Back to Research
Testing MCP Servers: From Unit Tests to CI

Testing MCP Servers: From Unit Tests to CI

Automated Testing Strategies for AI Tool Integrations in CI Pipelines

E
Exo
March 31, 20265 min read

TLDR

  • Unit test each tool handler across four categories: happy path, invalid inputs, edge cases, and error conditions with mocked dependencies.

  • Use the MCP SDK’s InMemoryTransport and mock clients for integration tests that verify the full protocol stack, including schema discovery and error propagation.

  • Add schema snapshot testing to your CI pipeline so unintended tool schema changes are flagged before they break AI client integrations.


Custom MCP server development does not end when the server runs. MCP servers are infrastructure that AI models depend on for every tool call, resource read, and prompt expansion. A broken tool handler means the AI assistant fails in production. A schema mismatch means the model sends arguments your server cannot parse. A regression in error handling means cryptic failures that are hard to debug.

Testing MCP servers requires strategies at three levels: unit tests for individual tool handlers, integration tests that exercise the full protocol stack, and CI/CD pipelines that catch regressions before deployment.

Unit Testing Tool Handlers

The simplest and most valuable tests target individual tool handlers in isolation. A tool handler is an async function that takes validated input and returns a result. You can test it directly without spinning up the full MCP server.

For each tool, test four categories:

Happy path. Valid inputs that should produce correct results. Test with representative data that matches real usage patterns. If the tool queries a database, use a test database with known data and verify the output matches expectations.

Invalid inputs. Arguments that fail schema validation (wrong types, missing required fields, out-of-range values). The Zod schema should catch these before your handler runs, but test that the error messages are clear enough for the AI model to understand what went wrong and retry with correct arguments.

Edge cases. Empty strings, maximum-length inputs, special characters, Unicode, null bytes. AI models generate surprising inputs. Test that your handler does not crash or produce unexpected results with boundary values.

Error conditions. Network failures, database timeouts, permission denied, resource not found. Mock external dependencies and verify that your handler returns structured error responses (isError: true with a descriptive message) rather than throwing unhandled exceptions.

Integration Testing with Mock Clients

Unit tests verify handler logic, but they do not test the protocol layer. Integration tests spin up the full server and communicate with it through a mock MCP client, exercising the complete request/response cycle.

The MCP SDK provides a Client class that connects to a server over any transport. For testing, use an in-memory transport that connects client and server without network or process boundaries:

import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { InMemoryTransport } from "@modelcontextprotocol/sdk/inMemory.js";

const [clientTransport, serverTransport] = InMemoryTransport.createLinkedPair();
await server.connect(serverTransport);

const client = new Client({ name: "test-client", version: "1.0.0" });
await client.connect(clientTransport);

With the client connected, you can test the full protocol:

Capability discovery. Call client.listTools() and verify the returned tool list matches your registered tools. Check that names, descriptions, and input schemas are correct. This catches schema definition bugs that unit tests miss.

Tool execution through the protocol. Call client.callTool({ name: “lookup-user”, arguments: { email: “test@example.com” } }) and verify the response. This tests the full pipeline: JSON-RPC serialization, input validation, handler execution, and response formatting.

Resource reading. Call client.readResource({ uri: “config://app/settings” }) and verify the returned content and MIME type. Test both static and dynamic resources.

Error propagation. Call tools with invalid arguments and verify the client receives appropriate error responses. Test that protocol-level errors (unknown tool name) and tool-level errors (handler failure) are returned correctly.

Using the MCP Inspector for Manual Testing

The MCP Inspector is an interactive debugging tool provided by the MCP project. Run it with npx @modelcontextprotocol/inspector and point it at your server. It provides a visual interface for:

  • Viewing all registered tools, resources, and prompts with their schemas

  • Manually invoking tools with custom arguments and inspecting responses

  • Reading resources and verifying content and MIME types

  • Watching the JSON-RPC message stream in real time

  • Testing the initialization handshake and capability negotiation

The Inspector is invaluable during development. Use it before connecting a real AI client to verify that your server behaves correctly at the protocol level. It catches issues that are hard to debug through an AI interface: malformed schemas, incorrect content types, missing error codes.

Schema Snapshot Testing

Tool schemas are a contract between your server and every AI client that connects to it. A change in a tool’s input schema can break existing integrations. Schema snapshot testing catches unintended changes.

The approach: in your test suite, connect a client to the server, call listTools(), and compare the result against a stored snapshot. If the schemas change, the test fails. You review the diff, verify the change is intentional, and update the snapshot.

This is especially important for MCP servers that multiple teams depend on. A schema change that looks minor (renaming a field, changing a type from string to number) can break every client using that tool.

CI/CD Pipeline Integration

MCP server tests belong in your CI pipeline like any other critical infrastructure component. Here is a practical CI configuration:

Build stage: Compile the server and verify type correctness. For TypeScript, run tsc with strict mode. For Rust, cargo build catches type errors and ownership violations.

Unit test stage: Run handler-level tests with mocked dependencies. These should be fast (under 30 seconds for the full suite) and require no external services.

Integration test stage: Spin up the server with a test database or mock services. Run the full client-server test suite. This stage may need Docker for external dependencies.

Schema validation stage: Compare current tool schemas against snapshots. Flag any changes for review. This prevents accidental breaking changes from reaching production.

Security scan stage: Run input validation tests that simulate prompt injection attacks. Verify that SQL injection, path traversal, and other attack vectors are blocked by your validation layer.

The Testing Pyramid for MCP Servers

The standard testing pyramid applies, but with MCP-specific adjustments. The base is unit tests for handlers (fast, isolated, high coverage). The middle layer is integration tests with mock clients (protocol correctness, schema validation). The top layer is end-to-end tests with a real AI model (expensive, slow, but catches issues the other layers miss).

End-to-end tests with a real model are optional but valuable for critical servers. Connect an AI client to your server, send a natural language prompt that should trigger a specific tool, and verify the tool was called with correct arguments. This catches description quality issues: if the model does not call your tool when it should, the description needs improvement.


Exo builds and tests production MCP servers for technical teams. Our testing practices come from building infrastructure that handles real traffic for 26 protocols. We bring the same rigor to MCP server development. Ready to build? Reach out at founders@exotechnologies.xyz