If your AI Agent or MCP server uses Server-Sent Events (SSE), you can’t load test it like a normal REST API and call it done. SSE is all about long-lived connections, where data streams back to client for a truly interactive real-time feeling.
This guide is a practical howto for testing SSE transports used by agents/MCP services-focused on what breaks in production and how to catch it early. See also this post on load testing recipes for other things.
This post targets SSE-based/legacy MCP deployments. For new deployments, readers should evaluate Streamable HTTP
Why this matters
In addition to the usual reasons for load testing any server, a LLM’s response is way more time consuming than HTTP REST-APIs. Your agent streams characters back to the user so they see the response arriving piece by piece over many seconds.
Many frameworks used for agents by default run with a small amount of workers enabled. While a worker is dealing with the prompt, it cannot pick-up more incoming requests. It’s easy for an agent to saturate even with a small amount of incoming requests.
A single prompt often triggers multiple MCP tool calls to an MCP server. It’s important to have confidence both your agent and your own MCP server(s) can handle it.
Quick and dirty SSE overview
A typical request to an AI agent or MCP Server, running with SSE, goes through these phases.
Note: MCP Servers can have different protocols such as stdio and streaming HTTP. This post is about the SSE protocol.
Client sends a POST /foo request with the usual TCP and HTTP semantics. Its body is specific for your agent / MCP Server.
The server streams lines of text back to the client. In SSE, events are delimited by a blank line. An event is composed of one or more field: value lines (data:, ) Here’s an example for a session with an agent doing the AG-UI protocol
data:{"type":"RUN_STARTED", ...}data:{"type":"TEXT_MESSAGE_CONTENT", "delta":"Hello"}data:{"type":"TEXT_MESSAGE_CONTENT", "delta":" World"}data:{"type":"TEXT_MESSAGE_CONTENT", "delta":"!"}data:{"type":"TEXT_MESSAGE_END" }data:{"type":"TOOL_CALL_START", ...}data:{"type":"TOOL_CALL_RESULT", ...}data:{"type":"TOOL_CALL_END", ...}data:{"type":"RUN_FINISHED", ...}
K6 SSE script
K6 is great for repeatable CI-friendly tests! The k6/x/sse community extension helps us consume SSE streams. If you’re running K6 version 1.2 or higher then you don’t even need to create custom build! It uses its automatic extension resolution, to ensure the extension is downloaded and integrated.
//Intellisense thinks it's an unresolvable module//just ignore the linting error for next line//@ts-ignoreimport sse from 'k6/x/sse'import { check } from 'k6'export default function() { const params = { method: "POST", body: JSON.stringify({ jsonrpc: "2.0", id:1, method: "tools/call", params: { name: "mcp-servers-tool-name", arguments: { arg1: "Hello" } } }), headers: { "Accept": "*/*", "Content-Type": "application/json", }, timeout: "10s" } //This sends the HTTP request. The handler function is responsible //for dealing with the received SSE events streamed back over the //connection. const res = sse.open("https://my-server/mcp", params, myHandler) //The return value of open() contains the HTTP status code in //res.status. Although it looks similar to a k6 http response, //it's a different object and lacks members like body and json() check(res, {"no http errors": (res) => res.status === 200 }) function myHandler(client: any) { client.on("error", () => { //This example uses the normal k6 check() method //to record errors on SSE level check(true, {"No SSE errors allowed": () => false}) }) client.on("event", (event: any) => { try { const mcpResponse = JSON.parse(event.data) check(mcpResponse, { "No errors from MCP server allowed": (r) => !r.result?.isError }) //Add any more checks you need to verify your MCP server //response contains sensible data } catch { check(false, { "MCP response must be valid JSON": () => false}) } }) }}
Relevant metrics for your test
For an MCP Server tool. If http reqs per second and events per second rates continuously diverge, then the server is saturated and has exceeded its peak capacity. It might be interesting to see how long the server can survive such a peak and if it recovers after load decreases. K6 scenarios are your friend here.
For an AI agent, having a custom metric to verify the time-to-first token is a great indicator how the end users will experience the conversation flow.