Sampling (MCP)

Sampling is an MCP feature that lets a server request an LLM completion from the connected client, enabling agentic behavior inside the server without its own model access.

Also known as: MCP sampling, sampling/createMessage

What is sampling in MCP?

Sampling is the Model Context Protocol feature that reverses the usual relationship between server and model: instead of only responding to tool calls, an MCP server can ask the connected client to run an LLM completion on its behalf. The name comes from the underlying operation — sampling tokens from a language model.

Concretely, a server handling a tool call might need intelligence mid-task: summarize a 50-page result before returning it, classify an ambiguous input, or decide which of three strategies to take next. Sampling lets it borrow the client's model for exactly that step.

How a sampling request works

Sampling is a client capability negotiated at connection time. When a server needs a completion, it sends a sampling/createMessage request containing the messages to complete, plus optional model preferences — hints about whether to prioritize speed, cost, or intelligence — a system prompt, and a max token count. The client runs the completion against whatever model it has access to and returns the result.

The specification puts a human in the loop by design: clients should give the user visibility and control over incoming sampling requests, since a server is effectively asking to spend the user's tokens and inject content into a model conversation. The server never learns the client's API key and never chooses the exact model; it only expresses preferences.

Why sampling exists

Without sampling, any server needing model access would have to ship its own API key and bill its own usage — pushing credential management and inference cost onto every server author and fragmenting model access across the ecosystem. Sampling centralizes both in the client: one key, one bill, one place to enforce policy, while servers stay stateless and credential-free.

This division also makes server logic more portable. The same server gets smarter or cheaper depending on which client connects to it, without a single code change.

Sampling and agentic servers

Sampling turns servers from passive tool providers into participants that can reason. A research server can iteratively refine queries based on intermediate results; a data-cleaning server can resolve edge cases that defeat regex. Combined with tools, it allows multi-step agent loops to live behind a single MCP tool call.

There is a commercial angle worth noting: a server whose tool calls embed model-quality reasoning is delivering more value per call than a thin API wrapper, which supports a higher per-call price when the server is monetized — on Loomal, anywhere from the $0.01 minimum upward, paid by the calling agent over x402 before the handler runs.

Limitations to be aware of

Client support for sampling is uneven — as of mid-2026 many popular MCP clients still do not implement it, so production servers treat it as an enhancement with a fallback path rather than a dependency. Check your target client's documentation before building around it. There is also a latency cost: every sampling round-trip includes a full model completion, so a tool that samples repeatedly will feel slow compared to one that computes directly.

Roots (MCP)MCP Client Large Language Model Tool Calling