Loomal

Best Code Execution MCP Servers for AI agents.

Sandboxes, multi-language runners, container runtimes, shell access, and GPU-backed execution — the servers that let agents run the code they write.

Code an agent cannot run is code it cannot verify. Execution servers close the loop: the agent writes a script, runs it, reads the output, and corrects itself — the single biggest quality multiplier in agentic coding. The category covers everything from locked-down sandboxes to raw shell access, and the spread between those two poles is the whole story.

Loomal tracks 33 live servers tagged Code Execution. The sample below covers the main execution models; each entry links to a marketplace listing with full tool details.

The isolation spectrum

Every execution server sits somewhere on a line from 'cannot hurt you' to 'can do anything you can'. At the safe end, runno — the category's most-starred listing at 767 — wraps the Runno sandbox, giving agents a contained place to run code with no access to the host. code-runner extends the multi-language runner pattern across various programming languages.

The middle ground is containers: podman-mcp-server drives Podman and Docker runtimes, so each agent task gets a disposable, resource-limited environment — the isolation model most production deployments land on. At the open end sits Shell Exec, which runs bash with background job support: maximally useful, minimally protected, appropriate only where the agent is trusted like a colleague with your terminal.

Execution beyond the local machine

Some of the most interesting listings execute somewhere unusual. mcp-server-colab-exec runs Python on Google Colab's T4 and L4 GPU runtimes from any MCP client — an agent can train a small model or run CUDA workloads without you owning a GPU. The Qiskit IBM Runtime server goes further afield, submitting quantum circuits to IBM's hardware.

Execution also turns out to be a substrate for non-coding domains: build123d MCP executes parametric CAD code and renders, measures, and exports geometry; cesium-mcp-runtime drives a CesiumJS 3D globe for spatial analysis. The pattern is the same loop — write code, run it, inspect the result — pointed at solids and maps instead of test suites.

How to choose an execution model

Decide on the blast radius first and the feature list second. Untrusted or experimental agents get a sandbox (runno) or a container (podman-mcp-server); only well-supervised workflows on dev machines earn direct shell access. Statefulness is the next question: RLM Tools provides a persistent Python sandbox, which keeps variables and loaded data alive across calls — far more token-efficient for iterative exploration than re-running setup code every invocation.

For regulated environments, this category even includes its own compliance layer: the Vaara servers produce tamper-evident, EU AI Act-oriented runtime evidence for MCP activity, useful when 'the agent ran code' needs an audit trail.

Paying for compute by the call

Execution is the category where per-call pricing is most literal: every invocation consumes measurable CPU, GPU, or sandbox time. Self-hosting the open-source servers here costs only your own hardware, but hosted execution — especially GPU-backed — is a metered product by nature. On Loomal, maintainers who claim their listing can attach x402 pricing from $0.01 per call; agents pay in USDC before the handler runs, settlement hits Base in about two seconds, and receipts are Ed25519-signed. Loomal's 5% fee on settled transactions is currently waived.

All 33 Code Execution listings: loomal.ai/marketplace?category=Code%20Execution.

Frequently asked questions

What's the safest way to let an agent execute code?

Use a sandbox or container server rather than raw shell access: runno for contained execution, or podman-mcp-server for disposable Podman/Docker environments with resource limits. Reserve direct shell tools like Shell Exec for supervised use on machines you'd let the agent break.

Can an agent run GPU workloads through MCP?

Yes — mcp-server-colab-exec executes Python on Google Colab T4/L4 GPU runtimes from any MCP client, which covers small training runs and CUDA work without local hardware. Note you're operating within Colab's own usage terms and quotas.

Why do execution servers emphasize persistence and token efficiency?

Because agents iterate. A persistent sandbox like RLM Tools keeps state between calls, so the agent doesn't re-import libraries and reload data every step — that saves both wall-clock time and the context tokens that re-printing setup output would burn.

How does pay-per-call work for hosted execution?

The maintainer claims their Loomal listing and sets a USDC price per call, minimum $0.01. An x402-capable agent gets an HTTP 402 with the price, pays, and only then does the code run — settlement on Base takes about two seconds and there are no chargebacks, which matters when the cost is irreversible compute.

Run a Code Execution MCP server?

Claim your listing, set a per-call USDC price, and let AI agents pay for every call over x402.

List it on Loomal