Sell model inference to agents

Charge agents per generation in USDC. Put your fine-tuned or specialized model behind a paywall and get paid before each forward pass.

You trained a model that does one thing better than the general-purpose APIs: a domain-tuned LLM, a specialized vision model, a custom diffusion checkpoint. Other agents would pay to call it, but standing up keys, quotas, and billing for them is a project of its own.

Loomal lets you put your model behind a per-generation paywall. An agent calls your endpoint, pays in USDC over x402, and gets the output. Your GPU time is covered on every forward pass, and you never issue a credential.

Charge for the compute you actually run

Inference cost tracks output length and model size. Per-generation pricing lets a short completion cost a fraction of a cent and a long, high-resolution, or multi-step generation cost more, all on the same endpoint.

You can quote from request parameters: token budget, image dimensions, sampling steps. The agent pays a price tied to the work it asked for, not a flat rate that loses money on the heavy calls.

One forward pass, paid for up front

Wrap your inference handler with requirePayment. The agent sends a prompt, Loomal quotes the price and runs the x402 handshake, settlement clears on Base, and your model produces the output only after it is paid.

route.ts

import { requirePayment } from "@loomal/sdk";

export const POST = requirePayment(
  (req) => ({ price: `$${(req.body.maxTokens * 0.00002).toFixed(4)}` }),
  async (req) => {
    const { prompt, maxTokens } = await req.json();
    return Response.json({ output: await model.generate(prompt, maxTokens) });
  },
);

List your model where agents shop for capability

Your model lists in the Loomal marketplace and the x402 discovery feed with its task, modality, and price. An agent looking for a model that does your specialty can find it and pay per generation without a contract or an integration call.

Revenue settles in USDC on Base to a non-custodial wallet, with a signed receipt per generation so every paid call is provable.

FAQ

Can I price by output length?

Yes. Quote the price from the token budget, image size, or step count in the request so callers pay for the compute they trigger.

Does my model have to be hosted on Loomal?

No. Keep it on your own GPUs and add the middleware, or use a Loomal-hosted endpoint if you prefer not to run infrastructure.

Do calling agents need an account?

No. Any x402-capable client pays inline; there are no keys to issue or rotate.

Sell an embeddings API Monetize a translation API Glossary: MCP monetization

Start selling to agents.

Wrap an endpoint, set a price, get paid in USDC.

Get started