A/B test pricing for agent-facing APIs
Agents read your price out of a 402 response, not a pricing page. That makes price experiments cheap to run and easy to measure — if you design them properly.
With human customers, changing a price means updating a pricing page, grandfathering plans, and emailing everyone. With x402, the price is a field in the payment requirements your server returns on the 402 response. Change the field, and every subsequent call is quoted the new price.
That property makes agent-facing APIs unusually testable. This guide covers how to structure a price experiment on an MCP server, what to measure, and how to avoid the traps that make the data lie to you.
Why agents make pricing testable
An agent decides whether to pay your price programmatically, at call time, against whatever budget its operator gave it. There is no sticker shock, no negotiation, no sales cycle — just a quoted price and an accept-or-skip decision. That decision repeats thousands of times, which is exactly the kind of signal A/B testing needs.
Because the minimum x402 price is $0.01 and settlement is per call, even small experiments produce real revenue data rather than survey answers.
Pick one variable and a clean split
Test one thing: the per-call price of one tool. Don't change the tool's description, output, or latency in the same window, or you won't know what moved the numbers.
Two split strategies work. Time-based rotation — run $0.01 for a week, then $0.03 for a week — is simplest and avoids quoting different callers different prices for the same request. Per-request assignment converges faster but means two agents can see two prices simultaneously; if your listing or any index caches a quoted price, keep windows long and prefer rotation.
Implement the variant
Your payment gate already computes a price per request; make that computation read from a variant schedule instead of a constant.
const SCHEDULE = [
{ until: "2026-06-21", price: "$0.01" },
{ until: "2026-06-28", price: "$0.03" },
];
export function currentPrice(now = new Date()): string {
const slot = SCHEDULE.find((s) => now < new Date(s.until));
return slot?.price ?? "$0.02"; // fallback after the test
}Measure the right three numbers
Per variant, track: paid calls per day, the 402-to-paid conversion rate (how many agents that hit the 402 actually paid), and revenue per day (price times paid calls). Signed receipts give you an exact, disputable-by-no-one count of paid calls, so revenue attribution is trivial.
Conversion is the early-warning metric. If a price hike drops conversion sharply, you've crossed a budget ceiling that agent operators set in their frameworks — many cap per-call or per-session spend, and your tool silently falls out of consideration above the cap.
Read the results like elasticity, not like a vote
If you triple the price and volume barely moves, you were underpriced — take the revenue. If volume collapses, the agents calling you had cheaper substitutes; check what comparable tools in your category quote before cutting further. The floor is $0.01, so 'race to free' isn't an available failure mode.
On Loomal, repricing is a single field in the console, and the change takes effect on the next 402 your listing serves. There's no platform friction to running another round: Loomal's fee is 5% on settled transactions, currently waived, regardless of which price wins.
FAQ
Won't agents get confused if my price changes between calls?
No. The x402 flow quotes the price fresh on every 402 response, and the agent signs an authorization for that specific amount. Each call is a self-contained transaction, so a price change simply applies to the next call. Just avoid changing the price between a 402 and its immediate retry window.
How long should each price variant run?
Long enough to cover your traffic's natural weekly cycle — usually one to two weeks per variant for a moderately trafficked tool. If you get thousands of calls a day, a few days per variant is enough; if you get dozens, run longer or the noise will swamp the signal.
Do I need different infrastructure to run the test?
No. The price is computed by your payment gate at request time, so a variant schedule is a few lines of code. Counting results needs nothing exotic either — paid calls come with Ed25519-signed receipts, so your logs are the ledger.
What if both variants perform the same?
Then price isn't your binding constraint — discovery is. Spend the effort on your listing's tool descriptions and category placement instead, and retest pricing once call volume is high enough to discriminate between variants.
Set a price. Then test it.
Reprice your listing in one field and watch the call volume respond.