Embedding

An embedding is a numerical vector that represents text, images, or other data so that semantically similar items end up close together in vector space.

Also known as: vector embedding, text embedding

What is an embedding?

An embedding converts data — most commonly text — into a long list of numbers (a vector) positioned so that items with similar meaning sit near each other in vector space. "How do I reset my password?" and "I'm locked out of my account" share almost no words, but their embeddings land close together because they mean nearly the same thing.

That property is what makes embeddings useful: instead of matching keywords, software can compare meanings by measuring the distance between vectors.

How embeddings are produced

Embeddings come from embedding models — neural networks trained specifically to map inputs to vectors. You send a string to the model and get back a fixed-length array, often somewhere between a few hundred and a few thousand dimensions depending on the model.

Similarity between two embeddings is usually measured with cosine similarity or dot product. Two important caveats: vectors from different embedding models are not comparable with each other, and if you switch models you must re-embed your entire corpus.

Embeddings in semantic search and RAG

Embeddings are the engine behind retrieval-augmented generation (RAG). A document set is chunked, each chunk is embedded, and the vectors are stored in a vector database. At query time, the user's question is embedded too, and the database returns the chunks whose vectors are closest — those become the context the LLM answers from.

The same mechanism powers semantic product search, deduplication, recommendation, and clustering. Anywhere the question is "what in this dataset is most like X?", embeddings are usually the answer.

Embeddings and MCP servers

Many MCP servers are embedding-powered under the hood. A documentation-search or knowledge-base server typically embeds the agent's query, runs a nearest-neighbor lookup against pre-computed vectors, and returns the top matches as tool output. The agent never sees the vectors — it just gets relevant text back.

Embedding-backed tools also map cleanly onto per-call pricing. Each query has a real marginal cost (an embedding-model call plus a vector lookup), so charging per search via x402 — where the agent pays in USDC before the handler runs — keeps revenue aligned with the compute each call actually consumes.

Embedding vs related terms

An embedding is the vector itself; a vector database is where embeddings are stored and queried at scale; RAG is the end-to-end pattern that uses both to feed retrieved context into an LLM. People sometimes say "embeddings" loosely to mean the whole retrieval pipeline, but strictly it refers only to the numeric representation.

Vector Database RAG (Retrieval-Augmented Generation)LLM (Large Language Model)MCP Server