Vector Database

A vector database is a database optimized for storing and searching high-dimensional embeddings, the backbone of semantic search and RAG.

Also known as: vector store, embedding database

What is a vector database?

A vector database is a database optimized for storing embeddings — numerical representations of text, images, or other data in high-dimensional space — and running fast similarity search over them. Instead of matching exact keywords, it finds the stored items whose vectors sit closest to a query's vector, which corresponds to semantic similarity.

Ask a keyword index for "refund policy" and it matches those literal words; ask a vector database and it also surfaces a paragraph about "returning a purchase for your money back", because the embeddings of the two phrases land near each other.

How similarity search works

Each item is first passed through an embedding model, which outputs a vector of hundreds or thousands of floating-point dimensions. The database indexes these vectors using approximate nearest neighbor (ANN) structures — HNSW graphs are the most common — so that finding the closest matches among millions of vectors takes milliseconds rather than requiring a full scan.

Queries follow the same path: embed the query text, search the index, return the top-k nearest items ranked by distance (typically cosine similarity). Most vector databases also support metadata filters, so a search can be restricted to one tenant, document set, or date range.

Vector databases in RAG

Retrieval-augmented generation (RAG) is the dominant use case. Documents are split into chunks, embedded, and upserted into a vector database; at question time, the system embeds the user's query, retrieves the most relevant chunks, and feeds them into the LLM's context so it answers from the source material instead of from memory.

The vector database is what makes this work at scale — it is the component that turns "a million pages of documentation" into "the five passages that answer this question", inside the latency budget of a chat response.

How AI agents use vector databases

Agents reach vector databases through MCP servers that wrap them, exposing operations like search, upsert, and delete as agent-callable tools. The Loomal Index lists servers in its RAG and knowledge-memory categories that wrap popular vector stores, so an agent in Claude Desktop or Cursor can query a knowledge base the same way it calls any other tool.

This pattern also gives agents durable memory: an agent can embed and store what it learns during a session, then retrieve it semantically in later sessions — a capability raw LLM context windows cannot provide.

The per-query economics

Vector search has real marginal cost — embedding compute, index memory, and storage all scale with usage — which makes it a natural fit for per-call pricing rather than flat subscriptions. A hosted retrieval endpoint that charges per query aligns revenue with the work each search actually performs.

On the Loomal Index, the owner of a retrieval-backed MCP server can claim the listing and attach x402 pricing — from $0.01 per call, paid in USDC and settled on Base in about two seconds — so agent queries pay for themselves without API keys or subscription onboarding.

Embedding RAG (Retrieval-Augmented Generation)MCP Server Context Window