Now in private beta

Your context window
is full of duplicates

30-40% of RAG context is semantically redundant. Same information from docs, code, and memory competing for tokens. Distill removes duplicates and maximizes diversity in ~12ms.

How it works

┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Vector DB  │────▶│  Over-fetch │────▶│   Cluster   │────▶│  Select +   │────▶ LLM
│  (Pinecone) │     │    (50)     │     │    (12)     │     │  MMR (8)    │
└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘
                          │                   │                   │
                     Retrieve 3-5x      Group similar       Pick diverse
                     more chunks        chunks together     representatives

Distill sits between your vector database and LLM. No infrastructure changes required.

Two problems, one solution

RAG pipeline optimization

Your vector DB returns similar chunks from different sources - docs, code, changelogs. Same answer, multiple times.

Distill clusters and deduplicates before the context reaches your LLM.

Context window efficiency

Every token counts. Redundant context wastes budget and dilutes the signal your model needs.

Get more diverse information in fewer tokens. Better answers, lower cost.

~12ms overhead

Distance matrix, clustering, selection, and MMR re-ranking. All in under 15ms.

Tunable parameters

Adjust clustering threshold and MMR lambda to balance relevance and diversity for your use case.

Enterprise authorization

Document-level permissions with OpenFGA. Filter chunks by user access before they reach the LLM.Coming soon

Works with your stack

PineconeQdrantLangChainLlamaIndex

Ready to optimize your context?

Join the private beta. Get early access and help shape the product.

Join the waitlist