30-40% of RAG context is semantically redundant. Same information from docs, code, and memory competing for tokens. Distill removes duplicates and maximizes diversity in ~12ms.
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Vector DB │────▶│ Over-fetch │────▶│ Cluster │────▶│ Select + │────▶ LLM
│ (Pinecone) │ │ (50) │ │ (12) │ │ MMR (8) │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
│ │ │
Retrieve 3-5x Group similar Pick diverse
more chunks chunks together representativesDistill sits between your vector database and LLM. No infrastructure changes required.
Your vector DB returns similar chunks from different sources - docs, code, changelogs. Same answer, multiple times.
Distill clusters and deduplicates before the context reaches your LLM.
Every token counts. Redundant context wastes budget and dilutes the signal your model needs.
Get more diverse information in fewer tokens. Better answers, lower cost.
Distance matrix, clustering, selection, and MMR re-ranking. All in under 15ms.
Adjust clustering threshold and MMR lambda to balance relevance and diversity for your use case.
Document-level permissions with OpenFGA. Filter chunks by user access before they reach the LLM.Coming soon
Join the private beta. Get early access and help shape the product.
Join the waitlist