Distill is now open source!Star on GitHub
Now in private beta

Reliable LLM outputs
start with clean context

Same inputs should produce same outputs. We make that possible.

Deterministic deduplication. Zero-latency feel. No inference. Full traceability.

LLM outputs are unreliable because context is polluted.

30-40% of RAG context is redundant. Same information from docs, code, memory, and tools competing for attention. The model gets confused. Outputs become non-deterministic.

Read the research

Non-Deterministic Outputs

Same workflow, different results. Redundant context makes outputs unpredictable run-to-run.

Confused Reasoning

Models see the same thing 5 ways. Signal gets diluted. Long-horizon reasoning breaks down.

Production Failures

"Vibe coding" works in demos. In production, unreliable outputs block enterprise adoption.

"Garbage in, garbage out"

- The fundamental problem with LLM reliability

You can't fix unreliable outputs with better prompts.
You need to fix the context that goes in.

Clean context → Reliable outputs.

Math, not magic.

No models were harmed in the making of this optimization.

CLUSTER

Agglomerative clustering groups similar chunks.

~6ms

SELECT

Pick the best representative from each cluster.

<1ms

RERANK

MMR balances relevance and diversity.

~3ms

Total: ~12msLLM calls: 0Deterministic, auditable, reliable

Where Distill fits

Your App
Vector DB

Any vector database

DistillDISTILL

Cluster, Select, Rerank

DedupeRankCache
LLM

Any model

A reliability layer between your data sources and your LLM. No infrastructure changes required.

Why not just use an LLM?

LLMs are non-deterministic. Reliability requires deterministic preprocessing.

LLM CompressionDistill
Latency~500ms~12ms
Cost per call$0.01+$0.0001
DeterministicNoYes
LosslessNoYes
AuditableNoYes

Use LLMs for reasoning. Use deterministic algorithms for reliability.

Built on research, not hype.

We read the papers. We wrote the code.

Agglomerative Clustering

Hierarchical clustering that adapts to your data. No need to specify K. No hyperparameter tuning.

Average linkage • O(n² log n)

Maximal Marginal Relevance

Balance relevance and diversity in one pass. λ = 1.0 → pure relevance. λ = 0.0 → pure diversity.

Carbonell & Goldstein (1998)

SIMD-Accelerated Similarity

Vectorized distance computation. Process 50 chunks in under 2ms.

Pure Go • No dependencies

Enterprise ready.

Determinism, traceability, and compliance for production AI.

Fine-grained Authorization

Secure RAG access controls

Audit Logs

Full traceability

PII Detection

Auto-redact sensitive data

On-Prem / VPC

Your infrastructure

Pricing

Open Source

Free

Forever

  • Core deduplication engine
  • Clustering + MMR algorithms
  • Local execution (your infra)
  • Pinecone & Qdrant support
  • Community support (GitHub)

Cloud

Popular

$0.001/query

Pay as you go

  • Everything in Open Source
  • Hosted API (99.9% uptime)
  • Advanced re-ranking models
  • Real-time usage analytics
  • 1K free queries/month
  • Email support

Enterprise

Custom

Volume discounts

  • Everything in Cloud
  • Fine-grained authorization
  • SSO (SAML, OIDC)
  • Audit logs & compliance
  • On-prem / VPC deployment
  • Dedicated support & SLA

Works with your stack

PineconeQdrantWeaviateLangChainLlamaIndex

Reliable outputs start with clean context.

Deterministic. Auditable. Zero-latency feel.
The reliability layer your LLM needs.

Get started free