Persistent memory across sessions. Semantic dedup. Context compression. ~12ms. No LLM calls.
Other tools compress what goes into your agent. Distill controls what your agent remembers: across sessions, without conflicts, ranked by what matters now.
Context deduplication is not an optimization. It is a correctness measure.
No LLM calls. Same input, same output. Every decision is auditable.
POST /v1/pipeline chains all six stagesStore once. Recall by relevance. Forget what's outdated. The model does not know what matters to you. Distill does.
# Enable memory + sessions distill mcp --memory --session # Store a memory distill memory store \ --text "Auth uses JWT with RS256" \ --tags auth,security \ --source code_review \ --auto-classify # Recall by relevance distill memory recall \ --query "how does auth work?" \ --boost-tags auth \ --min-relevance 0.3
Push context incrementally as the agent works. Distill deduplicates entries, compresses aging ones, and evicts when the budget is exceeded.
For engineering teams running automated code-mod rollouts, CVE patching, or any workflow where the same change pattern has caused problems before.
HTTP API, MCP server, or CLI. Run fully local with Ollama. No API key, no external calls.
Python SDK (LangChain + LlamaIndex native) in progress. #5
| Distill | LLM compression | |
|---|---|---|
| Latency | ~12ms | ~500ms |
| Cost per call | $0.0001 | $0.01+ |
| Deterministic | Yes, always | No, varies per run |
| Auditable | Full cluster log | Black box |
| Requires API key | Optional (Ollama works) | Always |
| Batch support | Async job queue | Sequential |
Whether you need a hosted API, enterprise deployment, or want to talk through your use case, reach out directly.