Free demo with limits: 10 runs/day, 20 chunks max, 500 chars/chunk. Running on Render free tier - first request may take 30-50s (cold start). Subsequent requests are faster (~500ms-2s including embedding generation).
Need more? Schedule a call or reach out.
Distance threshold for clustering. Lower = stricter matching, more clusters.
Maximal Marginal Relevance
Balances relevance vs diversity when selecting chunks.
The auth service validates JWT tokens using RS256. Tokens expire after 24 hours.
Authentication is handled by the auth service. It uses RS256-signed JWTs with a 24h expiry.
JWT tokens are validated by the auth service using RS256 signing. Expiry is set to 24 hours.
Rate limiting is enforced at the API gateway: 100 requests per minute per user.
The API gateway enforces rate limits of 100 req/min per user.
Database writes go through a write-ahead log. Reads are served from a read replica.
The database uses WAL for writes. Read queries hit the replica, not the primary.
All database writes use a write-ahead log. Read traffic is routed to the read replica.
Deployments are triggered by merging to main. The pipeline runs tests, builds a Docker image, and pushes to ECR.
Merging to main triggers the CI/CD pipeline: tests run, Docker image is built, pushed to ECR.
The search index is rebuilt nightly at 02:00 UTC. Incremental updates run every 15 minutes.
Search index full rebuild: nightly at 02:00 UTC. Incremental sync: every 15 minutes.
Click "Run Distill" to see results