Enabling automatic evaluation
Toggle Enable Automatic Evaluation
Turn on Enable Automatic Evaluation. When this is off, no evals fire regardless of what
profiles are active.
How evaluation works
- Every agent response triggers the eval capture hook.
- MIRA runs
rule,similarity, andmetricevals immediately — these are fast local computations. llm_judgeevals are processed via a separate queue with configurable concurrency to avoid overwhelming the judge provider.- All results are stored in the local database and accessible in the dashboard.
Eval settings
Configure evaluation behaviour in Settings → Evals:| Setting | Description |
|---|---|
| Enable Automatic Evaluation | Master toggle — disables all eval capture when off |
| Local-Only Mode | Suspends llm_judge evals; only rule, similarity, and metric evals run |
| LLM Concurrency | Number of simultaneous LLM judge calls (1–4) |
| Data Retention | Retention period for stored results: 7 / 30 / 90 / 180 days, or Forever |
| Run Cleanup Now | Immediately purge results older than the retention window |
Monitoring results
After chatting with the engine, open the Eval Studio dashboard:- Conversations tab — one card per captured agent response, showing pass/fail summary across all active eval definitions
- Eval Health tab — per-eval-definition pass rate trends over time
- Compare tab — A/B comparison across two conversations or time windows
Cancelling / pausing evaluation
To stop evals from firing temporarily, toggle Enable Automatic Evaluation off in Settings → Evals. Active profiles and eval definitions remain configured — re-enabling resumes capture from the next response onward.Edit this page — Open a pull
request