Running Evals

MIRA evals run automatically — there is no manual “run” button for profiles. Every time the active engine produces a response, MIRA captures that response and runs it through all activated eval definitions. Results are stored immediately and visible in the dashboard.

Enabling automatic evaluation

Open Settings

Click the gear icon in the sidebar to open Settings, then select the Evals tab.

Toggle Enable Automatic Evaluation

Turn on Enable Automatic Evaluation. When this is off, no evals fire regardless of what profiles are active.

Activate eval profiles

Open the Eval Studio (Flask icon), go to the Profiles tab, and click Activate on the profiles you want running. An active profile means all eval definitions assigned to that profile will fire on each captured response.

How evaluation works

Agent response captured → fast queue (rule/metric/similarity) → result stored
                        → LLM queue (llm_judge)               → result stored

Every agent response triggers the eval capture hook.
MIRA runs rule, similarity, and metric evals immediately — these are fast local computations.
llm_judge evals are processed via a separate queue with configurable concurrency to avoid overwhelming the judge provider.
All results are stored in the local database and accessible in the dashboard.

Eval settings

Configure evaluation behaviour in Settings → Evals:

Setting	Description
Enable Automatic Evaluation	Master toggle — disables all eval capture when off
Local-Only Mode	Suspends `llm_judge` evals; only `rule`, `similarity`, and `metric` evals run
LLM Concurrency	Number of simultaneous LLM judge calls (1–4)
Data Retention	Retention period for stored results: 7 / 30 / 90 / 180 days, or Forever
Run Cleanup Now	Immediately purge results older than the retention window

Monitoring results

After chatting with the engine, open the Eval Studio dashboard:

Conversations tab — one card per captured agent response, showing pass/fail summary across all active eval definitions
Eval Health tab — per-eval-definition pass rate trends over time
Compare tab — A/B comparison across two conversations or time windows

Cancelling / pausing evaluation

To stop evals from firing temporarily, toggle Enable Automatic Evaluation off in Settings → Evals. Active profiles and eval definitions remain configured — re-enabling resumes capture from the next response onward.

Edit this page — Open a pull request

​Enabling automatic evaluation

​How evaluation works

​Eval settings

​Monitoring results

​Cancelling / pausing evaluation

Enabling automatic evaluation

How evaluation works

Eval settings

Monitoring results

Cancelling / pausing evaluation