A/B Comparison

The A/B Comparison view puts two runs from the same profile side by side so you can see exactly which cases improved, regressed, or stayed the same between runs.

When to use A/B comparison

After updating a skill’s instructions — did quality improve?
After switching providers or models — which performs better on your workloads?
After enabling a new MCP tool — did additional context help?

Running an A/B comparison

Open the Eval Dashboard

Press ⌘4, then click Dashboard.

Select a profile

Click the profile card whose runs you want to compare.

Select two runs

In the Run History table, check the boxes next to two runs and click Compare Selected.

Alternatively, open any single run and click Compare with another run to pick the second run from a dropdown.

Comparison view

The comparison table has one row per case, with columns for both runs:

Column	Description
Case name	Shared case identifier
Run A status	✅ Pass / ❌ Fail
Run A score	Numeric score
Run B status	✅ Pass / ❌ Fail
Run B score	Numeric score
Delta	Score B − Score A (green = improved, red = regressed)
Change	🟢 Improved / 🔴 Regressed / ➡️ Unchanged / 🆕 New in B / ❌ Removed in B

Click any row to expand and see both outputs side by side.

Summary metrics

Above the table, MIRA shows:

Overall pass rate: Run A → Run B
Cases improved: N cases that flipped from fail to pass
Cases regressed: N cases that flipped from pass to fail
Average score delta: Mean(Score B) − Mean(Score A)

Exporting a comparison

Click Export Comparison to download a CSV with all per-case data from both runs, including deltas.

Edit this page — Open a pull request

Eval Dashboard Human Review

​When to use A/B comparison

​Running an A/B comparison

​Comparison view

​Summary metrics

​Exporting a comparison

When to use A/B comparison

Running an A/B comparison

Comparison view

Summary metrics

Exporting a comparison