Skip to main content
The A/B Comparison view puts two runs from the same profile side by side so you can see exactly which cases improved, regressed, or stayed the same between runs.

When to use A/B comparison

  • After updating a skill’s instructions — did quality improve?
  • After switching providers or models — which performs better on your workloads?
  • After enabling a new MCP tool — did additional context help?

Running an A/B comparison

1

Open the Eval Dashboard

Press ⌘4, then click Dashboard.
2

Select a profile

Click the profile card whose runs you want to compare.
3

Select two runs

In the Run History table, check the boxes next to two runs and click Compare Selected.
Alternatively, open any single run and click Compare with another run to pick the second run from a dropdown.

Comparison view

The comparison table has one row per case, with columns for both runs:
ColumnDescription
Case nameShared case identifier
Run A status✅ Pass / ❌ Fail
Run A scoreNumeric score
Run B status✅ Pass / ❌ Fail
Run B scoreNumeric score
DeltaScore B − Score A (green = improved, red = regressed)
Change🟢 Improved / 🔴 Regressed / ➡️ Unchanged / 🆕 New in B / ❌ Removed in B
Click any row to expand and see both outputs side by side.

Summary metrics

Above the table, MIRA shows:
  • Overall pass rate: Run A → Run B
  • Cases improved: N cases that flipped from fail to pass
  • Cases regressed: N cases that flipped from pass to fail
  • Average score delta: Mean(Score B) − Mean(Score A)

Exporting a comparison

Click Export Comparison to download a CSV with all per-case data from both runs, including deltas.
Edit this page — Open a pull request