When to use A/B comparison
- After updating a skill’s instructions — did quality improve?
- After switching providers or models — which performs better on your workloads?
- After enabling a new MCP tool — did additional context help?
Running an A/B comparison
Alternatively, open any single run and click Compare with another run to pick the second run from a dropdown.
Comparison view
The comparison table has one row per case, with columns for both runs:| Column | Description |
|---|---|
| Case name | Shared case identifier |
| Run A status | ✅ Pass / ❌ Fail |
| Run A score | Numeric score |
| Run B status | ✅ Pass / ❌ Fail |
| Run B score | Numeric score |
| Delta | Score B − Score A (green = improved, red = regressed) |
| Change | 🟢 Improved / 🔴 Regressed / ➡️ Unchanged / 🆕 New in B / ❌ Removed in B |
Summary metrics
Above the table, MIRA shows:- Overall pass rate: Run A → Run B
- Cases improved: N cases that flipped from fail to pass
- Cases regressed: N cases that flipped from pass to fail
- Average score delta: Mean(Score B) − Mean(Score A)
Exporting a comparison
Click Export Comparison to download a CSV with all per-case data from both runs, including deltas.Edit this page — Open a pull request