Human Review

Human review is not a separate eval type — it is a post-run score override available on any evaluated result. After MIRA automatically scores an agent response, you can open that result in the dashboard and submit a manual score (0.0–1.0) that replaces the automated score for reporting and pass/fail purposes. This is useful when:

Automated scores miss nuance that expert judgment can catch
You want to audit LLM judge results for reliability
Regulatory or compliance requirements mandate human-in-the-loop sign-off

How to submit a human review

Open the Eval Studio dashboard

Click the Flask icon in the sidebar. The Conversations tab shows one card per captured agent response.

Open a run result

Click a conversation card to open the run detail view. Each eval result for that response is listed with its automated score.

Click Review

Click the Review button on any individual eval result row.

Enter a score and comment

Enter a score from 0.0–1.0 and an optional comment explaining your assessment.

Submit

Click Submit. The result is updated immediately. The human score is stored as humanOverride on the eval result and takes precedence over the automated score in pass/fail calculations.

How the override is stored

The humanOverride field on an eval result contains:

Field	Type	Description
`score`	`number`	Human-assigned score, 0.0–1.0
`reviewerNote`	`string`	Optional qualitative comment
`reviewedAt`	`string`	ISO timestamp of submission

When exported, override data appears in the human_override_score and human_override_note CSV columns.

When to use human review

Human override is best used as a spot-check layer on top of automated evals, not as the primary eval mechanism. For high-volume evaluation, set up automated LLM Judge evals and use human review to validate edge cases or disputed results.

Edit this page — Open a pull request

A/B Comparison Exporting Eval Results

​How to submit a human review

​How the override is stored

​When to use human review

How to submit a human review

How the override is stored

When to use human review