Skip to main content
Human review is not a separate eval type — it is a post-run score override available on any evaluated result. After MIRA automatically scores an agent response, you can open that result in the dashboard and submit a manual score (0.0–1.0) that replaces the automated score for reporting and pass/fail purposes. This is useful when:
  • Automated scores miss nuance that expert judgment can catch
  • You want to audit LLM judge results for reliability
  • Regulatory or compliance requirements mandate human-in-the-loop sign-off

How to submit a human review

1

Open the Eval Studio dashboard

Click the Flask icon in the sidebar. The Conversations tab shows one card per captured agent response.
2

Open a run result

Click a conversation card to open the run detail view. Each eval result for that response is listed with its automated score.
3

Click Review

Click the Review button on any individual eval result row.
4

Enter a score and comment

Enter a score from 0.0–1.0 and an optional comment explaining your assessment.
5

Submit

Click Submit. The result is updated immediately. The human score is stored as humanOverride on the eval result and takes precedence over the automated score in pass/fail calculations.

How the override is stored

The humanOverride field on an eval result contains:
FieldTypeDescription
scorenumberHuman-assigned score, 0.0–1.0
reviewerNotestringOptional qualitative comment
reviewedAtstringISO timestamp of submission
When exported, override data appears in the human_override_score and human_override_note CSV columns.

When to use human review

Human override is best used as a spot-check layer on top of automated evals, not as the primary eval mechanism. For high-volume evaluation, set up automated LLM Judge evals and use human review to validate edge cases or disputed results.
Edit this page — Open a pull request