- Automated scores miss nuance that expert judgment can catch
- You want to audit LLM judge results for reliability
- Regulatory or compliance requirements mandate human-in-the-loop sign-off
How to submit a human review
Open the Eval Studio dashboard
Click the Flask icon in the sidebar. The Conversations tab shows one card per captured agent
response.
Open a run result
Click a conversation card to open the run detail view. Each eval result for that response is
listed with its automated score.
Enter a score and comment
Enter a score from 0.0–1.0 and an optional comment explaining your assessment.
How the override is stored
ThehumanOverride field on an eval result contains:
| Field | Type | Description |
|---|---|---|
score | number | Human-assigned score, 0.0–1.0 |
reviewerNote | string | Optional qualitative comment |
reviewedAt | string | ISO timestamp of submission |
human_override_score and human_override_note CSV columns.
When to use human review
Human override is best used as a spot-check layer on top of automated evals, not as the primary eval mechanism. For high-volume evaluation, set up automated LLM Judge evals and use human review to validate edge cases or disputed results.Edit this page — Open a pull
request