Skip to main content
An Eval Profile is a named collection of eval definitions that activate together. When a profile is active, every agent response is evaluated against all the evals in that profile.

Creating a profile

1

Open the Eval Framework

Click the Flask icon in the sidebar.
2

Go to the Profiles tab

Click Profiles in the top tab bar of the Eval Studio.
3

Click New Profile

Click + New profile.
4

Fill in details

Enter a Name and optional Description. A good name describes the workload being tested, e.g. “Contract Analysis — Accuracy”, “Code Review — Completeness”.
5

Save the profile

Click Save. The profile appears in the list, ready for evals to be assigned.

Assigning evals to a profile

After creating a profile, expand it to see a checklist of all your eval definitions. Check the box next to each eval you want in this profile. Unchecking removes it. Alternatively, when editing an eval definition in the Eval Editor, the profileIds field is updated automatically when you toggle it from the profile card.

Activating a profile

Click Activate on a profile card to make it active. Only active profiles trigger automatic evaluation on new responses. You can have multiple profiles active at the same time — each one evaluates responses independently.

Organising evals

Eval definitions have built-in organisation fields you can set in the Eval Editor:
  • Prioritynormal or critical. A failing critical eval sets the composite score to 0 regardless of other scores.
  • Scopechat, skill, workflow, or all. Scopes an eval to only run against responses produced in a specific context.
  • Weight — relative weight (1–10) used in the composite score formula.
  • Statusdraft (not yet running), active (auto-evaluates), or archived.

Importing and exporting profiles

Profiles can be exported as JSON and shared with teammates. Click ⋮ → Export Profile on the profile card. The export includes all profile metadata but not run history. To import: click Import Profile in the profile list and select the JSON file.

Profile best practices

  • Keep profiles focused — one profile per feature area or eval dimension
  • Mark business-critical evals as priority: critical so one failure triggers immediate visibility
  • Start with a small set of rule and metric evals before adding LLM judge evals (lower cost, faster feedback)
Edit this page — Open a pull request