Skip to main content
The four built-in eval types (rule, similarity, llm_judge, metric) cover most use cases. If you need a custom scoring strategy, you can add a new executor function following the same pattern as the existing ones in src/main/eval-executors/.

1. Understand the existing pattern

Each eval type is a standalone async (or sync) function in src/main/eval-executors/. There is no shared class hierarchy — each executor receives a typed config and returns a typed result:
// src/main/eval-executors/rule-executor.ts (simplified)
export async function executeRule(
  config: RuleConfig,
  output: string
): Promise<RuleResult> { … }

// src/main/eval-executors/metric-executor.ts (simplified)
export function executeMetric(
  config: MetricConfig,
  metrics: MetricInput
): MetricResult { … }

2. Define the config and result types

Add your new config and result types to src/shared/eval-types.ts:
// src/shared/eval-types.ts — extend EvalType union
export type EvalType = 'llm_judge' | 'rule' | 'similarity' | 'metric' | 'my_custom'

// Add config interface
export interface MyCustomConfig {
  expectedFormat: string // example field
  caseSensitive: boolean
}

// EvalConfig union — add your config
export type EvalConfig =
  | RuleConfig
  | LlmJudgeConfig
  | SimilarityConfig
  | MetricConfig
  | MyCustomConfig

3. Implement the executor

Create a new file in src/main/eval-executors/:
// src/main/eval-executors/my-custom-executor.ts
import type { MyCustomConfig } from '../../shared/eval-types'
import log from 'electron-log/node'

export interface MyCustomResult {
  passed: boolean
  detail: string
  error?: string
}

export function executeMyCustom(config: MyCustomConfig, output: string): MyCustomResult {
  try {
    const target = config.caseSensitive
      ? config.expectedFormat
      : config.expectedFormat.toLowerCase()
    const actual = config.caseSensitive ? output : output.toLowerCase()

    const passed = actual.includes(target)
    return {
      passed,
      detail: passed
        ? `Output contains expected format "${config.expectedFormat}"`
        : `Output does not contain "${config.expectedFormat}"`,
    }
  } catch (err) {
    log.error('MyCustomExecutor error:', err)
    return { passed: false, detail: '', error: String(err) }
  }
}

4. Wire it into the worker

The src/workers/eval-worker.ts file dispatches to each executor based on eval.type. Add a case for your new type:
// src/workers/eval-worker.ts — inside the switch on eval.type
case 'my_custom': {
  const { executeMyCustom } = await import('../main/eval-executors/my-custom-executor')
  const result = executeMyCustom(eval.config as MyCustomConfig, run.outputResponse)
  score = result.passed ? 1 : 0
  passed = result.passed
  detail = result.detail
  error = result.error
  break
}

5. Add the config UI

In src/renderer/src/components/evals/, add a settings component for your config fields and register it in the eval case form — follow the pattern of the existing RuleConfig form component.

6. Add a DB migration (if needed)

If your executor needs additional columns in the eval_definitions settings blob, the config is stored as JSON so no column migration is required — just add your fields to the TypeScript interface.

7. Write tests

// src/main/eval-executors/my-custom-executor.test.ts
import { executeMyCustom } from './my-custom-executor'

describe('executeMyCustom', () => {
  it('passes when output contains expected format', () => {
    const result = executeMyCustom(
      { expectedFormat: 'Conclusion', caseSensitive: false },
      'In Conclusion, the analysis shows…'
    )
    expect(result.passed).toBe(true)
  })

  it('fails when expected format is absent', () => {
    const result = executeMyCustom(
      { expectedFormat: 'Conclusion', caseSensitive: false },
      'The analysis shows mixed results.'
    )
    expect(result.passed).toBe(false)
  })
})

8. Update docs

Add your eval type to the Eval Framework section in docs/docs.json and create a new page at docs/eval-framework/my-custom.mdx following the pattern of the existing eval type pages.
Edit this page — Open a pull request