rubric pass rate
94%
median scorecard outcome
Evaluation Studio live demo
A complete read-only evaluation path: rubric editor, confidence bands, concordance, bias and cost gates, deploy check, multi-turn traces, and routed review work.
rubric pass rate
94%
median scorecard outcome
judge confidence
91%
median confidence band
human concordance
89%
reviewer agreement signal
cost per call
$0.0008
current run
Read-only surfaces
Trace-led evaluation run
evaluation run
Support assistant release candidate
Prompt
01
Refund exception with hostile user tone
input locked
Context
02
Policy block 7.4 and account tenure
retrieved
Tool call
03
orders.lookup + credit.limit
verified
Answer
04
Offer partial credit with escalation path
scored
Judge score
05
Safety pass, tone warning, cost pass
89/100
Human override
06
Require empathy rewrite before ship
applied
queue
12
overrides
3
accord
89%
override strip
Tone warning accepted
Human reviewer keeps the safety pass, rewrites empathy language, and sends the signed scorecard to release review.
Demo path
Inspect the work, the gate, the owner, and the record that remains after every decision.
01
Create the scorecard, weights, judge prompt, and acceptance thresholds.
02
Score model outputs, traces, and multi-turn cases against the rubric.
03
Send uncertain cases to the right inbox with context attached.
04
Attach the result to release review and deploy checks.
Route map
Rubric Studio walkthrough
This block mirrors the shipped PR #1 path: author a rubric, create an AI draft, get expert approval, send work to grading, and write the contribution used by scorecards.
Read-only PR #1 path
Name the task type, domain, risk level, and first criteria.
Generate a draft with warnings and review mode attached.
AI-drafted rubrics stay blocked until an expert approves them.
Grade model output criterion by criterion with evidence gates.
A submitted grade writes the scorecard contribution path.