OPEN · RUBRIC STUDIO · THE IDE FOR THE RUBRIC

The rubric becomes the artifact.

Local, file-based, git-friendly authoring for the criterion-level evaluations that now shape frontier AI. Author, test, calibrate, diff, and export.

Talk to AI Labs Read the docs

RUBRIC · behavior-shaping-v3

AGREEMENT

0.81 κ

BIAS PROBES

2 warn

DIFF IMPACT

14/200

AUTHOR

On disk

Criteria, judges, samples, calibration data — a project folder a reviewer can diff.

CALIBRATE

Agreement first

Cohen κ, Fleiss κ, Krippendorff α, bootstrap intervals, ordinal support.

EXPORT

Portable

rubric-spec, judge cards, manifests, adapters for Inspect, Evals, Promptfoo.

HOW IT WORKS

Three steps. Git-native authoring.

Write the criteria. Score with a mock or BYO judge. Diff the wording, see the score impact.

STEP 01

WHAT WE WRITE

Author the criteria

Criterion-level rubrics in a project folder with schema validation, examples, evidence requirements, and theme tags.

→

STEP 02

WHAT WE MEASURE

Calibrate against gold

Bring expert scores into the calibration tab. Compute agreement. Probe judge bias. Rank criteria that need work.

→

STEP 03

WHAT WE SHIP

Diff and export

Semantic rubric changes next to score-impact overlays. Export rubric-spec, manifests, conformance badges, intake packets.

WHAT COMES OUT

Files a reviewer can diff.

Every project leaves a folder. Every export is portable. Reviewers run it without a hosted account.

rubric.toml

Portable rubric in the rubric-spec schema. Validated, linted, diffable, and adapter-ready.

↳ ARTIFACT

judge-card.md

Disclosure card for the judge prompt: calibration results, known bias, use envelope, limits.

↳ ARTIFACT

eval-run-manifest.json

Reproducible scoring envelope with provenance, hashes, and the exact data the run touched.

↳ ARTIFACT

framework adapters

Exports for Inspect, OpenAI Evals, Promptfoo, Hugging Face, and lm-eval-harness.

↳ ARTIFACT

Intake packets

Signed .auraonepkg with a privacy preview before handoff to AuraOne reviewers.

↳ ARTIFACT

RELATED OPEN SURFACES

Next to this in AuraOne Open.

AGENT STUDIO OPEN

The debug loop, on your laptop.

Local-first IDE for MCP and A2A agents. Replay, compare, export.

See the page →

ROBOTICS STUDIO OPEN

Review teleop and VLA datasets, on disk.

Scrub sensor streams. Cluster failures. Export reviewed subsets.

See the page →

OPEN V2

Trust gates for agentic and embodied AI.

Twelve installable packages including rubric-spec, iaa-kit, judge-bench, judge-card.

See the page →

RUBRIC STUDIO OPEN

A folder a reviewer can diff.

Open is not a trial. It is the IDE. Cloud begins when multi-author review is the actual problem.

Talk to AI Labs Read the docs