Author the criteria
Criterion-level rubrics in a project folder with schema validation, examples, evidence requirements, and theme tags.
→Local, file-based, git-friendly authoring for the criterion-level evaluations that now shape frontier AI. Author, test, calibrate, diff, and export.
Criteria, judges, samples, calibration data — a project folder a reviewer can diff.
Cohen κ, Fleiss κ, Krippendorff α, bootstrap intervals, ordinal support.
rubric-spec, judge cards, manifests, adapters for Inspect, Evals, Promptfoo.
Write the criteria. Score with a mock or BYO judge. Diff the wording, see the score impact.
Criterion-level rubrics in a project folder with schema validation, examples, evidence requirements, and theme tags.
→Bring expert scores into the calibration tab. Compute agreement. Probe judge bias. Rank criteria that need work.
→Semantic rubric changes next to score-impact overlays. Export rubric-spec, manifests, conformance badges, intake packets.
Every project leaves a folder. Every export is portable. Reviewers run it without a hosted account.
Portable rubric in the rubric-spec schema. Validated, linted, diffable, and adapter-ready.
Disclosure card for the judge prompt: calibration results, known bias, use envelope, limits.
Reproducible scoring envelope with provenance, hashes, and the exact data the run touched.
Exports for Inspect, OpenAI Evals, Promptfoo, Hugging Face, and lm-eval-harness.
Signed .auraonepkg with a privacy preview before handoff to AuraOne reviewers.
Local-first IDE for MCP and A2A agents. Replay, compare, export.
See the page →Scrub sensor streams. Cluster failures. Export reviewed subsets.
See the page →Twelve installable packages including rubric-spec, iaa-kit, judge-bench, judge-card.
See the page →Open is not a trial. It is the IDE. Cloud begins when multi-author review is the actual problem.