Define the rubric
The label schema, the guidance, and the examples are versioned. Every annotator works against the same definition.
→Every label leaves with the rubric it was graded against, the reviewer who cleared it, and a signed chain of who created it and under what rights. A dataset that survives an audit — and runs on infrastructure you keep.
The label schema is reviewable and the same on every batch.
The reviewer who cleared the label follows it out of the workspace.
Who created it, who reviewed it, under what rights — sealed with the export.
Define the rubric. Label with review. Sign the dataset before it leaves.
The label schema, the guidance, and the examples are versioned. Every annotator works against the same definition.
→Annotation and review run in the same workspace. IAA, reopen rate, and reviewer notes stay attached to the batch.
→Export only when the gate clears. Rubric version, reviewer record, and verdict travel with the dataset.
Expert pay is quote-scoped because frontier work needs real domain experts. Run yours, ours, or a mix — every reviewer is identity-verified and calibration-tested before they touch a row.
Your coders, clinicians, lawyers, and analysts work in the same governed workspace. Their reviews carry the same signed chain as everyone else's.
Identity-verified, credential-checked, calibration-tested specialists routed to the exact rows the rubric flagged. Frontier work needs experts, not generalist crowds.
Mix your bench with ours on the same program. Every label — whoever made it — leaves with the reviewer who cleared it and the rights it was created under.
For real-work programs, experts recreate professional tasks from clean-room scenarios — no employer data, with a signed no-employer-data attestation. The trade-secret and PII exposure stays out of your dataset.
78% of teams cannot validate their training data and 77% cannot trace where it came from. The EU AI Act provenance provisions enforce in August 2026. Every label here carries the chain that answers them.
Each label carries the identity-verified person who made it — not an anonymous contractor onboarded like a consumer. The chain holds under audit.
The reviewer who cleared the row, the time they cleared it, and the rubric version they cleared it against travel inside the dataset.
Signed consent attached at the datapoint, not assumed at the contract. You can show what you are allowed to train on, row by row.
Your data runs on your infrastructure, never pooled in one vendor's data center. A competitor lost four terabytes — including who its workers were.
Founder-led, and not a data vendor aligned with one of the labs it serves. The kind of supply you can defend under audit.
PII and PHI handling, on-prem or VPC deployment, data residency controls, and audit logging on every action.
IAA below the threshold, reopen rate above the ceiling, coverage or gold-set agreement below the floor — any one check holds the release, with the reason attached. A workspace, mid-program.
Release status: review-scoped. This is a checked-in metrics snapshot; the provider-backed workspace is scoped only after provider readiness and review evidence are accepted. Pricing remains quote-scoped.
Target 0.75 · trend up
Ceiling 2.0% · trend down
3 blocked · 1 in review
6 auto-escalated · 38m median wait
Image, video, audio, text, structured, 3D point cloud, biosignal — seven modalities, one quality path. Masks, timelines, waveforms, cuboids, and spans all attach to the same review record.
Bounding boxes, polygons, and pixel-mask segmentation with SAM2 assist. Every stroke and every correction stays on the same review record.
Frame-by-frame tracking with timelines and action-recognition segments that reviewers can scrub together. Spans survive review.
Voice comments on segments, diarization, transcripts, and biosignal overlays kept together. The reviewer's voice note lives on the row that earned it.
Named entity recognition, sentiment, classification, span annotation, and multilingual prompts in one governed workflow.
Hierarchical taxonomies, metadata validation, and bulk edits for tables and schema-driven labels. One source of truth for the schema.
LiDAR and depth-sensor annotation with cuboids, measurement tools, and calibration evidence that travels with the sensor frame.
The Studio is where the work happens. Review is where it gets cleared. The Quality Hub is where it gets measured. Datasets are where it leaves.
Pull source data from your cloud buckets, pre-label with model assist, then work across image, video, text, audio, and LiDAR without relearning the workflow. Autosave keeps work intact when a connection drops.
Disagreements turn into clear decisions with shared context. Voice and video comments stay attached to the task, not scattered in side channels.
IAA, annotator drift, gold-set gaps, and the export gate sit in one hub before data leaves the workflow. Threshold rules can escalate when metrics drift.
Projects, datasets, queue routing, and exports stay tied to the same review workflow. Dataset management and export flows share one source of truth.
Every batch shows where the rubric was hit, where review caught the disagreement, and where policy held the line. One readout, three readings.
Every batch leaves something the training team can act on — and something the next reviewer can read.
The label schema and guidance the batch was annotated against, pinned and immutable.
Annotations tied to the rubric version, the annotator, and the time they were made.
The disagreement, the reasoning, and the resolution stay with the row that triggered them.
Export only after the gate clears. Rubric version and verdict travel inside the package.
Inter-annotator agreement, reopen rate, and pass rate for every batch — readable, not buried.
Seven export targets can be scoped during project setup. Every format carries the manifest with it — rubric version, reviewer coverage, gate state, and the checksum on the dataset.
Instance + keypoint + panoptic JSON for image and video workflows.
Bounding-box TXT files per image, class-index manifest included.
Polygon + mask coordinates for YOLO segmentation training.
Pascal VOC XML per image for long-tail legacy pipelines.
LabelMe JSON with shapes, groups, and flags preserved.
JSON Lines for streaming and incremental dataset updates.
Columnar dataset with schema enforced by the taxonomy editor.
the schema the batch was graded against
who cleared what, and the gate state
the training side verifies what it received
“Adjudication moved from spreadsheets to one review record. IAA is visible before export, and the quality lead can reopen a case without losing context.”
Six things change the first time the whole workflow runs. None of them are about labelling speed — they are about the record the labels leave behind.
Disputed labels go through adjudication instead of spreadsheets and side threads. Every disagreement turns into a row with a resolution.
Inter-annotator agreement is not a quarterly report. It runs against every batch, with reviewer-level and project-level views.
Reopen rate per annotator and per project with service-target alerts. A drift in either one can route to review before the dataset leaves.
Four checks can block a release when the dataset is weak. Gate state and the evidence stay on the job — not on a dashboard nobody reads.
Complete audit trail for every label decision. Who labelled it, who reviewed it, what changed, and why the cleared dataset was allowed to leave.
Work stays intact when the connection drops. Robotics sessions captured in the field arrive attached to the same clip record once the network returns.
The workspace keeps running when the network goes away. Mask edits, span boundaries, and reviewer notes queue locally and reconcile the moment you reconnect.
Robotics teams capturing in the field, clinical reviewers in a basement reading room, audio teams in a studio booth — none of them lose work. Two reviewers editing the same span merge cleanly, with the resolution kept.
Autosave committed
29 mask edits queued locally
Two reviewers edited the same span
Edits reconciled · resolution kept
Six questions the program lead asks before they sign the export. The same six come up across labs, programs, and regulated workflows.
Rubrics are versioned. A change creates a new version, and the program lead approves the move. Existing batches stay on the version they were labelled against — the export carries the version with it.
Adjudication is a defined role with its own queue. Disagreements flow into a single record with both labels, both reasons, and the resolver's call. The decision lives with the row.
Per batch and rolling 30-day, per annotator and per project. The Quality Hub surfaces the trend, the drift, and the threshold against the gate. Reopens and IAA share one source of truth.
Four checks. IAA below the threshold. Reopen rate above the ceiling. Coverage below the required percentage. Gold-set agreement below the floor. Any one of them blocks the release with the reason attached.
Through the export manifest. Export formats are scoped during project setup. The manifest carries the rubric version, reviewer coverage, gate state, and the dataset checksum so the training side can verify what it received.
Image, video, audio, text, structured, 3D point cloud, and biosignal are all on the same workflow. New modalities are added behind the same review, quality, and export gates as the existing ones.
Annotation breaks when the label, the disagreement, the quality reading, and the export gate live in different systems. The labels arrive. The disputes get emailed. The quality report comes out a week later. And the dataset leaves before anyone has actually looked at it.
On one record, the rubric version is pinned to the batch. The disagreement is a row in adjudication, not a thread in chat. The IAA reads against the gate before the export is even staged. And the dataset only leaves when the gate has cleared — with the reason, the reviewer, and the version still attached.
Test the run. Review the hard cases. Recruit the right specialist. Remember the misses. Approve what's right. When Evaluation Studio hits a case a model can't score, this is where a human scores it.
Specialists routed to the rows the rubric flagged.
See the page →Reads the rubric, the review, and the dataset alongside you.
See the page →The same rubric that grades a release grades the dataset.
See the page →Bring the rubric your reviewers already trust. We'll keep it attached to every label, every review, and every dataset that leaves — on infrastructure you keep.
Pilots start at a single reviewed batch. Bring the modality, the rubric, and the rights — we'll sign the dataset.