Define the dataset
Name the languages, accents, and speakers your model is weakest on. Set target speaker counts, hours of audio, and the turnaround you need.
AuraOne · Human Data OS · Voice
Multilingual speech requests are waitlist-first and provider-dependent. Each scope carries rights metadata, evaluation-readiness notes, and review records before any licensing claim is made.
Name the languages, accents, and speakers your model is weakest on. Set target speaker counts, hours of audio, and the turnaround you need.
Every recording gets an identity-verified consent record: who spoke, in what language, under what rights, and whether the voice can be reused. The rights ride with the data, not just a worker contract.
Speech-to-text and text-to-speech rubrics, scored accuracy, resolved disagreements, and voice-agent safety cases. The data is reviewed for your task, not just collected.
One record from speaker to delivery, with an export artifact you can audit. When a speaker revokes, you can prove which recordings are gone.
What you can request
A speaker can revoke. The EU AI Act starts enforcing training-data provenance for high-risk systems in August 2026, and a voice is a biometric identifier. A competitor lost terabytes once, including who its workers were. That is why every recording carries its own signed, identity-verified consent record, and why your data is never pooled in one store a single breach can drain.
Tell us the languages, the accents, the speaker count, and the hours. A neutral source you can defend under audit, not aligned with any one lab. Public voice programs open by waitlist.