Human Data Evals Lead (Remote/US/LATAM)
Anyone Ai
Reports to: CEO Owns: data proposals, sample development, quality, and pilot delivery Location: Remote / Latam / US THE ROLE You will own Anyone AIβs data initiatives and proposals to AI labs, from the data proposal or responding to requests, through pilot delivery. You own how we build proposals and develop the sample packages and benchmarks: frontier-grade packages across reasoning, coding, agents, and tool use, multi-modal and others, produced in collaboration with subject-matter experts, with expert-verified ground truth, multi-model headroom results, and QC that survives buyer-side scrutiny. You are the person who designs the sample that demonstrates our quality, converts pilots into production engagements. On a small team, this is the operational center of the Human Data Division. RESPONSIBILITIES - Proposals & requests. Study public benchmarks and eval targets, and turn them into proposals and sample packages that demonstrate capability and win the work. Respond to lab data requests and pilots. - Sample & benchmark development. Design and build the sample packages, working with subject-matter experts. Every package meets the bar of our current sample set: - Expert-verified, ... Click Apply to read the full job description.