#7
Datacurve
Commercial
medium confidence
Datacurve is a YC W24 commercial data vendor that supplies expert-curated frontier coding data, RLHF traces, and repository-wide reinforcement learning environments (with unit-test verifiers) to foundation model labs, sourced via its Shipd bounty platform of vetted software engineers. It also publishes DeepSWE, a long-horizon agentic coding benchmark.
Key facts
Headquarters
San Francisco, USA
confirmed citeLast round
Series A, $15M, October 2025
confirmed citeOpen source
no (company products are commercial/proprietary; only the DeepSWE benchmark repo is published publicly)
reported citeDeployment
managed-hosted (data delivered as a service; private model endpoints spun up for RLHF traces)
reported cite
Scale & velocity
Current headcount
approx 36 (LinkedIn 'Discover all 36 employees', size band 11-50) as of 2026-06-07
reported citeDistributed / remote
no (San Francisco office-based; careers page lists in-office meals and commuter benefits)
estimated cite
Research depth
Backgrounds
Serena Ge (co-founder/CEO): worked on LLM reasoning during a co-op at Cohere; University of Waterloo CS; Forbes 30 Under 30, Charley Lee (co-founder): University of Waterloo CS; AI research background
reported citePapers / benchmarks
DeepSWE, long-horizon agentic coding benchmark, 113 tasks across TypeScript/Go/Python/JavaScript/Rust with isolated test environments and program-based verifiers (github.com/datacurve-ai/deep-swe). Distinct from the Together AI/Agentica 'DeepSWE' coding agent of the same name.
confirmed cite
Capital
Total raised
$17.7M ($15M Series A + $2.7M seed)
confirmed citeLast round
Series A, $15M, October 2025
confirmed citeInvestors
Chemistry (Mark Goldberg, lead Series A), Y Combinator, Balaji Srinivasan (seed), angel investors who are employees of DeepMind, Vercel, Anthropic and OpenAI (individuals, not the companies)
reported citeRevenue signals
unknown (over $1M paid out in bounties to contributors, a payout figure, not revenue)
cite
Security & compliance
Other certifications
unknown
Product
Open source
no (company products are commercial/proprietary; only the DeepSWE benchmark repo is published publicly)
reported citeLicense
unknown (DeepSWE repo has no license file shown as of access date)
citeDeployment model
managed-hosted (data delivered as a service; private model endpoints spun up for RLHF traces)
reported cite
Buyer analysis
Best fit: Frontier/foundation model labs needing expert-sourced coding SFT/RLHF data and code-execution RL environments with verifiable rewards (code execution, tight loops).
How we verified this
Confirmed this is the correct company: Datacurve (YC W24, datacurve.ai) sells frontier coding data and repo-wide RL environments with unit-test verifiers, matching the directory note 'code execution, tight loops'. Funding re-verified: $15M Series A (Oct 2025, led by Chemistry/Mark Goldberg) + $2.7M seed (Balaji Srinivasan) = $17.7M total, corroborated by TechCrunch and the University of Waterloo announcement; a StartupHub '$34M' figure was treated as an unreliable aggregator outlier and discounted. Valuation undisclosed (unknown). Headcount ~36 / band 11-50 confirmed via LinkedIn public snippet (consistent with a Series A startup; YC's 'team size 4' is stale). Founded 2024 confirmed. notable_customers kept empty/unknown: press and aggregators reference unnamed 'frontier/leading AI labs' but no specific customer is verifiable, and no institutional frontier-lab investor exists (lab affiliations are individual angels). DeepSWE benchmark confirmed as Datacurve's own (113 tasks, 5 languages), explicitly distinguished from the same-named Together AI/Agentica coding agent. SOC2/certifications/security page remain unknown (no trust page found). Downgrades applied to open_source, notable_investors, and researcher_backgrounds to remove overreach; distributed_remote set to 'no (estimated)' based on on-site careers benefits.
Sources
- datacurve.ai/ · 2026-06-07, Official site, 'data engine for frontier AI', DeepSWE, total funding $17.7M
- datacurve.ai/careers · 2026-06-07, 3 open roles, SF location, team description
- www.ycombinator.com/companies/datacurve · 2026-06-07, Founded 2024, founders, W24 batch, SF, product description
- www.linkedin.com/company/datacurveai · 2026-06-07, Headcount ~36, size band 11-50, HQ San Francisco
- techcrunch.com/2025/10/09/datacurve-raises-15-million-to-take-on-scaleai · 2026-06-07, $15M Series A, Chemistry lead, investor list, $1M bounties paid, frontier lab ties
- sacra.com/c/datacurve/ · 2026-06-07, Funding, RLHF traces via private endpoints, repo-wide RL environments with unit tests, 14,000 vetted engineers
- github.com/datacurve-ai/deep-swe · 2026-06-07, DeepSWE benchmark, 113 tasks, 5 languages, 652 stars
- api.github.com/repos/datacurve-ai/deep-swe · 2026-06-07, Repo metadata: created 2026-05-15, pushed 2026-06-05, stars 652, license null
- uwaterloo.ca/computer-science/news/cs-led-startup-secures-177m-transform · 2026-06-07, Founder backgrounds: Ge ex-Cohere, Lee ex-Google/RL research, Waterloo CS
- www.chemistry.vc/post/staying-ahead-of-the-curve · 2026-06-07, Lead investor Chemistry's Series A announcement
- www.menlotimes.com/post/datacurve-is-taking-on-scale-ai-building-frontie · 2026-06-07, Bounty model, complex RL environments, future expansion plans
Last updated 2026-06-07 · Every quantitative field carries a source and a confidence tag. Fields we could not source publicly are marked
unknown, never estimated. See the
methodology.