rl-list.com
UPDATED 2026.06.07
rl-list.com · Vendors · Datacurve
#7

Datacurve

Commercial
medium confidence
datacurve.ai ↗ · Status: active confirmed · Founded 2024

Datacurve is a YC W24 commercial data vendor that supplies expert-curated frontier coding data, RLHF traces, and repository-wide reinforcement learning environments (with unit-test verifiers) to foundation model labs, sourced via its Shipd bounty platform of vetted software engineers. It also publishes DeepSWE, a long-horizon agentic coding benchmark.

Key facts
Headquarters
San Francisco, USAconfirmed cite
Headcount band
11-50reported cite
Total raised
$17.7Mconfirmed cite
Last round
Series A, $15M, October 2025confirmed cite
SOC 2
unknown
What they sell
mixedconfirmed cite
Open source
no (company products are commercial/proprietary; only the DeepSWE benchmark repo is published publicly)reported cite
Deployment
managed-hosted (data delivered as a service; private model endpoints spun up for RLHF traces)reported cite

Scale & velocity

Current headcount
approx 36 (LinkedIn 'Discover all 36 employees', size band 11-50) as of 2026-06-07reported cite
Headcount growth
unknown
Open roles
3confirmed cite
Other locations
unknown
Distributed / remote
no (San Francisco office-based; careers page lists in-office meals and commuter benefits)estimated cite

Research depth

Has researchers
yesreported cite
Researcher count
unknown
Backgrounds
Serena Ge (co-founder/CEO): worked on LLM reasoning during a co-op at Cohere; University of Waterloo CS; Forbes 30 Under 30, Charley Lee (co-founder): University of Waterloo CS; AI research backgroundreported cite
Papers / benchmarks
DeepSWE, long-horizon agentic coding benchmark, 113 tasks across TypeScript/Go/Python/JavaScript/Rust with isolated test environments and program-based verifiers (github.com/datacurve-ai/deep-swe). Distinct from the Together AI/Agentica 'DeepSWE' coding agent of the same name.confirmed cite

Capital

Total raised
$17.7M ($15M Series A + $2.7M seed)confirmed cite
Last round
Series A, $15M, October 2025confirmed cite
Investors
Chemistry (Mark Goldberg, lead Series A), Y Combinator, Balaji Srinivasan (seed), angel investors who are employees of DeepMind, Vercel, Anthropic and OpenAI (individuals, not the companies)reported cite
Valuation
unknown
Revenue signals
unknown (over $1M paid out in bounties to contributors, a payout figure, not revenue) cite

Security & compliance

SOC 2
unknown
Other certifications
unknown
Security page
unknown

Product

What they sell
mixedconfirmed cite
Open source
no (company products are commercial/proprietary; only the DeepSWE benchmark repo is published publicly)reported cite
License
unknown (DeepSWE repo has no license file shown as of access date) cite
Deployment model
managed-hosted (data delivered as a service; private model endpoints spun up for RLHF traces)reported cite
Maturity
GAestimated cite
Notable customers
unknown

Buyer analysis

Best fit: Frontier/foundation model labs needing expert-sourced coding SFT/RLHF data and code-execution RL environments with verifiable rewards (code execution, tight loops).

How we verified this

Confirmed this is the correct company: Datacurve (YC W24, datacurve.ai) sells frontier coding data and repo-wide RL environments with unit-test verifiers, matching the directory note 'code execution, tight loops'. Funding re-verified: $15M Series A (Oct 2025, led by Chemistry/Mark Goldberg) + $2.7M seed (Balaji Srinivasan) = $17.7M total, corroborated by TechCrunch and the University of Waterloo announcement; a StartupHub '$34M' figure was treated as an unreliable aggregator outlier and discounted. Valuation undisclosed (unknown). Headcount ~36 / band 11-50 confirmed via LinkedIn public snippet (consistent with a Series A startup; YC's 'team size 4' is stale). Founded 2024 confirmed. notable_customers kept empty/unknown: press and aggregators reference unnamed 'frontier/leading AI labs' but no specific customer is verifiable, and no institutional frontier-lab investor exists (lab affiliations are individual angels). DeepSWE benchmark confirmed as Datacurve's own (113 tasks, 5 languages), explicitly distinguished from the same-named Together AI/Agentica coding agent. SOC2/certifications/security page remain unknown (no trust page found). Downgrades applied to open_source, notable_investors, and researcher_backgrounds to remove overreach; distributed_remote set to 'no (estimated)' based on on-site careers benefits.

Related vendors

Sources

  1. datacurve.ai/ · 2026-06-07, Official site, 'data engine for frontier AI', DeepSWE, total funding $17.7M
  2. datacurve.ai/careers · 2026-06-07, 3 open roles, SF location, team description
  3. www.ycombinator.com/companies/datacurve · 2026-06-07, Founded 2024, founders, W24 batch, SF, product description
  4. www.linkedin.com/company/datacurveai · 2026-06-07, Headcount ~36, size band 11-50, HQ San Francisco
  5. techcrunch.com/2025/10/09/datacurve-raises-15-million-to-take-on-scaleai · 2026-06-07, $15M Series A, Chemistry lead, investor list, $1M bounties paid, frontier lab ties
  6. sacra.com/c/datacurve/ · 2026-06-07, Funding, RLHF traces via private endpoints, repo-wide RL environments with unit tests, 14,000 vetted engineers
  7. github.com/datacurve-ai/deep-swe · 2026-06-07, DeepSWE benchmark, 113 tasks, 5 languages, 652 stars
  8. api.github.com/repos/datacurve-ai/deep-swe · 2026-06-07, Repo metadata: created 2026-05-15, pushed 2026-06-05, stars 652, license null
  9. uwaterloo.ca/computer-science/news/cs-led-startup-secures-177m-transform · 2026-06-07, Founder backgrounds: Ge ex-Cohere, Lee ex-Google/RL research, Waterloo CS
  10. www.chemistry.vc/post/staying-ahead-of-the-curve · 2026-06-07, Lead investor Chemistry's Series A announcement
  11. www.menlotimes.com/post/datacurve-is-taking-on-scale-ai-building-frontie · 2026-06-07, Bounty model, complex RL environments, future expansion plans
Last updated 2026-06-07 · Every quantitative field carries a source and a confidence tag. Fields we could not source publicly are marked unknown, never estimated. See the methodology.