Datacurve

Commercial

medium confidence

datacurve.ai ↗ · Status: active confirmed · Founded 2024

Datacurve is a YC W24 commercial data vendor that supplies expert-curated frontier coding data, RLHF traces, and repository-wide reinforcement learning environments (with unit-test verifiers) to foundation model labs, sourced via its Shipd bounty platform of vetted software engineers. It also publishes DeepSWE, a long-horizon agentic coding benchmark.

Key facts

Headquarters

San Francisco, USAconfirmed cite

Headcount band

11-50reported cite

Total raised

$17.7Mconfirmed cite

Last round

Series A, $15M, October 2025confirmed cite

SOC 2

unknown

What they sell

mixedconfirmed cite

Open source

no (company products are commercial/proprietary; only the DeepSWE benchmark repo is published publicly)reported cite

Deployment

managed-hosted (data delivered as a service; private model endpoints spun up for RLHF traces)reported cite

Scale & velocity

Current headcount

approx 36 (LinkedIn 'Discover all 36 employees', size band 11-50) as of 2026-06-07reported cite

Headcount growth

unknown

Open roles

3confirmed cite

Other locations

unknown

Distributed / remote

no (San Francisco office-based; careers page lists in-office meals and commuter benefits)estimated cite

Research depth

Has researchers

yesreported cite

Researcher count

unknown

Backgrounds

Serena Ge (co-founder/CEO): worked on LLM reasoning during a co-op at Cohere; University of Waterloo CS; Forbes 30 Under 30, Charley Lee (co-founder): University of Waterloo CS; AI research backgroundreported cite

Papers / benchmarks

DeepSWE, long-horizon agentic coding benchmark, 113 tasks across TypeScript/Go/Python/JavaScript/Rust with isolated test environments and program-based verifiers (github.com/datacurve-ai/deep-swe). Distinct from the Together AI/Agentica 'DeepSWE' coding agent of the same name.confirmed cite

Capital

Total raised

$17.7M ($15M Series A + $2.7M seed)confirmed cite

Last round

Series A, $15M, October 2025confirmed cite

Investors

Chemistry (Mark Goldberg, lead Series A), Y Combinator, Balaji Srinivasan (seed), angel investors who are employees of DeepMind, Vercel, Anthropic and OpenAI (individuals, not the companies)reported cite

Valuation

unknown

Revenue signals

unknown (over $1M paid out in bounties to contributors, a payout figure, not revenue) cite

Security & compliance

SOC 2

unknown

Other certifications

unknown

Security page

unknown

Product

What they sell

mixedconfirmed cite

Open source

no (company products are commercial/proprietary; only the DeepSWE benchmark repo is published publicly)reported cite

License

unknown (DeepSWE repo has no license file shown as of access date) cite

Deployment model

managed-hosted (data delivered as a service; private model endpoints spun up for RLHF traces)reported cite

Maturity

GAestimated cite

Notable customers

unknown

Buyer analysis

Best fit: Frontier/foundation model labs needing expert-sourced coding SFT/RLHF data and code-execution RL environments with verifiable rewards (code execution, tight loops).

How we verified this

Confirmed this is the correct company: Datacurve (YC W24, datacurve.ai) sells frontier coding data and repo-wide RL environments with unit-test verifiers, matching the directory note 'code execution, tight loops'. Funding re-verified: $15M Series A (Oct 2025, led by Chemistry/Mark Goldberg) + $2.7M seed (Balaji Srinivasan) = $17.7M total, corroborated by TechCrunch and the University of Waterloo announcement; a StartupHub '$34M' figure was treated as an unreliable aggregator outlier and discounted. Valuation undisclosed (unknown). Headcount ~36 / band 11-50 confirmed via LinkedIn public snippet (consistent with a Series A startup; YC's 'team size 4' is stale). Founded 2024 confirmed. notable_customers kept empty/unknown: press and aggregators reference unnamed 'frontier/leading AI labs' but no specific customer is verifiable, and no institutional frontier-lab investor exists (lab affiliations are individual angels). DeepSWE benchmark confirmed as Datacurve's own (113 tasks, 5 languages), explicitly distinguished from the same-named Together AI/Agentica coding agent. SOC2/certifications/security page remain unknown (no trust page found). Downgrades applied to open_source, notable_investors, and researcher_backgrounds to remove overreach; distributed_remote set to 'no (estimated)' based on on-site careers benefits.

Related vendors

Sources

datacurve.ai/ · 2026-06-07, Official site, 'data engine for frontier AI', DeepSWE, total funding $17.7M
datacurve.ai/careers · 2026-06-07, 3 open roles, SF location, team description
www.ycombinator.com/companies/datacurve · 2026-06-07, Founded 2024, founders, W24 batch, SF, product description
www.linkedin.com/company/datacurveai · 2026-06-07, Headcount ~36, size band 11-50, HQ San Francisco
techcrunch.com/2025/10/09/datacurve-raises-15-million-to-take-on-scaleai · 2026-06-07, $15M Series A, Chemistry lead, investor list, $1M bounties paid, frontier lab ties
sacra.com/c/datacurve/ · 2026-06-07, Funding, RLHF traces via private endpoints, repo-wide RL environments with unit tests, 14,000 vetted engineers
github.com/datacurve-ai/deep-swe · 2026-06-07, DeepSWE benchmark, 113 tasks, 5 languages, 652 stars
api.github.com/repos/datacurve-ai/deep-swe · 2026-06-07, Repo metadata: created 2026-05-15, pushed 2026-06-05, stars 652, license null
uwaterloo.ca/computer-science/news/cs-led-startup-secures-177m-transform · 2026-06-07, Founder backgrounds: Ge ex-Cohere, Lee ex-Google/RL research, Waterloo CS
www.chemistry.vc/post/staying-ahead-of-the-curve · 2026-06-07, Lead investor Chemistry's Series A announcement
www.menlotimes.com/post/datacurve-is-taking-on-scale-ai-building-frontie · 2026-06-07, Bounty model, complex RL environments, future expansion plans

Last updated 2026-06-07 · Every quantitative field carries a source and a confidence tag. Fields we could not source publicly are marked unknown, never estimated. See the methodology.