#18
BenchFlow
Commercial
medium confidence
BenchFlow is an early-stage, YC-backed open-source 'environment lab' building evaluation infrastructure and a community Benchmark Hub for AI agents, with products including SkillsBench, ClawsBench (mock workplace environments) and a sandboxed agent runtime. It positions environments as 'the new data' for training and evaluating agents across domains like enterprise workflows, coding, computer use and browser tasks.
Key facts
Headquarters
New Castle, DE, USA (incorporation); Bay Area / San Francisco operating presence
reported citeLast round
Seed, $1M, January 2025
reported citeWhat they sell
environments
confirmed citeDeployment
API + self-hosted (open-source runtime; sandboxes via Docker/Daytona/Modal); Benchmark Hub managed-hosted
reported cite
Scale & velocity
Current headcount
1-10 employees (LinkedIn shows 2-10 band; ~3 identified on LinkedIn as of 2026-06-07)
reported citeDistributed / remote
unknown
Research depth
Backgrounds
Xiangyi Li (founder/CEO), creator of SkillsBench; prior engineering roles per founder interview, Moritz Wallawitsch, early co-founder, reported departure ~Feb 2025
reported citePapers / benchmarks
SkillsBench, 'Benchmarking How Well Agent Skills Work Across Diverse Tasks' (arXiv:2602.12670); 86 tasks across 11 domains with curated Skills and deterministic verifiers, ClawsBench (mock workplace environments: Gmail, Calendar, Drive, Docs, Slack), Benchmark Hub (community ports incl. OS-World, WebArena)
confirmed cite
Capital
Last round
Seed, $1M, January 2025
reported citeInvestors
Y Combinator, Pear VC, Construct Capital, FAST by GETTYLAB, Ankit Jain (angel)
reported cite
Security & compliance
Other certifications
unknown
Product
What they sell
environments
confirmed citeDeployment model
API + self-hosted (open-source runtime; sandboxes via Docker/Daytona/Modal); Benchmark Hub managed-hosted
reported citeMaturity
research preview / early-stage (open-source runtime and Benchmark Hub live and actively released; RFT framework still in development; pre-Series-A YC startup)
estimated cite
Buyer analysis
Best fit: Teams needing open-source, reproducible agent evaluation environments and a runtime to benchmark coding/computer-use/workplace agents at low setup cost.
How we verified this
Re-verified BenchFlow independently. Company identity is CORRECT and matches the directory note 'dev workflow benchmarking', BenchFlow (benchflow-ai) is an early-stage YC-backed open-source 'environment lab' for AI-agent benchmarking/evaluation founded by Xiangyi Li; not a same-named unrelated entity. Confirmed via official site, GitHub (Apache-2.0, 249 stars, release 0.5.2 on 2026-06-05), and the SkillsBench arXiv paper (2602.12670). Funding ($1M seed, Jan 2025; YC/Pear/Construct/FAST-GETTYLAB/Ankit Jain) traces to a SINGLE aggregator (startupintros) with no primary press release; Crunchbase/PitchBook were inaccessible, kept all funding fields at 'reported' and flagged the single-source weakness rather than upgrading. Headcount 1-10 corroborated by LinkedIn and the aggregator (no inflated 200+ figure). No named customers exist on the official site or any third party; the 'featured during Google Gemini launch' claim appears only in AI-generated search summaries with no primary source and is explicitly marked unverified (the site lists Gemini merely as a supported model). No SOC2, certifications, or trust/security page found, all left 'unknown'. Main correction: downgraded maturity from 'GA' to 'research preview/early-stage' as GA overreaches for a pre-Series-A startup with an in-development RFT framework; modest upgrade of has_researchers to 'reported' given the arXiv publication. Overall confidence: medium.
Sources
- www.benchflow.ai/ · 2026-06-07, Official site, 'frontier environment lab for AI agents'; products SkillsBench, ClawsBench, Runtime
- docs.benchflow.ai/introduction · 2026-06-07, Official docs, Benchmark Hub + eval infra; RFT framework in development
- github.com/benchflow-ai/benchflow · 2026-06-07, Main repo, Apache-2.0, ~249 stars, release 0.5.2 (2026-06-05); RL environments framework, sandboxes via Docker/Daytona/Modal
- github.com/benchflow-ai/skillsbench · 2026-06-07, SkillsBench repo
- startupintros.com/orgs/benchflow · 2026-06-07, Founded 2024, New Castle DE, $1M seed Jan 2025, investors (YC, Pear, Construct, FAST/GETTYLAB, Ankit Jain), 1-10 employees, co-founder departure
- www.linkedin.com/company/benchflow-ai · 2026-06-07, Public snippet, 2-10 employees, founded 2024, Software Development; specialties 'Data and environments for agents to learn'
- www.linkedin.com/in/l1xiangyi/ · 2026-06-07, Founder Xiangyi Li, creator of SkillsBench
- www.inverse.com/tech/building-ais-testing-ground-benchflows-mission-as-e · 2026-06-07, Founder interview, founding story, background, no funding/customers named
- news.ycombinator.com/item?id=43440893 · 2026-06-07, Show HN: BenchFlow – run AI benchmarks as an API
- www.crunchbase.com/organization/benchflow · 2026-06-07, Crunchbase profile, returned HTTP 403 (not directly accessible)
- pitchbook.com/profiles/company/711737-02 · 2026-06-07, PitchBook profile (not fetched)
Last updated 2026-06-07 · Every quantitative field carries a source and a confidence tag. Fields we could not source publicly are marked
unknown, never estimated. See the
methodology.