#18

BenchFlow

Commercial

medium confidence

www.benchflow.ai ↗ · Status: active confirmed · Founded 2024

BenchFlow is an early-stage, YC-backed open-source 'environment lab' building evaluation infrastructure and a community Benchmark Hub for AI agents, with products including SkillsBench, ClawsBench (mock workplace environments) and a sandboxed agent runtime. It positions environments as 'the new data' for training and evaluating agents across domains like enterprise workflows, coding, computer use and browser tasks.

Key facts

Headquarters

New Castle, DE, USA (incorporation); Bay Area / San Francisco operating presencereported cite

Headcount band

1-10reported cite

Total raised

$1.0Mreported cite

Last round

Seed, $1M, January 2025reported cite

SOC 2

unknown

What they sell

environmentsconfirmed cite

Open source

yesconfirmed cite

Deployment

API + self-hosted (open-source runtime; sandboxes via Docker/Daytona/Modal); Benchmark Hub managed-hostedreported cite

Scale & velocity

Current headcount

1-10 employees (LinkedIn shows 2-10 band; ~3 identified on LinkedIn as of 2026-06-07)reported cite

Headcount growth

unknown

Open roles

unknown

Other locations

unknown

Distributed / remote

unknown

Research depth

Has researchers

yesreported cite

Researcher count

unknown

Backgrounds

Xiangyi Li (founder/CEO), creator of SkillsBench; prior engineering roles per founder interview, Moritz Wallawitsch, early co-founder, reported departure ~Feb 2025reported cite

Papers / benchmarks

SkillsBench, 'Benchmarking How Well Agent Skills Work Across Diverse Tasks' (arXiv:2602.12670); 86 tasks across 11 domains with curated Skills and deterministic verifiers, ClawsBench (mock workplace environments: Gmail, Calendar, Drive, Docs, Slack), Benchmark Hub (community ports incl. OS-World, WebArena)confirmed cite

Capital

Total raised

$1.0Mreported cite

Last round

Seed, $1M, January 2025reported cite

Investors

Y Combinator, Pear VC, Construct Capital, FAST by GETTYLAB, Ankit Jain (angel)reported cite

Valuation

unknown

Revenue signals

unknown

Security & compliance

SOC 2

unknown

Other certifications

unknown

Security page

unknown cite

Product

What they sell

environmentsconfirmed cite

Open source

yesconfirmed cite

License

Apache-2.0confirmed cite

Deployment model

API + self-hosted (open-source runtime; sandboxes via Docker/Daytona/Modal); Benchmark Hub managed-hostedreported cite

Maturity

research preview / early-stage (open-source runtime and Benchmark Hub live and actively released; RFT framework still in development; pre-Series-A YC startup)estimated cite

Notable customers

unknown

Buyer analysis

Best fit: Teams needing open-source, reproducible agent evaluation environments and a runtime to benchmark coding/computer-use/workplace agents at low setup cost.

How we verified this

Re-verified BenchFlow independently. Company identity is CORRECT and matches the directory note 'dev workflow benchmarking', BenchFlow (benchflow-ai) is an early-stage YC-backed open-source 'environment lab' for AI-agent benchmarking/evaluation founded by Xiangyi Li; not a same-named unrelated entity. Confirmed via official site, GitHub (Apache-2.0, 249 stars, release 0.5.2 on 2026-06-05), and the SkillsBench arXiv paper (2602.12670). Funding ($1M seed, Jan 2025; YC/Pear/Construct/FAST-GETTYLAB/Ankit Jain) traces to a SINGLE aggregator (startupintros) with no primary press release; Crunchbase/PitchBook were inaccessible, kept all funding fields at 'reported' and flagged the single-source weakness rather than upgrading. Headcount 1-10 corroborated by LinkedIn and the aggregator (no inflated 200+ figure). No named customers exist on the official site or any third party; the 'featured during Google Gemini launch' claim appears only in AI-generated search summaries with no primary source and is explicitly marked unverified (the site lists Gemini merely as a supported model). No SOC2, certifications, or trust/security page found, all left 'unknown'. Main correction: downgraded maturity from 'GA' to 'research preview/early-stage' as GA overreaches for a pre-Series-A startup with an in-development RFT framework; modest upgrade of has_researchers to 'reported' given the arXiv publication. Overall confidence: medium.

Related vendors

Sources

www.benchflow.ai/ · 2026-06-07, Official site, 'frontier environment lab for AI agents'; products SkillsBench, ClawsBench, Runtime
docs.benchflow.ai/introduction · 2026-06-07, Official docs, Benchmark Hub + eval infra; RFT framework in development
github.com/benchflow-ai/benchflow · 2026-06-07, Main repo, Apache-2.0, ~249 stars, release 0.5.2 (2026-06-05); RL environments framework, sandboxes via Docker/Daytona/Modal
github.com/benchflow-ai/skillsbench · 2026-06-07, SkillsBench repo
startupintros.com/orgs/benchflow · 2026-06-07, Founded 2024, New Castle DE, $1M seed Jan 2025, investors (YC, Pear, Construct, FAST/GETTYLAB, Ankit Jain), 1-10 employees, co-founder departure
www.linkedin.com/company/benchflow-ai · 2026-06-07, Public snippet, 2-10 employees, founded 2024, Software Development; specialties 'Data and environments for agents to learn'
www.linkedin.com/in/l1xiangyi/ · 2026-06-07, Founder Xiangyi Li, creator of SkillsBench
www.inverse.com/tech/building-ais-testing-ground-benchflows-mission-as-e · 2026-06-07, Founder interview, founding story, background, no funding/customers named
news.ycombinator.com/item?id=43440893 · 2026-06-07, Show HN: BenchFlow – run AI benchmarks as an API
www.crunchbase.com/organization/benchflow · 2026-06-07, Crunchbase profile, returned HTTP 403 (not directly accessible)
pitchbook.com/profiles/company/711737-02 · 2026-06-07, PitchBook profile (not fetched)

Last updated 2026-06-07 · Every quantitative field carries a source and a confidence tag. Fields we could not source publicly are marked unknown, never estimated. See the methodology.