rl-list.com
UPDATED 2026.06.07
rl-list.com · Vendors · BenchFlow
#18

BenchFlow

Commercial
medium confidence
www.benchflow.ai ↗ · Status: active confirmed · Founded 2024

BenchFlow is an early-stage, YC-backed open-source 'environment lab' building evaluation infrastructure and a community Benchmark Hub for AI agents, with products including SkillsBench, ClawsBench (mock workplace environments) and a sandboxed agent runtime. It positions environments as 'the new data' for training and evaluating agents across domains like enterprise workflows, coding, computer use and browser tasks.

Key facts
Headquarters
New Castle, DE, USA (incorporation); Bay Area / San Francisco operating presencereported cite
Headcount band
1-10reported cite
Total raised
$1.0Mreported cite
Last round
Seed, $1M, January 2025reported cite
SOC 2
unknown
What they sell
environmentsconfirmed cite
Open source
yesconfirmed cite
Deployment
API + self-hosted (open-source runtime; sandboxes via Docker/Daytona/Modal); Benchmark Hub managed-hostedreported cite

Scale & velocity

Current headcount
1-10 employees (LinkedIn shows 2-10 band; ~3 identified on LinkedIn as of 2026-06-07)reported cite
Headcount growth
unknown
Open roles
unknown
Other locations
unknown
Distributed / remote
unknown

Research depth

Has researchers
yesreported cite
Researcher count
unknown
Backgrounds
Xiangyi Li (founder/CEO), creator of SkillsBench; prior engineering roles per founder interview, Moritz Wallawitsch, early co-founder, reported departure ~Feb 2025reported cite
Papers / benchmarks
SkillsBench, 'Benchmarking How Well Agent Skills Work Across Diverse Tasks' (arXiv:2602.12670); 86 tasks across 11 domains with curated Skills and deterministic verifiers, ClawsBench (mock workplace environments: Gmail, Calendar, Drive, Docs, Slack), Benchmark Hub (community ports incl. OS-World, WebArena)confirmed cite

Capital

Total raised
$1.0Mreported cite
Last round
Seed, $1M, January 2025reported cite
Investors
Y Combinator, Pear VC, Construct Capital, FAST by GETTYLAB, Ankit Jain (angel)reported cite
Valuation
unknown
Revenue signals
unknown

Security & compliance

SOC 2
unknown
Other certifications
unknown
Security page
unknown cite

Product

What they sell
environmentsconfirmed cite
Open source
yesconfirmed cite
License
Apache-2.0confirmed cite
Deployment model
API + self-hosted (open-source runtime; sandboxes via Docker/Daytona/Modal); Benchmark Hub managed-hostedreported cite
Maturity
research preview / early-stage (open-source runtime and Benchmark Hub live and actively released; RFT framework still in development; pre-Series-A YC startup)estimated cite
Notable customers
unknown

Buyer analysis

Best fit: Teams needing open-source, reproducible agent evaluation environments and a runtime to benchmark coding/computer-use/workplace agents at low setup cost.

How we verified this

Re-verified BenchFlow independently. Company identity is CORRECT and matches the directory note 'dev workflow benchmarking', BenchFlow (benchflow-ai) is an early-stage YC-backed open-source 'environment lab' for AI-agent benchmarking/evaluation founded by Xiangyi Li; not a same-named unrelated entity. Confirmed via official site, GitHub (Apache-2.0, 249 stars, release 0.5.2 on 2026-06-05), and the SkillsBench arXiv paper (2602.12670). Funding ($1M seed, Jan 2025; YC/Pear/Construct/FAST-GETTYLAB/Ankit Jain) traces to a SINGLE aggregator (startupintros) with no primary press release; Crunchbase/PitchBook were inaccessible, kept all funding fields at 'reported' and flagged the single-source weakness rather than upgrading. Headcount 1-10 corroborated by LinkedIn and the aggregator (no inflated 200+ figure). No named customers exist on the official site or any third party; the 'featured during Google Gemini launch' claim appears only in AI-generated search summaries with no primary source and is explicitly marked unverified (the site lists Gemini merely as a supported model). No SOC2, certifications, or trust/security page found, all left 'unknown'. Main correction: downgraded maturity from 'GA' to 'research preview/early-stage' as GA overreaches for a pre-Series-A startup with an in-development RFT framework; modest upgrade of has_researchers to 'reported' given the arXiv publication. Overall confidence: medium.

Related vendors

Sources

  1. www.benchflow.ai/ · 2026-06-07, Official site, 'frontier environment lab for AI agents'; products SkillsBench, ClawsBench, Runtime
  2. docs.benchflow.ai/introduction · 2026-06-07, Official docs, Benchmark Hub + eval infra; RFT framework in development
  3. github.com/benchflow-ai/benchflow · 2026-06-07, Main repo, Apache-2.0, ~249 stars, release 0.5.2 (2026-06-05); RL environments framework, sandboxes via Docker/Daytona/Modal
  4. github.com/benchflow-ai/skillsbench · 2026-06-07, SkillsBench repo
  5. startupintros.com/orgs/benchflow · 2026-06-07, Founded 2024, New Castle DE, $1M seed Jan 2025, investors (YC, Pear, Construct, FAST/GETTYLAB, Ankit Jain), 1-10 employees, co-founder departure
  6. www.linkedin.com/company/benchflow-ai · 2026-06-07, Public snippet, 2-10 employees, founded 2024, Software Development; specialties 'Data and environments for agents to learn'
  7. www.linkedin.com/in/l1xiangyi/ · 2026-06-07, Founder Xiangyi Li, creator of SkillsBench
  8. www.inverse.com/tech/building-ais-testing-ground-benchflows-mission-as-e · 2026-06-07, Founder interview, founding story, background, no funding/customers named
  9. news.ycombinator.com/item?id=43440893 · 2026-06-07, Show HN: BenchFlow – run AI benchmarks as an API
  10. www.crunchbase.com/organization/benchflow · 2026-06-07, Crunchbase profile, returned HTTP 403 (not directly accessible)
  11. pitchbook.com/profiles/company/711737-02 · 2026-06-07, PitchBook profile (not fetched)
Last updated 2026-06-07 · Every quantitative field carries a source and a confidence tag. Fields we could not source publicly are marked unknown, never estimated. See the methodology.