Pillar 01 · Updated 2026
RL Environment Companies: The 2026 List
RL environments are now the raw material of frontier AI: resettable simulations — a codebase, a browser, a CRM — where agents practice and get rewarded. One frontier lab has reportedly weighed spending over $1 billion on them in a year.
A working 2026 list of the companies building them — specialists, pivoted incumbents, and open-source labs — scored on a transparent rubric. Maintained by Huzzle Labs, who are listed and scored on the same criteria as everyone else.
How we rank
§ Methodology
Only companies whose primary product is reinforcement learning environments, simulators, or evaluation harnesses are eligible for the ranking. Adjacent businesses — human-data labelers, expert marketplaces, generalist AI services firms — are listed for context but kept out of the ranked list, even when they also offer environments on the side. Eligible companies are scored 0–10 on three weighted dimensions, and the weighted total sets the order. No pay-to-rank, no sponsored slots.
Growth & momentum
Funding, hiring, shipping cadence, and how the trajectory has moved over the last few quarters.
Traction & credibility
Named lab customers, founder pedigree, research output, and technical depth of the work.
Accessibility
Open-source releases, public docs, and how easily a team outside a frontier lab can engage.
Where a field is unconfirmed, we mark it rather than guess. Spot something wrong or missing? There's a correction link at the foot of every profile.
The list
§ 19 entries · sorted by weighted score
- 01
Prime Intellect
the open-source backbone of the category
Focus · Open Environments Hub, the PRIME-RL training framework, and verifiers — plus training and open-sourcing its own RL models.
- Domains
- Code · Math · Science · Reasoning
- Founded
- Est. 2024
- HQ
- San Francisco, USA
- Funding
- ~$70M (Series B, Jan 2026)
- Open source
- Open-source
····An open superintelligence stack: distributed compute, the PRIME-RL training framework, verifiers, and a public Environments Hub positioned as a Hugging Face for RL environments.
Best for · Teams that want open, reusable environments and tooling.
- 02
Mechanize
deep, narrow, elite
Focus · A small number of high-fidelity software-engineering environments and graders that score coding agents on real SWE tasks.
- Domains
- Code · Software Engineering
- Founded
- Est. 2025
- HQ
- San Francisco, USA
- Funding
- Undisclosed (angel, 2025)
- Open source
- Closed
····Builds a small number of high-fidelity software-engineering environments and evals for frontier coding agents, with graders that score performance on real SWE tasks.
Best for · Frontier coding-agent training where depth beats breadth.
- 03
General Reasoning
research-grade, open-leaning
Focus · Environments and reasoning data for long-horizon, multi-agent reliability, with open community releases.
- Domains
- Long Horizon · Reasoning · Multi-agent
- Founded
- Est. 2024
- HQ
- San Francisco, USA
- Funding
- Seed (undisclosed)
- Open source
- Open-source
····A research-and-infrastructure startup building environments and reasoning data for long-horizon, multi-agent reliability — paired with open community releases.
Best for · Long-horizon and multi-agent reasoning work.
- 03
Bespoke Labs
open data and reproducible recipes
Focus · Evaluation and data-curation tooling for post-training, with open datasets and reproducible reasoning recipes.
- Domains
- Data Curation · Evaluation · Reasoning
- Founded
- Est. 2024
- HQ
- San Francisco, USA
- Funding
- Seed (reported)
- Open source
- Open-source
····Evaluation and data-curation tooling for post-training, known for open datasets and reproducible reasoning recipes used widely across the community.
Best for · Eval pipelines and open dataset curation.
- 05
Halluminate
computer-use, done deterministically
Focus · Resettable, high-fidelity computer-use sandboxes (e.g., a synthetic CRM) and datasets, with a financial-services focus.
- Domains
- Computer Use · Browser · Finance
- Founded
- Est. 2024
- HQ
- San Francisco, USA
- Funding
- Bootstrapped / pre-seed (YC)
- Open source
- Closed
····YC-backed builder of resettable, high-fidelity sandboxes — a synthetic Salesforce/CRM, for example — plus proprietary datasets for training and evaluating computer- and browser-use agents.
Best for · Reproducible computer-use testing without flaky live sites.
- 06
Gray Swan AI
adversarial evaluation arenas
Focus · Red-teaming and safety-evaluation arenas that adversarially stress-test models and agents.
- Domains
- Security · Evaluation · Safety
- Founded
- Est. 2024
- HQ
- USA
- Funding
- Funded (verify round)
- Open source
- Closed
····Runs adversarial arenas and red-teaming programs that probe models and agents for failures, feeding safety evaluations back to developers.
Best for · Safety and robustness evaluation under adversarial pressure.
- 07
HUD
the wrapper layer
Focus · Tooling that wraps arbitrary software into dockerized, MCP-exposed RL environments.
- Domains
- Computer Use · Tooling · Browser
- Founded
- Est. 2024
- HQ
- San Francisco, USA
- Funding
- Undisclosed
- Open source
- Open-source
····Tooling that wraps arbitrary software — browsers, games, apps — into dockerized, MCP-exposed RL environments. Less an environment catalog than the infrastructure to build them.
Best for · Teams standing up custom environments from existing software.
- 08
Veris AI
the enterprise environment layer
Focus · High-fidelity simulated environments to train and test enterprise AI agents on real workflows — 'experience' instead of prompt engineering.
- Domains
- Enterprise · Agents · Long Horizon
- Founded
- Est. 2024
- HQ
- San Francisco, USA
- Funding
- $8.5M seed (2025)
- Open source
- Closed
····Builds high-fidelity simulated experiences so enterprises can train and test agents safely before production, targeting accuracy, consistency, and governance.
Best for · Enterprises training agents on their own workflows.
- 09
Huzzle Labs
environments + human data + evals in one stack
Focus · Long-horizon coding/agent environments, RLHF datasets, and eval benchmarks, paired with vetted human-expert feedback.
- Domains
- Long Horizon · Code · RLHF · Evaluation
- Founded
- Est. 2025
- HQ
- London, UK
- Funding
- Undisclosed
- Open source
- Closed
····Builds long-horizon coding and agent environments, RLHF datasets, and evaluation benchmarks — pairing engineered environments with vetted human-expert feedback. Early-stage, but unusually broad for its size.
Best for · Teams wanting environments and the human feedback to train against them together.
- 10
Andon Labs
agent benchmarks with a sense of humor
Focus · Benchmarks and evaluation environments for autonomous agents (e.g., Vending-Bench) plus safety/control protocols for autonomous AI 'organizations.'
- Domains
- Long Horizon · Evaluation · Safety
- Founded
- Est. 2023
- HQ
- Bromma, Sweden
- Funding
- ~$500K
- Open source
- Open-source
····Builds attention-grabbing real-world agent benchmarks (vending machines, radio stations, robot control) and develops protocols for controlling autonomous AI systems.
Best for · Long-horizon agent reliability and safety evaluation.
- 11
AfterQuery
benchmarks with real-task framing
Focus · Benchmark-style environments around code and finance workflows with real-task framing.
- Domains
- Code · Finance
- Founded
- Est. 2024
- HQ
- San Francisco, USA
- Funding
- Undisclosed
- Open source
- Closed
····Benchmark-style environments around code and finance workflows, with a focus on practical, real-world task framing rather than toy problems.
Best for · Code and finance benchmarking.
- 11
Datacurve
code execution, tight loops
Focus · Code-execution and code-evaluation environments plus high-quality coding data.
- Domains
- Code
- Founded
- Est. 2023
- HQ
- San Francisco, USA
- Funding
- Seed (reported)
- Open source
- Closed
····Code-execution and code-evaluation environments plus high-quality coding data, built for fast, iterative model-training loops.
Best for · Iterative coding-model training.
- 11
Plato
browser meets enterprise workflow
Focus · Browser-interaction environments fused with enterprise workflow simulation.
- Domains
- Browser · Enterprise
- Founded
- Est. 2024
- HQ
- San Francisco, USA
- Funding
- Undisclosed
- Open source
- Closed
····Blends browser-interaction environments with enterprise workflow simulation for agent training and evaluation.
Best for · Enterprise web-workflow agents.
- 14
Vals AI
domain benchmarks for regulated work
Focus · Domain-specific evaluation environments and benchmarks for legal, finance, and tax workflows.
- Domains
- Finance · Legal · Evaluation
- Founded
- Est. 2024
- HQ
- USA
- Funding
- Undisclosed
- Open source
- Closed
····Builds rigorous, domain-specific benchmarks for high-stakes professional work — law, finance, tax — where generic evals miss the nuance.
Best for · Evaluating agents in regulated, expert domains.
- 14
Sepal AI
expert evaluation environments
Focus · Builds expert evaluation environments and data; co-developed SheetBench (financial-analyst spreadsheet evaluation) with HUD.
- Domains
- Finance · Enterprise · Evaluation
- Founded
- Est. 2024
- HQ
- USA
- Funding
- Undisclosed
- Open source
- Closed
····Designs expert-grade evaluation environments for complex professional tasks, partnering on public benchmarks like SheetBench.
Best for · Expert-domain evaluation design.
- 16
Matrices
browser-native
Focus · Browser-native environments for web navigation and interaction tasks.
- Domains
- Browser
- Founded
- Est. 2024
- HQ
- San Francisco, USA
- Funding
- Undisclosed
- Open source
- Closed
····A browser-native environment builder focused on web navigation and interaction tasks for RL and evaluation.
Best for · Pure web-navigation agents.
- —
Surge AI
human-data leader, environments on the side
Focus · Primarily human data and RLHF; recently stood up a dedicated RL-environments org.
- Domains
- Human Data · RLHF · Multi-domain
- Founded
- Est. 2020
- HQ
- San Francisco, USA
- Funding
- Bootstrapped (reported ~$1.2B revenue)
- Open source
- Closed
····A bootstrapped human-data powerhouse working with every major lab, now offering RL environments alongside its core data business.
Best for · Labs already buying human data that want environments from the same vendor.
- —
Scale AI
the labeling incumbent adapting
Focus · Primarily data labeling; extending into agents and environments via its Forge product line.
- Domains
- Data Labeling · Agents · Multi-domain
- Founded
- Est. 2016
- HQ
- San Francisco, USA
- Funding
- Reported ~$29B valuation
- Open source
- Closed
····The data-labeling giant of the chatbot era, now retooling for agents and environments through its Forge offering.
Best for · Enterprise-scale environment operations and delivery.
- —
Mercor
expert marketplace, environments emerging
Focus · Primarily an expert/human-data marketplace; now offering domain-specific RL environments.
- Domains
- Code · Healthcare · Law · Multi-domain
- Founded
- Est. 2023
- HQ
- San Francisco, USA
- Funding
- Reported ~$10B valuation
- Open source
- Closed
····An expert-marketplace and human-data company now pitching RL environments for domain-specific work — coding, healthcare, law.
Best for · Domain-specific environments needing credentialed human experts.
List in active expansion toward 50+ companies. Funding and scores reflect the most recent public information; fields marked “unconfirmed” are being verified.
§ Context · 2026
How the market is taking shape in 2026
A few patterns are worth naming. The category is splitting into three camps: human-data incumbents (Surge, Mercor, Scale) extending into environments off existing expert networks; specialist startups (Mechanize, Halluminate, AfterQuery) going deep on one domain; and open labs (Prime Intellect, General Reasoning) releasing environments and tooling publicly.
Most commercial environments are closed and sold under exclusive lab contracts, which is exactly why the handful of open players punch above their funding in this ranking. The other live tension is depth versus breadth — a few robust, high-fidelity environments versus a wide catalog of simpler ones — and labs appear to want both, from different vendors.
§ FAQ
Frequently asked
- What is an RL environment company?
- A company that builds the simulated, resettable settings — a codebase, a browser, a synthetic enterprise app — where AI agents practice tasks and receive reward signals during reinforcement learning, plus the evaluations that measure how well agents perform.
- Who are the leading RL environment companies in 2026?
- The most prominent include Prime Intellect, Mercor, Surge AI, and Scale AI among large or well-funded players, alongside specialists like Mechanize, General Reasoning, and Halluminate. The full ranked list is above.
- What's the difference between RL environments and RLHF data?
- RLHF data is human preference judgments used to shape model behavior. RL environments are interactive task settings where an agent acts and is automatically scored. Many companies now offer both, since the human feedback and the environment to train against it are complementary.
- Which RL environment companies are open-source?
- Prime Intellect is the most prominent, with its Environments Hub, PRIME-RL framework, and open model releases. General Reasoning also publishes open research, and HUD's wrapping tooling is openly oriented.
- How do AI labs use RL environments?
- Labs drop agents into the environment to attempt real tasks, use a grader to score each attempt, and feed those scores back as reward signals during reinforcement learning — and as benchmarks to evaluate model versions.
- How is this list ranked?
- By a transparent three-part rubric — growth (50%), credibility (35%), and accessibility (15%) — applied identically to every ranked company, including the maintainer. Adjacent incumbents are listed for context but kept unranked.