What is an RL environment company?

A company that builds the simulated, resettable settings — a codebase, a browser, a synthetic enterprise app — where AI agents practice tasks and receive reward signals during reinforcement learning, plus the evaluations that measure how well agents perform.

Who are the leading RL environment companies in 2026?

The most prominent include Prime Intellect, Mercor, Surge AI, and Scale AI among large or well-funded players, alongside specialists like Mechanize, General Reasoning, and Halluminate. The full ranked list is above.

What's the difference between RL environments and RLHF data?

RLHF data is human preference judgments used to shape model behavior. RL environments are interactive task settings where an agent acts and is automatically scored. Many companies now offer both, since the human feedback and the environment to train against it are complementary.

Which RL environment companies are open-source?

Prime Intellect is the most prominent, with its Environments Hub, PRIME-RL framework, and open model releases. General Reasoning also publishes open research, and HUD's wrapping tooling is openly oriented.

How do AI labs use RL environments?

Labs drop agents into the environment to attempt real tasks, use a grader to score each attempt, and feed those scores back as reward signals during reinforcement learning — and as benchmarks to evaluate model versions.

How is this list ranked?

By a transparent three-part rubric — growth (50%), credibility (35%), and accessibility (15%) — applied identically to every ranked company, including the maintainer. Adjacent incumbents are listed for context but kept unranked.

Pillar 01 · Updated 2026

RL Environment Companies: The 2026 List

RL environments are now the raw material of frontier AI: resettable simulations — a codebase, a browser, a CRM — where agents practice and get rewarded. One frontier lab has reportedly weighed spending over $1 billion on them in a year.

A working 2026 list of the companies building them — specialists, pivoted incumbents, and open-source labs — scored on a transparent rubric. Maintained by Huzzle Labs, who are listed and scored on the same criteria as everyone else.

How we rank

§ Methodology

Only companies whose primary product is reinforcement learning environments, simulators, or evaluation harnesses are eligible for the ranking. Adjacent businesses — human-data labelers, expert marketplaces, generalist AI services firms — are listed for context but kept out of the ranked list, even when they also offer environments on the side. Eligible companies are scored 0–10 on three weighted dimensions, and the weighted total sets the order. No pay-to-rank, no sponsored slots.

50%

Growth & momentum

Funding, hiring, shipping cadence, and how the trajectory has moved over the last few quarters.

35%

Traction & credibility

Named lab customers, founder pedigree, research output, and technical depth of the work.

15%

Accessibility

Open-source releases, public docs, and how easily a team outside a frontier lab can engage.

Where a field is unconfirmed, we mark it rather than guess. Spot something wrong or missing? There's a correction link at the foot of every profile.

The list

§ 19 entries · sorted by weighted score

01
Prime Intellect
the open-source backbone of the category
Focus · Open Environments Hub, the PRIME-RL training framework, and verifiers — plus training and open-sourcing its own RL models.
Domains
Code · Math · Science · Reasoning
·
Founded
Est. 2024
·
HQ
San Francisco, USA
·
Funding
~$70M (Series B, Jan 2026)
·
Open source
Open-source
An open superintelligence stack: distributed compute, the PRIME-RL training framework, verifiers, and a public Environments Hub positioned as a Hugging Face for RL environments.
Best for · Teams that want open, reusable environments and tooling.
Visit primeintellect.ai ↗Suggest a correctionScore 8.80 · Verified
02
Mechanize
deep, narrow, elite
Focus · A small number of high-fidelity software-engineering environments and graders that score coding agents on real SWE tasks.
Domains
Code · Software Engineering
·
Founded
Est. 2025
·
HQ
San Francisco, USA
·
Funding
Undisclosed (angel, 2025)
·
Open source
Closed
Builds a small number of high-fidelity software-engineering environments and evals for frontier coding agents, with graders that score performance on real SWE tasks.
Best for · Frontier coding-agent training where depth beats breadth.
Visit mechanize.work ↗Suggest a correctionScore 7.58 · Verified
03
General Reasoning
research-grade, open-leaning
Focus · Environments and reasoning data for long-horizon, multi-agent reliability, with open community releases.
Domains
Long Horizon · Reasoning · Multi-agent
·
Founded
Est. 2024
·
HQ
San Francisco, USA
·
Funding
Seed (undisclosed)
·
Open source
Open-source
A research-and-infrastructure startup building environments and reasoning data for long-horizon, multi-agent reliability — paired with open community releases.
Best for · Long-horizon and multi-agent reasoning work.
Visit gr.inc ↗Suggest a correctionScore 7.15 · Verified
03
Bespoke Labs
open data and reproducible recipes
Focus · Evaluation and data-curation tooling for post-training, with open datasets and reproducible reasoning recipes.
Domains
Data Curation · Evaluation · Reasoning
·
Founded
Est. 2024
·
HQ
San Francisco, USA
·
Funding
Seed (reported)
·
Open source
Open-source
Evaluation and data-curation tooling for post-training, known for open datasets and reproducible reasoning recipes used widely across the community.
Best for · Eval pipelines and open dataset curation.
Visit bespokelabs.ai ↗Suggest a correctionScore 7.15 · Partial
05
Halluminate
computer-use, done deterministically
Focus · Resettable, high-fidelity computer-use sandboxes (e.g., a synthetic CRM) and datasets, with a financial-services focus.
Domains
Computer Use · Browser · Finance
·
Founded
Est. 2024
·
HQ
San Francisco, USA
·
Funding
Bootstrapped / pre-seed (YC)
·
Open source
Closed
YC-backed builder of resettable, high-fidelity sandboxes — a synthetic Salesforce/CRM, for example — plus proprietary datasets for training and evaluating computer- and browser-use agents.
Best for · Reproducible computer-use testing without flaky live sites.
Visit halluminate.ai ↗Suggest a correctionScore 7.03 · Verified
06
Gray Swan AI
adversarial evaluation arenas
Focus · Red-teaming and safety-evaluation arenas that adversarially stress-test models and agents.
Domains
Security · Evaluation · Safety
·
Founded
Est. 2024
·
HQ
USA
·
Funding
Funded (verify round)
·
Open source
Closed
Runs adversarial arenas and red-teaming programs that probe models and agents for failures, feeding safety evaluations back to developers.
Best for · Safety and robustness evaluation under adversarial pressure.
Visit grayswan.ai ↗Suggest a correctionScore 6.85 · Partial
07
HUD
the wrapper layer
Focus · Tooling that wraps arbitrary software into dockerized, MCP-exposed RL environments.
Domains
Computer Use · Tooling · Browser
·
Founded
Est. 2024
·
HQ
San Francisco, USA
·
Funding
Undisclosed
·
Open source
Open-source
Tooling that wraps arbitrary software — browsers, games, apps — into dockerized, MCP-exposed RL environments. Less an environment catalog than the infrastructure to build them.
Best for · Teams standing up custom environments from existing software.
Visit hud.so ↗Suggest a correctionScore 6.65 · Partial
08
Veris AI
the enterprise environment layer
Focus · High-fidelity simulated environments to train and test enterprise AI agents on real workflows — 'experience' instead of prompt engineering.
Domains
Enterprise · Agents · Long Horizon
·
Founded
Est. 2024
·
HQ
San Francisco, USA
·
Funding
$8.5M seed (2025)
·
Open source
Closed
Builds high-fidelity simulated experiences so enterprises can train and test agents safely before production, targeting accuracy, consistency, and governance.
Best for · Enterprises training agents on their own workflows.
Visit veris.ai ↗Suggest a correctionScore 6.53 · Partial
09
Huzzle Labs
environments + human data + evals in one stack
Focus · Long-horizon coding/agent environments, RLHF datasets, and eval benchmarks, paired with vetted human-expert feedback.
Domains
Long Horizon · Code · RLHF · Evaluation
·
Founded
Est. 2025
·
HQ
London, UK
·
Funding
Undisclosed
·
Open source
Closed
Builds long-horizon coding and agent environments, RLHF datasets, and evaluation benchmarks — pairing engineered environments with vetted human-expert feedback. Early-stage, but unusually broad for its size.
Best for · Teams wanting environments and the human feedback to train against them together.
Visit labs.huzzle.com ↗Suggest a correctionScore 6.33 · Internal
10
Andon Labs
agent benchmarks with a sense of humor
Focus · Benchmarks and evaluation environments for autonomous agents (e.g., Vending-Bench) plus safety/control protocols for autonomous AI 'organizations.'
Domains
Long Horizon · Evaluation · Safety
·
Founded
Est. 2023
·
HQ
Bromma, Sweden
·
Funding
~$500K
·
Open source
Open-source
Builds attention-grabbing real-world agent benchmarks (vending machines, radio stations, robot control) and develops protocols for controlling autonomous AI systems.
Best for · Long-horizon agent reliability and safety evaluation.
Visit andonlabs.com ↗Suggest a correctionScore 6.15 · Verified
11
AfterQuery
benchmarks with real-task framing
Focus · Benchmark-style environments around code and finance workflows with real-task framing.
Domains
Code · Finance
·
Founded
Est. 2024
·
HQ
San Francisco, USA
·
Funding
Undisclosed
·
Open source
Closed
Benchmark-style environments around code and finance workflows, with a focus on practical, real-world task framing rather than toy problems.
Best for · Code and finance benchmarking.
Visit afterquery.com ↗Suggest a correctionScore 6.10 · Partial
11
Datacurve
code execution, tight loops
Focus · Code-execution and code-evaluation environments plus high-quality coding data.
Domains
Code
·
Founded
Est. 2023
·
HQ
San Francisco, USA
·
Funding
Seed (reported)
·
Open source
Closed
Code-execution and code-evaluation environments plus high-quality coding data, built for fast, iterative model-training loops.
Best for · Iterative coding-model training.
Visit datacurve.ai ↗Suggest a correctionScore 6.10 · Partial
11
Plato
browser meets enterprise workflow
Focus · Browser-interaction environments fused with enterprise workflow simulation.
Domains
Browser · Enterprise
·
Founded
Est. 2024
·
HQ
San Francisco, USA
·
Funding
Undisclosed
·
Open source
Closed
Blends browser-interaction environments with enterprise workflow simulation for agent training and evaluation.
Best for · Enterprise web-workflow agents.
Visit plato.so ↗Suggest a correctionScore 6.10 · Partial
14
Vals AI
domain benchmarks for regulated work
Focus · Domain-specific evaluation environments and benchmarks for legal, finance, and tax workflows.
Domains
Finance · Legal · Evaluation
·
Founded
Est. 2024
·
HQ
USA
·
Funding
Undisclosed
·
Open source
Closed
Builds rigorous, domain-specific benchmarks for high-stakes professional work — law, finance, tax — where generic evals miss the nuance.
Best for · Evaluating agents in regulated, expert domains.
Visit vals.ai ↗Suggest a correctionScore 6.00 · Partial
14
Sepal AI
expert evaluation environments
Focus · Builds expert evaluation environments and data; co-developed SheetBench (financial-analyst spreadsheet evaluation) with HUD.
Domains
Finance · Enterprise · Evaluation
·
Founded
Est. 2024
·
HQ
USA
·
Funding
Undisclosed
·
Open source
Closed
Designs expert-grade evaluation environments for complex professional tasks, partnering on public benchmarks like SheetBench.
Best for · Expert-domain evaluation design.
Visit sepal.ai ↗Suggest a correctionScore 6.00 · Partial
16
Matrices
browser-native
Focus · Browser-native environments for web navigation and interaction tasks.
Domains
Browser
·
Founded
Est. 2024
·
HQ
San Francisco, USA
·
Funding
Undisclosed
·
Open source
Closed
A browser-native environment builder focused on web navigation and interaction tasks for RL and evaluation.
Best for · Pure web-navigation agents.
Visit matrices.ai ↗Suggest a correctionScore 5.68 · Partial
—
Surge AI
human-data leader, environments on the side
Focus · Primarily human data and RLHF; recently stood up a dedicated RL-environments org.
Domains
Human Data · RLHF · Multi-domain
·
Founded
Est. 2020
·
HQ
San Francisco, USA
·
Funding
Bootstrapped (reported ~$1.2B revenue)
·
Open source
Closed
A bootstrapped human-data powerhouse working with every major lab, now offering RL environments alongside its core data business.
Best for · Labs already buying human data that want environments from the same vendor.
Visit surgehq.ai ↗Suggest a correctionScore 0.00 · Verified
—
Scale AI
the labeling incumbent adapting
Focus · Primarily data labeling; extending into agents and environments via its Forge product line.
Domains
Data Labeling · Agents · Multi-domain
·
Founded
Est. 2016
·
HQ
San Francisco, USA
·
Funding
Reported ~$29B valuation
·
Open source
Closed
The data-labeling giant of the chatbot era, now retooling for agents and environments through its Forge offering.
Best for · Enterprise-scale environment operations and delivery.
Visit scale.com ↗Suggest a correctionScore 0.00 · Verified
—
Mercor
expert marketplace, environments emerging
Focus · Primarily an expert/human-data marketplace; now offering domain-specific RL environments.
Domains
Code · Healthcare · Law · Multi-domain
·
Founded
Est. 2023
·
HQ
San Francisco, USA
·
Funding
Reported ~$10B valuation
·
Open source
Closed
An expert-marketplace and human-data company now pitching RL environments for domain-specific work — coding, healthcare, law.
Best for · Domain-specific environments needing credentialed human experts.
Visit mercor.com ↗Suggest a correctionScore 0.00 · Verified

List in active expansion toward 50+ companies. Funding and scores reflect the most recent public information; fields marked “unconfirmed” are being verified.

§ Context · 2026

How the market is taking shape in 2026

A few patterns are worth naming. The category is splitting into three camps: human-data incumbents (Surge, Mercor, Scale) extending into environments off existing expert networks; specialist startups (Mechanize, Halluminate, AfterQuery) going deep on one domain; and open labs (Prime Intellect, General Reasoning) releasing environments and tooling publicly.

Most commercial environments are closed and sold under exclusive lab contracts, which is exactly why the handful of open players punch above their funding in this ranking. The other live tension is depth versus breadth — a few robust, high-fidelity environments versus a wide catalog of simpler ones — and labs appear to want both, from different vendors.

§ FAQ

Frequently asked

What is an RL environment company?: A company that builds the simulated, resettable settings — a codebase, a browser, a synthetic enterprise app — where AI agents practice tasks and receive reward signals during reinforcement learning, plus the evaluations that measure how well agents perform.
Who are the leading RL environment companies in 2026?: The most prominent include Prime Intellect, Mercor, Surge AI, and Scale AI among large or well-funded players, alongside specialists like Mechanize, General Reasoning, and Halluminate. The full ranked list is above.
What's the difference between RL environments and RLHF data?: RLHF data is human preference judgments used to shape model behavior. RL environments are interactive task settings where an agent acts and is automatically scored. Many companies now offer both, since the human feedback and the environment to train against it are complementary.
Which RL environment companies are open-source?: Prime Intellect is the most prominent, with its Environments Hub, PRIME-RL framework, and open model releases. General Reasoning also publishes open research, and HUD's wrapping tooling is openly oriented.
How do AI labs use RL environments?: Labs drop agents into the environment to attempt real tasks, use a grader to score each attempt, and feed those scores back as reward signals during reinforcement learning — and as benchmarks to evaluate model versions.
How is this list ranked?: By a transparent three-part rubric — growth (50%), credibility (35%), and accessibility (15%) — applied identically to every ranked company, including the maintainer. Adjacent incumbents are listed for context but kept unranked.