Pillar 01 · Updated 2026

RL Environment Companies: The 2026 List

RL environments are now the raw material of frontier AI: resettable simulations — a codebase, a browser, a CRM — where agents practice and get rewarded. One frontier lab has reportedly weighed spending over $1 billion on them in a year.

A working 2026 list of the companies building them — specialists, pivoted incumbents, and open-source labs — scored on a transparent rubric. Maintained by Huzzle Labs, who are listed and scored on the same criteria as everyone else.

How we rank

§ Methodology

Only companies whose primary product is reinforcement learning environments, simulators, or evaluation harnesses are eligible for the ranking. Adjacent businesses — human-data labelers, expert marketplaces, generalist AI services firms — are listed for context but kept out of the ranked list, even when they also offer environments on the side. Eligible companies are scored 0–10 on three weighted dimensions, and the weighted total sets the order. No pay-to-rank, no sponsored slots.

50%

Growth & momentum

Funding, hiring, shipping cadence, and how the trajectory has moved over the last few quarters.

35%

Traction & credibility

Named lab customers, founder pedigree, research output, and technical depth of the work.

15%

Accessibility

Open-source releases, public docs, and how easily a team outside a frontier lab can engage.

Where a field is unconfirmed, we mark it rather than guess. Spot something wrong or missing? There's a correction link at the foot of every profile.

The list

§ 19 entries · sorted by weighted score

  1. 01

    Prime Intellect

    the open-source backbone of the category

    Focus · Open Environments Hub, the PRIME-RL training framework, and verifiers — plus training and open-sourcing its own RL models.

    Domains
    Code · Math · Science · Reasoning
    ·
    Founded
    Est. 2024
    ·
    HQ
    San Francisco, USA
    ·
    Funding
    ~$70M (Series B, Jan 2026)
    ·
    Open source
    Open-source

    An open superintelligence stack: distributed compute, the PRIME-RL training framework, verifiers, and a public Environments Hub positioned as a Hugging Face for RL environments.

    Best for · Teams that want open, reusable environments and tooling.

  2. 02

    Mechanize

    deep, narrow, elite

    Focus · A small number of high-fidelity software-engineering environments and graders that score coding agents on real SWE tasks.

    Domains
    Code · Software Engineering
    ·
    Founded
    Est. 2025
    ·
    HQ
    San Francisco, USA
    ·
    Funding
    Undisclosed (angel, 2025)
    ·
    Open source
    Closed

    Builds a small number of high-fidelity software-engineering environments and evals for frontier coding agents, with graders that score performance on real SWE tasks.

    Best for · Frontier coding-agent training where depth beats breadth.

  3. 03

    General Reasoning

    research-grade, open-leaning

    Focus · Environments and reasoning data for long-horizon, multi-agent reliability, with open community releases.

    Domains
    Long Horizon · Reasoning · Multi-agent
    ·
    Founded
    Est. 2024
    ·
    HQ
    San Francisco, USA
    ·
    Funding
    Seed (undisclosed)
    ·
    Open source
    Open-source

    A research-and-infrastructure startup building environments and reasoning data for long-horizon, multi-agent reliability — paired with open community releases.

    Best for · Long-horizon and multi-agent reasoning work.

  4. 03

    Bespoke Labs

    open data and reproducible recipes

    Focus · Evaluation and data-curation tooling for post-training, with open datasets and reproducible reasoning recipes.

    Domains
    Data Curation · Evaluation · Reasoning
    ·
    Founded
    Est. 2024
    ·
    HQ
    San Francisco, USA
    ·
    Funding
    Seed (reported)
    ·
    Open source
    Open-source

    Evaluation and data-curation tooling for post-training, known for open datasets and reproducible reasoning recipes used widely across the community.

    Best for · Eval pipelines and open dataset curation.

  5. 05

    Halluminate

    computer-use, done deterministically

    Focus · Resettable, high-fidelity computer-use sandboxes (e.g., a synthetic CRM) and datasets, with a financial-services focus.

    Domains
    Computer Use · Browser · Finance
    ·
    Founded
    Est. 2024
    ·
    HQ
    San Francisco, USA
    ·
    Funding
    Bootstrapped / pre-seed (YC)
    ·
    Open source
    Closed

    YC-backed builder of resettable, high-fidelity sandboxes — a synthetic Salesforce/CRM, for example — plus proprietary datasets for training and evaluating computer- and browser-use agents.

    Best for · Reproducible computer-use testing without flaky live sites.

  6. 06

    Gray Swan AI

    adversarial evaluation arenas

    Focus · Red-teaming and safety-evaluation arenas that adversarially stress-test models and agents.

    Domains
    Security · Evaluation · Safety
    ·
    Founded
    Est. 2024
    ·
    HQ
    USA
    ·
    Funding
    Funded (verify round)
    ·
    Open source
    Closed

    Runs adversarial arenas and red-teaming programs that probe models and agents for failures, feeding safety evaluations back to developers.

    Best for · Safety and robustness evaluation under adversarial pressure.

  7. 07

    HUD

    the wrapper layer

    Focus · Tooling that wraps arbitrary software into dockerized, MCP-exposed RL environments.

    Domains
    Computer Use · Tooling · Browser
    ·
    Founded
    Est. 2024
    ·
    HQ
    San Francisco, USA
    ·
    Funding
    Undisclosed
    ·
    Open source
    Open-source

    Tooling that wraps arbitrary software — browsers, games, apps — into dockerized, MCP-exposed RL environments. Less an environment catalog than the infrastructure to build them.

    Best for · Teams standing up custom environments from existing software.

  8. 08

    Veris AI

    the enterprise environment layer

    Focus · High-fidelity simulated environments to train and test enterprise AI agents on real workflows — 'experience' instead of prompt engineering.

    Domains
    Enterprise · Agents · Long Horizon
    ·
    Founded
    Est. 2024
    ·
    HQ
    San Francisco, USA
    ·
    Funding
    $8.5M seed (2025)
    ·
    Open source
    Closed

    Builds high-fidelity simulated experiences so enterprises can train and test agents safely before production, targeting accuracy, consistency, and governance.

    Best for · Enterprises training agents on their own workflows.

  9. 09

    Huzzle Labs

    environments + human data + evals in one stack

    Focus · Long-horizon coding/agent environments, RLHF datasets, and eval benchmarks, paired with vetted human-expert feedback.

    Domains
    Long Horizon · Code · RLHF · Evaluation
    ·
    Founded
    Est. 2025
    ·
    HQ
    London, UK
    ·
    Funding
    Undisclosed
    ·
    Open source
    Closed

    Builds long-horizon coding and agent environments, RLHF datasets, and evaluation benchmarks — pairing engineered environments with vetted human-expert feedback. Early-stage, but unusually broad for its size.

    Best for · Teams wanting environments and the human feedback to train against them together.

  10. 10

    Andon Labs

    agent benchmarks with a sense of humor

    Focus · Benchmarks and evaluation environments for autonomous agents (e.g., Vending-Bench) plus safety/control protocols for autonomous AI 'organizations.'

    Domains
    Long Horizon · Evaluation · Safety
    ·
    Founded
    Est. 2023
    ·
    HQ
    Bromma, Sweden
    ·
    Funding
    ~$500K
    ·
    Open source
    Open-source

    Builds attention-grabbing real-world agent benchmarks (vending machines, radio stations, robot control) and develops protocols for controlling autonomous AI systems.

    Best for · Long-horizon agent reliability and safety evaluation.

  11. 11

    AfterQuery

    benchmarks with real-task framing

    Focus · Benchmark-style environments around code and finance workflows with real-task framing.

    Domains
    Code · Finance
    ·
    Founded
    Est. 2024
    ·
    HQ
    San Francisco, USA
    ·
    Funding
    Undisclosed
    ·
    Open source
    Closed

    Benchmark-style environments around code and finance workflows, with a focus on practical, real-world task framing rather than toy problems.

    Best for · Code and finance benchmarking.

  12. 11

    Datacurve

    code execution, tight loops

    Focus · Code-execution and code-evaluation environments plus high-quality coding data.

    Domains
    Code
    ·
    Founded
    Est. 2023
    ·
    HQ
    San Francisco, USA
    ·
    Funding
    Seed (reported)
    ·
    Open source
    Closed

    Code-execution and code-evaluation environments plus high-quality coding data, built for fast, iterative model-training loops.

    Best for · Iterative coding-model training.

  13. 11

    Plato

    browser meets enterprise workflow

    Focus · Browser-interaction environments fused with enterprise workflow simulation.

    Domains
    Browser · Enterprise
    ·
    Founded
    Est. 2024
    ·
    HQ
    San Francisco, USA
    ·
    Funding
    Undisclosed
    ·
    Open source
    Closed

    Blends browser-interaction environments with enterprise workflow simulation for agent training and evaluation.

    Best for · Enterprise web-workflow agents.

  14. 14

    Vals AI

    domain benchmarks for regulated work

    Focus · Domain-specific evaluation environments and benchmarks for legal, finance, and tax workflows.

    Domains
    Finance · Legal · Evaluation
    ·
    Founded
    Est. 2024
    ·
    HQ
    USA
    ·
    Funding
    Undisclosed
    ·
    Open source
    Closed

    Builds rigorous, domain-specific benchmarks for high-stakes professional work — law, finance, tax — where generic evals miss the nuance.

    Best for · Evaluating agents in regulated, expert domains.

  15. 14

    Sepal AI

    expert evaluation environments

    Focus · Builds expert evaluation environments and data; co-developed SheetBench (financial-analyst spreadsheet evaluation) with HUD.

    Domains
    Finance · Enterprise · Evaluation
    ·
    Founded
    Est. 2024
    ·
    HQ
    USA
    ·
    Funding
    Undisclosed
    ·
    Open source
    Closed

    Designs expert-grade evaluation environments for complex professional tasks, partnering on public benchmarks like SheetBench.

    Best for · Expert-domain evaluation design.

  16. 16

    Matrices

    browser-native

    Focus · Browser-native environments for web navigation and interaction tasks.

    Domains
    Browser
    ·
    Founded
    Est. 2024
    ·
    HQ
    San Francisco, USA
    ·
    Funding
    Undisclosed
    ·
    Open source
    Closed

    A browser-native environment builder focused on web navigation and interaction tasks for RL and evaluation.

    Best for · Pure web-navigation agents.

  17. Surge AI

    human-data leader, environments on the side

    Focus · Primarily human data and RLHF; recently stood up a dedicated RL-environments org.

    Domains
    Human Data · RLHF · Multi-domain
    ·
    Founded
    Est. 2020
    ·
    HQ
    San Francisco, USA
    ·
    Funding
    Bootstrapped (reported ~$1.2B revenue)
    ·
    Open source
    Closed

    A bootstrapped human-data powerhouse working with every major lab, now offering RL environments alongside its core data business.

    Best for · Labs already buying human data that want environments from the same vendor.

  18. Scale AI

    the labeling incumbent adapting

    Focus · Primarily data labeling; extending into agents and environments via its Forge product line.

    Domains
    Data Labeling · Agents · Multi-domain
    ·
    Founded
    Est. 2016
    ·
    HQ
    San Francisco, USA
    ·
    Funding
    Reported ~$29B valuation
    ·
    Open source
    Closed

    The data-labeling giant of the chatbot era, now retooling for agents and environments through its Forge offering.

    Best for · Enterprise-scale environment operations and delivery.

  19. Mercor

    expert marketplace, environments emerging

    Focus · Primarily an expert/human-data marketplace; now offering domain-specific RL environments.

    Domains
    Code · Healthcare · Law · Multi-domain
    ·
    Founded
    Est. 2023
    ·
    HQ
    San Francisco, USA
    ·
    Funding
    Reported ~$10B valuation
    ·
    Open source
    Closed

    An expert-marketplace and human-data company now pitching RL environments for domain-specific work — coding, healthcare, law.

    Best for · Domain-specific environments needing credentialed human experts.

List in active expansion toward 50+ companies. Funding and scores reflect the most recent public information; fields marked “unconfirmed” are being verified.

§ Context · 2026

How the market is taking shape in 2026

A few patterns are worth naming. The category is splitting into three camps: human-data incumbents (Surge, Mercor, Scale) extending into environments off existing expert networks; specialist startups (Mechanize, Halluminate, AfterQuery) going deep on one domain; and open labs (Prime Intellect, General Reasoning) releasing environments and tooling publicly.

Most commercial environments are closed and sold under exclusive lab contracts, which is exactly why the handful of open players punch above their funding in this ranking. The other live tension is depth versus breadth — a few robust, high-fidelity environments versus a wide catalog of simpler ones — and labs appear to want both, from different vendors.

§ FAQ

Frequently asked

What is an RL environment company?
A company that builds the simulated, resettable settings — a codebase, a browser, a synthetic enterprise app — where AI agents practice tasks and receive reward signals during reinforcement learning, plus the evaluations that measure how well agents perform.
Who are the leading RL environment companies in 2026?
The most prominent include Prime Intellect, Mercor, Surge AI, and Scale AI among large or well-funded players, alongside specialists like Mechanize, General Reasoning, and Halluminate. The full ranked list is above.
What's the difference between RL environments and RLHF data?
RLHF data is human preference judgments used to shape model behavior. RL environments are interactive task settings where an agent acts and is automatically scored. Many companies now offer both, since the human feedback and the environment to train against it are complementary.
Which RL environment companies are open-source?
Prime Intellect is the most prominent, with its Environments Hub, PRIME-RL framework, and open model releases. General Reasoning also publishes open research, and HUD's wrapping tooling is openly oriented.
How do AI labs use RL environments?
Labs drop agents into the environment to attempt real tasks, use a grader to score each attempt, and feed those scores back as reward signals during reinforcement learning — and as benchmarks to evaluate model versions.
How is this list ranked?
By a transparent three-part rubric — growth (50%), credibility (35%), and accessibility (15%) — applied identically to every ranked company, including the maintainer. Adjacent incumbents are listed for context but kept unranked.