The 2026 List
A ranked list of the companies building the environments AI learns in.
Reinforcement learning environments are where frontier models are trained, stressed, and measured. RL List catalogues the labs and companies shipping them — scored on a single public rubric.
01 · Leaderboard
The current ranking
- 01Prime Intellectthe open-source backbone of the category8.80
- 02Mechanizedeep, narrow, elite7.58
- 03General Reasoningresearch-grade, open-leaning7.15
- 04Bespoke Labsopen data and reproducible recipes7.15
- 05Halluminatecomputer-use, done deterministically7.03
- 06Gray Swan AIadversarial evaluation arenas6.85
- 07HUDthe wrapper layer6.65
- 08Veris AIthe enterprise environment layer6.53
- 09Huzzle Labsenvironments + human data + evals in one stack6.33
- 10Andon Labsagent benchmarks with a sense of humor6.15
- 11AfterQuerybenchmarks with real-task framing6.10
- 12Datacurvecode execution, tight loops6.10
- 13Platobrowser meets enterprise workflow6.10
- 14Vals AIdomain benchmarks for regulated work6.00
- 15Sepal AIexpert evaluation environments6.00
- 16Matricesbrowser-native5.68
Outside the ranking
Also building RL environments
These companies built their reputations in human data and labeling and now offer RL environments too — but environments aren't their core focus, so they sit outside the ranking.
Adjacent · Infrastructure
Adjacent: execution infrastructure
These companies build the sandbox and runtime layer that RL environments execute on — fast, isolated, resettable compute. They're not environment vendors themselves, but they're the substrate the category runs on.
- E2BOpen-source secure cloud sandboxes for running AI-generated code, with sub-second startup.e2b.dev ↗
- DaytonaSecure, elastic infrastructure for running AI-generated and agent code at scale.daytona.io ↗
- RunloopDevboxes and code-execution sandboxes purpose-built for coding agents and RL loops.runloop.ai ↗
- MorphSnapshot-based compute that forks and resets VM state instantly for agent rollouts.morph.so ↗
- ModalServerless cloud for AI workloads — fast container starts, GPUs, and on-demand sandboxes.modal.com ↗
02 · Method
How we score
Every ranked company is graded against the same three dimensions, weighted into a single 0–10 score. The rubric is public and applies to all listings — no exceptions.
Growth & momentum
50%Funding, hiring, shipping cadence, and how the trajectory has moved over the last few quarters.
Traction & credibility
35%Named frontier-lab customers, founder pedigree, research output, and technical depth.
Accessibility
15%Open-source releases, public docs, self-serve access vs. closed enterprise.
03 · Writing
Latest guides
How to Choose an RL Environment Vendor
A buyer's framework for picking an RL environment vendor — what to ask, what to ignore, and how to match a vendor's strengths to the kind of agent you're actually trying to train.
Read →2026-05-22RL Environments for Coding Agents
Coding is the most commercially active RL environment domain. Here's how code environments work, what makes a good one, and the companies building them in 2026.
Read →2026-05-22RL Environments for Browser & Computer-Use Agents
Computer-use is where agents finally hit the real world: clicking buttons, filling forms, navigating apps. Here's how the environments work and who builds them.
Read →