Directory

Every company in the list

19 companies building, selling, or open-sourcing reinforcement learning environments — sorted by weighted score. Filter by domain and click through for the full profile.

018.80 · Verified
Prime Intellect
the open-source backbone of the category
Code · Math · Science · Reasoning
An open superintelligence stack: distributed compute, the PRIME-RL training framework, verifiers, and a public Environments Hub positioned as a Hugging Face for RL environments.
027.58 · Verified
Mechanize
deep, narrow, elite
Code · Software Engineering
Builds a small number of high-fidelity software-engineering environments and evals for frontier coding agents, with graders that score performance on real SWE tasks.
037.15 · Verified
General Reasoning
research-grade, open-leaning
Long Horizon · Reasoning · Multi-agent
A research-and-infrastructure startup building environments and reasoning data for long-horizon, multi-agent reliability — paired with open community releases.
037.15 · Partial
Bespoke Labs
open data and reproducible recipes
Data Curation · Evaluation · Reasoning
Evaluation and data-curation tooling for post-training, known for open datasets and reproducible reasoning recipes used widely across the community.
057.03 · Verified
Halluminate
computer-use, done deterministically
Computer Use · Browser · Finance
YC-backed builder of resettable, high-fidelity sandboxes — a synthetic Salesforce/CRM, for example — plus proprietary datasets for training and evaluating computer- and browser-use agents.
066.85 · Partial
Gray Swan AI
adversarial evaluation arenas
Security · Evaluation · Safety
Runs adversarial arenas and red-teaming programs that probe models and agents for failures, feeding safety evaluations back to developers.
076.65 · Partial
HUD
the wrapper layer
Computer Use · Tooling · Browser
Tooling that wraps arbitrary software — browsers, games, apps — into dockerized, MCP-exposed RL environments. Less an environment catalog than the infrastructure to build them.
086.53 · Partial
Veris AI
the enterprise environment layer
Enterprise · Agents · Long Horizon
Builds high-fidelity simulated experiences so enterprises can train and test agents safely before production, targeting accuracy, consistency, and governance.
096.33 · Internal
Huzzle Labs
environments + human data + evals in one stack
Long Horizon · Code · RLHF · Evaluation
Builds long-horizon coding and agent environments, RLHF datasets, and evaluation benchmarks — pairing engineered environments with vetted human-expert feedback. Early-stage, but unusually broad for its size.
106.15 · Verified
Andon Labs
agent benchmarks with a sense of humor
Long Horizon · Evaluation · Safety
Builds attention-grabbing real-world agent benchmarks (vending machines, radio stations, robot control) and develops protocols for controlling autonomous AI systems.
116.10 · Partial
AfterQuery
benchmarks with real-task framing
Code · Finance
Benchmark-style environments around code and finance workflows, with a focus on practical, real-world task framing rather than toy problems.
116.10 · Partial
Datacurve
code execution, tight loops
Code
Code-execution and code-evaluation environments plus high-quality coding data, built for fast, iterative model-training loops.
116.10 · Partial
Plato
browser meets enterprise workflow
Browser · Enterprise
Blends browser-interaction environments with enterprise workflow simulation for agent training and evaluation.
146.00 · Partial
Vals AI
domain benchmarks for regulated work
Finance · Legal · Evaluation
Builds rigorous, domain-specific benchmarks for high-stakes professional work — law, finance, tax — where generic evals miss the nuance.
146.00 · Partial
Sepal AI
expert evaluation environments
Finance · Enterprise · Evaluation
Designs expert-grade evaluation environments for complex professional tasks, partnering on public benchmarks like SheetBench.
165.68 · Partial
Matrices
browser-native
Browser
A browser-native environment builder focused on web navigation and interaction tasks for RL and evaluation.
Verified
Surge AI
human-data leader, environments on the side
Human Data · RLHF · Multi-domain
A bootstrapped human-data powerhouse working with every major lab, now offering RL environments alongside its core data business.
Verified
Scale AI
the labeling incumbent adapting
Data Labeling · Agents · Multi-domain
The data-labeling giant of the chatbot era, now retooling for agents and environments through its Forge offering.
Verified
Mercor
expert marketplace, environments emerging
Code · Healthcare · Law · Multi-domain
An expert-marketplace and human-data company now pitching RL environments for domain-specific work — coding, healthcare, law.