rl-list.com
UPDATED 2026.06.07
rl-list.com · Methodology

Methodology

How this directory is researched, sourced, and ranked, and, just as importantly, what we deliberately leave blank. The whole point of rl-list.com is that you can audit every claim.

What we’re trying to do

Procurement teams choosing an RL-environment vendor care about scale, velocity, quality, cost, and data specs. Almost none of that is on the public web, it surfaces only in direct vendor engagement. So we don’t pretend to answer it. Instead, for every company we research the public proxies for those questions, cite each one, and tag how confident we are.

What a buyer wants to knowThe public proxy we source
How fast can they scale productionHeadcount, headcount growth, open roles, funding
Quality & rigor of the workResearchers on the team, their backgrounds, published papers/benchmarks
Will they survive / are they credibleCapital raised, investors, customers, founding year
Can we clear security reviewSOC 2 / ISO certifications
Footprint & jurisdictionHQ + office locations

Our confidence tiers

Every non-trivial field on every vendor page carries one of these tags, so you never have to guess how solid a number is:

The rules we hold ourselves to

What we deliberately don’t publish

The numbers frontier labs actually request in an RFI, task/sample counts, unique environments, pass@1 and difficulty, capability and complexity splits, data-type breakdowns, harness and data format, and unit/total pricing, are not on the public web and only come from direct engagement. We do not estimate them, and we do not let an AI “reason toward” a plausible figure. Those fields stay blank on purpose. If you see a number on rl-list.com, it has a source.

How the ranking works, the RL List score

We rank only the dedicated, pure-play RL-environment vendors. Three groups are deliberately excluded from the ranking and listed separately for reference, because they aren’t like-for-like comparable: data-labeling incumbents moving into environments (Scale AI, Surge AI, Mercor), execution-infrastructure providers (sandbox/compute layers), and open-source projects. Mixing a $1B labeling incumbent into a list of focused environment startups would mislead more than it informs.

Within the ranked set, order is driven by a transparent formula we call the RL List score, the same calculation applied to every vendor, so the baseline order stays auditable. A small number of vendors we have reviewed in depth are placed editorially; everyone else falls where the score puts them. The score combines:

The score is not a product-quality or endorsement rating, it reflects scale, signal, and how verifiable a vendor’s public record is. A company lower down is often simply earlier-stage or harder to verify, not a weaker product.

Freshness

Every vendor page shows a “last updated” date, and the directory is re-verified on a rolling basis. This snapshot was last updated 2026-06-07.