#14
HUD
Commercial
medium confidence
HUD (YC W25, formerly hud.so) is a platform for building reinforcement-learning environments and evaluations for computer-use and browser agents. It lets teams wrap real software/code as agent-callable tools in isolated containers, define tasks and rewards, and run evals/RL at scale via an open-source SDK plus a cloud-hosted gateway. It maintains public benchmarks (OSWorld-Verified contributions, SheetBench-50) and positions frontier AI labs and agent-first startups as its target customers.
Key facts
Headquarters
San Francisco, USA
confirmed citeLast round
Seed (amount undisclosed)
reported citeWhat they sell
environments
confirmed citeDeployment
managed-hosted + self-hosted (open-source SDK; cloud platform with local CLI execution and an OpenAI-compatible model gateway at inference.hud.ai)
confirmed cite
Scale & velocity
Current headcount
~15 (per YC profile, accessed 2026-06-07)
reported cite
Research depth
Backgrounds
Jay Ram (CEO) - consumer apps, ML/quant research, Lorenss Martinsons (CPO) - Cognitive Science, Yale, Parth Patel (CTO) - evals and RL environments, Team reported to include International Olympiad medalists (IOI, IPhO) and researchers with ICLR/NeurIPS publications
reported citePapers / benchmarks
OSWorld-Verified (369+ real-world desktop tasks; HUD/'Human Data' acknowledged among institutions providing feedback/fixes, per XLANG Lab), SheetBench-50 (financial-analyst spreadsheet benchmark, developed with Sepal AI; per HUD case study)
reported cite
Capital
Last round
Seed (amount undisclosed)
reported citeInvestors
Y Combinator (W25 batch), Exceptional Capital
reported cite
Security & compliance
Other certifications
unknown
Security page
https://www.hud.ai/dpa (Data Processing Addendum)
reported cite
Product
What they sell
environments
confirmed citeDeployment model
managed-hosted + self-hosted (open-source SDK; cloud platform with local CLI execution and an OpenAI-compatible model gateway at inference.hud.ai)
confirmed citeNotable customers
DoorDash self-claimed UiPath self-claimed Sharpe self-claimed ⚑OpenAI self-claimed ⚑Anthropic self-claimed cite
Buyer analysis
Best fit: Teams that need to benchmark or RL-train computer-use/browser agents against real-software tasks with reproducible, containerized environments.
How we verified this
Confirmed this is the correct company: HUD (hud.ai, YC W25, formerly hud.so), a platform for wrapping real software/code as agent-callable RL environments and evals for computer-use/browser agents - matching the 'wrapper layer' directory note. Disambiguated from an unrelated same-named Israeli startup ('Hud', runtime code sensor, backed by Square Peg, ~$21M) whose funding figures had leaked into search snippets; HUD's actual funding amount remains undisclosed, though a seed round is confirmed via Exceptional Capital's portfolio (added as an investor, reported). Biggest correction: OpenAI and Anthropic were downgraded from 'verified' to 'self-claimed' customers - the cited XLANG/OSWorld-Verified page only co-acknowledges HUD as a benchmark feedback contributor alongside those labs, which is collaboration, not customer verification. Benchmarks downgraded to 'reported' (OSWorld page names 'Human Data', SheetBench is self-reported). open_roles_count downgraded to reported. License upgraded to confirmed after re-fetching the MIT license field on GitHub. DoorDash/UiPath/Sharpe correctly remain self-claimed. No SOC 2/ISO found; security_page is only a DPA. Overall confidence: medium.
Sources
- www.ycombinator.com/companies/hud · 2026-06-07, YC profile: founders (Jay Ram, Lorenss Martinsons, Parth Patel), founded 2025, ~15 people, SF, W25, frontier-lab positioning
- www.ycombinator.com/companies/hud/jobs · 2026-06-07, 5 open roles, all San Francisco
- github.com/hud-evals/hud-python · 2026-06-07, OSS SDK, MIT license reported, ~258 stars, v0.5.41 (Apr 2026), prebuilt computer/shell/file/browser tools
- pypi.org/project/hud-python/ · 2026-06-07, PyPI package for hud-python
- docs.hud.ai/ · 2026-06-07, Deployment model: cloud-hosted + local CLI, containerized isolated environments, OpenAI-compatible gateway at inference.hud.ai (Claude/GPT/Gemini/Grok); no SOC2/license stated in docs
- www.hud.ai/ · 2026-06-07, Homepage (HTTP 429 on fetch); via search: customers DoorDash, UiPath, Sharpe self-claimed; positioning as RL environments/evals for CUAs
- www.hud.ai/case-studies/sheetbench-50 · 2026-06-07, SheetBench-50 developed with Sepal AI; finance professionals from PwC/Cisco/Charles Schwab/Fannie Mae involved
- www.hud.ai/dpa · 2026-06-07, Data Processing Addendum page
- xlang.ai/blog/osworld-verified · 2026-06-07, Third-party: HUD (Human Data, hud.so) listed among institutions providing OSWorld-Verified feedback alongside OpenAI, Anthropic, Moonshot, ByteDance, Simular - verifies frontier-lab collaboration
- foundertrace.com/companies/hud_yc_w25/ · 2026-06-07, Founders, Martinsons Yale Cognitive Science background
- www.workatastartup.com/companies/hud · 2026-06-07, Team described with Olympiad medalists and ICLR/NeurIPS researchers (via search snippet)
- newsletter.semianalysis.com/p/rl-environments-and-rl-for-science · 2026-06-07, Technical description: HUD wraps software in dockerized container + MCP server exposing agent tools
- www.linkedin.com/company/hud-evals · 2026-06-07, Correct LinkedIn handle for HUD (YC W25) RL-environments company
- www.crunchbase.com/organization/hud · 2026-06-07, Crunchbase profile (HTTP 403 on fetch); funding not verified
- x.com/hud_evals/status/1919262852088570225 · 2026-06-07, HUD X post: hiring research engineers, works with frontier labs to evaluate CUAs
Last updated 2026-06-07 · Every quantitative field carries a source and a confidence tag. Fields we could not source publicly are marked
unknown, never estimated. See the
methodology.