Thursday, June 25, 2026
HomeTechnologySilicon Valley bets big on 'environments' to train AI agents

Silicon Valley bets big on ‘environments’ to train AI agents


For years, Big Tech CEOs have touted visions of AI agents that can autonomously use software applications to complete tasks for people. But take todayโ€™s consumer AI agents out for a spin, whether itโ€™s OpenAIโ€™s ChatGPT Agent or Perplexityโ€™s Comet, and youโ€™ll quickly realize how limited the technology still is. Making AI agents more robust may take a new set of techniques that the industry is still discovering.

One of those techniques is carefully simulating workspaces where agents can be trained on multi-step tasks โ€” known as reinforcement learning (RL) environments. Similarly to how labeled datasets powered the last wave of AI, RL environments are starting to look like a critical element in the development of agents.

AI researchers, founders, and investors tell TechCrunch that leading AI labs are now demanding more RL environments, and thereโ€™s no shortage of startups hoping to supply them.

โ€œAll the big AI labs are building RL environments in-house,โ€ said Jennifer Li, general partner at Andreessen Horowitz, in an interview with TechCrunch. โ€œBut as you can imagine, creating these datasets is very complex, so AI labs are also looking at third party vendors that can create high quality environments and evaluations. Everyone is looking at this space.โ€

The push for RL environments has minted a new class of well-funded startups, such as Mechanize and Prime Intellect, that aim to lead the space. Meanwhile, large data-labeling companies like Mercor and Surge say theyโ€™re investing more in RL environments to keep pace with the industryโ€™s shifts from static datasets to interactive simulations. The major labs are considering investing heavily too: according to The Information, leaders at Anthropic have discussed spending more than $1 billion on RL environments over the next year.

The hope for investors and founders is that one of these startups emerge as the โ€œScale AI for environments,โ€ referring to the $29 billion data labelling powerhouse that powered the chatbot era.

The question is whether RL environments will truly push the frontier of AI progress.

Techcrunch event

San Francisco
|
October 27-29, 2025

What is an RL environment?

At their core, RL environments are training grounds that simulate what an AI agent would be doing in a real software application. One founder described building them in recent interview โ€œlike creating a very boring video game.โ€

For example, an environment could simulate a Chrome browser and task an AI agent with purchasing a pair of socks on Amazon. The agent is graded on its performance and sent a reward signal when it succeeds (in this case, buying a worthy pair of socks).

While such a task sounds relatively simple, there are a lot of places where an AI agent could get tripped up. It might get lost navigating the web pageโ€™s drop down menus, or buy too many socks. And because developers canโ€™t predict exactly what wrong turn an agent will take, the environment itself has to be robust enough to capture any unexpected behavior, and still deliver useful feedback. That makes building environments far more complex than a static dataset.

Some environments are quite elaborate, allowing for AI agents to use tools, access the internet, or use various software applications to complete a given task. Others are more narrow, aimed at helping an agent learn specific tasks in enterprise software applications.

While RL environments are the hot thing in Silicon Valley right now, thereโ€™s a lot of precedent for using this technique. One of OpenAIโ€™s first projects back in 2016 was building โ€œRL Gyms,โ€ which were quite similar to the modern conception of environments. The same year, Google DeepMindโ€™s AlphaGo AI system beat a world champion at the board game, Go. It also used RL techniques within a simulated environment.

Whatโ€™s unique about todayโ€™s environments is that researchers are trying to build computer-using AI agents with large transformer models. Unlike AlphaGo, which was a specialized AI system working in a closed environments, todayโ€™s AI agents are trained to have more general capabilities. AI researchers today have a stronger starting point, but also a complicated goal where more can go wrong.

A crowded field

AI data labeling companies like Scale AI, Surge, and Mercor are trying to meet the moment and build out RL environments. These companies have more resources than many startups in the space, as well as deep relationships with AI labs.

Surge CEO Edwin Chen tells TechCrunch heโ€™s recently seen a โ€œsignificant increaseโ€ in demand for RL environments within AI labs. Surge โ€” which reportedly generated $1.2 billion in revenue last year from working with AI labs like OpenAI, Google, Anthropic and Meta โ€” recently spun up a new internal organization specifically tasked with building out RL environments, he said.

Close behind Surge is Mercor, a startup valued at $10 billion, which has also worked with OpenAI, Meta, and Anthropic. Mercor is pitching investors on its business building RL environments for domain specific tasks such as coding, healthcare, and law, according to marketing materials seen by TechCrunch.

Mercor CEO Brendan Foody told TechCrunch in an interview that โ€œfew understand how large the opportunity around RL environments truly is.โ€

Scale AI used to dominate the data labeling space, but has lost ground since Meta invested $14 billion and hired away its CEO. Since then, Google and OpenAI dropped Scale AI as a data provider, and the startup even faces competition for data labelling work inside of Meta. But still, Scale is trying to meet the moment and build environments.

โ€œThis is just the nature of the business [Scale AI] is in,โ€ said Chetan Rane, Scale AIโ€™s head of product for agents and RL environments. โ€œScale has proven its ability to adapt quickly. We did this in the early days of autonomous vehicles, our first business unit. When ChatGPT came out, Scale AI adapted to that. And now, once again, weโ€™re adapting to new frontier spaces like agents and environments.โ€

Some newer players are focusing exclusively on environments from the outset. Among them is Mechanize, a startup founded roughly six months ago with the audacious goal of โ€œautomating all jobs.โ€ However, co-founder Matthew Barnett tells TechCrunch that his firm is starting with RL environments for AI coding agents.

Mechanize aims to supply AI labs with a small number of robust RL environments, Barnett says, rather than larger data firms that create a wide range of simple RL environments. To this point, the startup is offering software engineers $500,000 salaries to build RL environments โ€” far higher than an hourly contractor could earn working at Scale AI or Surge.

Mechanize has already been working with Anthropic on RL environments, two sources familiar with the matter told TechCrunch. Mechanize and Anthropic declined to comment on the partnership.

Other startups are betting that RL environments will be influential outside of AI labs. Prime Intellect โ€” a startup backed by AI researcher Andrej Karpathy, Founders Fund, and Menlo Ventures โ€” is targeting smaller developers with its RL environments.

Last month, Prime Intellect launched an RL environments hub, which aims to be a โ€œHugging Face for RL environments.โ€ The idea is to give open-source developers access to the same resources that large AI labs have, and sell those developers access to computational resources in the process.

Training generally capable agents in RL environments can be more computational expensive than previous AI training techniques, according to Prime Intellect researcher Will Brown. Alongside startups building RL environments, thereโ€™s another opportunity for GPU providers that can power the process.

โ€œRL environments are going to be too large for any one company to dominate,โ€ said Brown in an interview. โ€œPart of what weโ€™re doing is just trying to build good open-source infrastructure around it. The service we sell is compute, so it is a convenient onramp to using GPUs, but weโ€™re thinking of this more in the long term.โ€

Will it scale?

The open question around RL environments is whether the technique will scale like previous AI training methods.

Reinforcement learning has powered some of the biggest leaps in AI over the past year, including models like OpenAIโ€™s o1 and Anthropicโ€™s Claude Opus 4. Those are particularly important breakthroughs because the methods previously used to improve AI models are now showing diminishing returns.ย 

Environments are part of AI labsโ€™ bigger bet on RL, which many believe will continue to drive progress as they add more data and computational resources to the process. Some of the OpenAI researchers behind o1 previously told TechCrunch that the company originally invested in AI reasoning models โ€” which were created through investments in RL and test-time-compute โ€” because they thought it would scale nicely.

The best way to scale RL remains unclear, but environments seem like a promising contender. Instead of simply rewarding chatbots for text responses, they let agents operate in simulations with tools and computers at their disposal. Thatโ€™s far more resource-intensive, but potentially more rewarding.

Some are skeptical that all these RL environments will pan out. Ross Taylor, a former AI research lead with Meta that co-founded General Reasoning, tells TechCrunch that RL environments are prone to reward hacking. This is a process in which AI models cheat in order to get a reward, without really doing the task.

โ€œI think people are underestimating how difficult it is to scale environments,โ€ said Taylor. โ€œEven the best publicly available [RL environments] typically donโ€™t work without serious modification.โ€

OpenAIโ€™s Head of Engineering for its API business, Sherwin Wu, said in a recent podcast that he was โ€œshortโ€ on RL environment startups. Wu noted that itโ€™s a very competitive space, but also that AI research is evolving so quickly that itโ€™s hard to serve AI labs well.

Karpathy, an investor in Prime Intellect that has called RL environments a potential breakthrough, has also voiced caution for the RL space more broadly. In a post on X, he raised concerns about how much more AI progress can be squeezed out of RL.

โ€œI am bullish on environments and agentic interactions but I am bearish on reinforcement learning specifically,โ€ said Karpathy.

Update: A previous version of this article referred to Mechanize as Mechanize Work. It has been updated to reflect the companyโ€™s official name.



Source link

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments

Translate ยป