OpenWebRL-4B reaches 67% success on live-web tasks with minimal supervised data
New open framework trains visual web agents via online reinforcement learning on real websites, achieving state-of-the-art open results with just 400 initialization examples.

Researchers say online reinforcement learning can break the expensive demonstration bottleneck that has held back open-source visual web agents. OpenWebRL, a new framework detailed in a preprint this week, trains agents directly on live websites using multi-turn RL, achieving state-of-the-art open results with just 0.4K initialization trajectories and 2.2K RL training tasks.
OpenWebRL-4B, the 4-billion-parameter model trained under the framework, scored 67.0% success on Online-Mind2Web and 64.0% on DeepShop—live-web benchmarks that test long-horizon reasoning and interaction with dynamic sites. Those numbers exceed prior open agents at similar or larger scale and remain competitive with OpenAI's CUA and Gemini CUA, both closed systems. The framework covers the full training pipeline: scalable live-browser infrastructure, supervised initialization, multimodal context management, trajectory-level success judging, and efficient multi-turn policy optimization.
Authors Rui Yang, Qianhui Wu, Yuxi Chen, Hao Bai, Wenlin Yao, and Hao Cheng systematically examined which design choices make online RL effective for visual web agents and analyzed how RL improves agentic reasoning beyond supervised fine-tuning. The team plans to release training data, models, and code, offering a practical path toward building more capable, reproducible, and cost-efficient open web agents without the scalability bottleneck of curated demonstration datasets.



