RL4F: Open-source offline RL benchmark for tokamak plasma control
Researchers released RL4F, an open-source benchmark for offline reinforcement learning in nuclear fusion plasma control, built from historical DIII-D tokamak discharge data across four full-profile tracking tasks.

A team has released RL4F, an offline reinforcement learning benchmark for plasma control in nuclear fusion, built from historical discharge data from DIII-D, a real-world tokamak operated by General Atomics. The benchmark provides closed-loop evaluation environments and baseline comparisons across four full-profile tracking tasks: rotation, density, temperature, and pressure control.
The work addresses a longstanding gap in fusion research — the lack of standardized benchmarks for testing multi-actuator, long-horizon plasma control algorithms without risking damage to expensive tokamak hardware. Offline RL trains controllers on historical data rather than live trial-and-error, making it a safer route for developing plasma control policies. The RL4F benchmark evaluates a broad set of imitation learning and offline RL baselines under a unified protocol, with dynamics models derived from actual DIII-D operations.
What stands out
- 01Offline model-based RL methods achieved the best average performance across most objectives, outperforming imitation learning and model-free offline RL baselines on rotation, density, temperature, and pressure tracking tasks.
- 02No single method dominated all four tasks, indicating that plasma control remains a challenging domain where algorithm choice matters. The benchmark reveals that dynamics modeling is critical for long-horizon control in fusion environments.
- 03The benchmark is built from real DIII-D tokamak discharge data, not simulated physics. The dynamics function underlying the evaluation environment reflects historical multi-actuator control sequences, making the benchmark representative of real-world plasma behavior.
- 04Full closed-loop evaluation environments are included, not just datasets. Researchers can train offline RL agents on historical data and then test them in a simulated tokamak loop that mimics DIII-D's response characteristics.






