ZenCreator

Pro-grade AI content creation. Image, video, face-swap, lipsync, and upscaling behind one API.

14 tools

Up to 4K

4.4(288)

Visit

Loading…

RL4F: Open-source offline RL benchmark for tokamak plasma control | UncensoredHub

Research

RL4F: Open-source offline RL benchmark for tokamak plasma control

Researchers released RL4F, an open-source benchmark for offline reinforcement learning in nuclear fusion plasma control, built from historical DIII-D tokamak discharge data across four full-profile tracking tasks.

ByAlex Sokoloff·June 10, 2026

RL4F: Open-source offline RL benchmark for tokamak plasma control

A team has released RL4F, an offline reinforcement learning benchmark for plasma control in nuclear fusion, built from historical discharge data from DIII-D, a real-world tokamak operated by General Atomics. The benchmark provides closed-loop evaluation environments and baseline comparisons across four full-profile tracking tasks: rotation, density, temperature, and pressure control.

The work addresses a longstanding gap in fusion research — the lack of standardized benchmarks for testing multi-actuator, long-horizon plasma control algorithms without risking damage to expensive tokamak hardware. Offline RL trains controllers on historical data rather than live trial-and-error, making it a safer route for developing plasma control policies. The RL4F benchmark evaluates a broad set of imitation learning and offline RL baselines under a unified protocol, with dynamics models derived from actual DIII-D operations.

What stands out

01Offline model-based RL methods achieved the best average performance across most objectives, outperforming imitation learning and model-free offline RL baselines on rotation, density, temperature, and pressure tracking tasks.
02No single method dominated all four tasks, indicating that plasma control remains a challenging domain where algorithm choice matters. The benchmark reveals that dynamics modeling is critical for long-horizon control in fusion environments.
03The benchmark is built from real DIII-D tokamak discharge data, not simulated physics. The dynamics function underlying the evaluation environment reflects historical multi-actuator control sequences, making the benchmark representative of real-world plasma behavior.
04Full closed-loop evaluation environments are included, not just datasets. Researchers can train offline RL agents on historical data and then test them in a simulated tokamak loop that mimics DIII-D's response characteristics.

ZenCreator

RL4F: Open-source offline RL benchmark for tokamak plasma control

What stands out

More in Research

PAJAMA distills LLM judges into programs, cuts eval cost by 100×

Molt: NVIDIA's PyTorch framework cuts agentic RL iteration cost

Hypernetworks outscale LoRA for train-time knowledge injection in LLMs

Staleness-Adaptive Trust Region cuts asynchronous RL performance loss to 3% at 8× policy lag

Distilled RL transfers knowledge across model families without unconditional imitation