Federated RL agents train faster with personalized input normalization

A new personalized observation normalization technique lets each agent in federated reinforcement learning maintain its own running statistics, accelerating training when environments have different state dynamics.

ByAlex Sokoloff·May 28, 2026

Federated RL agents train faster with personalized input normalization

Personalized observation normalization (PON) is a federated reinforcement learning technique that addresses training slowdowns in heterogeneous simulation environments. The method, detailed in an arXiv preprint posted this week, tackles a core problem in collaborative AI training: when multiple agents learn a shared policy without exchanging raw data, differing state-transition dynamics across their local environments create non-identical input distributions that sabotage the global aggregation step.

In federated RL, privacy-sensitive applications—hospital robots or financial trading agents—benefit from keeping raw observations local while pooling learned behaviors. But when one agent trains on a lightweight robot arm and another on a heavy manipulator, or when friction coefficients vary across simulated warehouses, the resulting parameter updates arrive at the central server with wildly different scales. Traditional federated learning either skips input normalization or attempts to share a single set of running statistics across all agents, both of which fail when local distributions diverge. PON solves this by giving each agent its own running mean and variance, updated continuously as new state observations arrive. Each agent normalizes its raw inputs locally before feeding them into the policy network, ensuring consistent feature scaling without one agent's statistics overwhelming another's during weight averaging. Experiments on heterogeneous MuJoCo locomotion tasks show PON accelerates training and delivers superior final performance compared to baseline methods.

ZenCreator

Federated RL agents train faster with personalized input normalization

More in Research

Apple accuses OpenAI of soliciting hardware prototypes in job interviews

Lightweight proxy models cut LLM post-training costs while enabling cross-model signal reuse

Colibri runs 744B GLM-5.2 on 25GB RAM by streaming experts from disk

Anthropic extends Fable 5 preview a second week, bumps rate limits 50%

Soofi S 30B activates 3B parameters per token, tops European AI baselines