LC-MAPF: learned agent communication cuts multi-robot pathfinding errors by 16 percent
A new preprint introduces Local Communication for Multi-agent Pathfinding (LC-MAPF), a pre-trained model that lets neighboring agents exchange features across multiple rounds to coordinate trajectory planning without sacrificing scalability.
Researchers at Moscow Institute of Physics and Technology and Yandex Research have published a preprint on Local Communication for Multi-agent Pathfinding (LC-MAPF), a decentralized solver that frames multi-agent coordination as a partially observable Markov decision process with a learnable communication layer. Posted to HuggingFace on May 15, the model lets agents share local observations with neighbors over multiple communication rounds before choosing actions, improving coordination in dense scenarios where collision avoidance is critical.
Multi-agent pathfinding—getting multiple robots or agents from start positions to goal positions without collisions—is NP-hard to solve optimally, so recent work has turned to reinforcement learning and imitation learning to train decentralized policies that scale. LC-MAPF follows that pattern but introduces a communication module that exchanges feature vectors between agents within a fixed radius. Each agent encodes its local observation (a partial map plus nearby agent positions), broadcasts features to neighbors, aggregates incoming messages, and repeats the cycle for a fixed number of rounds before outputting an action. The architecture is trained via imitation learning on expert trajectories generated by a classical MAPF solver.
Benchmark results
The authors tested LC-MAPF against existing learning-based solvers—DHC (an RL method), SCRIMP (imitation learning), and others—across warehouse, maze, and random obstacle maps with agent counts from 32 to 256. LC-MAPF achieved higher success rates (the fraction of agents reaching goals without collision) and lower makespan (total time until all agents finish) in every scenario. In one 64-agent warehouse test, LC-MAPF hit a 94 percent success rate versus 78 percent for the next-best baseline—a 16-point improvement. Crucially, inference time scaled linearly with agent count; adding communication rounds increased per-step compute by roughly 30 percent but did not break the linear scaling that makes decentralized solvers practical.
The preprint does not release weights or code yet, though the authors note the model is "generalizable" and pre-trained, suggesting they intend to share artifacts. The communication mechanism itself is the main contribution—prior decentralized MAPF solvers either skip inter-agent messaging entirely or use hand-coded protocols that don't adapt to the scenario. LC-MAPF learns which features to broadcast and how to weight incoming messages, letting agents implicitly negotiate who yields in tight corridors without explicit turn-taking rules.
