MobileEgo Anywhere uses smartphones to capture 200 hours of long-horizon robot training data
Researchers release an open-source framework and dataset that uses commodity smartphones to capture hour-long egocentric trajectories, removing hardware barriers to Vision Language Action model training.

Long-horizon egocentric data is critical for training Vision Language Action models, yet most robotics datasets capture only minutes-long episodes—too short to teach complex, real-world task sequences.
MobileEgo Anywhere, detailed in a preprint released this week, solves that by repurposing smartphone sensors as persistent egocentric recorders. The system leverages cameras, IMUs, and depth sensors already built into modern mobile devices to track camera pose reliably over hour-plus sessions, eliminating the need for motion-capture rigs or specialized robotics hardware. Researchers Senthil Palanisamy, Abhishek Anand, Satpal Singh Rathor, Pratyush Patnaik, and Shubhanshu Khatana released three components: a 200-hour dataset of diverse long-form egocentric trajectories with continuous state tracking, an open-source mobile app enabling any user to record their own data, and a processing pipeline that converts raw smartphone sensor streams into standardized, training-ready formats for VLA and foundation model research.
By removing the hardware barrier, the framework democratizes egocentric data collection across global environments and real-world task contexts that lab-based setups rarely capture. The processing pipeline handles the full conversion from smartphone streams to model-ready outputs, and both the app and dataset are open-sourced to accelerate development of generalizable robotic policies that can handle extended real-world horizons.