Gemini Omni cannot generate a backflip video, exposing motion synthesis gap
Google's multimodal Gemini Omni model cannot generate a simple backflip video on command, revealing limitations in temporal coherence and physics modeling that rival systems like Sora and Gen-3 handle routinely.
Google's Gemini Omni model struggles with basic video generation despite its multimodal capabilities. When prompted to create a video of someone performing a backflip, the model accepted the request but failed to produce usable output—a limitation that underscores the gap between processing diverse input types and synthesizing coherent motion.
