Gemini Omni Flash passes tongue-rotation test; Kling and Seedance fail
A new prompt-following challenge for video models asked four platforms to generate 10 seconds of circular tongue rotation. Gemini Omni Flash was the only model to complete the motion convincingly.

Gemini Omni Flash, Google's multimodal video model, is the only platform to pass a new prompt-following test this week: generate 10 seconds of a tongue rotating in a circle. Four models—Omni Flash, Grok Imagine 1.5, Seedance 2.0, and Kling 3.0 Pro—each received four attempts with the identical prompt. Only Omni produced realistic circular motion across attempts.
Kling 3.0 Pro from Kuaishou failed the instruction every time. The tongue moved sideways, twisted, stretched, and performed what observers called "strange anatomical experiments," but never completed a full rotation. Visual quality was decent, but motion was unnatural and inconsistent. Seedance 2.0 from ByteDance performed better than Kling but still missed the circular motion requirement. The movements looked more polished, yet the model couldn't execute the core instruction. Grok Imagine 1.5 from xAI delivered a surprisingly strong result. The output retained Grok's slightly cartoonish visual style, but the tongue actually rotated. The motion wasn't perfect and didn't always complete a full circle, but the model understood the task and made a genuine attempt.
Gemini Omni Flash stood alone in delivering both realistic visuals and accurate circular motion. The tongue completed the rotation cleanly across multiple attempts, with controlled movement that matched the prompt. The test exposes a persistent gap in video models' ability to interpret and execute precise physical instructions—especially when the motion involves anatomy that doesn't follow typical training data patterns. As video generation moves beyond static composition into fine motor control, the next frontier is whether models can handle arbitrary physical constraints without falling back on learned motion templates. Expect more tests like this as practitioners push beyond aesthetic benchmarks into functional prompt adherence.



