Six open-source video and image models ship this week
CausalCine, SwiftI2V, HiDream-O1-Image, OmniGen2, CDM, and PhysForge arrive with new approaches to long-form coherence, efficient 2K generation, unified architectures, and physics-grounded 3D synthesis.
CausalCine, an interactive autoregressive framework for multi-shot video narratives, tackles motion stagnation and semantic drift in long-form generation by replacing temporal-proximity memory with Content-Aware Memory Routing. The system retrieves historical key-value entries based on attention relevance rather than recency, keeping generated video coherent across extended rollouts. It distills to a few-step generator for real-time use. The arXiv preprint and GitHub repo arrived this week.
SwiftI2V delivers 2K image-to-video generation through a two-stage pipeline: low-resolution motion drafting followed by high-resolution refinement that preserves source image detail. OmniGen2 unifies text-to-image, editing, subject-driven generation, and visual conditioning in a single architecture; the paper is public but weights remain unreleased. HiDream-O1-Image, a natively unified image generation foundation model, ships an 8-billion-parameter checkpoint with full code and weights on HuggingFace.
CDM, a continuous-time distribution matching method, distills diffusion models to fewer steps without quality loss. Released checkpoints cover Stable Diffusion 3 Medium and Longcat. PhysForge generates physics-grounded 3D assets with parts, materials, joints, mass, and movement rules for simulation and game engines. All six projects posted preprints or releases between May 7 and May 14, with HiDream-O1-Image the only one shipping open weights at a named parameter count: 8 billion.