ByteDance Lance 3B: Apache 2.0 multimodal model for image and video editing
ByteDance released Lance, a 3-billion-parameter multimodal model for image and video understanding, generation, and editing, under the Apache 2.0 license.
Lance is a 3-billion-parameter multimodal model from ByteDance that handles image and video understanding, generation, and editing under the Apache 2.0 license. The model ships with weights on HuggingFace and source code on GitHub, positioning it as a compact open alternative for practitioners running vision workflows locally without the overhead of closed APIs or models ten times the size.
The release targets use cases across computer vision pipelines — captioning, object detection, video frame synthesis, and in-place editing — in a single 3B checkpoint. ByteDance designed Lance to run on consumer hardware while maintaining capability across modalities, a balance that has historically required either closed APIs or models an order of magnitude larger. The Apache 2.0 license permits commercial use without restriction, making it viable for both research prototypes and production deployments. Practitioners who have been stitching together separate models for each stage of a vision pipeline now have a unified option that fits in VRAM budgets under 16GB.
For developers already running Stable Diffusion or FLUX workflows locally, Lance offers a path to add video understanding and editing without switching to a closed service. The Apache 2.0 terms and sub-4B parameter footprint make it a practical candidate for ComfyUI custom nodes, A1111 extensions, and other community tooling that assumes local execution and unrestricted fine-tuning. Weights and source code are available on HuggingFace and GitHub as of this week.
