Ovis2.6-80B-A3B reaches 80B parameters with 3B active inference
AIDC-AI released Ovis2.6-80B-A3B, an 80B-parameter multimodal model with Mixture-of-Experts architecture that activates only 3B parameters during inference, supporting 64K context and 2880×2880 image resolution.

AIDC-AI released Ovis2.6-80B-A3B on May 13, a multimodal large language model that scales to 80 billion total parameters while activating only 3 billion during inference. The model uses a Mixture-of-Experts (MoE) architecture to keep serving costs low while expanding capacity for vision-language tasks. It extends context length to 64,000 tokens and supports image resolutions up to 2880×2880 pixels, targeting long-document question answering and high-resolution visual analysis.
The release introduces "Think with Image" reasoning, where the model can invoke visual tools—cropping, rotation—mid-inference to re-examine image regions during its chain-of-thought. AIDC-AI positions this as active visual cognition rather than passive input processing, aiming for multi-turn self-correction on complex visual reasoning tasks. The model card emphasizes reinforced OCR, document understanding, and chart analysis, claiming the system both extracts structured data and reasons over it.
Ovis2.6-80B-A3B joins a recent wave of AIDC-AI releases including Marco-Mini-Instruct, Marco-Nano-Instruct, Marco-DeepResearch-8B, and Ovis2.6-30B-A3B. The checkpoint is available on HuggingFace under the AIDC-AI organization. The open question is whether the 3B active parameter count holds across all task types or only on average—MoE routing can spike when visual and text experts fire simultaneously. AIDC-AI has not yet published eval numbers comparing Ovis2.6-80B to the earlier 30B variant or to other open multimodal MoE models, and the model card does not specify the license. Watch for independent community benchmarks and license clarification in the coming days.