SupraLabs launches 2M-parameter Supra-Mini for edge inference
A new open-source model lab is releasing small language models under 10 million parameters, targeting edge deployment and local inference on consumer hardware.
SupraLabs, a newly founded open-source model lab, announced this week it is training and releasing small language models in the 2–10 million parameter range, explicitly targeting edge devices and local inference.
The first release, Supra-Mini-v4-2M, is a 2-million-parameter checkpoint now available on HuggingFace under an open license. The team says they're focused on "making AI accessible to everyone" by shrinking model size without sacrificing usability for basic tasks. Upcoming releases include StorySupra 10M, a 10-million-parameter storytelling model, and Supra Mini v5 at 5 million parameters, both listed as coming soon.
All weights live at huggingface.co/SupraLabs. The team is posting updates on a HuggingFace Spaces blog and recruiting contributors through community discussions on the platform. The announcement positions SupraLabs as a pure open-source effort, with the team inviting collaborators to join via HuggingFace community threads or direct outreach.
The sub-10M parameter range puts these models well below the smallest widely-deployed open weights. For context, Qwen2.5-0.5B sits at 494 million parameters, and even Microsoft's Phi-3-mini lands at 3.8 billion. A 2-million-parameter model is roughly the scale of older embedding networks or specialized task models, not general-purpose chat. Whether a model that small can handle coherent multi-turn conversation or follow complex instructions remains an open question.
No benchmark numbers, architecture details, or training corpus information were shared in the announcement. The team didn't specify what tasks the 2M checkpoint was trained on, what tokenizer it uses, or how it compares to existing small-model baselines like TinyLlama or MobileLLM. The 2M checkpoint is downloadable now; the 5M and 10M variants are listed as coming soon with no release timeline.
The pitch is clear: models small enough to run on edge hardware without a GPU, trained and released in the open. Whether the performance justifies the size trade-off is the question practitioners will answer once they download the weights and run inference locally.
