BitCPM4-CANN 1B, 3B, 8B land on HuggingFace with 1.58-bit weights
OpenBMB released three BitNet-architecture language models this week — 1B, 3B, and 8B parameters — all quantized to 1.58 bits per weight and hosted on HuggingFace.
BitCPM4-CANN is a family of 1.58-bit language models from OpenBMB that landed on HuggingFace on May 18 in three sizes: 1B, 3B, and 8B parameters. The models use BitNet architecture, which represents weights at extreme low precision — 1.58 bits per parameter instead of the 16-bit or 8-bit formats common in quantized models. Standard 8-bit quantization cuts memory use roughly in half compared to FP16; 1.58-bit weights push that reduction further, potentially fitting an 8B model in under 2 GB of VRAM. That compression matters for practitioners running models on consumer hardware — a MacBook with 16 GB of unified memory could theoretically load the 8B checkpoint with room left for context and system overhead. The tradeoff is precision: fewer bits per weight means coarser representations of learned patterns, which can degrade output quality compared to higher-precision checkpoints of the same parameter count.
OpenBMB published all three checkpoints under open-weight licenses, but benchmark numbers, training corpus size, and eval results are not yet public. Without published perplexity scores or MMLU numbers, it's unclear how the 1.58-bit quantization affects accuracy relative to standard 8-bit or 16-bit models at similar scale. The immediate blocker for local users is llama.cpp support — Jan, a popular desktop LLM client built on llama.cpp, cannot yet load the BitCPM4-CANN weights because the underlying runtime hasn't added support for the BitNet format. Once llama.cpp merges BitNet inference, the models should slot into existing workflows that already handle Llama, Mistral, and Qwen checkpoints. All three are available now at huggingface.co/openbmb/BitCPM4-CANN-{1B,3B,8B}.
