Qwen3-VL-8B NSFW Caption V4.5 gets 4-bit MLX quantization for Apple Silicon

gavinmroy released a 4-bit MLX quantization of disty0's Qwen3-VL-8B NSFW Caption V4.5, an uncensored image-to-text model under Apache 2.0 license.

ByAlex Sokoloff·July 2, 2026

Qwen3-VL-8B NSFW Caption V4.5 gets 4-bit MLX quantization for Apple Silicon

gavinmroy released a 4-bit quantized version of Qwen3-VL-8B NSFW Caption V4.5 on HuggingFace, bringing the uncensored image captioning model to Apple Silicon users at reduced memory footprint. The checkpoint is a direct quantization of disty0's base model, packaged in MLX-compatible safetensors format for local inference on M-series Macs.

Qwen3-VL-8B NSFW Caption V4.5 is an image-text-to-text model that generates detailed captions for images without content filtering. The base model runs at 8 billion parameters; this 4-bit affine quantization cuts memory requirements by roughly 75 percent while preserving caption quality for most use cases. The model is licensed under Apache 2.0, making it freely usable for commercial and personal projects.

Technical details

The 4-bit affine quantization uses the MLX framework's native quantization scheme, which stores weights in 4-bit precision with per-channel affine scaling factors. This approach balances compression and accuracy better than naive 4-bit rounding, though users should expect minor degradation in edge cases compared to the full-precision checkpoint. The model supports conversational prompting, allowing iterative refinement of captions through follow-up queries. Qwen3-VL models typically support 4,096-token windows for combined image and text input.

Users running MLX on Apple Silicon can load the weights directly via the transformers library or MLX's own model loader.

ZenCreator

Qwen3-VL-8B NSFW Caption V4.5 gets 4-bit MLX quantization for Apple Silicon

Technical details

More in Releases

Jackxuanxuan drops two abliterated Gemma-4 31B checkpoints in GGUF format

Krea 2 NSFW V4 LoRA hits 19K downloads with H100-trained skin realism

Anthropic restores Fable 5 access worldwide after export restrictions lifted

Autodata framework trains 4B models to outperform 397B giants on code and law

Manifestation Units protocol makes neural network analyses queryable and reusable