Qwen3-VL-8B NSFW Caption V4.5 gets 4-bit MLX quantization for Apple Silicon
gavinmroy released a 4-bit MLX quantization of disty0's Qwen3-VL-8B NSFW Caption V4.5, an uncensored image-to-text model under Apache 2.0 license.
gavinmroy released a 4-bit quantized version of Qwen3-VL-8B NSFW Caption V4.5 on HuggingFace, bringing the uncensored image captioning model to Apple Silicon users at reduced memory footprint. The checkpoint is a direct quantization of disty0's base model, packaged in MLX-compatible safetensors format for local inference on M-series Macs.
Qwen3-VL-8B NSFW Caption V4.5 is an image-text-to-text model that generates detailed captions for images without content filtering. The base model runs at 8 billion parameters; this 4-bit affine quantization cuts memory requirements by roughly 75 percent while preserving caption quality for most use cases. The model is licensed under Apache 2.0, making it freely usable for commercial and personal projects.
Technical details
The 4-bit affine quantization uses the MLX framework's native quantization scheme, which stores weights in 4-bit precision with per-channel affine scaling factors. This approach balances compression and accuracy better than naive 4-bit rounding, though users should expect minor degradation in edge cases compared to the full-precision checkpoint. The model supports conversational prompting, allowing iterative refinement of captions through follow-up queries. Qwen3-VL models typically support 4,096-token windows for combined image and text input.
Users running MLX on Apple Silicon can load the weights directly via the transformers library or MLX's own model loader.





