ReleasesNSFW

Mero-Artemis 31B v0.3.1 GGUF quantizations arrive for local uncensored inference

Two GGUF quantization sets of the 31-billion-parameter Mero-Artemis v0.3.1 merge appeared on HuggingFace this week, tagged not-for-all-audiences and built from sophosympatheia's base weights.

May 14, 2026

Mero-Artemis 31B v0.3.1 GGUF quantizations arrive for local uncensored inference

Two GGUF quantization releases for Mero-Artemis 31B v0.3.1 hit HuggingFace on May 14, both uploaded by mradermacher and derived from sophosympatheia's base merge.

Mero-Artemis 31B v0.3.1 is a 31-billion-parameter English-language model built with mergekit, a toolkit for combining pretrained weights into new checkpoints. The base weights carry a not-for-all-audiences tag, signaling unrestricted content generation without safety filters. Two separate GGUF repos now offer quantized versions: mradermacher/Mero-Artemis-31B-v0.3.1-GGUF and mradermacher/Mero-Artemis-31B-v0.3.1-i1-GGUF, posted within twenty minutes of each other.

The GGUF format allows CPU and mixed-precision inference on consumer hardware, a common path for running large uncensored merges locally without enterprise GPU clusters. Quantization trades some precision for dramatically lower memory footprint—31 billion parameters at full FP16 would require roughly 62 GB of VRAM, while aggressive GGUF quants can squeeze the same model onto 24 GB or less. Neither repo has logged downloads or likes yet, suggesting the releases are fresh.

The model cards do not detail the merge recipe, benchmark scores, or training corpus. Sophosympatheia's original base repo would typically hold that metadata, but the quantized forks focus solely on inference-ready weights. The not-for-all-audiences tag is a HuggingFace convention for models that lack content moderation, a label that has become standard for community merges targeting roleplay, creative writing, and other use cases where safety tuning interferes with output quality.

Mradermacher has built a reputation for rapid GGUF conversions of popular open-weight releases, often posting multiple quantization profiles within hours of an upstream merge. The dual repos here—one standard, one labeled i1—likely represent different imatrix or importance-matrix quantization strategies, though the cards do not specify the distinction. Users typically test both and pick the version that balances perplexity and VRAM budget for their hardware.

The 31-billion-parameter class sits between the 13B models that run comfortably on gaming GPUs and the 70B+ giants that demand multi-GPU rigs or high-end workstations. For practitioners chasing uncensored inference without cloud API costs, this size bracket offers a practical middle ground.

More in Releases