EtherealRainbow v0.3 8B quantized for local inference on consumer GPUs
Beathazard released EtherealRainbow v0.3 8B in IQ4_XS GGUF format, a quantized merge optimized for llama.cpp and tagged not-for-all-audiences.
EtherealRainbow v0.3 8B, an uncensored language model from Beathazard, is now available in IQ4_XS GGUF format on HuggingFace. The quantized weights target llama.cpp users running inference on consumer hardware, with the IQ4_XS compression scheme trading precision for memory savings. The model card carries a not-for-all-audiences tag, signaling unrestricted output capability.
The release is a mergekit product—Beathazard combined multiple base models or fine-tunes into a single checkpoint using the standard open-weight blending toolkit. Mergekit lets creators splice together instruction tuning, domain expertise, and personality traits from different sources without retraining from scratch. The resulting models often occupy niches that commercial vendors won't touch: uncensored creative writing, roleplay, or domain-specific tasks that fall outside mainstream safety guidelines.
IQ4_XS is one of the more aggressive quantization levels in the GGUF family, typically landing around 4 bits per weight with importance-weighted rounding. That compression lets an 8B-parameter model fit comfortably in 4–6 GB of VRAM, making it accessible to users with mid-range consumer GPUs or even CPU-only setups willing to tolerate slower token generation. The model supports English and ships with standard transformers-compatible metadata, making it drop-in compatible with text-generation-webui, KoboldCpp, and other llama.cpp frontends.
