Sakiko Prompt Gen v2.0 preview1 arrives as Q4_K_M GGUF for local multilingual generation
lys1os released a quantized GGUF checkpoint of Sakiko Prompt Gen v2.0 preview1 in Q4_K_M format, supporting multilingual text generation in Chinese and Japanese via llama.cpp.
lys1os released Sakiko-Prompt-Gen-v2.0-preview1-Q4_K_M-GGUF on HuggingFace on May 14, a quantized checkpoint designed for prompt generation workflows. The model ships in GGUF format at Q4_K_M precision, optimized for local inference via llama.cpp. It supports multilingual text generation with explicit Chinese and Japanese language tags, positioning it for cross-language creative prompting tasks.
The release carries a "not-for-all-audiences" tag on HuggingFace, indicating unrestricted output capability. Open-weight GGUF models run locally without server-side safety enforcement, making them viable for uncensored prompt engineering. The Q4_K_M quantization balances memory footprint and output quality, targeting consumer-grade hardware.
Local inference and compatibility
GGUF is the native weight format for llama.cpp, the CPU/GPU inference engine that powers Ollama, LM Studio, and dozens of community tools. Q4_K_M uses 4-bit quantization with mixed precision on key layers, a common trade-off for running 7B–13B parameter models on 16–24 GB VRAM cards. The model card does not list parameter count or base architecture, but the "transformers" and "llama-cpp" tags suggest a Llama-derived backbone.
The checkpoint is available now at lys1os/Sakiko-Prompt-Gen-v2.0-preview1-Q4_K_M-GGUF. No license information appears in the card metadata; users should verify terms before commercial deployment.
