What Pony Diffusion V6 XL Actually Is
Pony Diffusion V6 XL is an SDXL 1.0 fine-tune released in January 2024 by AstraliteHeart and the PurpleSmartAI team, trained on roughly 2.6 million curated images spanning anthro, feral, human, cartoon, and anime sources. It is the most-downloaded uncensored image checkpoint in the open-source ecosystem, and the first SDXL derivative that produced explicit, anatomically coherent output by default — no LoRA stack, no jailbreak prompt, no negative-prompt incantations to bypass safety alignment. It does not have safety alignment to bypass.
That single design decision — train an SDXL fine-tune on a dataset that was never filtered for adult content in the first place — is why everything downstream of Pony exists. The score-tag prompt syntax, the source-tag system, the 15,000-plus LoRA ecosystem, the 2024 schism with Civitai, and the eventual mass migration to Illustrious XL all trace back to the fact that someone shipped a competent SDXL anime-and-anthro model that didn't pretend it was for something else.
The base architecture is unmodified SDXL 1.0 — the same 2.6B-parameter UNet, the same dual text encoders (CLIP-L and OpenCLIP bigG), the same VAE. What changed is everything downstream of the weights: the tokenizer behavior is reshaped by the score-tag training regime, the latent space is pulled hard toward illustrated subjects, and the model lost most of SDXL's ability to render photorealistic landscapes or text. In exchange it gained near-total command of character anatomy, pose, and explicit content, which turned out to be the trade most of the open-source community wanted.
Why It Broke Civitai
For most of the first half of 2024, Pony Diffusion V6 XL and its derivatives dominated Civitai's NSFW model rankings, image feed, and front-page resource lists. Then the rankings changed. In mid-2024 Civitai adjusted how it surfaced models on the public NSFW front pages — the exact algorithmic detail was never fully published, but the visible effect was that Pony-base resources stopped appearing where they had previously been default. Some Pony fine-tunes were quietly age-gated or moved behind authentication walls; others lost rank weight; the model page itself remained accessible but several derivatives were repeatedly delisted, restored, and delisted again over a period of weeks.
AstraliteHeart's public response was measured but unambiguous — the position was that the platform was making editorial decisions inconsistent with how the same content had been treated for the prior year, and that fine-tune authors who had built audiences on Civitai had legitimate cause to look elsewhere. The team did not pull Pony from Civitai outright, and the official model page is still hosted there, but the public messaging shifted toward Hugging Face mirrors and the PurpleSmartAI infrastructure for distribution.
What followed was a quiet but consequential migration. Several of the most prolific fine-tune authors who had previously trained on Pony — the same group responsible for most of the popular checkpoints in the realistic-anime space — switched their next-generation models to a different SDXL fine-tune called Illustrious XL, released by OnomaAI in late 2024. Illustrious had a cleaner training set focused on Danbooru-tagged anime, no score-tag system, and a base license that fine-tune authors found easier to redistribute under. Within roughly six months of the Civitai ranking change, the center of gravity for new anime-leaning fine-tunes had moved from Pony base to Illustrious base. Pony itself kept its furry and anthro audience, kept its photorealistic-women fine-tune ecosystem, and kept its dominant download counts, but it stopped being the default substrate for new work.
Inside the 2.6M-Image Dataset
The 2.6 million figure is what makes Pony structurally different from every other SDXL fine-tune in its weight class. Most aesthetic SDXL fine-tunes train on between 100,000 and 800,000 images, often heavily filtered to a single style domain — anime only, or photorealistic only, or one specific 3D-render aesthetic. Pony's dataset crosses domains deliberately. The curation pulled from anthro art (the historical PurpleSmartAI corpus), feral animal art, Western cartoon styles, full anime, and a substantial slice of human-figure illustration and photography-adjacent renders. Everything got retagged into a unified vocabulary before training.
That unified vocabulary is what the source_pony, source_anime, source_furry, and source_cartoon tags actually control. They are not stylistic suggestions layered on top of a generic model — they are the training-time bucket labels the model used to organize its visual concepts, exposed at inference. Prompting with source_anime shifts the latent toward the anime-tagged slice of the training distribution; combining source_anime with source_furry produces the specific anthro-anime hybrid that Pony renders better than any other public SDXL checkpoint. Character tags and artist tags work the same way — the model was trained on Danbooru-style tag vocabulary with characters and artists named explicitly, which is why specific-character generation works without a LoRA for a surprising fraction of well-known subjects.
The strength of cross-domain training is style transfer that no single-domain model can match. The weakness is bleed. Prompt for a photorealistic human and you will sometimes get subtly cartoon-proportioned anatomy, slightly oversized eyes, or a faint illustration-grade rim light the model picked up from its anime slice. Most of the human-realism fine-tunes built on Pony exist specifically to push the latent away from this bleed, which is also why those fine-tunes work — the photorealistic concepts are in there, just not on the surface.
The base license is Fair AI Public License 1.0-SD, a copyleft variant intended for AI model weights. In practice it permits commercial and non-commercial use, allows fine-tuning and redistribution, and requires that derivative model weights remain under the same license. It is not OpenRAIL — there are no use-case restrictions baked into the license text — which is part of why fine-tune authors adopted it readily.
The score_9 Tag System (And Why You Have To Use It)
Every Pony prompt in the wild starts with some variant of score_9, score_8_up, score_7_up, score_6_up, score_5_up, score_4_up. This is not a magic incantation and it is not aesthetic ranking. During training, every image in the dataset was assigned a quality score from 1 to 9 based on a mixture of community ratings and automated quality-classifier output, and the score was injected into the caption as a tag. The _up suffix is cumulative — score_7_up was attached to every image scoring 7 or higher, score_6_up to every image scoring 6 or higher, and so on. So a single high-quality image carried multiple stacked score tags during training.
At inference, prompting with the full score stack tells the model to draw from the intersection of every quality bucket — which in practice means the highest-quality region of the training distribution. Removing the score tags doesn't break the model, but it visibly degrades output. Anatomy gets looser, line quality drops, color saturation flattens, and the kind of small-detail competence that distinguishes Pony from base SDXL evaporates.
The negative-prompt counterparts are the inverse — score_6, score_5, score_4, score_3, score_2, score_1 in the negative prompt, sometimes accompanied by worst quality, low quality, normal quality. Some users add source_pony, source_furry to the negative prompt when they want to push the output toward anime or human-realism and away from anthro defaults; this works but is a blunt instrument and tends to also strip the model's anatomical strengths.
A clean baseline prompt looks like this:
``text score_9, score_8_up, score_7_up, score_6_up, score_5_up, score_4_up, source_anime, 1girl, solo, standing in a forest clearing, soft morning light, detailed background, looking at viewer, casual outfit ``
Negative:
``text score_6, score_5, score_4, score_3, score_2, score_1, worst quality, low quality, blurry, jpeg artifacts, watermark, text, signature ``
For a more illustrative hybrid pull:
``text score_9, score_8_up, score_7_up, source_anime, source_cartoon, 1boy, urban rooftop at dusk, neon signage, leather jacket, dynamic pose, cinematic lighting, depth of field ``
The score stack is verbose, ugly, and non-negotiable. Every Pony fine-tune in the catalog inherits it.
How to Run It Locally
The checkpoint file is roughly 6.78 GB in fp16 safetensors format. The SDXL VAE is baked in — no separate VAE file required, and using an external SDXL VAE will produce identical output to the baked-in one.
Hardware floor is 8 GB of VRAM at native SDXL resolution (1024×1024) with no hires fix, using a recent build of Forge or ComfyUI with their default memory optimizations. 12 GB is the practical comfortable minimum if you want to run hires fix at 1.5x without offloading. 16 GB or more is what you want for batch generation, ControlNet stacks, or running multiple LoRAs simultaneously.
Tooling support is universal across the SDXL frontend ecosystem — A1111, Forge, ComfyUI, Fooocus, SD.Next, and InvokeAI all load Pony as a standard SDXL checkpoint with no special handling. ComfyUI gets the most use in production workflows because its node graph makes the score-tag prefix easy to template; Forge is the most common desktop-user choice because it handles the SDXL memory profile better than upstream A1111.
Recommended sampler settings, validated across the fine-tune ecosystem:
| Setting | Recommended | Notes |
|---|---|---|
| Sampler | Euler a or DPM++ 2M Karras | Euler a is softer, DPM++ 2M sharper |
| Steps | 25 to 30 | More steps stop helping past ~30 |
| CFG | 5 to 7 | Above 8 the model burns and over-saturates |
| Resolution | 1024×1024 native | 832×1216 portrait, 1216×832 landscape |
| Clip skip | 2 | Standard for anime-leaning SDXL |
| Hires fix | Latent upscaler, 1.5x, 0.4 denoise | 0.5+ denoise will redraw the image |
The 832×1216 portrait aspect is what most character generation actually uses in practice — full-body framing fits cleanly, and 1.5x hires fix to 1248×1824 gives a usable final resolution without the tiling artifacts that show up past 2x.
For users specifically chasing photorealistic women rather than illustration, Babes By Stable Yogi (Pony) is a more direct fit than base Pony and uses the same prompt syntax.
The Pony Ecosystem
The LoRA count crossed 15,000 at some point in 2025 and is no longer meaningfully countable — Civitai alone hosts the bulk of the public ones, and there are private and Discord-distributed LoRAs on top. Character LoRAs make up the largest share, followed by style LoRAs, followed by concept LoRAs covering specific compositions, outfits, and settings. The score-tag system means every Pony LoRA is implicitly trained against the same prompt-prefix convention, which is part of why the ecosystem composes so cleanly — you can stack three or four LoRAs from different authors and they will all respect the same baseline structural prompt.
The major fine-tune lineage is more interesting than the LoRA count. Cyberrealistic Pony was one of the first major realism-focused fine-tunes, eventually splintering into the broader CyberRealistic XL line that runs in parallel with Pony base. AutismMix combined Pony with several other SDXL anime fine-tunes and became a popular generic anime base in its own right. Hassaku started as a Pony fine-tune and later migrated to Illustrious base when its author moved with the broader migration. The WAI series — multiple variants targeting different demographics and styles — built specifically on Pony's character-tag fluency. Babes by Stable Yogi and the RealMixPony line both targeted photorealistic women but with different demographic emphases.
What unites all of these is the score-tag prompt structure and the source-tag controls. A user who learns Pony prompting can move between any of these fine-tunes without relearning the syntax — only the negative prompt and the optional source-tag steering need adjustment. This is a meaningful reason Pony's ecosystem has stayed coherent even as individual fine-tunes have diverged.
Where Pony Stops and Illustrious Begins
Illustrious XL is the obvious comparison because it is the model that absorbed most of the post-Civitai migration. The two are both SDXL fine-tunes targeting illustrated content, but they differ in nearly every detail of how they were built and how you prompt them.
| Dimension | Pony Diffusion V6 XL | Illustrious XL v0.1 |
|---|---|---|
| Training set | ~2.6M, multi-domain (anthro, feral, human, anime, cartoon) | ~7.5M, anime/Danbooru focus |
| Prompt syntax | score_9, score_8_up, ... mandatory | Standard Danbooru tags, no score system |
| Anthro/furry | Native, strongest in class | Possible but mediocre |
| Pure anime | Strong with source_anime | Stronger by default |
| Human realism | Bleed-prone, fine-tunes fix it | Cleaner out of the box |
| Character knowledge | Broad but uneven | Deeper for anime-original characters |
| NSFW default |
Choose Pony when the work is anthro, furry, cartoon-anime hybrid, or anything that benefits from cross-domain style mixing — and when you have a specific Pony fine-tune that already does what you want. Choose Illustrious or its derivatives when the work is pure anime, when you want cleaner human anatomy without a fine-tune layer, or when you are starting a new project in 2026 and want to be on the substrate where most new LoRAs are being trained.
NoobAI XL is worth mentioning as an Illustrious-derived checkpoint that pushed the permissiveness further — its training data overlap with Illustrious is high, but the safety filtering in the captioning was looser, which makes it the closest Illustrious-side analog to Pony's default-uncensored stance.
Alternatives by Use Case
The Pony-vs-everything question collapses by use case. For photorealistic women in commercial-shoot or amateur-photo aesthetics, the dedicated realism fine-tunes outperform base Pony — CyberRealistic XL is the most stable choice and has its own separate development line. For pure anime work in 2026, Illustrious and its derivatives are simply better-tuned for the use case. For furry, anthro, and feral work, Pony remains dominant; Indigo Furry Mix and its variants are the only serious competitors and most of them are Pony-derived anyway. For high-fidelity photographic output that needs accurate text, complex scenes, or non-character compositions, FLUX.1 [dev] is a different architecture entirely and operates in a class above SDXL on those specific axes — at the cost of much higher VRAM, slower inference, and a more complex prompt-engineering surface.
The honest answer in 2026 is that no single model wins all categories, and most serious users keep three to five checkpoints on disk for different jobs. Pony stays on disk for anthro, hybrid styles, and anything that needs its specific cross-domain fluency. Illustrious or NoobAI for anime. A realism Pony fine-tune or CyberRealistic for human photography. FLUX for the jobs that need a different architecture entirely.
Frequently Asked Questions
Is Pony Diffusion safe to download?
The official distributions on Civitai, Hugging Face, and the PurpleSmartAI infrastructure are clean safetensors files and have been since release — safetensors format does not execute code on load, unlike older .ckpt pickle files. The only meaningful download risk is third-party mirrors that repackage the weights with bundled scripts; stick to the original sources and verify the file hash if you are cautious.
Why was Pony Diffusion banned from Civitai?
It was not banned. In mid-2024 Civitai changed how its NSFW front-page rankings surfaced models, which deprioritized Pony-base resources in the public feed and led to repeated delisting and relisting of specific derivatives. The base Pony Diffusion V6 XL model page remained accessible the entire time, and the model is still hosted there in 2026.
How do I make Pony less anthro or less furry?
Add source_pony, source_furry, anthro, furry to your negative prompt and add source_anime or source_cartoon to your positive prompt to steer the latent away from the anthro default. For human realism specifically, switch to a realism-focused Pony fine-tune like Babes by Stable Yogi, RealMixPony, or one of the WAI variants — these handle the steering at training time and produce cleaner results than negative-prompt fighting on base Pony.
What is score_9 and do I need it?
score_9 is the highest training-time quality bucket from Pony's dataset captioning, and the full stack score_9, score_8_up, score_7_up, score_6_up, score_5_up, score_4_up tells the model to draw from the intersection of every quality tier. Yes, you need it. Removing the score prefix produces visibly worse anatomy, line quality, and detail, and every Pony fine-tune inherits the same convention.
Is Pony better than Illustrious?
Different jobs. Pony is better for anthro, furry, cross-domain hybrid styles, and anywhere its 15,000-plus LoRA ecosystem already has what you need. Illustrious is better for pure anime, cleaner human anatomy without a fine-tune layer, and any new project in 2026 where being on the more actively developed substrate matters.
When will Pony V7 release?
V7 has been in long development with public progress updates from AstraliteHeart, including architecture experiments beyond SDXL base. As of early 2026 there is no public release date, and the team has consistently described it as ready when ready rather than committing to a timeline. V6 XL remains the current production version.


