INT8 ConvRot beats MXFP8 on Anima with older GPU hardware
A week-long quantization benchmark shows INT8 ConvRot delivers lower latent error than MXFP8 while running on RTX 20-series cards from 2018.
A ComfyUI user benchmarked quantization methods for Stable Diffusion Anima, comparing INT8 variants against the recently hyped MXFP8 format. The tests captured conditional and unconditional latents at every inference step across 100 samples at 1-megapixel resolution, measuring signal-to-noise ratio, cosine similarity, and relative root-mean-square error against a BF16 baseline. INT8 ConvRot—a rotation-based outlier removal technique from a December 2024 preprint—delivered the lowest relative error at 0.09032, beating MXFP8 (0.10847), standard FP8 (0.12145), and row-wise INT8 (0.13396). GGUF Q8 edged ahead overall, but INT8 variants run on Tensor Cores in RTX 20-series cards and newer, while MXFP8 requires Blackwell-generation hardware.
The researcher released ComfyUI-INT8-Fast, a custom node implementing row-wise and ConvRot quantization on the fly. Pre-quantized INT8 checkpoints from Bedovyy and silveroxides were also tested; Bedovyy's row-wise Anima weights logged 0.13396 relative error, and silveroxides' tensor-wise learned variant came in at 0.14721. All tests ran on an RTX 3090 with the latest ComfyUI build. INT8 ConvRot stays within 10 percent relative error on three-year-old consumer hardware while outperforming MXFP8.
