LTX 2.3 INT8 quantization halves inference time on RTX 3000-series GPUs
A community-built ComfyUI custom node delivers 2x faster video synthesis on Ampere GPUs through INT8 quantization, with weights and workflow now on HuggingFace.
LTX 2.3, Lightricks' open-weight video diffusion model, now runs twice as fast on Ampere-generation GPUs thanks to INT8 quantization. A ComfyUI user posted benchmarks this week showing inference time dropping from 118.77 seconds to 66.45 seconds on an RTX 3080 Ti—a 44 percent reduction—using quantized weights and a custom loader node.
The speedup targets older hardware. Ampere cards (RTX 3060 through 3090 Ti) lack the tensor-core optimizations that make newer Ada and Blackwell chips fast at FP16 by default. INT8 math on Ampere closes that gap by trading precision for throughput. The poster notes that RTX 5090 owners see no benefit; those cards already saturate memory bandwidth at half-precision.
What stands out
- 01Drop-in replacement loader — The custom node swaps the standard LTX model loader in ComfyUI. Everything downstream (sampler, VAE decode, output) stays identical.
- 02Quantized weights on HuggingFace — The INT8 checkpoint lives at
ovpresent/ltx-2.3-distilled-1.1-INT8. Quantization happens at load time in the custom node. - 03Ampere-only acceleration — The 2x speedup is specific to RTX 3000-series tensor cores. Turing (RTX 2000) and Pascal cards see minimal or no gain.
- 04GitHub custom node required — Installation pulls from
overpresentme/ComfyUI-ltx-int8-loader. The node is not yet part of the default ComfyUI manager registry.
