Loading…

LTX-2.3 inference drops from 300s to 45s on RTX 3080Ti with INT8 quantization | UncensoredHub

CommunityNSFWPlatform

LTX-2.3 inference drops from 300s to 45s on RTX 3080Ti with INT8 quantization

A developer building a video generation app cut LTX-2.3 generation time from five minutes to 45 seconds on an RTX 3080Ti through resolution tweaks, step reduction, and INT8 quantization.

May 12, 2026

LTX-2.3 inference drops from 300s to 45s on RTX 3080Ti with INT8 quantization

A developer running LTX-2.3 for a commercial video generation app has documented a path from 300-second generation times down to 45 seconds on an RTX 3080Ti. The optimization work, shared on May 12, centers on ComfyUI backend workflows and reveals INT8 quantization as the single biggest performance gain on Ampere hardware.

The setup pairs an RTX 5090 for LoRA training with a 3080Ti server for inference. Training runs on musubi-tuner, which the developer credits for clean FP8 and NF4 VRAM optimization. The inference bottleneck was the target.

The speed ladder

Dropping resolution from 1080×1920 to 720×1280 cut generation time from 300 seconds to 120 seconds. Lowering the spatial upscaler from 2× to 1.5× brought it to 80 seconds, though the developer warns that stacking both resolution and upscaler cuts degrades quality noticeably.

LTX-2.3 runs in two stages: base generation and upsampling. Stage 2 defaults to three steps with a sigma schedule of [0.85, 0.7250, 0.4219, 0.0]. Trimming that to two steps—[0.85, 0.4219, 0.0]—yields a proportional speedup with acceptable quality loss. Sage Attention showed no improvement on the Ampere-based 3080Ti, which runs standard Triton logic rather than Sage-specific paths.

INT8 quantization delivered the steepest drop: 80 seconds to 45 seconds. The 3080Ti handles INT8 better than NVFP4 in this workflow. GGUF models, while viable for zero-offload setups, ran Stage 1 in 40 seconds versus 29 seconds for INT8 with VRAM offloading. The developer notes that INT8 models and ComfyUI nodes for LTX-2.3 v1.1 remain sparse, prompting custom node development.

The full workflow is production-deployed. RTX 50-series users may see different results with Sage Attention, and the developer invites testing on newer architectures.

The speed ladder

More in Community