Dual RTX 3090 hits 113 tokens/sec on Qwen 27B with native Ubuntu

A LocalLLaMA user reports near-Sonnet performance on consumer hardware after switching from WSL2 to native Ubuntu, with 4000 tokens/sec prompt processing and 113 tokens/sec generation on 48GB VRAM.

May 15, 2026

Dual RTX 3090 hits 113 tokens/sec on Qwen 27B with native Ubuntu

A dual RTX 3090 rig is delivering Claude Sonnet-level performance on open-weight models after switching from WSL2 to native Ubuntu. Running Qwen 3.6 27B with a 262k context window on 48GB of VRAM, the system now generates at 113 tokens per second with 4000 tokens per second prompt processing—fast enough for real-time code review and live SSH session handling.

Under WSL2, the same hardware managed only 30 tokens per second generation and 400 tokens per second prompt processing. The jump to bare-metal Linux brought better GPU scheduling and memory management; the cards run without NVLink, a bridge that could push speeds higher still. The setup relies on club-3090, a multi-GPU inference tool patched to fix SSE session drops and tool-calling bugs.

For now, two 3090s and 48GB of VRAM are enough to run a 27-billion-parameter model faster than most cloud APIs.

ByAlex Sokoloff·AI enthusiast·MSc Computer Science

More in Community