Club-5060ti repo shares tested Qwen3.6 configs for dual RTX 5060 Ti setups

A GitHub repository collects tested configurations for running Qwen3.6 27B and 35B models on dual RTX 5060 Ti 16GB cards, with exact vLLM and llama.cpp settings for practitioners.

May 15, 2026

Club-5060ti repo shares tested Qwen3.6 configs for dual RTX 5060 Ti setups

A practitioner has published a configuration repository documenting what actually works on dual RTX 5060 Ti 16GB cards, moving past theoretical performance claims to exact, reproducible setups for local LLM deployments.

The club-5060ti repo on GitHub follows the format of the earlier club-3090 project but focuses on the 5060 Ti's 16GB VRAM ceiling. The seed setup runs on Linux with two cards and includes tested configurations for vLLM serving Qwen3.6 27B in NVFP4/MTP format, llama.cpp serving the same model in Q4 and Q6 GGUF quantizations, and initial checks on Qwen3.6 35B in A3B format. Context length presets range from a conservative 65536-token llama.cpp router config with extra headroom to a 204800-token direct long-context preset for Q6 weights.

The repository ships with model download scripts, llama.cpp update helpers, OpenAI-compatible smoke tests, and CSV templates for logging benchmark results. Each configuration lists exact versions, KV cache settings, and caveats. The maintainer is soliciting pull requests from other 5060 Ti users with reproducible detail — exact command lines, context lengths, and quantization formats rather than vague throughput numbers.

The 5060 Ti's 16GB VRAM sits between the 12GB 4070 and 24GB 4090, making it a practical floor for running 27B-class models locally without offloading layers to system RAM. The repo's focus on Qwen3.6 reflects the model's popularity among local practitioners; the 27B variant fits comfortably in Q6 at moderate context lengths, while the 35B pushes the dual-card setup's limits even at lower quantizations.

More in Community