SIQ-1-35B beats Qwen-350B with PPO-trained 35B fine-tune
AlexWortega released SIQ-1-35B, a PPO-trained Qwen-35B fine-tune that outperforms models 10× its size on parameter-golf benchmarks and generates research ideas comparable to Claude Opus.
SIQ-1-35B is a 35-billion-parameter fine-tune of Qwen-35B A3, trained with proximal policy optimization (PPO) and released this week on HuggingFace. The model outperforms GLM-5.2 and Qwen-350B on Karpathy's AutoResearch parameter-golf benchmark—a test that measures reasoning capability while minimizing parameter count. The developer reports this is the first PPO training run they've observed that delivered consistent improvements, attributed to a verifiable reward model rather than human preference signals.
On an internal "bullshit benchmark" used to measure output quality under adversarial prompting, SIQ-1-35B scores higher than NEX and GPT-5.5. The model also generates research ideas comparable in quality to Anthropic's Claude Opus, a closed frontier model typically reserved for complex reasoning tasks.
Weights are available in both full precision and GGUF quantized formats on HuggingFace. A free demo runs on HuggingFace Spaces with ZeroGPU backend, allowing users to test the model without API costs or local setup.
What stands out
- 01PPO training that actually works. The developer emphasizes this is the first PPO run they've seen produce consistent improvements with a verifiable reward model rather than human preference signals.
- 02Beats models 10× its size. SIQ-1-35B outperforms Qwen-350B (a 350-billion-parameter model) and GLM-5.2 on parameter-golf tasks, where the goal is to maximize capability while minimizing parameter count.
- 03Opus-class idea generation. The model reportedly generates research ideas similar in quality to Anthropic's Claude Opus, a closed frontier model typically used for complex reasoning tasks.
- 04Open weights and GGUF quantizations. Full weights and GGUF formats are available on HuggingFace, making the model accessible for local inference on consumer hardware.




