ZenCreator

Pro-grade AI content creation. Image, video, face-swap, lipsync, and upscaling behind one API.

14 tools

Up to 4K

4.4(288)

Visit

Loading…

Qwen 4B agent fine-tune scores 10% on Terminal Bench 2, runs free on HuggingFace Spaces | UncensoredHub

ReleasesPlatform

Qwen 4B agent fine-tune scores 10% on Terminal Bench 2, runs free on HuggingFace Spaces

A new agent-tuned Qwen 4B checkpoint runs on HuggingFace Spaces via ZeroGPU, writing small projects and scoring 10 percent on Terminal Bench 2.

ByAlex Sokoloff·May 28, 2026

Qwen 4B agent fine-tune scores 10% on Terminal Bench 2, runs free on HuggingFace Spaces

A fine-tuned Qwen 4B checkpoint optimized for agent tasks is now live on HuggingFace Spaces, running on ZeroGPU infrastructure and capable of writing functional small projects. The model scores 10 percent on Terminal Bench 2, a benchmark that measures a language model's ability to generate correct terminal commands and code snippets. It's tuned on Pi Agent and Hermes Agent datasets, giving it enough capability to handle iterative coding workflows inside the HuggingFace environment.

The checkpoint uses Pi Agent as its scaffolding framework and deploys via ZeroGPU, HuggingFace's shared GPU service that spins up compute on demand. That setup makes the model accessible without local hardware—users can test it directly in the browser through the HuggingFace Space. One demo shows the model generating a working Tetris game as a web app, illustrating the kind of self-contained project it can produce when prompted with a task description.

Terminal Bench 2 scores typically range from single digits to low teens for models in the 3B-7B range, so the 10 percent mark places this fine-tune in the viable-for-simple-tasks tier. The benchmark tests both command accuracy and the model's ability to chain multiple steps in a terminal session, making it a stricter eval than single-turn code generation. The 4B parameter size keeps inference fast enough for real-time interaction on shared infrastructure.

ZenCreator

Qwen 4B agent fine-tune scores 10% on Terminal Bench 2, runs free on HuggingFace Spaces

More in Releases

Apple accuses OpenAI of soliciting hardware prototypes in job interviews

Lightweight proxy models cut LLM post-training costs while enabling cross-model signal reuse

Colibri runs 744B GLM-5.2 on 25GB RAM by streaming experts from disk

Anthropic extends Fable 5 preview a second week, bumps rate limits 50%

Soofi S 30B activates 3B parameters per token, tops European AI baselines