HRM-Text 1B outperforms Llama 3.2 3B on reasoning with 40B tokens and $1K budget
Sapient Intelligence released HRM-Text 1B, a 1-billion-parameter model trained on 40 billion tokens for roughly $1,000 that outperforms larger models on multi-step reasoning benchmarks.
Sapient Intelligence released HRM-Text 1B on May 19, a 1-billion-parameter hierarchical reasoning model trained from scratch in 1.9 days on 16 GPUs. The company reports using roughly 1/1000th the training data of comparable models — 40 billion unique tokens versus the trillions typical of modern small language models — at a cost of around $1,000.
The benchmark results show a sharp tradeoff: HRM-Text 1B excels at multi-step reasoning but lags on knowledge recall. On MATH, it scores 56.2 versus Llama 3.2 3B's 48.0 and GPT-3.5's 34.1. On DROP (reading comprehension), it reaches 82.2 compared to Llama 3.2 3B's 45.2. But on MMLU, a knowledge-heavy benchmark, it scores only 60.7 — behind Qwen 3.5 2B's 64.7 and Olmo 3 7B's 65.8. The MMLU gap validates the tradeoff: 40 billion tokens isn't enough to pack in broad world knowledge, suggesting the reasoning gains are real rather than an artifact of test-set contamination.
Weights are available on HuggingFace (sapientinc/HRM-Text-1B) and code on GitHub. The benchmark numbers are self-reported; independent verification is pending.
