GPT-5.5 uses 12% more tokens than GPT-5.4 on Codex benchmarks
An Artificial Analysis benchmark shows GPT-5.5 consuming 2.8 million tokens per Codex task versus 2.5 million for GPT-5.4, contradicting OpenAI's efficiency claims.

GPT-5.5 is consuming more tokens than GPT-5.4 on coding benchmarks, according to an Artificial Analysis chart comparing the models on Codex tasks. GPT-5.5 averaged around 2.8 million tokens per task, while GPT-5.4 used roughly 2.5 million under identical conditions — a 12 percent increase that contradicts OpenAI's positioning of the newer model as more cost-efficient.
The gap could reflect longer outputs or additional internal reasoning steps required by GPT-5.5 to solve the same programming challenges. Developers have questioned whether real-world pricing — which factors in cached tokens and tiered rate cards — still delivers net savings, or whether efficiency gains apply only outside code synthesis. OpenAI has emphasized that prompt caching behavior can reduce actual costs even when raw token counts rise, a distinction that may explain the discrepancy.
Anthropic's Opus 4.7 showed significantly lower token usage on the same Codex tasks, though that comparison spans different architectures. Cursor's platform also paired strong benchmark performance with lower token usage, attributed to the editor's context-pruning and caching strategies.