Caveman plugin cuts LLM token usage up to 75% by stripping conversational fluff
A GitHub project that forces language models to respond in terse, telegram-style prose has reached 80,000 stars, with early adopters reporting token savings of up to 75 percent on coding and query tasks.

A new open-source plugin is teaching language models to talk like cavemen—and practitioners say it can cut token bills by three-quarters without sacrificing answer quality.
Caveman, a GitHub project from Julius Brussee, strips out greetings, politeness markers, verbose transitions, and beginner-level definitions, leaving only the core information in compressed, telegram-style prose. The tool started as a custom skill for Claude Code but has since expanded into a standalone plugin and a full coding agent called Caveman Code. The repository has climbed to nearly 80,000 stars.
Brussee offers multiple compression modes—from light trimming to ultra-terse output—and claims the approach saves up to 75 percent of tokens without degrading technical accuracy. He reports that engineers at OpenAI, Nvidia, and GitHub are using the skill in production, and that French industrial giant Legrand is recommending Caveman internally to keep teams within token quotas.
OpenAI CTO Shane Sweeney contributed code to the project, adding native support for Codex. The move signals institutional interest in token-efficiency tooling as inference costs remain a friction point for enterprise AI deployments.


