Hexllama desktop app cuts llama.cpp CLI overhead with one-click templates
Open-source desktop wrapper eliminates command-line flag memorization, adds template manager, built-in version control, and direct HuggingFace GGUF downloads.
Hexllama is a desktop wrapper for llama.cpp that replaces command-line flag juggling with saved templates and a built-in version manager. Released this week under MIT license, the tool targets practitioners who run llama-server locally but want to skip the bash-script overhead when switching models or testing new architectures.
The app's core feature is template-based execution: configure threads, context size, batch settings, and other CLI flags once in a visual editor, save the profile, then launch any model with one click. A built-in llama.cpp version manager polls the ggml-org repository for new releases, downloads them directly, and swaps backends on demand—useful when a new model architecture requires a specific build. The integrated HuggingFace downloader lets users search and pull GGUF files without leaving the interface; when a download completes, Hexllama auto-generates a baseline template from the model's parameter metadata.
Multi-model and API modes
Hexllama supports running multiple models simultaneously on different ports. Users can launch each instance in "Chat UI" mode, which opens llama.cpp's native web interface, or "API Only" mode to serve silently in the background for tools like SillyTavern or OpenWebUI. The multi-tab design eliminates the need to manage separate terminal windows when testing two models side by side.
The project is available now at andercoder.com/hexllama as pre-compiled releases for Windows, macOS, and Linux, or buildable from source. The code is open under MIT license.
