PlatformNSFW

Hedy ships full offline meeting summaries with Qwen 3.5/3.6 on M4 Max

AI meeting app Hedy now runs full transcription, summaries, detailed notes, and chat entirely on-device using llama.cpp and Whisper, with Qwen models from 2B to 35B and no cloud fallback.

May 13, 2026

Hedy ships full offline meeting summaries with Qwen 3.5/3.6 on M4 Max

Hedy, an AI meeting assistant, shipped full offline meeting processing this week — transcription, summaries, detailed notes, and chat with the meeting all run on-device with Wi-Fi off. The app's founder demonstrated the end-to-end flow on an M4 Max with no network connection, using llama.cpp for inference and Qwen 3.5/3.6 models for language tasks. Speech recognition has always run locally via whisper.cpp and Parakeet; the new release extends that to the entire AI pipeline.

More in Platform

Watch: Hedy ships full offline meeting summaries with Qwen 3.5/3.6 on M4 Max

The app supports Qwen 3.6, Qwen 3.5, and Gemma 4 families out of the box, ranging from 2B parameters (runs on iPhone 15 Pro and later) through 9B as the recommended sweet spot for most laptops, up to 27B and 35B for users with more VRAM. Multiple quantization levels ship per model — Q4 and Q8 for the 9B Qwen, for example. Users can also download any compatible GGUF from HuggingFace and load it directly; the curated list is a starting point, not a constraint. Acceleration runs on Metal for Apple Silicon, Vulkan for Windows GPUs, and CPU fallback when needed. Mac unified memory means total system RAM is the bottleneck; Windows is VRAM-bound and the model picker warns when layers will spill to CPU. The picker also surfaces whether a model will be a "great fit," a "tight fit," or won't fit your hardware before download, and shows current memory footprint so you know your headroom.

Cloud inference is still faster and higher-quality for many use cases — the 27B+ parameter models roughly match Hedy's cloud models, but smaller local models trade speed and accuracy for privacy. Local is opt-in, and there's no silent cloud fallback; if local inference fails, the app surfaces an error. Mobile is restricted to the smallest models on iPhone 15 Pro and later plus M-series iPads. Older devices don't see the local toggle. Automatic Suggestions (which run inference frequently during a meeting) are heavy enough that the app prompts users to disable them during local sessions. Android and web versions are on the roadmap but not ready — hardware variation on Android is too wide to ship a consistent experience today. The next question is whether the 9B sweet spot holds as Qwen and Gemma iterate, or whether 14B-class models become the new default for laptops with 32GB+ RAM.