mm-ctx indexes images, video, audio, and PDFs for LLM agents in 60ms
A new command-line tool converts images, video, audio, and PDFs into LLM-readable context, with sub-100ms metadata queries and automatic file indexing.

mm-ctx is a multimodal file indexer that turns images, video, audio, and PDFs into text context for language model agents. The tool ships with a Rust core and Python wrapper, indexing files automatically on first use and exposing a Unix-style command interface for agents to query and describe media.
The tool handles image captioning, object detection, video keyframe summarization, PDF text extraction, and audio transcription through a single mm cat command. Agents can search files with mm find, count tokens for context budgets with mm wc, and run SQL queries against file metadata with mm sql. Metadata operations run in roughly 60 milliseconds across 700-file collections.