IndusAgent detects manufacturing defects zero-shot across five industrial benchmarks
A tool-augmented agentic framework combines multimodal LLMs with dynamic region cropping and reinforcement learning to spot subtle manufacturing anomalies without task-specific training.

IndusAgent is a tool-augmented framework for open-vocabulary industrial anomaly detection that addresses a persistent limitation in multimodal large language models: while these models excel at zero-shot visual reasoning, they struggle with the fine-grained structural analysis required to spot subtle manufacturing defects.
The system orchestrates external tools—dynamic region cropping, high-frequency feature enhancement, and prior retrieval—that the agent invokes only when visual ambiguities arise. Researchers built Indus-CoT, a structured dataset that pairs global visual observations with high-resolution local patches and expert normalcy priors. This dataset provides supervision for fine-tuning the model on rigorous industrial inspection trajectories, teaching it when and how to deploy its toolset. A gated reinforcement learning objective jointly optimizes anomaly classification, localization accuracy, anomaly type reasoning, and efficient tool usage, ensuring the agent doesn't waste compute on unnecessary tool calls.
Evaluation across five benchmarks
IndusAgent was tested on MVTec-AD, VisA, MPDD, DTD, and SDD—five standard industrial anomaly benchmarks. The framework achieved state-of-the-art zero-shot performance across all five, outperforming existing methods that rely on task-specific training or domain-aligned reasoning. The results validate the approach's robustness and generalization capacity in real-world manufacturing scenarios where labeled anomaly data is scarce or unavailable.
The preprint, authored by Rongbin Tan, Fangfang Lin, Zhenlong Yuan, Min Qiu, Kejin Cui, and Mengmeng Wang, details the gated reinforcement learning objective and the Indus-CoT dataset construction—both central to the framework's ability to disentangle subtle anomalies from normal variations in industrial imagery.