MODIAD framework cuts edge-device training overhead for multimodal factory anomaly detection
New arXiv preprint proposes distributed scheduling and low-rank adaptation to coordinate multi-class anomaly detection across heterogeneous industrial sensors in real time.

MODIAD is a framework addressing multimodal industrial anomaly detection in distributed, continuously streaming environments. Posted to arXiv on May 26, 2026, the preprint tackles the gap between centralized offline methods and the reality of modern factories where heterogeneous sensors generate data at edge devices capable of local model training. The authors formulate a Multi-class Intelligent Scheduling problem that balances data sufficiency against class update frequency, then solve it with a Sequential Marginal Gain Greedy algorithm to coordinate cross-class model updates under resource constraints.
Industrial anomaly detection has historically relied on centralized systems that process batches of unimodal sensor data offline. As factories deploy diverse sensor arrays—vision, thermal, acoustic, vibration—the detection task becomes multimodal, and the volume of streaming data makes centralized processing impractical. Edge intelligence offers a way forward: modern edge devices can train models locally and share updates across the network, enabling collaborative detection without shipping raw data to a central server. MODIAD is designed for this architecture, coordinating when and how each device updates its portion of a shared anomaly classifier. The core contribution is the Multi-class Intelligent Scheduling problem, which the paper frames as a resource allocation challenge. In a factory with multiple product classes and limited edge compute, the scheduler must decide which class to train on next, weighing the freshness of available data against the staleness of the current model for that class. The Sequential Marginal Gain Greedy algorithm approximates an optimal schedule by iteratively selecting the class update that yields the highest marginal improvement in detection performance per unit of resource consumed. To reduce the cost of each update, the authors introduce REC-LoRA—Resource Efficient Class-Wise Low Rank Adaptation. Low-rank adaptation techniques, popularized in large language model fine-tuning, decompose weight updates into smaller parameter sets. REC-LoRA applies this principle class-wise, so each edge device trains only a compact adapter for its assigned class rather than full model weights. Evaluation on MVTec 3D-AD and Eyecandies, two public multimodal industrial anomaly detection benchmarks, demonstrates superior performance and efficiency compared to baseline methods in the distributed online scenario.

