Automated MoE search on RTX 4090 identifies ShuffleNet-MobileNetV3 as top pairing
A 28-day automated search of 4-expert Mixture-of-Experts architectures evaluated 1,021 candidates and identified ShuffleNet-MobileNetV3 ensembles as the highest-performing combinations, reaching 0.632 mean accuracy.

A new arXiv preprint describes an automated pipeline that generated and evaluated 4,463 candidate Mixture-of-Experts architectures over 28 days on a single NVIDIA RTX 4090. The pipeline, part of the open-source NNGPT project, systematically combines base architecture families from the LEMUR neural network dataset into 4-expert ensembles, each governed by a convolutional gating network with temperature scaling and mixup augmentation. Of the 4,463 candidates generated across 197 batches, 1,021 were successfully evaluated.
The highest-accuracy ensembles paired ShuffleNet and MobileNetV3, reaching mean accuracy up to 0.632. These two families consistently co-produced the top-performing combinations when assembled into heterogeneous 4-expert configurations, while FractalNet and MNASNet emerged as low-yield families warranting exclusion from future campaigns. The work also uncovered a critical bias in the search space: due to alphabetical enumeration via Python's itertools.combinations, the entire explored set anchored to a single family, AirNet, covering only 4.8% of the theoretical 23,751 possible 4-family combinations. The authors traced this to the generator's deterministic ordering and propose a stratified random sampling fix. The corrected generator, analysis artefacts, and pipeline are available at https://github.com/ABrain-One/nn-gpt.



