ZenCreator

Pro-grade AI content creation. Image, video, face-swap, lipsync, and upscaling behind one API.

14 tools

Up to 4K

4.4(288)

Visit

Loading…

Research

Automated MoE search on RTX 4090 identifies ShuffleNet-MobileNetV3 as top pairing

A 28-day automated search of 4-expert Mixture-of-Experts architectures evaluated 1,021 candidates and identified ShuffleNet-MobileNetV3 ensembles as the highest-performing combinations, reaching 0.632 mean accuracy.

ByAlex Sokoloff·June 24, 2026

Automated MoE search on RTX 4090 identifies ShuffleNet-MobileNetV3 as top pairing

A new arXiv preprint describes an automated pipeline that generated and evaluated 4,463 candidate Mixture-of-Experts architectures over 28 days on a single NVIDIA RTX 4090. The pipeline, part of the open-source NNGPT project, systematically combines base architecture families from the LEMUR neural network dataset into 4-expert ensembles, each governed by a convolutional gating network with temperature scaling and mixup augmentation. Of the 4,463 candidates generated across 197 batches, 1,021 were successfully evaluated.

The highest-accuracy ensembles paired ShuffleNet and MobileNetV3, reaching mean accuracy up to 0.632. These two families consistently co-produced the top-performing combinations when assembled into heterogeneous 4-expert configurations, while FractalNet and MNASNet emerged as low-yield families warranting exclusion from future campaigns. The work also uncovered a critical bias in the search space: due to alphabetical enumeration via Python's itertools.combinations, the entire explored set anchored to a single family, AirNet, covering only 4.8% of the theoretical 23,751 possible 4-family combinations. The authors traced this to the generator's deterministic ordering and propose a stratified random sampling fix. The corrected generator, analysis artefacts, and pipeline are available at https://github.com/ABrain-One/nn-gpt.

ZenCreator

Automated MoE search on RTX 4090 identifies ShuffleNet-MobileNetV3 as top pairing

More in Research

Five uncensored Qwen3.6-35B fine-tunes surface on HuggingFace in 24 hours

NormGuard preserves image quality in flow-model RL fine-tuning by capping velocity inflation

PP-OCRv6 scales from 1.5M to 34.5M parameters across 50 languages

OpenAI previews GPT-5.6-sol reasoning model for Pro and Enterprise users

OpenAI previews GPT-5.6 Sol with stronger coding and cybersecurity