ResearchPlatform

Mind Lab scales LoRA to millions of users on trillion-parameter models

Researchers propose infrastructure to cache millions of personal adapters on a single trillion-parameter base model, stabilize ultra-compact LoRA training under reinforcement learning, and demonstrate emergent collective intelligence across adapter populations.

ByAlex Sokoloff·June 8, 2026

Mind Lab scales LoRA to millions of users on trillion-parameter models

Mind Lab's new arXiv preprint, On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters, tackles the economics of personalization at scale. The paper describes a system that treats parameter-efficient fine-tuning—LoRA and similar adapter methods—not as a cost-saving trick but as the foundation for serving millions of users with deeply customized AI. The authors introduce mathematical stabilization techniques that let ultra-compact adapters train reliably under reinforcement learning, design a caching architecture that hosts millions of those adapters on a single trillion-parameter base model, and show that populations of diverse adapters exhibit emergent collective intelligence—each adapter specializing while the ensemble grows smarter together.

Training and storing a full trillion-parameter model per user is economically impossible; even inference costs would spiral. Mind Lab's approach splits the problem: a shared "biological" base model carries the general knowledge, while tiny, continuously updated personal adapters capture individual preferences and behavior. The paper argues this division unlocks large-scale user simulation and collective AI systems that were previously out of reach. The mathematical work focuses on keeping those tiny adapters stable during reinforcement learning, a regime where standard LoRA training can collapse when rank is pushed too low.

The preprint includes no code release and no public model weights, so replication will wait on implementation details the paper leaves implicit. The caching architecture is described at a high level—sharding strategies, memory hierarchies, adapter routing—but the actual serving layer remains a blueprint. The emergent-intelligence claim rests on experiments with synthetic user populations; real-world deployment at million-user scale will test whether the collective effects hold under production load and adversarial drift.

If the next wave of foundation models hits the ten-trillion-parameter mark, this infrastructure becomes the only economically viable path to personalization. The key question is whether Mind Lab or a competitor ships an open reference implementation of the caching system, and whether the stabilization math generalizes beyond the specific RL setup they tested. Adapter populations under continuous update churn may degrade in ways the paper's synthetic experiments don't capture.

ZenCreator

Mind Lab scales LoRA to millions of users on trillion-parameter models

More in Research

Staleness-Adaptive Trust Region cuts asynchronous RL performance loss to 3% at 8× policy lag

Distilled RL transfers knowledge across model families without unconditional imitation

Qwen-Music generates full vocal songs from text and lyrics

LongStraw trains RL models at 2.1M tokens on eight H20 GPUs

ShortOPD cuts pruned LLM recovery time by 75% while raising generation quality 9×