Queryable LoRA routes parameter updates through shared attention memory
A new preprint proposes replacing static LoRA adapters with a shared queryable memory that routes input-dependent combinations of low-rank update atoms via attention, improving fine-tuning stability and test performance while keeping parameter counts comparable.

Researchers at the University of Texas at Austin and Harvard have published a preprint describing Queryable LoRA, a parameter-efficient fine-tuning method that replaces fixed layer-local LoRA adapters with a shared memory of low-rank update atoms. Posted to arXiv on May 12, 2026, the paper describes a system in which each block of layers forms a query from its current low-rank state and a running summary of previous blocks, then uses attention to retrieve a content-dependent combination of shared update components. The resulting routed operator is applied within the low-rank bottleneck, allowing the effective update to vary across inputs while sharing reusable structure across layers.
Standard LoRA methods restrict each layer update to a fixed low-rank form—a static parameterization that becomes rigid when the appropriate correction depends on the input and the evolving depth-wise computation of the network. Queryable LoRA addresses this by maintaining a shared pool of update atoms and routing to them dynamically. The authors also incorporate instruction-regularization, augmenting routing logits with a language-induced prior over update atoms to bias the selection of low-rank transformations toward semantically relevant directions without generating unconstrained parameter updates. This sits between static LoRA-style updates and fully generated parameter updates, retaining the efficiency and scalability of low-rank adaptation while supporting dynamic, context-sensitive adaptation.
Experiments on noisy non-linear regression tasks and LLM fine-tuning show that the queryable update-memory formulation improves final test performance and training stability compared to standard low-rank adaptation, while using a comparable number of trainable parameters. The preprint does not yet include large-scale LLM benchmarks or ablation studies isolating the contribution of instruction-regularization from the routing mechanism itself. Practitioners will want to see how the method scales to 70B+ parameter models and whether the shared memory overhead remains negligible when the number of update atoms grows—details that a reference implementation and reproducibility study should clarify.