Queryable LoRA routes parameter updates through shared attention memory

A new preprint proposes replacing static LoRA adapters with a shared queryable memory that routes input-dependent combinations of low-rank update atoms via attention, improving fine-tuning stability and test performance while keeping parameter counts comparable.

May 12, 2026

Queryable LoRA routes parameter updates through shared attention memory

Researchers at the University of Texas at Austin and Harvard have published a preprint describing Queryable LoRA, a parameter-efficient fine-tuning method that replaces fixed layer-local LoRA adapters with a shared memory of low-rank update atoms. Posted to arXiv on May 12, 2026, the paper describes a system in which each block of layers forms a query from its current low-rank state and a running summary of previous blocks, then uses attention to retrieve a content-dependent combination of shared update components. The resulting routed operator is applied within the low-rank bottleneck, allowing the effective update to vary across inputs while sharing reusable structure across layers.

Standard LoRA methods restrict each layer update to a fixed low-rank form—a static parameterization that becomes rigid when the appropriate correction depends on the input and the evolving depth-wise computation of the network. Queryable LoRA addresses this by maintaining a shared pool of update atoms and routing to them dynamically. The authors also incorporate instruction-regularization, augmenting routing logits with a language-induced prior over update atoms to bias the selection of low-rank transformations toward semantically relevant directions without generating unconstrained parameter updates. This sits between static LoRA-style updates and fully generated parameter updates, retaining the efficiency and scalability of low-rank adaptation while supporting dynamic, context-sensitive adaptation.

Experiments on noisy non-linear regression tasks and LLM fine-tuning show that the queryable update-memory formulation improves final test performance and training stability compared to standard low-rank adaptation, while using a comparable number of trainable parameters. The preprint does not yet include large-scale LLM benchmarks or ablation studies isolating the contribution of instruction-regularization from the routing mechanism itself. Practitioners will want to see how the method scales to 70B+ parameter models and whether the shared memory overhead remains negligible when the number of update atoms grows—details that a reference implementation and reproducibility study should clarify.

More in Releases