CapVector extracts finetuning gains as reusable parameter vectors
Researchers decouple auxiliary training objectives into transferable capability vectors that enhance vision-language-action models without the computational cost of multi-loss finetuning.

CapVector is a parameter-space method that addresses a core inefficiency in vision-language-action model adaptation: pretrained VLA models often fail to improve efficiently during standard supervised finetuning, while auxiliary-objective methods that do work well carry significant computational overhead from extra loss terms. The technique trains a model to convergence on a small task set using two distinct strategies, producing two finetuned checkpoints. The parameter difference between those checkpoints is extracted as a capability vector—a representation of what auxiliary objectives contribute—and merged with the original pretrained weights to form a capability-enhanced meta model. When standard SFT on this merged model is augmented with a lightweight orthogonal regularization loss, it matches the performance of full auxiliary-objective baselines at lower computational cost.
Internal and external experiments show the capability vectors generalize across diverse model architectures and transfer to novel environments and embodiments without retraining. The approach decouples the dual goals of auxiliary training—enhancing general capabilities and fitting task-specific action distributions—into separate parameter-space operations, preserving the simplicity of standard SFT while capturing the performance gains of more complex methods. The preprint was posted May 12, 2026.