B-spline decoupling cuts transformer parameters while preserving accuracy
New arXiv preprint introduces R-CMTF-BSD, a B-spline-based compression method that reduces Vision Transformer and Swin Transformer parameter counts through structured approximation without major accuracy loss.
Researchers have published a new compression technique for transformer models that swaps polynomial and piecewise-linear functions for B-splines—basis functions that offer better numerical stability and expressiveness. The method, called R-CMTF-BSD (Robust Constrained Matrix-Tensor Factorization with B-Spline Decoupling), decomposes transformer weights into linear transformations and univariate nonlinear components, then parameterizes those components using B-splines instead of the polynomials or piecewise-linear approximations used in prior work.
Decoupling methods treat multivariate functions as compositions of simpler pieces—effectively a single-hidden-layer network with flexible activations. Earlier tensor-based compression schemes relied on polynomial or piecewise-linear parameterizations for the nonlinear parts, but those choices can suffer from numerical instability or limited expressiveness. B-splines, with their local support and adjustable smoothness through knot placement, address both problems.
The authors tested R-CMTF-BSD on Vision Transformer and Swin Transformer architectures, reporting "substantial parameter reduction while maintaining competitive accuracy." The algorithm incorporates Tikhonov regularization and normalization steps in an alternating least-squares loop to improve robustness during training.
What stands out
- 01Generalization of existing methods. The B-spline formulation subsumes polynomial and piecewise-linear decoupling as special cases, making it a drop-in replacement for earlier tensor factorization schemes.
- 02Regularization built in. Tikhonov regularization and normalization steps in the alternating least-squares loop improve numerical robustness during training.
- 03Transformer-specific validation. Experiments on Vision Transformer and Swin Transformer—two widely deployed vision architectures—show the method can compress parameter counts without catastrophic accuracy drops, though the preprint does not publish exact percentage reductions or final top-1 scores.
