SAGA transformer cuts 30-year earnings forecast error by 37.7 percent

Decoder-only transformer trained on 61 million person-years of Swedish tax records beats parametric baselines and ships conformal prediction intervals with finite-sample coverage guarantees.

May 17, 2026

SAGA transformer cuts 30-year earnings forecast error by 37.7 percent

"Microsimulation models that central banks and finance ministries use to project lifetime earnings typically rely on parametric processes that capture only first and second moments of the conditional distribution, missing long-range nonlinear structure," researchers note in a new preprint.

Lundström-Imanov and Cömert propose SAGA, a decoder-only transformer for irregular tabular panel sequences paired with a split conformal calibration wrapper. The model delivers individual-level prediction intervals with finite-sample marginal coverage guarantees—a critical feature for policy work where confidence bounds matter as much as point estimates. Trained on the longitudinal Swedish LISA register spanning 1990 to 2022, SAGA ingested data from 2,143,817 individuals across 61,284,903 person-years. It forecasts annual labor earnings at horizons of one to thirty years and aggregates them by Monte Carlo into present-discounted lifetime earnings distributions.

On the ten-year horizon, SAGA reduces continuous ranked probability score by 31.9 percent versus the canonical Guvenen, Karahan, Ozkan, and Song parametric process and tabular and recurrent baselines. At twenty years, mean absolute error drops 37.7 percent. Conformal intervals achieve nominal coverage to within 0.4 percentage points marginally and within 2.4 percentage points on the worst-case demographic subgroup—a sign the model generalizes fairly across population segments. The reconstructed lifetime earnings Gini coefficient is 0.327 against the partially observed truth of 0.341 and the GKOS estimate of 0.378.

Model weights, calibration tables, and a synthetic equivalent dataset are released for replication outside the protected SCB MONA environment. The preprint appeared on arXiv on May 20, 2026.

More in Releases