TAP steers diffusion inpainting toward high-utility tabular rows, lifting accuracy 15.6 points
A new diffusion-based method couples learner-conditioned policy with inpainting to generate synthetic tabular data that directly improves downstream model performance under severe data scarcity.
TAP (Tabular Augmentation Policy), a method from researchers at the University of Tübingen and Sapienza University of Rome, addresses a persistent gap in synthetic tabular data generation: models that produce plausible-looking rows often fail to improve downstream classifiers or regressors. The authors formalize this as a fidelity-utility gap — common generative objectives optimize for distributional plausibility rather than reduction in held-out evaluation loss. TAP couples diffusion inpainting with a learner-conditioned policy that decides which rows to generate and when to inject them as training evolves, using explicit gating and a conservative windowed commitment mechanism to avoid poisoning the learner with low-utility samples.
Tested on seven real-world tabular datasets under severe data scarcity, TAP improved classification accuracy by up to 15.6 percentage points over strong generative baselines and reduced regression RMSE by up to 32 percent. The policy layer is lightweight — it conditions on the current state of the learner rather than requiring a separate large model — and the diffusion inpainting component fills missing or low-confidence feature values rather than generating entire rows from scratch. The method is designed for domains where labeled data is expensive and augmentation must directly serve the training objective rather than match summary statistics.
