SENSE cuts urban energy prediction error by 11% using diffusion-generated satellite data
Researchers released SENSE, a diffusion framework that jointly synthesizes satellite imagery and building energy maps, reducing prediction error by 3–11% NMBE across four cities while generating synthetic training data from minimal labeled examples.

SENSE is a generative urban building energy modeling framework that synthesizes satellite imagery alongside aligned building energy consumption and height maps. The model, detailed in a preprint released this week, conditions on road networks and urban density metrics to generate physically consistent energy annotations in latent space. Experiments across New York City, Boston, Lyon, and Busan show the approach meets ASHRAE Standard 14 requirements while producing visually realistic outputs.
The framework addresses a critical gap in urban energy modeling: most existing methods are predictive rather than generative, and aligned high-resolution energy data paired with satellite imagery remains scarce. SENSE leverages knowledge from large vision models to generate energy consumption and building height information simultaneously. The authors report that synthetic data generated from less than 20% labeled energy data boosts downstream prediction performance by 10% IoU—a significant gain from minimal labeled input.
On benchmarks
Compared to state-of-the-art urban energy prediction methods, SENSE reduced normalized mean bias error by 3–11% and coefficient of variation of root mean square error by 1–9% across the four test cities. Visual fidelity and physical consistency held across diverse urban contexts, from dense North American grids to European and East Asian layouts.
The dataset and code are available on HuggingFace and GitHub. The work targets applications in energy-efficient urban planning and aligns with UN Sustainable Development Goals 7 (affordable and clean energy) and 11 (sustainable cities and communities).