M2Retinexformer fuses depth and semantics for low-light image recovery

A new multi-modal framework extends Retinexformer by fusing depth maps, luminance priors, and semantic features through cross-attention, showing gains on four benchmarks.

May 13, 2026

M2Retinexformer fuses depth and semantics for low-light image recovery

M2Retinexformer, a low-light image enhancement framework from researchers at Cairo University and the German University in Cairo, extends the single-modality Retinexformer architecture by incorporating depth cues, luminance priors, and semantic features. The method addresses the core challenge of low-light enhancement—amplified noise, color distortion, and artifacts—by adding geometric context that remains stable across lighting conditions. Depth provides scene structure independent of illumination, while luminance and semantic features guide brightness distribution and scene understanding.

The architecture extracts these modalities at multiple scales and fuses them through cross-attention layers. An adaptive gating mechanism dynamically balances illumination-guided self-attention and cross-attention based on the reliability of auxiliary cues in each region of the image. This multi-modal approach represents a departure from the RGB-only methods that have dominated Retinex-based deep learning work, where single-channel intensity decomposition often struggles with severe degradation. By adding depth as a geometric prior—invariant to lighting changes—and semantic features that encode scene context, the model gains additional signal to distinguish true structure from noise.

Evaluations on LOL, SID, SMID, and SDSD benchmarks show improvements over the original Retinexformer and recent state-of-the-art methods. The LOL dataset focuses on paired low/normal-light images captured in controlled indoor settings, while SID and SMID test raw sensor data from Sony and smartphone cameras; SDSD includes static low-light scenes with ground-truth references. Across these varied conditions, the multi-modal fusion consistently outperformed single-modality baselines. Code and pretrained weights are available on GitHub.

More in Releases