Synthetic layered design data outperforms proprietary datasets at 50K samples
A new HKUST paper shows that purely synthetic layered graphic-design datasets can train decomposition models more effectively than scarce proprietary alternatives, with performance gains plateauing around 50,000 samples.

Researchers at Hong Kong University of Science and Technology have published a data-centric study demonstrating that synthetic layered design data can outperform proprietary training sets for graphic decomposition tasks. The team, led by Kam Man Wu and colleagues, built SynLayers—a fully synthetic dataset of layered graphic elements—and trained models on it using the CLD baseline framework. The core finding: models trained exclusively on synthetic data beat those trained on PrismLayersPro, a widely used but non-scalable proprietary dataset, while offering unlimited generation capacity.
The research reports three concrete results. First, synthetic data alone is viable—no real proprietary assets required. Second, performance scales predictably with dataset size, improving steadily until around 50,000 training samples, after which gains flatten. Third, synthetic generation sidesteps the layer-count imbalance endemic to real-world design datasets, where certain layer configurations dominate and others barely appear. The team automated much of the pipeline: vision-language models generated textual supervision for each layer, and VLM-predicted bounding boxes fed directly into inference, reducing manual annotation overhead.
The assumption underpinning the work is that graphic design decomposition differs from natural-image layer separation. Design elements are typically modular and semantically distinct by intent—a logo, a headline, a background gradient—so inter-layer dependencies matter less than in photographic composition. That modularity, the authors argue, makes synthetic data a better fit for training than for natural scenes, where lighting, occlusion, and physical constraints tightly couple layers.
The saturation point at 50,000 samples suggests diminishing returns from scale alone; the next step is likely architectural or task-specific tuning rather than raw data volume. The study also leaves open how synthetic data performs on edge cases—highly stylized designs, complex transparency blending, or non-Latin scripts—and whether the approach generalizes to video or animated design workflows.