Hugging Face agents orchestrate multi-step creative tasks by chaining Spaces
A new demonstration shows an AI agent orchestrating image generation and 3D scene construction by calling two separate Hugging Face Spaces in sequence, producing a navigable virtual gallery.
An AI agent can now build a three-dimensional virtual gallery by chaining calls to two different Hugging Face Spaces—one for image generation, another for 3D scene assembly—without human intervention between steps.
The demonstration, detailed in a Hugging Face blog post this week, uses the platform's Spaces infrastructure to let an agent request images from a generative model hosted in one Space, then pass those outputs to a second Space that constructs a 3D environment. The result is a navigable Paris-themed gallery where each generated image hangs on a virtual wall. Hugging Face Spaces are containerized web apps that run machine learning models and tools; the agent treats each Space as a callable function, sending a prompt to the image-generation Space, waiting for the output URL, then feeding that URL to the 3D-builder Space along with layout parameters.
The workflow demonstrates multi-step orchestration without custom API glue. The agent's reasoning loop decides which Space to call next based on the task state, and the Spaces themselves expose standardized input/output schemas the agent can parse. The 3D Space returns a scene file the agent can render or share. The blog post includes sample code showing how the agent constructs the chain and handles intermediate results.
No benchmark numbers or parameter counts are provided—the focus is the integration pattern rather than model performance. The Paris gallery theme is illustrative; the same chaining logic applies to any multi-stage creative task where one model's output feeds another's input, from video-to-animation workflows to text-to-scene-to-rendering pipelines.







