ChatGPT's four-year arc: from text box to multimodal canvas
A side-by-side comparison video shows how ChatGPT's interface has evolved from its November 2022 launch—text-only prompts and bare-bones responses—to today's voice, vision, and canvas-equipped experience.

A video comparison circulating this week captures the distance ChatGPT has traveled since its November 2022 debut. The left pane shows the original interface: a single text box, monochrome responses, no images, no voice, no memory across sessions. The right pane shows the current build—Advanced Voice Mode, canvas for iterative editing, inline image generation, persistent context, and real-time web search. The contrast is stark enough that newcomers often mistake the 2022 version for a prototype.
The original ChatGPT was GPT-3.5-turbo wrapped in a minimal web UI. It could draft an email or explain a concept, but it couldn't see an image, hear a question, or remember what you asked ten minutes earlier. By early 2023 OpenAI had added GPT-4 and plugin support; by mid-2024 the interface gained vision and voice input; by late 2024 canvas arrived for collaborative editing. Each addition required backend model work—multimodal training for vision, low-latency speech models for voice, retrieval-augmented generation for memory—but the interface changes are what users see first.