OpenAI and Broadcom unveil Jalapeño, a custom LLM inference chip
OpenAI and Broadcom released Jalapeño, a purpose-built AI chip designed to accelerate large language model inference with gains in performance, efficiency, and scale.

OpenAI and Broadcom released Jalapeño, a custom AI chip built specifically for large language model inference. The chip targets performance, efficiency, and scale improvements across OpenAI's AI systems, marking the company's first publicly disclosed custom silicon effort in partnership with a major semiconductor vendor.
Jalapeño is optimized for the computational patterns of LLM inference workloads — the phase where a trained model generates responses to user queries. Inference accounts for the majority of compute costs in production AI systems at scale, and purpose-built chips can deliver significant efficiency gains over general-purpose GPUs by tailoring memory bandwidth, matrix multiplication units, and data paths to the specific operations that dominate transformer architectures. OpenAI has not disclosed transistor count, fabrication node, power draw figures, or performance benchmarks, but the chip is designed to slot into the company's existing infrastructure.
The move follows a broader industry pattern of hyperscalers designing their own inference silicon to reduce reliance on Nvidia's datacenter GPUs. Google has shipped multiple generations of TPUs for both training and inference. Meta runs inference on custom MTIA chips. Amazon offers Inferentia and Trainium instances on AWS. Microsoft has disclosed work on Maia chips for Azure. OpenAI's partnership with Broadcom positions the company alongside those peers in the custom-silicon race, though OpenAI remains a customer rather than a cloud provider selling inference capacity to third parties.
Broadcom has previously collaborated with Google on TPU designs and with Meta on custom networking silicon. Jalapeño represents the chipmaker's first known partnership with OpenAI, whose inference demand has grown alongside the rollout of GPT-4, GPT-4 Turbo, and the ChatGPT platform. The chip is expected to enter production deployments later in 2026. OpenAI has not announced whether Jalapeño will also be used for fine-tuning workloads or remain dedicated to inference, nor whether the design will eventually be licensed to other AI labs or cloud providers. Inference costs remain a central bottleneck for deploying large models at consumer scale, and custom silicon is one lever OpenAI can pull to continue reducing API pricing without sacrificing margin.



