OpenAI just announced its first custom chip. Not a reference design. Not a paper. A real silicon accelerator called Jalapeño, co-developed with Broadcom, and it’s already running GPT-5.3-Codex-Spark in the lab at production target frequency and power.
This is OpenAI going vertical in a way that changes the conversation around inference cost, availability, and who really controls the AI stack.
Here’s what happened, what Jalapeño actually is, and why it matters for anyone who uses ChatGPT, Codex, or the API.
What is Jalapeño?
Jalapeño is OpenAI’s first Intelligence Processor — a custom accelerator architected from scratch for LLM inference. It is not a general-purpose chip adapted from earlier AI workloads. It’s a blank-slate design built around how modern LLMs actually run: the kernel patterns, the memory movement, the networking topology, and the serving infrastructure that frontier models need.
OpenAI designed it. Broadcom handled the silicon implementation, chip integration, board and rack system engineering, and high-performance networking (including their Tomahawk networking silicon). Celestica contributed the board and rack-level system expertise.
The chip was physically delivered to Sam Altman and Greg Brockman by Broadcom CEO Hock Tan and President Charlie Kawwas. That handoff photo is the kind of thing that makes NVIDIA take notice.
Nine months from design to tape-out
This is maybe the most impressive operational detail in the announcement. Jalapeño went from initial design to manufacturing tape-out in nine months. OpenAI says this is "what we believe to be the fastest ASIC development cycle ever achieved in high-performance advanced semiconductors."
That speed comes from three things:
- Deep software-hardware co-design. OpenAI’s engineering teams knew exactly what the chip needed to do because they operate the inference stacks for ChatGPT, Codex, and the API every day.
- Broadcom’s silicon implementation expertise. Broadcom has been making custom silicon at scale for decades.
- OpenAI used its own models to accelerate the chip design and optimization process. The same models you chat with helped design the hardware that will run future models.
That last point is worth sitting with. If AI can help engineers design better chips faster, it lowers the cost of compute for everyone.
Performance: better perf/watt, but the real numbers are coming
OpenAI says early testing shows Jalapeño will deliver “performance per watt substantially better than current state-of-the-art.” They’re not naming the baseline, so take that with the grain of salt it deserves until the detailed technical report drops “in the coming months.”
What we do know about the architecture:
- Reduced data movement. This is the single biggest win in LLM inference — most of the energy and latency in inference comes from moving data between memory and compute. A chip designed from scratch to minimize that has a structural advantage.
- Balanced compute, memory, and networking resources designed to achieve “realized utilization much closer to theoretical peak performance.” This is a direct jab at general-purpose GPUs where utilization sometimes falls well short of theoretical peaks because the architecture was designed for graphics, not transformer attention.
- Broadcom’s Tomahawk networking silicon enables large-scale deployment. Networking is the hidden bottleneck in distributed inference, and Broadcom is the market leader in datacenter switching silicon.
Multi-generation platform at gigawatt scale
Jalapeño is not a one-off. OpenAI and Broadcom are building a multi-generation compute platform together. The first deployment is targeted for the end of 2026, and they’re explicitly planning for gigawatt-scale data centers with Microsoft and other partners.
Hock Tan, Broadcom’s CEO: “This is just the beginning of a multi-generation roadmap.”
That scale is enormous. A gigawatt is roughly what a large nuclear power plant produces. We’re talking about AI infrastructure at a scale that rivals the largest cloud providers.
What this means for the AI chip market
The immediate elephant in the room is NVIDIA. NVIDIA’s H100 and B200 GPUs dominate AI inference today. But they’re general-purpose accelerators that do a lot of things well and nothing perfectly. A chip purpose-built for LLM inference — and designed by the company that builds the most popular LLMs — is a direct threat to that dominance.
It also changes the dynamics for companies like AMD, Intel (with Gaudi), and the custom ASIC players like Marvell and Broadcom’s other custom silicon clients. If OpenAI and Broadcom deliver on the perf/watt claims at scale, every hyperscaler and large AI company is going to be asking their silicon partners: “Why aren’t we doing this?”
Google has TPU. Amazon has Trainium and Inferentia. Microsoft has Maia. Now OpenAI has Jalapeño. The era of buying AI compute exclusively off the shelf from GPU vendors is ending.
Full-stack flywheel
The most important sentence in the whole announcement is buried in the middle:
“OpenAI is not only developing frontier models or building products on top of them; it is designing the infrastructure underneath them: chip architecture, kernels, memory systems, networking, scheduling, deployment systems, and product experience.”
That is the thesis statement. OpenAI now touches every layer of the stack:
- Silicon (Jalapeño)
- Kernels and serving systems
- Models (GPT-5.x family)
- Products (ChatGPT, Codex, API)
- Deployment infrastructure (with Microsoft)
Each layer can be optimized around the same goal. Better chips means cheaper inference. Cheaper inference means better products. Better products means more users and revenue. More revenue funds the next chip generation.
That flywheel is hard to compete against.
Caveats
- Final performance numbers are not public. “Substantially better than current SOTA” is a directional claim, not a benchmark. We need the technical report.
- This is a first-generation chip. First-gen silicon always has surprises. The nine-month tape-out is impressive but also means less time for design iteration.
- Jalapeño is not for sale. It runs OpenAI’s infrastructure. There is no indication they plan to sell it as a merchant silicon product. If you want the benefit, it comes through cheaper/better ChatGPT, Codex, and API access.
- Deployment starts “by the end of 2026.” That’s still 6+ months out. The current GPU shortage might shift timelines.
- The detailed technical report is “coming months,” not weeks. Expect the real story on performance later.
Bottom line
OpenAI just became a silicon company. Jalapeño is a serious piece of engineering — a custom LLM inference accelerator co-developed with Broadcom in nine months, designed from the ground up for how modern LLMs actually work, heading for gigawatt-scale deployment.
The full performance picture is still a few months away. But the strategic direction is clear: OpenAI is building the entire stack, from the transistor to the chat interface. That vertical integration is the kind of move that compounds over time.
For everyone using ChatGPT, Codex, or the API, the bet is simple: inference gets cheaper and faster over time as this matures. For NVIDIA and the rest of the AI hardware market, the question just got a lot more complicated.



