By Geeta Kakrani | Google Developer Expert in AI

Imagine you run a restaurant.For years, you had one chef doing everything sourcing ingredients, prepping, cooking, plating, cleaning. One person. All jobs. It worked when you had 20 customers a day.
Now you have 2 million customers. Every single day. And they all want their food in under two seconds.
You don’t hire one super-chef. You split the kitchen.
That’s exactly what Google just did with its TPUs.
A Decade of One Chip Doing Everything
Since 2016, Google’s Tensor Processing Units have been the silent engine behind every Google product you use — Search, Translate, Photos, and Gemini. One chip family, designed to both train AI models and run them.
For years, that was fine.
But then AI agents arrived. Systems that don’t just answer one question — they reason, plan, remember, and take action across multiple steps. Millions of them, running simultaneously, in real time.
Suddenly, one chip doing everything wasn’t fine anymore.
At Google Cloud Next 2026, Google made an announcement ten years in the making: the 8th generation TPU is actually two completely different chips.
Meet TPU 8t and TPU 8i.
The Problem They’re Each Solving
Here’s a simple way to think about it.
Training an AI model is like writing a novel. You lock yourself in a room for months. You need enormous focus, massive resources, and you’re not in a hurry to show anyone the draft. When it’s done, it’s done.
Running an AI model is like performing that novel as a live play — every night, for a million audiences simultaneously. You need to be fast, fluid, and you absolutely cannot pause mid-sentence because you’re waiting for a prop to arrive.
Same story. Completely different skills required.
Google finally stopped asking one chip to do both.
TPU 8t — The Training Powerhouse
The “t” is for training. And the numbers here are staggering.
One TPU 8t superpod holds 9,600 chips working together as a single system — with 2 petabytes of shared memory. That’s roughly the storage equivalent of 400 million books, all accessible at once.
The compute? 121 ExaFLOPS. Nearly triple the previous generation.
If the previous chip could fill an Olympic swimming pool in an hour, TPU 8t fills three.
Google also solved a long-standing bottleneck: data transfer. Previously, chips had to route data through the CPU — like every order in a restaurant going through one overwhelmed manager. TPU 8t bypasses that entirely with TPUDirect Storage, letting chips talk directly to data. Transfer speeds effectively doubled.
The result: 2.7x better training performance per dollar over the last generation.
TPU 8i — Built for the Age of AI Agents
This is where it gets really interesting.
The “i” is for inference — but honestly, it should stand for intelligence at scale. Because TPU 8i wasn’t just designed to run AI models. It was designed specifically for the messy, complex, real-time world of AI agents.
Google made three radical changes:
1. Triple the on-chip memory
When an AI is mid-conversation with you, it holds a running record of everything said — called a KV Cache. On older chips, this record kept overflowing into slower memory, forcing the chip to pause and fetch data. Like a waiter who keeps forgetting orders and running back to the kitchen.
TPU 8i has 3x more on-chip SRAM (384 MB). The entire conversation stays on the chip. No pausing. No fetching. Just flow.
2. A brand new engine for thinking fast
AI agents that reason — the kind that think step by step before answering — constantly need all their cores to synchronize with each other. On old chips, this synchronization was a bottleneck.
Google replaced the old system with something called the Collectives Acceleration Engine (CAE). It handles all that synchronization with near-zero latency. The result: 5x faster on-chip communication. For an agent running a complex chain-of-thought, this is the difference between feeling instant and feeling sluggish.
3. A completely new way chips talk to each other
Imagine a city where every road goes through the town square. That was the old network design — a 3D grid where messages between chips could take up to 16 hops to arrive.
Google redesigned the entire road system with something called Boardfly. It’s a hierarchical network — small groups of chips fully connected to each other, then connected to bigger groups through optical switches. The longest any message has to travel? 7 hops. A 56% reduction.
For AI agents using modern architectures like Mixture-of-Experts — where different parts of the model need to collaborate constantly — this is transformational.
The combined result of all three changes: 80% better price-performance for inference over the previous generation.
By the Numbers
TPU 8tTPU 8iBuilt forTrainingInference & AgentsChips per system9,6001,152On-chip SRAM128 MB384 MBMemory (HBM)216 GB288 GBNetwork design3D TorusBoardfly (7 hops max)Key innovationTPUDirect StorageCAE (5x latency cut)Performance gain2.7x over Ironwood80% better price-performance
And Then Google Did Something It Has Never Done Before
For ten years, TPUs were Google’s private weapon. You could use them on Google Cloud — but you couldn’t own one.
That just changed.
Google announced it will begin selling TPUs directly to select customers — AI labs, financial institutions, and high-performance computing organizations — to run inside their own data centers.
The secret weapon is now a product.
Why This Moment Matters
The split of TPU into 8t and 8i isn’t just a hardware story. It’s Google saying out loud what engineers have known quietly for years:
Training AI and running AI are two fundamentally different problems. It’s time to stop pretending one chip can solve both.
As the world moves deeper into the agent era — where AI systems don’t just respond but reason, plan, and act — the infrastructure underneath has to evolve too. Purpose-built beats general-purpose. Every time.
Both TPU 8t and TPU 8i arrive on Google Cloud later in 2026.
The kitchen has been split. The restaurant is ready for scale.
Sources:
- TPU 8t and TPU 8i Technical Deep Dive — Google Cloud Blog
- Sundar Pichai’s Google I/O 2026 Keynote
- Google Cloud Next 2026 — Sundar Pichai
TPU Just Got Split in Two — and It Changes Everything About AI Infrastructure was originally published in Google Developer Experts on Medium, where people are continuing the conversation by highlighting and responding to this story.