Software

4 minute read

TPU Just Got Split in Two — and It Changes Everything About AI Infrastructure

June 3, 2026

By Geeta Kakrani | Google Developer Expert in AI

Imagine you run a restaurant.For years, you had one chef doing everything sourcing ingredients, prepping, cooking, plating, cleaning. One person. All jobs. It worked when you had 20 customers a day.

Now you have 2 million customers. Every single day. And they all want their food in under two seconds.

You don’t hire one super-chef. You split the kitchen.

That’s exactly what Google just did with its TPUs.

A Decade of One Chip Doing Everything

Since 2016, Google’s Tensor Processing Units have been the silent engine behind every Google product you use — Search, Translate, Photos, and Gemini. One chip family, designed to both train AI models and run them.

For years, that was fine.

But then AI agents arrived. Systems that don’t just answer one question — they reason, plan, remember, and take action across multiple steps. Millions of them, running simultaneously, in real time.

Suddenly, one chip doing everything wasn’t fine anymore.

At Google Cloud Next 2026, Google made an announcement ten years in the making: the 8th generation TPU is actually two completely different chips.

Meet TPU 8t and TPU 8i.

The Problem They’re Each Solving

Here’s a simple way to think about it.

Training an AI model is like writing a novel. You lock yourself in a room for months. You need enormous focus, massive resources, and you’re not in a hurry to show anyone the draft. When it’s done, it’s done.

Running an AI model is like performing that novel as a live play — every night, for a million audiences simultaneously. You need to be fast, fluid, and you absolutely cannot pause mid-sentence because you’re waiting for a prop to arrive.

Same story. Completely different skills required.

Google finally stopped asking one chip to do both.

TPU 8t — The Training Powerhouse

The “t” is for training. And the numbers here are staggering.

One TPU 8t superpod holds 9,600 chips working together as a single system — with 2 petabytes of shared memory. That’s roughly the storage equivalent of 400 million books, all accessible at once.

The compute? 121 ExaFLOPS. Nearly triple the previous generation.

If the previous chip could fill an Olympic swimming pool in an hour, TPU 8t fills three.

Google also solved a long-standing bottleneck: data transfer. Previously, chips had to route data through the CPU — like every order in a restaurant going through one overwhelmed manager. TPU 8t bypasses that entirely with TPUDirect Storage, letting chips talk directly to data. Transfer speeds effectively doubled.

The result: 2.7x better training performance per dollar over the last generation.

TPU 8i — Built for the Age of AI Agents

This is where it gets really interesting.

The “i” is for inference — but honestly, it should stand for intelligence at scale. Because TPU 8i wasn’t just designed to run AI models. It was designed specifically for the messy, complex, real-time world of AI agents.

Google made three radical changes:

1. Triple the on-chip memory

When an AI is mid-conversation with you, it holds a running record of everything said — called a KV Cache. On older chips, this record kept overflowing into slower memory, forcing the chip to pause and fetch data. Like a waiter who keeps forgetting orders and running back to the kitchen.

TPU 8i has 3x more on-chip SRAM (384 MB). The entire conversation stays on the chip. No pausing. No fetching. Just flow.

2. A brand new engine for thinking fast

AI agents that reason — the kind that think step by step before answering — constantly need all their cores to synchronize with each other. On old chips, this synchronization was a bottleneck.

Google replaced the old system with something called the Collectives Acceleration Engine (CAE). It handles all that synchronization with near-zero latency. The result: 5x faster on-chip communication. For an agent running a complex chain-of-thought, this is the difference between feeling instant and feeling sluggish.

3. A completely new way chips talk to each other

Imagine a city where every road goes through the town square. That was the old network design — a 3D grid where messages between chips could take up to 16 hops to arrive.

Google redesigned the entire road system with something called Boardfly. It’s a hierarchical network — small groups of chips fully connected to each other, then connected to bigger groups through optical switches. The longest any message has to travel? 7 hops. A 56% reduction.

For AI agents using modern architectures like Mixture-of-Experts — where different parts of the model need to collaborate constantly — this is transformational.

The combined result of all three changes: 80% better price-performance for inference over the previous generation.

By the Numbers

TPU 8tTPU 8iBuilt forTrainingInference & AgentsChips per system9,6001,152On-chip SRAM128 MB384 MBMemory (HBM)216 GB288 GBNetwork design3D TorusBoardfly (7 hops max)Key innovationTPUDirect StorageCAE (5x latency cut)Performance gain2.7x over Ironwood80% better price-performance

And Then Google Did Something It Has Never Done Before

For ten years, TPUs were Google’s private weapon. You could use them on Google Cloud — but you couldn’t own one.

That just changed.

Google announced it will begin selling TPUs directly to select customers — AI labs, financial institutions, and high-performance computing organizations — to run inside their own data centers.

The secret weapon is now a product.

Why This Moment Matters

The split of TPU into 8t and 8i isn’t just a hardware story. It’s Google saying out loud what engineers have known quietly for years:

Training AI and running AI are two fundamentally different problems. It’s time to stop pretending one chip can solve both.

As the world moves deeper into the agent era — where AI systems don’t just respond but reason, plan, and act — the infrastructure underneath has to evolve too. Purpose-built beats general-purpose. Every time.

Both TPU 8t and TPU 8i arrive on Google Cloud later in 2026.

The kitchen has been split. The restaurant is ready for scale.

Sources:

TPU Just Got Split in Two — and It Changes Everything About AI Infrastructure was originally published in Google Developer Experts on Medium, where people are continuing the conversation by highlighting and responding to this story.

Amazon will show AI product images when you search for some reason

June 3, 2026

Software

You can run Gemma on just about anything!

June 3, 2026

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Hand-Picked Top-Read Stories

I Built an AI Search Visibility Checker — and Found Out My Own Site Was Invisible

OpenAI researcher Miles Wang in talks to launch AI drug discovery startup valued at $2B

From Using Claude Code to Growing It — The Complete Environment-Building Series

Trending Tags

TPU Just Got Split in Two — and It Changes Everything About AI Infrastructure

A Decade of One Chip Doing Everything

The Problem They’re Each Solving

TPU 8t — The Training Powerhouse

TPU 8i — Built for the Age of AI Agents

By the Numbers

And Then Google Did Something It Has Never Done Before

Why This Moment Matters

Sources:

Leave a Reply Cancel reply

Previous Post

Amazon will show AI product images when you search for some reason

Next Post

You can run Gemma on just about anything!

TPU Just Got Split in Two — and It Changes Everything About AI Infrastructure

A Decade of One Chip Doing Everything

The Problem They’re Each Solving

TPU 8t — The Training Powerhouse

TPU 8i — Built for the Age of AI Agents

By the Numbers

And Then Google Did Something It Has Never Done Before

Why This Moment Matters

Sources:

Leave a Reply Cancel reply

Previous Post

Next Post

Related Posts