Top 5 AWS NAT Gateway Mistakes Teams Make & How to Fix Them

Introduction

NAT Gateway is the AWS service most teams overpay for, and they usually don’t notice until the bill is already in four figures. Every time I run an AWS NAT Gateway cost optimization review for clients, the same mistakes show up on the bill.

NAT Gateway charges are one of the most common hidden costs in AWS, not because the pricing is literally hidden. It’s right on the VPC pricing page: $0.045 per hour idle, $0.045 per GB processed, $0.09 per GB for internet egress on top. What stays hidden is how much traffic is actually flowing through the gateway. Most teams never go back to check after the initial setup. A few terabytes of accidental S3 or ECR traffic move through the gateway every month, and the bill keeps growing before anyone notices.

After five years of building VPCs on AWS for in-house and client-led projects, these are the five AWS NAT Gateway mistakes I keep seeing. Each one has a fix that takes less than a day and saves a big amount on your AWS bills.

Mistake 1: Routing S3 and DynamoDB traffic through NAT Gateway

This is the most common AWS NAT Gateway mistake teams make. A Lambda function in a private subnet reads from an S3 bucket in the same region, with the route table sending 0.0.0.0/0 to the NAT Gateway. Nobody thinks about how the S3 reads will leave the subnet. Every byte the Lambda pulls gets charged $0.045/GB in NAT data processing, even though in-region S3 traffic is free.

I caught this on a recent client audit. A reporting Lambda was pulling 2 TB a week from S3, costing $0 on the S3 line but $390 a week on the NAT line for the same data movement.

The fix: Create Gateway VPC Endpoints for S3 and DynamoDB and add them to the route table for your private subnets. These endpoints are free: no hourly charge, no per-GB charge. They route S3 and DynamoDB traffic directly through AWS’s private network, bypassing the NAT Gateway entirely.

Mistake 2: Forgetting NAT Gateways running in dev and staging accounts

Every team I’ve worked with has had at least one forgotten NAT Gateway sitting in a dev or staging account, costing $32.85/month in us-east-1 hourly charges. Multiply that by three AZs and a few environments, and the bill passes your budgeted estimate, for gateways nobody’s using on weekends.

A team I worked with had three forgotten gateways in a single staging account, running for six months before a routine audit caught them. That’s $32.85/month for three gateways over six months, about $590 in hourly charges.

The fix: Run aws ec2 describe-nat-gateways once a month against every account and check the CloudWatch BytesOutToDestination metric for each gateway. If one has been idle for 30 days, delete it. AWS Compute Optimizer also flags idle gateways automatically now. Turning it on takes about thirty seconds.

Mistake 3: Pulling ECR images through NAT in EKS or ECS clusters

In EKS and ECS, nodes pull container images constantly. By default, every pull goes through the NAT Gateway and gets charged $0.045/GB on the full image size.

A client I worked with last year ran into this on their EKS rollout. They were running 30 services on 12 nodes with 400 MB images on rolling deploys. Within two weeks, the NAT data processing line on their bill had quadrupled, and VPC Flow Logs traced close to 8 TB of monthly traffic to ECR pulls. This is the biggest AWS NAT Gateway mistake I see on EKS engagements.

The fix: Create two Interface Endpoints in each AZ (com.amazonaws..ecr.api and com.amazonaws..ecr.dkr) plus a Gateway Endpoint for S3, where ECR stores its layer data. Once these are in place, container pulls stay on AWS’s private network instead of going through the NAT Gateway. The Interface Endpoints cost $0.01/GB plus an hourly per-ENI charge; the S3 Gateway Endpoint is free. We did this for our client, and their NAT data processing dropped about 78% the month after cutover.

If you’re running EKS or ECS at scale and want this done properly without involving your in-house engineers in other work, this is a good place to hire AWS developers who’ve done it before.

Mistake 4: Running a single NAT Gateway for multiple AZs to save money

Some teams deploy a single NAT Gateway in us-east-1a and route private subnets from 1b and 1c through it to save $32/month. The setup looks clean on the diagram. The catch is cross-AZ data transfer. Every byte from 1b and 1c now pays $0.01/GB to reach the NAT Gateway, on top of the $0.045/GB NAT data processing charge.

We had this setup on an internal data ingestion service that pulled metrics from third-party APIs. Workers in 1b and 1c routed through a single NAT Gateway in 1a to save $64/month, but the cross-AZ fees passed the cost of a second NAT Gateway inside three weeks.

The fix: Deploy one NAT Gateway per AZ for any workload pushing more than ~3 TB/month of cross-AZ traffic. Below that, single-AZ NAT can still be cheaper. If you don’t know which side of the line you’re on, VPC Flow Logs will tell you in an afternoon.

Mistake 5: Ignoring VPC Flow Logs until the AWS bill forces a look

This is the key NAT Gateway mistake that enables all the others. Most teams don’t enable VPC Flow Logs on their NAT Gateway’s ENI until something on the bill forces them to look, by which point the gateway has been burning money for months. The architecture diagram tells one story. The traffic tells another.

The fix: Flow Logs on the NAT Gateway’s ENI, queried weekly. In AWS Cost Explorer, group by Usage Type and filter for NatGateway-Bytes and NatGateway-Hours. You’re looking for two patterns: AWS service IPs (candidates for endpoints) and small chatty destinations (candidates for caching or batching). Five minutes a week. You can’t catch AWS NAT Gateway mistakes without it.

Ending Note

Five years of building and reviewing AWS infrastructure have put me in front of a lot of NAT Gateway mistakes people make. The conclusion is almost always the same. The gateway itself is rarely the problem. The problem is the traffic flowing through it that nobody on the team realizes is there. Once that traffic shows up in Flow Logs, most of the fix is a route table change.

If your NAT bill is climbing and you’re not sure why, start with VPC Flow Logs on the NAT ENI. Look at the top destinations. If S3, DynamoDB, or ECR are in there, you’ve already found money. Most teams hit two or three of these AWS NAT Gateway mistakes the first time they actually look.

For teams that have looked at their Flow Logs, tried the fixes above, and are still seeing high NAT bills, or are dealing with exceptional cases that don’t fit the patterns in this article, they can get help from an AWS consulting services provider to assess the current NAT setup, recommend the right fixes, and implement them with their team.

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post

OpenAI barrels towards IPO that may happen in September

Related Posts