Listen to this episode on: Spotify | Apple Podcasts
Building AI products isn’t just about clever prompts and orchestration—it’s about knowing if what you’ve built actually works. In this episode, Teresa Torres and Petra Wille dive deep into AI evals: how they’re defined, why they’re essential, and how teams can implement them to ensure product quality.
Teresa shares her journey building her Interview Coach tool and the hard lessons she learned about evals along the way. From golden datasets and synthetic data to error analysis, code-based checks, and LLM-as-judge methods, you’ll walk away with a clearer picture of how to measure and improve AI products over time.
What you’ll learn in this episode:
- What “evals” actually mean in the AI/ML world
- Why evals are more than just quality assurance
- The difference between golden datasets, synthetic data, and real-world traces
- How to identify error modes and turn them into evals
- When to use code-based evals vs. LLM-as-judge evals
- How discovery practices inform every step of AI product evaluation
- Why evals require continuous maintenance (and what “criteria drift” means for your product)
- The relationship between evals, guardrails, and ongoing human oversight
Resources & Links:
- Follow Teresa Torres: https://ProductTalk.org
- Follow Petra Wille: https://Petra-Wille.com
Mentioned in the episode:
- How I Designed & Implemented Evals for Product Talk’s Interview Coach by Teresa Torres
Teresa’s – Interview Coach - ML (Machine learning)
- Story-Based Customer Interviews – On Demand course by Teresa
- LLM (Large language model)
- AI Evals for Engineers and PMs course (get 35% off through Teresa’s link) on Maven
- V0
- JSON (JavaScript Object Notation)
- Anthropic
- The Product Leadership Wheel – A Framework for Defining and Growing Product Leadership at Scale by Petra Wille
- Lovable
- Behind the Scenes: Building the Product Talk Interview Coach by Teresa
- Previous episode: – Building AI Products
Coming soon from Teresa:
- Weekly Monday posts sharing lessons learned while building AI products
- A new podcast interviewing cross-functional teams about real-world AI product development stories
Join the Conversation:
Have thoughts on this episode? Leave a comment below.
Full Transcript
Full transcripts are only available for paid subscribers.