Building Todoist Ramble: How Doist Turned Voice Braindumps into Real-Time Task Capture

Building Todoist Ramble: How Doist Turned Voice Braindumps into Real-Time Task Capture

Listen to this episode on: Spotify | Apple Podcasts

How do you turn a rambling stream of consciousness into a clean task list — while the person is still talking? That’s the core challenge Doist solved with Ramble, a voice-to-task feature inside Todoist that uses live audio AI to capture tasks in real time, no transcription step required.

In this episode of Just Now Possible, Teresa Torres talks with Ernesto Garcia (Front-end Product Engineer), Thomas Jost (Backend Software Engineer), and Hugo Fauquenoi (Product Manager) from Doist about how they built Ramble — Todoist’s first pure AI feature. What started as a two-to-three month AI exploration phase became one of the most technically deliberate features they’ve shipped: a Gemini-powered pipeline that makes tool calls while the user is still speaking, surfacing tasks on screen in real time without any text output from the model.

You’ll hear how they designed around the “brain dump” behavior they found in user research, why they chose direct context injection over RAG for project and label matching, the surprising complexity of date handling in a live audio pipeline, and how they built a multi-language eval system using real employee recordings across 35 countries. It’s a detailed look at the discipline of keeping AI features simple, constrained, and genuinely useful.

Show Notes

Guests

  • Ernesto Garcia, Front-end Product Engineer, Doist
  • Thomas Jost, Backend Software Engineer, Doist
  • Hugo Fauquenoi, Product Manager, Doist

In this episode

  • How Doist’s 2-3 month AI exploration phase led to Ramble — and why voice-to-task emerged as the top contender
  • The user research insight behind Ramble: people using pen and paper or ChatGPT voice to brainstorm tasks before committing them to Todoist
  • Why Ramble skips transcription entirely and processes raw audio directly with a Gemini live audio model
  • How the model makes tool calls (add task, edit task, delete task) in real time while the user is still speaking — no text output at all
  • Designing for the driving use case: sound effects as audio confirmation cues alongside visual task cards
  • The challenge of teaching an LLM to capture tasks literally without over-interpreting or doing them — and how temperature tuning played a role
  • Date handling complexity: injecting the current date, normalizing to days vs. months, and always outputting dates in English for the natural language parser
  • Building an LLM-judge eval system with 20+ language recordings from 100+ employees across 35 countries to catch prompt regressions
  • Why Doist chose to inject the full project/label list into the system prompt instead of building a RAG pipeline — and why it worked
  • How easy correction beats perfect first-time accuracy in natural language interfaces
  • What’s next: multimodal task capture from images and text blobs, Apple Watch support, and automation integrations

Chapters

00:00 Meet the Doist Team
01:40 What Doist Builds
02:27 Ramble Voice to Tasks
04:16 Why Voice Matters
07:42 Brain Dump Insight
09:46 Prototyping With LLMs
11:08 Live Audio Workflow
14:32 Driving Friendly UX
18:47 Tool Only Architecture
26:06 Evals and Multilingual Testing
28:41 Taming Dates and Time
33:28 Fixing Date Confusion
33:43 Defining Task Boundaries
34:34 Capture Versus Do
37:17 Tuning Creativity Levels
39:01 Evals Across Languages
41:23 Feedback and Regressions
44:09 Model Upgrades Over Time
46:33 Projects Labels Context
51:40 Handling Ambiguous Names
54:23 Whats Next Multimodal
58:48 From Capture to Execution
59:46 Closing Thoughts

Full Transcript

Podcast transcripts are only available to paid subscribers.

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post

5 top cloud migration software for Infrastructure as Code (IaC)

Related Posts