Software

7 minute read

I Built a Voice-Controlled Plant Sitter in Python with Goose & Gemini CLI

November 13, 2025

Ever killed a houseplant because you forgot to water it? I have. Twice.

Here’s how I built a hands-free plant care assistant, the architectural nightmare I ran into, and how an AI subagent helped me solve it.

🪴 The Journey

Here’s a look at the journey from a simple seed of an idea to a fully-grown application. Follow along to see how it grew.

🌱 Planting the Seed

Tools for the Job

🌿 The Growth

The Architecture That Almost Broke Me
The Breakthrough: Embracing Process Isolation
The Last Mile: Making It Real

🌻 The Harvest

The Partner: My Experience Pair Programming with an LLM
The Brains: Making the Reminders “Smart”
Key Lessons Learned
The Final Product on macOS and Windows
Conclusion

Happy reading! 🪴

Mastery requires love. And sometimes, that love leads you into debugging at 2 AM because your ficus deserves better uptime.

I currently have seven houseplants. That’s seven different watering schedules, seven different light preferences, and one sometimes very forgetful owner. Keeping up with their routines can become a real chore. I know because I’ve lost two from the initial nine to forgetfulness. And yes, they all have names, named after characters from Harry Potter.

This simple, real-world problem was the seed for my hackathon project: Smart Plant Sitter.

The theme of the CODETV & Goose Hackathon was to build an app that ‘listens, moves, or reacts to anything but your keyboard’. So the vision became to build a voice-only assistant that could manage my plant collection for me. I wanted to be able to say, “Watered Minerva,” and have the app remember it for me, no typing required.

Curious to see it in action? A full video demo is waiting for you at the end of the article.

Tools for the Job

To bring this vision to life, I settled on a modern Python tech stack. From the outset, I wanted to follow Gall’s Law as my guiding principle:

“A complex system that works is invariably found to have evolved from a simple system that worked.”

That principle guided every decision, from the first prototype to the final packaged app.

My plan was to start with the simplest possible working version and evolve it. The tools for this simple system were:

Flet: A fantastic framework for building beautiful, multi-platform desktop UIs with Python that uses Flutter under the hood. I needed a UI that would look the same across different platforms.
FastAPI: A high-performance web framework for creating the backend API that would house all the logic. This is part of my standard stack, so it was a no-brainer.
SpeechRecognition & pyttsx3: The core libraries for handling voice input and text-to-speech output.
Goose & Gemini CLI: My AI development partners in crime for this heist. I used the Gemini CLI within the Goose environment, which gave the AI direct access to my local file system to read, write, and refactor code alongside me.
OpenWeatherMap API: To provide real-time weather data for the user’s location. This enables the assistant to offer intelligent, context-aware advice.
Local Data Persistence: A simple plants.json file stored in the system’s standard application data directory.
Cross-Platform Design: From the beginning, the app was designed to work on macOS, Windows, and Linux. This influenced decisions like using Flet and writing platform-aware code to store the application’s data in the correct system directory for each operating system.

The Architecture That Almost Broke Me

My initial goal was to create a simple, packageable desktop app. The most common advice for this is to run everything in a single process: the Flet GUI on the main thread and the FastAPI server on a background thread.

It seemed simple enough. It was not.

This approach led to a cascade of complex, frustrating, and hard-to-debug issues:

Threading Conflicts: The Flet GUI and the voice session were constantly competing for resources, making the app unresponsive.
Stateful TTS Engines: Text-to-speech libraries like pyttsx3 are stateful and have known threading issues that simple locks can’t fix. My app would often speak its first line and then go silent forever.
Blocking I/O: The speech_recognition library blocks the entire thread while it’s listening for audio. When run in the same process as the GUI, this would freeze the entire application.
Packaging Complexity: Trying to bundle all these conflicting concerns into a single process for packaging was a nightmare.

The Breakthrough: Embracing Process Isolation

After two days of late-night debugging with my AI subagent, it was clear: we were fighting a losing battle. The root of all these issues was the single-process model. It was the recurring theme in the troubleshooting session.

After several iterations I realised that these libraries were never designed to coexist so intimately. The breakthrough came when we stopped forcing these libraries together and instead embraced a classic software design principle: separation of concerns.

The final, stable architecture is a multi-process client-server model:

The Flet Frontend: It became a pure, lightweight client. Its only job is to display the UI and send HTTP requests to the backend.
The FastAPI Backend: Runs in a completely separate, isolated process. It handles all the heavy lifting: the stateful TTS engine, the blocking speech recognition, and all business logic.

This architecture solved everything. The frontend is always responsive, the voice session runs without interference, and the state is cleanly managed in one place. To achieve this, the main frontend.py script launches a second, identical instance of itself as a subprocess, but passes a special --run-backend flag. The child process sees this flag and starts the FastAPI server, while the parent process continues on to launch the Flet GUI.

# In frontend.py
def start_backend():
    global backend_process
    # Relaunch this same script, but pass a special flag
    # so the new process knows to run the backend server.
    command = [sys.executable, "frontend.py", "--run-backend"]

    # Use DETACHED_PROCESS on Windows to prevent the GUI from freezing
    creation_flags = 0
    if sys.platform == "win32":
        creation_flags = subprocess.DETACHED_PROCESS | subprocess.CREATE_NO_WINDOW

    backend_process = subprocess.Popen(
        command,
        creationflags=creation_flags
    )

The Last Mile: Making It Real

With a stable architecture, I turned to packaging. Because a voice-first app must run locally, packaging wasn’t optional—it was survival. But the real world always has one last challenge.

1. The macOS Microphone Permission Problem

On macOS, my packaged app was being silently blocked from using the microphone. flet pack didn’t expose the advanced options needed to fix this. The solution was to use PyInstaller directly, generate a .spec file, and manually add the NSMicrophoneUsageDescription key to the app’s Info.plist. This gave me the control I needed and was the final key to a working macOS app.

2. The Windows Reality Check

With the Mac version working, testing on a friend’s Windows laptop reminded me how fragile ‘cross-platform’ can be.
This is where theory meets reality. Working on a borrowed, non-technical machine meant I had to create a perfect, sandboxed development environment that could be wiped clean. It forced me to move beyond my Mac comfort zone and truly understand the OS-level quirks of deployment.

The biggest discovery? Windows has two different application data directories (%APPDATA% and %LOCALAPPDATA%). My frontend was looking in Local, while the backend was writing to Roaming. Finding that bug required hunting through the file system—a reminder that assumptions about “standard” paths don’t always hold.

def get_app_data_dir(app_name):
    if sys.platform == "win32":
        return Path(os.getenv("LOCALAPPDATA")) / app_name
    elif sys.platform == "darwin":  # macOS
        return Path.home() / "Library" / "Application Support" / app_name
    else:  # Linux and other Unix-like
        return Path.home() / ".config" / app_name

The Partner: My Experience Pair Programming with an LLM

This project was a true collaboration with an AI subagent. Our partnership had a clear division of labor: I handled the Flet frontend and UX, while the AI took on the heavy lifting for the backend architecture and logic—roughly a 60/40 split in its favor.

My “rule of thumb” when using AI is simple: never let the agent make direct changes. I reviewed every suggestion, understood its implications, and applied the code myself.

My process is deliberate: I start each session with a preamble to guide the overall direction, and I embed strict instructions like “Answer in chat” or “Explain the proposed changes” in my prompts. This allows me to understand the AI’s thought process and verify the implications of every change before I apply the code myself.
This human-in-the-loop process was essential.

Here are some real examples from my 48-prompt journal:

Architectural Pivot (Prompts 34-37):

“Refactor the application to run in a single process… Diagnose the resulting critical runtime failure… Revert the architecture back to the stable multi-process model, but implement it in a package-friendly way using sys.executable…”

This was the moment we hit the wall and made the crucial decision to pivot.

Advanced Packaging Attempts (Prompts 46-47):
> “Refactor the application’s process management from the subprocess module to Python’s multiprocessing module… Following a native development pattern, create a minimal Swift ‘Launcher’ application in Xcode…”

These ambitious attempts to solve a cosmetic issue failed, but they taught us the limits of what was worth optimizing.

Finally, a practical note: this kind of iterative development is compute-intensive. The constant need for the subagent to read multiple files and analyze complex code meant I was glad to have a paid Gemini plan with sufficient credits, as I would have quickly hit the limits of a free tier.

The Brains: Making the Reminders “Smart”

To make the assistant genuinely helpful, I built a small “database” of 12 common houseplants. It includes watering intervals based on maturity, sunlight needs, and care tips. This allows the assistant to cross-reference this data with your plant’s age and the real-time weather to give specific, actionable advice.

Key Lessons Learned

Simple Isn’t Always Simple: The “obvious” single-process architecture hid enormous complexity. The breakthrough came from embracing the design that fit the tools’ constraints.
AI Excels at Iteration, Humans Excel at Direction: The AI was incredible at refactoring, pattern recognition, and proposing solutions. But the strategic pivots and real-world UX validation still required human insight.
Cross-Platform Is a First-Class Concern: You can’t bolt on compatibility at the end. Every decision—from file paths to process management—has platform implications.
Testing Beats Theory Every Time: No amount of reading can replace actually running your code on the target platform.

These lessons reshaped how I think about tool choice, architecture, and the creative partnership between humans and AI.

The Final Product on macOS and Windows

macOS

Windows

Conclusion

Building Smart Plant Sitter was a masterclass in the realities of software development. It taught me that architectural elegance is about finding the design that works for your specific constraints. Working with an AI subagent transformed what would have been weeks of work into days, but it also highlighted that AI is a tool that amplifies your skills rather than replacing them.

This project reminded me that software isn’t built – it’s grown. And that the best AI tools don’t replace developers; they just make our growth faster, and our bugs more interesting.

And now, if you’ll excuse me, my ZZ Plant is telling me it’s thirsty.

If you’d like to explore the code or replicate the build, you can find the full source code, including my detailed 48-prompt journal. View on GitHub

The downloadable application is available for both macOS and Windows

Creaform CUBE-R M Series

November 13, 2025

Software

Introducing Code Wiki: Accelerating your code understanding

November 13, 2025

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Hand-Picked Top-Read Stories

888Starz Casino Philippines – Is It Available for PH Players?

The Privacy Revolution in Your Pocket

My DEV.TO Blogs List

Trending Tags

I Built a Voice-Controlled Plant Sitter in Python with Goose & Gemini CLI

🪴 The Journey

🌱 Planting the Seed

🌿 The Growth

🌻 The Harvest

Tools for the Job

The Architecture That Almost Broke Me

The Breakthrough: Embracing Process Isolation

The Last Mile: Making It Real

1. The macOS Microphone Permission Problem

2. The Windows Reality Check

The Partner: My Experience Pair Programming with an LLM

The Brains: Making the Reminders “Smart”

Key Lessons Learned

The Final Product on macOS and Windows

macOS

Windows

Conclusion

Leave a Reply Cancel reply

Previous Post

Creaform CUBE-R M Series

Next Post

Introducing Code Wiki: Accelerating your code understanding

I Built a Voice-Controlled Plant Sitter in Python with Goose & Gemini CLI

🪴 The Journey

🌱 Planting the Seed

🌿 The Growth

🌻 The Harvest

Tools for the Job

The Architecture That Almost Broke Me

The Breakthrough: Embracing Process Isolation

The Last Mile: Making It Real

1. The macOS Microphone Permission Problem

2. The Windows Reality Check

The Partner: My Experience Pair Programming with an LLM

The Brains: Making the Reminders “Smart”

Key Lessons Learned

The Final Product on macOS and Windows

macOS

Windows

Conclusion

Leave a Reply Cancel reply

Previous Post

Next Post

Related Posts