A Week-Long Journey into AI Prototyping, Feedback, and Feasibility
A short while ago, we embarked on a rapid prototyping project to explore how AI could provide contextual and timely assistance to users. What started as a one-day build evolved into a week of intense learning about development, user-centric design, and the practical realities of building AI-powered features.
Here is the complete journey.
Part 1: The Power of Lean & Fast
Our first major insight was the incredible impact of a small, focused team. We assembled a nimble group: just one product owner, one developer, and one UX researcher. The mission was to go from initial idea to a functional prototype in a single day.
And we did it! Leveraging powerful AI coding assistance from Cline and Claude Sonnet 3.7, we built out a fully functioning prototype. The stack included a React frontend, API Gateway & AWS Lambda for the backend, DynamoDB for storage, and AWS Bedrock for the AI magic, all with infrastructure spun up using AWS CDK. It was an amazing demonstration of what a dedicated trio can achieve with the right tools and a tight deadline.
Part 2: When & How AI can help
We initially envisioned AI providing feedback after students completed their work. However, focusing on our users – young students – highlighted that the most impactful assistance occurs during the activity itself.
This insight led us to design an AI tutor for real-time, in-activity support. But how do you ensure such an AI truly helps a young mind learn and not just get answers? To quote my 🤖 sidekick –
𝘛𝘳𝘶𝘦 𝘸𝘪𝘴𝘥𝘰𝘮 𝘪𝘯 𝘈𝘐 𝘭𝘪𝘦𝘴 𝘯𝘰𝘵 𝘪𝘯 𝘵𝘩𝘦 𝘴𝘰𝘱𝘩𝘪𝘴𝘵𝘪𝘤𝘢𝘵𝘪𝘰𝘯 𝘰𝘧 𝘢𝘭𝘨𝘰𝘳𝘪𝘵𝘩𝘮𝘴, 𝘣𝘶𝘵 𝘪𝘯 𝘵𝘩𝘦 𝘱𝘳𝘰𝘧𝘰𝘶𝘯𝘥 𝘶𝘯𝘥𝘦𝘳𝘴𝘵𝘢𝘯𝘥𝘪𝘯𝘨 𝘰𝘧 𝘩𝘶𝘮𝘢𝘯 𝘯𝘦𝘦𝘥—𝘬𝘯𝘰𝘸𝘪𝘯𝘨 𝘱𝘳𝘦𝘤𝘪𝘴𝘦𝘭𝘺 𝘸𝘩𝘦𝘯 𝘵𝘰 𝘴𝘱𝘦𝘢𝘬 𝘢𝘯𝘥 𝘸𝘩𝘦𝘯 𝘵𝘰 𝘭𝘪𝘴𝘵𝘦𝘯, 𝘸𝘩𝘦𝘯 𝘵𝘰 𝘨𝘶𝘪𝘥𝘦 𝘢𝘯𝘥 𝘸𝘩𝘦𝘯 𝘵𝘰 𝘴𝘵𝘦𝘱 𝘣𝘢𝘤𝘬, 𝘤𝘳𝘦𝘢𝘵𝘪𝘯𝘨 𝘮𝘰𝘮𝘦𝘯𝘵𝘴 𝘰𝘧 𝘨𝘦𝘯𝘶𝘪𝘯𝘦 𝘪𝘯𝘴𝘪𝘨𝘩𝘵 𝘵𝘩𝘢𝘵 𝘵𝘳𝘢𝘯𝘴𝘧𝘰𝘳𝘮 𝘤𝘰𝘯𝘧𝘶𝘴𝘪𝘰𝘯 𝘪𝘯𝘵𝘰 𝘤𝘭𝘢𝘳𝘪𝘵𝘺.
Guided by this philosophy, our AI tutor aims to:
𝗣𝗲𝗿𝘀𝗼𝗻𝗮𝗹𝗶𝘇𝗲 𝘀𝘂𝗽𝗽𝗼𝗿𝘁: Using age-appropriate language and considering reading levels.
𝗔𝗱𝗮𝗽𝘁 𝗶𝘁𝘀 𝗴𝘂𝗶𝗱𝗮𝗻𝗰𝗲: Offering a spectrum of help – sometimes “guiding” with direct input, other times “stepping back” with subtle hints to nudge students toward their own discovery, truly aiming to turn confusion into clarity.
𝗜𝗻𝘁𝗲𝗹𝗹𝗶𝗴𝗲𝗻𝗰𝗲 𝗘𝗺𝗲𝗿𝗴𝗲𝘀 𝗳𝗿𝗼𝗺 𝗣𝗮𝘁𝘁𝗲𝗿𝗻 𝗥𝗲𝗰𝗼𝗴𝗻𝗶𝘁𝗶𝗼𝗻: The most effective AI assistants don’t just respond to explicit requests for help. They observe user behavior patterns and proactively offer assistance when it’s most needed. Our state detection algorithms demonstrate how to analyze user interactions and make intelligent inferences about when intervention would be helpful.
Part 3: Full Stack, Full Speed
Initially, our plan was fairly standard for rapid prototyping: start with a React UI, using mock data and simulated interactions. The idea was to get a feel for the UX quickly and iterate. But with a powerful 🤖 at our fingertips, what if going full stack wasn’t the bottleneck we assumed? Our AI power-duo helped get a basic end-to-end backend system, including an evaluation endpoint, running in about 30 minutes! This speed was phenomenal, but what it unlocked was even more transformative: truly rapid, multi-faceted iteration. So, while 🤖 was laying the groundwork, our team could zero in on rapidly advancing the AI’s core ‘thinking’.
- Live-tweaking the prompts for Claude 3.7 Sonnet, allowing us to observe immediate changes in its output and understanding.
- Putting the AI through its paces by rigorously testing its responses against a wide array of real-world student scenarios and edge cases.
- Focusing on clarity, contextual relevance, and achieving the right personalized & supportive tone for young learners.
We were able to quickly iterate over the UI and make significant design changes in real-time. We saw it make surprisingly intuitive design choices. With minimal detailed prompting from our side, it effectively implemented UI elements such as:
- Progress bars to show task completion.
- Collapsible sections for cleaner information display.
- Clearly distinguished primary & secondary buttons.
- Appropriate labels and helpful tooltips.
This hands-on, immediate testing loop allowed us to quickly zero in on key insights for the feature’s effectiveness and overall UX:
- 𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗦𝗰𝗮𝗹𝗶𝗻𝗴 𝗡𝗲𝗲𝗱𝗲𝗱: The AI’s language had to adapt to the student’s specific age and reading level for feedback to be truly effective.
- 𝗖𝗼𝗻𝘀𝗶𝘀𝘁𝗲𝗻𝗰𝘆 𝗗𝗲𝗺𝗮𝗻𝗱𝘀 𝗮 𝗥𝘂𝗯𝗿𝗶𝗰: For the AI’s feedback to be consistent and fair, it needed a detailed, binary, and scored rubric (with weighted metrics) to measure responses against.
- 𝗢𝗻𝗯𝗼𝗮𝗿𝗱𝗶𝗻𝗴 𝗳𝗼𝗿 𝗙𝗶𝗿𝘀𝘁-𝗧𝗶𝗺𝗲 𝗨𝘀𝗲𝗿𝘀: We also quickly recognized that for students to fully benefit from the get-go, a clear and simple onboarding flow was essential to introduce the AI’s capabilities and guide their initial interactions successfully.
Part 4: The AI Tightrope: Balancing Real-Time Speed & Real-World Spend
Prototyping isn’t just about cool features; it’s fundamentally about feasibility. An early, intense focus on delivering real-time UX AND mastering aggressive cost control is what turns innovative AI concepts into scalable, real-world solutions.
After exploring AI tutor design and rapid iteration, today we’re diving into crucial “engine room” lessons from our prototype: crafting performant, real-time AI that also respects the budget.
𝗞𝗲𝗲𝗽𝗶𝗻𝗴 𝗶𝘁 𝗦𝗻𝗮𝗽𝗽𝘆: 𝗘𝗻𝗴𝗶𝗻𝗲𝗲𝗿𝗶𝗻𝗴 𝗥𝗲𝗮𝗹-𝗧𝗶𝗺𝗲 𝗔𝗜 ⚡
Users expect AI interactions to be seamless. Here’s how we tackled this:
• 𝗪𝗲𝗯𝗦𝗼𝗰𝗸𝗲𝘁𝘀: For that instant, conversational feel.
• 𝗜𝗻𝘁𝗲𝗹𝗹𝗶𝗴𝗲𝗻𝘁 𝗗𝗲𝗯𝗼𝘂𝗻𝗰𝗶𝗻𝗴: Grouping user inputs for smoother, more contextual AI exchanges and limiting the number of messages sent.
• 𝗧𝗮𝗰𝗸𝗹𝗶𝗻𝗴 𝗟𝗟𝗠 𝗟𝗮𝘁𝗲𝗻𝗰𝘆: Let’s be real, even powerful models have thinking time. Our approach:
- Engaging UI: A fun “thinking” animation keeps users happy during brief waits.
- Streaming Responses: This is key! Users see feedback appear word-by-word as the LLM generates it, making the experience feel much faster.
𝗦𝗺𝗮𝗿𝘁 𝗦𝗽𝗲𝗻𝗱𝗶𝗻𝗴: 𝗔𝗜 𝗖𝗼𝘀𝘁 𝗖𝗼𝗻𝘁𝗿𝗼𝗹 𝗳𝗿𝗼𝗺 𝗗𝗮𝘆 𝗭𝗲𝗿𝗼 💰
Evaluating AI cost wasn’t just a late-stage optimization; it was a critical go/no-go metric for the feature’s feasibility right from the prototype phase. To serve thousands of concurrent users effectively, the solution had to be economically viable. This early focus on cost-per-interaction drove many design choices.
The foundational “system prompt” and other bulky, unchanging context (like rubric details) are placed first and 𝗰𝗮𝗰𝗵𝗲𝗱. Then, only unique student work or current state is injected dynamically.
We projected 70-80% savings while maintaining (and even enhancing) response quality and consistency!
To truly understand our efficiency, we diligently measured costs on both a per-interaction basis (how much each student query costs) and an overall session basis (total cost for a student’s entire engagement). This granular tracking helped us estimate the AI cost at scale.
Part 5: Build less, Learn more
With AI tools supercharging development, the art of prototyping shifts. It’s not just about building fast, but critically, knowing when to stop. Crafting ‘just enough’ to test core hypotheses with users is vital for agile learning, preventing over-investment in unvalidated ideas, and ensuring user needs truly shape your product.
Welcome back! After diving into real-time architecture and cost, today let’s talk about a subtle challenge in AI-accelerated prototyping: defining “done” for the prototype itself.
𝗧𝗵𝗲 𝗔𝗹𝗹𝘂𝗿𝗲 𝗼𝗳 𝘁𝗵𝗲 “𝗔𝗹𝗺𝗼𝘀𝘁 𝗙𝘂𝗹𝗹 𝗣𝗿𝗼𝗱𝘂𝗰𝘁” 🌟
Agentic Coding tools make building end-to-end features incredibly fast. But it also brings a new temptation: if you can build it all quickly, why not iron out every detail? It’s easy to get sucked into polishing and adding, moving closer to a full-blown product than a learning tool.
𝗪𝗵𝘆 𝗪𝗲 𝗛𝗶𝘁 𝘁𝗵𝗲 𝗕𝗿𝗮𝗸𝗲𝘀 🛑
As a team, we had to consciously pull back. Why?
𝘿𝙚𝙛𝙚𝙖𝙩𝙨 𝙩𝙝𝙚 𝙋𝙧𝙤𝙩𝙤𝙩𝙮𝙥𝙚’𝙨 𝙋𝙪𝙧𝙥𝙤𝙨𝙚: A prototype isn’t meant to be a perfect, complete product. Its primary job is to facilitate rapid learning and validate assumptions quickly. Overbuilding delays this crucial step.
𝙐𝙨𝙚𝙧 𝙏𝙚𝙨𝙩𝙞𝙣𝙜 𝙞𝙨 𝙋𝙖𝙧𝙖𝙢𝙤𝙪𝙣𝙩: We fundamentally believe in letting user feedback guide development. We needed to get something into users’ hands to hear how they would actually use the feature and what they truly value – not just build what we thought was best in a vacuum.
𝘼𝙫𝙤𝙞𝙙𝙞𝙣𝙜 𝙋𝙧𝙚𝙢𝙖𝙩𝙪𝙧𝙚 𝘼𝙩𝙩𝙖𝙘𝙝𝙢𝙚𝙣𝙩: The more effort and detail you pour into a specific feature set before validation, the harder it becomes to pivot or even discard it if users don’t respond well. We wanted to stay nimble and not get too emotionally invested in a solution users might reject.
𝗙𝗶𝗻𝗱𝗶𝗻𝗴 𝗢𝘂𝗿 “𝗝𝘂𝘀𝘁 𝗘𝗻𝗼𝘂𝗴𝗵” ⚖️
For us, “just enough” meant building the core functionality that would allow users to experience the primary value proposition of our AI tutor. It needed to be functional enough to elicit genuine reactions and specific feedback on key interactions, but not so polished that we’d be heartbroken if we had to change major parts (or all!) of it based on user testing.
It’s a continuous balancing act, but embracing this mindset keeps the “rapid” in rapid prototyping truly effective.
Part 6: Priming the prototype for user testing
Effective user testing hinges on designing tests that challenge assumptions, elicit genuine user behaviors, and provide actionable insights – not just echo our preconceived notions. Thoughtful planning, from understanding user context deeply to strategic use of tools like feature flags, is paramount.
Welcome back to hashtag#RapidLearnings! After discussing the art of “just enough” prototyping (Part 5), today we’re pulling back the curtain on how we prepare our AI-powered prototype for user experience research. The goal? To ensure we capture real insights that will guide our development.
𝗕𝗲𝘆𝗼𝗻𝗱 𝘁𝗵𝗲 𝗘𝗰𝗵𝗼 𝗖𝗵𝗮𝗺𝗯𝗲𝗿: 𝗗𝗲𝘀𝗶𝗴𝗻𝗶𝗻𝗴 𝗳𝗼𝗿 𝗛𝗼𝗻𝗲𝘀𝘁 𝗙𝗲𝗲𝗱𝗯𝗮𝗰𝗸 👂
It’s human nature to seek validation. However, user testing should be about uncovering truths. Our preparation to achieve this starts before the prototype is even shown:
𝗨𝗻𝗱𝗲𝗿𝘀𝘁𝗮𝗻𝗱𝗶𝗻𝗴 𝘁𝗵𝗲 𝗨𝘀𝗲𝗿’𝘀 𝗪𝗼𝗿𝗹𝗱 𝗙𝗶𝗿𝘀𝘁: We prepare a list of key questions designed to deeply explore their current reality. We ask users to talk openly about:
- 𝘛𝘩𝘦𝘪𝘳 𝘗𝘢𝘪𝘯 𝘗𝘰𝘪𝘯𝘵𝘴
- 𝘊𝘶𝘳𝘳𝘦𝘯𝘵 𝘛𝘰𝘰𝘭𝘴 & 𝘗𝘳𝘰𝘤𝘦𝘴𝘴𝘦𝘴
𝗖𝗿𝗮𝗳𝘁𝗶𝗻𝗴 𝗨𝗻𝗯𝗶𝗮𝘀𝗲𝗱 𝗣𝗿𝗼𝗺𝗽𝘁𝘀: The way we frame tasks and questions for the prototype interaction itself is crucial. We consciously create neutral prompts that encourage users to think aloud and share their genuine experiences, rather than leading them.
𝗢𝗯𝘀𝗲𝗿𝘃𝗶𝗻𝗴 𝗕𝗲𝗵𝗮𝘃𝗶𝗼𝗿 𝗢𝘃𝗲𝗿 𝗝𝘂𝘀𝘁 𝗔𝘀𝗸𝗶𝗻𝗴: What users do often speaks louder than what they say. We pay close attention to their interactions, hesitations, and workarounds within the prototype.
𝗧𝗵𝗲 𝗣𝗼𝘄𝗲𝗿 𝗼𝗳 𝗜𝗻𝗰𝗿𝗲𝗺𝗲𝗻𝘁𝗮𝗹 𝗘𝘅𝗽𝗼𝘀𝘂𝗿𝗲: 𝗙𝗲𝗮𝘁𝘂𝗿𝗲 𝗙𝗹𝗮𝗴𝘀 𝗶𝗻 𝗔𝗰𝘁𝗶𝗼𝗻 🚩
To isolate feedback on specific functionalities, we’re leveraging LaunchDarkly for feature flagging. This allows us to start with a core experience and incrementally turn on features during a single user testing session in real-time, observing how users react to each addition without prior priming. This shows us how users organically discover and integrate them into their workflow.
By investing time in this upfront preparation, from understanding user pain points to setting up robust test environments, we aim to make our user testing sessions far more insightful.
𝗧𝗵𝗲 𝗙𝗶𝗻𝗮𝗹𝗲: “𝗜 𝗮𝗺 𝗮𝗻𝘁𝗶 𝗔𝗜, 𝗯𝘂𝘁 𝗜 𝗱𝗼 𝗹𝗶𝗸𝗲 𝘁𝗵𝗶𝘀 𝗮 𝗹𝗼𝘁!”
The ultimate validation of a prototype lies in user feedback. Engaging with real users, actively listening to their concerns and aspirations, and iteratively incorporating those insights is the engine of meaningful product development. Our first round of user testing provided invaluable guidance, with one user’s comment – “𝘐 𝘢𝘮 𝘢𝘯𝘵𝘪 𝘈𝘐, 𝘣𝘶𝘵 𝘐 𝘥𝘰 𝘭𝘪𝘬𝘦 𝘵𝘩𝘪𝘴 𝘢 𝘭𝘰𝘵” – underscoring the potential of well-designed, user-centric AI to address real needs and even win over skeptics.
What a journey this hashtag#RapidLearnings series has been! Today, I’m sharing the exciting results and key takeaways from our initial user testing of the AI-powered prototype. It was a fantastic opportunity to see our work through fresh eyes and validate (and challenge!) our assumptions.
𝗩𝗼𝗶𝗰𝗲𝘀 𝗳𝗿𝗼𝗺 𝘁𝗵𝗲 𝗙𝗶𝗲𝗹𝗱: 𝗥𝗲𝗮𝗹 𝗨𝘀𝗲𝗿 𝗙𝗲𝗲𝗱𝗯𝗮𝗰𝗸 🗣️
Here’s a snapshot of what we heard:
𝗔𝗜 𝗛𝗲𝘀𝗶𝘁𝗮𝗻𝗰𝘆: One user, new to AI, expressed a common concern about students becoming overly reliant on it for answers. This highlighted the importance of our design focusing on learning support.
𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝗦𝘂𝗽𝗽𝗼𝗿𝘁 𝗙𝗼𝗰𝘂𝘀: Encouragingly, this same user, and others, emphasized a preference for AI to provide “𝘩𝘪𝘯𝘵𝘴 𝘢𝘴 𝘰𝘱𝘱𝘰𝘴𝘦𝘥 𝘵𝘰 𝘢𝘯𝘴𝘸𝘦𝘳𝘴” reinforcing our pivot towards an AI tutor model.
𝗔𝗰𝘁𝗶𝗼𝗻𝗮𝗯𝗹𝗲 𝗙𝗲𝗲𝗱𝗯𝗮𝗰𝗸 𝗼𝗻 𝗦𝘂𝗺𝗺𝗮𝗿𝗶𝗲𝘀: Users liked the summary feature but wanted more critical feedback – clear guidance on “𝘸𝘩𝘢𝘵 𝘪𝘴 𝘵𝘩𝘦 𝘯𝘦𝘹𝘵 𝘴𝘵𝘦𝘱?” 𝘢𝘯𝘥 “𝘸𝘩𝘢𝘵 𝘢𝘳𝘦 𝘵𝘩𝘦𝘺 𝘮𝘪𝘴𝘴𝘪𝘯𝘨?”
𝗞𝗶𝗱-𝗙𝗿𝗶𝗲𝗻𝗱𝗹𝘆 𝗟𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗩𝗮𝗹𝗶𝗱𝗮𝘁𝗶𝗼𝗻: The positive reception to the age-appropriate language confirmed a key design decision.
𝗩𝗶𝘀𝘂𝗮𝗹 𝗖𝗹𝗮𝗿𝗶𝘁𝘆 𝗡𝗲𝗲𝗱𝗲𝗱: A valuable recommendation was to improve visual clarity by color-coding evidence and bolding key recommendations, as the current presentation felt dense.
𝗧𝗿𝗮𝗻𝘀𝗽𝗮𝗿𝗲𝗻𝗰𝘆 & 𝗣𝗿𝗼𝗴𝗿𝗲𝘀𝘀 𝗧𝗿𝗮𝗰𝗸𝗶𝗻𝗴: Users expressed a desire for more transparency around their learning journey and how mastery is being determined.
𝗨𝗻𝗱𝗲𝗿𝘀𝘁𝗮𝗻𝗱𝗶𝗻𝗴 𝗔𝗜 𝗥𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴: There was a clear interest in understanding how the AI arrived at its recommendations, emphasizing the need for explainability.
𝗪𝗵𝗮𝘁’𝘀 𝗡𝗲𝘅𝘁? 𝗜𝘁𝗲𝗿𝗮𝘁𝗶𝗼𝗻 & 𝗙𝘂𝗿𝘁𝗵𝗲𝗿 𝗘𝘅𝗽𝗹𝗼𝗿𝗮𝘁𝗶𝗼𝗻 🚀
This iterative cycle of building, testing, and learning is at the heart of rapid prototyping, and the feedback we’ve received has given us a fantastic roadmap for the next phase of development.