Ever watched ChatGPT type back to you word by word, like it’s actually thinking out loud? That’s streaming AI in action, and it makes web apps feel incredibly alive! Today, we’re building exactly that: a real-time AI chatbot web app where responses flow in instantly. No more staring at loading spinners! We’ll use FastAPI for lightning-fast backends, WebSockets for live chat magic, and PocketFlow to keep things organized. Ready to make your web app feel like a real conversation? You can find the complete code for this part in the FastAPI WebSocket Chat Cookbook.
1. Why Your AI Web App Should Stream (It’s a Game Changer!) 🚀
Picture this: You ask an AI a question, then… you wait. And wait. Finally, BOOM – a wall of text appears all at once. Feels clunky, right?
Now imagine this instead: You ask your question, and the AI starts “typing” back immediately – word by word, just like texting with a friend. That’s the magic of streaming for AI web apps.
Why streaming rocks: It feels lightning fast, keeps users engaged, and creates natural conversation flow. No more “is this thing broken?” moments!
We’re creating a live AI chatbot web app that streams responses in real-time. You’ll type a message, and watch the AI respond word by word, just like the pros do it.
Our toolkit:
- 🔧 FastAPI – Blazing fast Python web framework
- 🔧 WebSockets – The secret sauce for live, two-way chat
- 🔧 PocketFlow – Our LLM framework in 100 lines
Quick catch-up on our series:
- Part 1: Built command-line AI tools ✅
- Part 2: Created interactive web apps with Streamlit ✅
- Part 3 (You are here!): Real-time streaming web apps 🚀
- Part 4 (Coming next!): Background tasks for heavy AI work
Want to see streaming in action without the web complexity first? Check out our simpler guide: “Streaming LLM Responses — Tutorial For Dummies“.
Ready to make your AI web app feel like magic? Let’s dive in!
2. FastAPI + WebSockets = Real-Time Magic ⚡
To build our streaming chatbot, we need two key pieces: FastAPI for a blazing-fast backend and WebSockets for live, two-way chat.
FastAPI: Your Speed Demon Backend
FastAPI is like the sports car of Python web frameworks – fast, modern, and async-ready. Perfect for AI apps that need to handle multiple conversations at once.
Most web apps work like old-school mail: Browser sends request → Server processes → Sends back response → Done. Here’s a basic FastAPI example:
from fastapi import FastAPI
app = FastAPI()
@app.get("/hello")
async def say_hello():
return {"greeting": "Hi there!"}
What’s happening here?
-
app = FastAPI()
– Creates your web server -
@app.get("https://dev.to/hello")
– Says “when someone visits/hello
, run the function below” -
async def say_hello()
– The function that handles the request -
return {"greeting": "Hi there!"}
– Sends back JSON data to the browser
When you visit http://localhost:8000/hello
, you’ll see {"greeting": "Hi there!"}
in your browser!
Your First FastAPI App Flow:
Simple enough, but for chatbots we need something more interactive…
WebSockets: Live Chat Superpowers
WebSockets turn your web app into a live phone conversation. Instead of sending messages back and forth, you open a connection that stays live for instant back-and-forth chat.
Here’s a simple echo server that repeats whatever you say:
from fastapi import FastAPI, WebSocket
app = FastAPI()
@app.websocket("/chat")
async def chat_endpoint(websocket: WebSocket):
await websocket.accept() # Pick up the call!
while True:
message = await websocket.receive_text() # Listen
await websocket.send_text(f"You said: {message}") # Reply
The browser side is just as simple:
id="messageInput" placeholder="Say something..."/>
id="chatLog">
const ws = new WebSocket("ws://localhost:8000/chat");
const chatLog = document.getElementById('chatLog');
ws.onmessage = (event) => {
chatLog.innerHTML += `Server:
${event.data}`;
};
function sendMessage() {
const message = document.getElementById('messageInput').value;
ws.send(message);
chatLog.innerHTML += `You:
${message}`;
document.getElementById('messageInput').value = '';
}
WebSocket Chat Flow:
That’s it! You now have live, real-time communication between browser and server. Perfect foundation for our streaming AI chatbot!
3. Adding AI to the Mix: Why Async Matters 🤖
Great! We have live chat working. But here’s the thing: calling an AI like ChatGPT takes time (sometimes 3-5 seconds). If our server just sits there waiting, our whole web app freezes. Not good!
The problem: Normal code is like a single-lane road. When the AI is thinking, everything else stops.
The solution: Async code is like a highway with multiple lanes. While AI is thinking in one lane, other users can chat in other lanes!
PocketFlow Goes Async
Remember PocketFlow from our earlier tutorials? It helps break down complex tasks into simple steps. For web apps, we need the async version:
-
AsyncNode
– Each step can wait for AI without blocking others -
AsyncFlow
– Manages the whole conversation workflow
Here’s the magic difference:
# ❌ This blocks everything
def call_ai(message):
response = openai.chat.completions.create(...) # Everyone waits!
return response
# ✅ This lets others keep chatting
async def call_ai_async(message):
response = await openai.chat.completions.create(...) # Just this task waits
return response
Streaming Chat Node: The Star of the Show
Our StreamingChatNode
does three things:
- Prep: Add user message to chat history
- Execute: Call AI and stream response word-by-word via WebSocket
- Post: Save AI’s complete response to history
class StreamingChatNode(AsyncNode):
async def prep_async(self, shared):
# Add user message to history
history = shared.get("conversation_history", [])
history.append({"role": "user", "content": shared["user_message"]})
return history, shared["websocket"]
async def exec_async(self, prep_result):
messages, websocket = prep_result
# Stream AI response word by word
full_response = ""
async for chunk in stream_llm(messages):
full_response += chunk
await websocket.send_text(json.dumps({"content": chunk}))
return full_response
async def post_async(self, shared, prep_res, exec_res):
# Save complete AI response
shared["conversation_history"].append({
"role": "assistant",
"content": exec_res
})
That’s it! The node streams AI responses live while keeping chat history. Next, let’s see how this all connects together!
4. Putting It All Together: The Complete Streaming Flow 🔄
Time to connect all the pieces! Here’s how a user message flows through our streaming chatbot:
The Journey of a Message:
User sends message → FastAPI receives it → PocketFlow handles AI logic → Streams response back live!
The FastAPI WebSocket Handler
Here’s the main FastAPI code that ties everything together:
@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
await websocket.accept()
chat_memory = {
"websocket": websocket,
"conversation_history": []
}
try:
while True:
# Get user message
user_data = await websocket.receive_text()
message = json.loads(user_data) # {"content": "Hello!"}
chat_memory["user_message"] = message["content"]
# Run our PocketFlow
chat_flow = create_streaming_chat_flow()
await chat_flow.run_async(chat_memory)
except WebSocketDisconnect:
print("User left the chat")
def create_streaming_chat_flow():
return AsyncFlow(start_node=StreamingChatNode())
What happens:
- Accept WebSocket connection
- Wait for user messages in a loop
- For each message, run our
StreamingChatNode
- The node handles AI calling + streaming automatically!
Note: Each WebSocket connection gets its own chat_memory
dictionary with the live connection, latest message, and full conversation history. This lets each user have independent conversations while the AI remembers context.
Frontend: The Streaming Magic in JavaScript
On the browser side, we need just a few lines to make streaming work:
id="aiResponse">
id="userInput" placeholder="Type your message..."/>
const ws = new WebSocket("ws://localhost:8000/ws");
const aiResponse = document.getElementById("aiResponse");
// The magic: append each chunk as it arrives
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.content) {
aiResponse.textContent += data.content; // Stream word by word!
}
};
function sendMessage() {
const input = document.getElementById("userInput");
aiResponse.textContent = ""; // Clear for new response
ws.send(JSON.stringify({content: input.value}));
input.value = "";
}
The streaming happens in ws.onmessage
– each time the server sends a text chunk, we append it to the display. That’s how you get the “typing” effect!
Pretty neat, right? You now have all the pieces for a real-time streaming AI chatbot!
5. Mission Accomplished! You Built a Real-Time AI Chatbot 🎉
Boom! You just built a streaming AI chatbot web app that feels like magic. No more waiting around – your AI responds word by word, just like the pros!
What you crushed today:
- ⚡ FastAPI + WebSockets – Live, two-way chat that never gets old
- 🔄 Async PocketFlow – AI calls that don’t freeze your app
- 🚀 Streaming responses – Watch the AI “type” in real-time
You’ve officially joined the ranks of developers building modern, responsive AI web apps. Pretty cool, right?
What’s next in our series:
- Part 1: Command-line AI tools ✅
- Part 2: Interactive web apps with Streamlit ✅
- Part 3 (You just finished!): Real-time streaming ✅
- Part 4 (Coming up!): Background tasks for heavy AI work
Ready for the big leagues? Part 4 will tackle those marathon AI tasks – think generating reports or complex analyses that take minutes, not seconds. We’ll explore background processing and Server-Sent Events to keep users happy even during the heavy lifting.
Want to try this yourself? Grab the complete code from the PocketFlow cookbook: FastAPI WebSocket Chat Example You’re building some serious AI web development skills! See you in Part 4! 🚀