Software

6 minute read

Build an LLM Web App in Python from Scratch: Part 3 (FastAPI & WebSockets)

June 8, 2025

build-an-llm-web-app-in-python-from-scratch:-part-3-(fastapi-&-websockets)

Ever watched ChatGPT type back to you word by word, like it’s actually thinking out loud? That’s streaming AI in action, and it makes web apps feel incredibly alive! Today, we’re building exactly that: a real-time AI chatbot web app where responses flow in instantly. No more staring at loading spinners! We’ll use FastAPI for lightning-fast backends, WebSockets for live chat magic, and PocketFlow to keep things organized. Ready to make your web app feel like a real conversation? You can find the complete code for this part in the FastAPI WebSocket Chat Cookbook.

1. Why Your AI Web App Should Stream (It’s a Game Changer!) 🚀

Picture this: You ask an AI a question, then… you wait. And wait. Finally, BOOM – a wall of text appears all at once. Feels clunky, right?

Now imagine this instead: You ask your question, and the AI starts “typing” back immediately – word by word, just like texting with a friend. That’s the magic of streaming for AI web apps.

Why streaming rocks: It feels lightning fast, keeps users engaged, and creates natural conversation flow. No more “is this thing broken?” moments!

We’re creating a live AI chatbot web app that streams responses in real-time. You’ll type a message, and watch the AI respond word by word, just like the pros do it.

Our toolkit:

🔧 FastAPI – Blazing fast Python web framework
🔧 WebSockets – The secret sauce for live, two-way chat
🔧 PocketFlow – Our LLM framework in 100 lines

Quick catch-up on our series:

Part 1: Built command-line AI tools ✅
Part 2: Created interactive web apps with Streamlit ✅
Part 3 (You are here!): Real-time streaming web apps 🚀
Part 4 (Coming next!): Background tasks for heavy AI work

Want to see streaming in action without the web complexity first? Check out our simpler guide: “Streaming LLM Responses — Tutorial For Dummies“.

Ready to make your AI web app feel like magic? Let’s dive in!

2. FastAPI + WebSockets = Real-Time Magic ⚡

To build our streaming chatbot, we need two key pieces: FastAPI for a blazing-fast backend and WebSockets for live, two-way chat.

FastAPI: Your Speed Demon Backend

FastAPI is like the sports car of Python web frameworks – fast, modern, and async-ready. Perfect for AI apps that need to handle multiple conversations at once.

Most web apps work like old-school mail: Browser sends request → Server processes → Sends back response → Done. Here’s a basic FastAPI example:

from fastapi import FastAPI
app = FastAPI()

@app.get("/hello")
async def say_hello():
    return {"greeting": "Hi there!"}

What’s happening here?

app = FastAPI() – Creates your web server
@app.get("https://dev.to/hello") – Says “when someone visits /hello, run the function below”
async def say_hello() – The function that handles the request
return {"greeting": "Hi there!"} – Sends back JSON data to the browser

When you visit http://localhost:8000/hello, you’ll see {"greeting": "Hi there!"} in your browser!

Your First FastAPI App Flow:

Simple enough, but for chatbots we need something more interactive…

WebSockets: Live Chat Superpowers

WebSockets turn your web app into a live phone conversation. Instead of sending messages back and forth, you open a connection that stays live for instant back-and-forth chat.

Here’s a simple echo server that repeats whatever you say:

from fastapi import FastAPI, WebSocket

app = FastAPI()

@app.websocket("/chat")
async def chat_endpoint(websocket: WebSocket):
    await websocket.accept() # Pick up the call!
    while True:
        message = await websocket.receive_text() # Listen
        await websocket.send_text(f"You said: {message}") # Reply

The browser side is just as simple:

 id="messageInput" placeholder="Say something..."/>
 onclick="sendMessage()">Send
 id="chatLog">

WebSocket Chat Flow:

That’s it! You now have live, real-time communication between browser and server. Perfect foundation for our streaming AI chatbot!

3. Adding AI to the Mix: Why Async Matters 🤖

Great! We have live chat working. But here’s the thing: calling an AI like ChatGPT takes time (sometimes 3-5 seconds). If our server just sits there waiting, our whole web app freezes. Not good!

The problem: Normal code is like a single-lane road. When the AI is thinking, everything else stops.

The solution: Async code is like a highway with multiple lanes. While AI is thinking in one lane, other users can chat in other lanes!

PocketFlow Goes Async

Remember PocketFlow from our earlier tutorials? It helps break down complex tasks into simple steps. For web apps, we need the async version:

AsyncNode – Each step can wait for AI without blocking others
AsyncFlow – Manages the whole conversation workflow

Here’s the magic difference:

# ❌ This blocks everything
def call_ai(message):
    response = openai.chat.completions.create(...)  # Everyone waits!
    return response

# ✅ This lets others keep chatting
async def call_ai_async(message):
    response = await openai.chat.completions.create(...)  # Just this task waits
    return response

Streaming Chat Node: The Star of the Show

Our StreamingChatNode does three things:

Prep: Add user message to chat history
Execute: Call AI and stream response word-by-word via WebSocket
Post: Save AI’s complete response to history

class StreamingChatNode(AsyncNode):
    async def prep_async(self, shared):
        # Add user message to history
        history = shared.get("conversation_history", [])
        history.append({"role": "user", "content": shared["user_message"]})
        return history, shared["websocket"]

    async def exec_async(self, prep_result):
        messages, websocket = prep_result

        # Stream AI response word by word
        full_response = ""
        async for chunk in stream_llm(messages):
            full_response += chunk
            await websocket.send_text(json.dumps({"content": chunk}))

        return full_response

    async def post_async(self, shared, prep_res, exec_res):
        # Save complete AI response
        shared["conversation_history"].append({
            "role": "assistant", 
            "content": exec_res
        })

That’s it! The node streams AI responses live while keeping chat history. Next, let’s see how this all connects together!

4. Putting It All Together: The Complete Streaming Flow 🔄

Time to connect all the pieces! Here’s how a user message flows through our streaming chatbot:

The Journey of a Message:

User sends message → FastAPI receives it → PocketFlow handles AI logic → Streams response back live!

The FastAPI WebSocket Handler

Here’s the main FastAPI code that ties everything together:

@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket):
    await websocket.accept()
    chat_memory = {
        "websocket": websocket, 
        "conversation_history": []
    }

    try:
        while True:
            # Get user message
            user_data = await websocket.receive_text()
            message = json.loads(user_data)  # {"content": "Hello!"}
            chat_memory["user_message"] = message["content"]

            # Run our PocketFlow
            chat_flow = create_streaming_chat_flow()
            await chat_flow.run_async(chat_memory)

    except WebSocketDisconnect:
        print("User left the chat")

def create_streaming_chat_flow():
    return AsyncFlow(start_node=StreamingChatNode())

What happens:

Accept WebSocket connection
Wait for user messages in a loop
For each message, run our StreamingChatNode
The node handles AI calling + streaming automatically!

Note: Each WebSocket connection gets its own chat_memory dictionary with the live connection, latest message, and full conversation history. This lets each user have independent conversations while the AI remembers context.

Frontend: The Streaming Magic in JavaScript

On the browser side, we need just a few lines to make streaming work:

 id="aiResponse">


 id="userInput" placeholder="Type your message..."/>
 onclick="sendMessage()">Send

The streaming happens in ws.onmessage – each time the server sends a text chunk, we append it to the display. That’s how you get the “typing” effect!

Pretty neat, right? You now have all the pieces for a real-time streaming AI chatbot!

5. Mission Accomplished! You Built a Real-Time AI Chatbot 🎉

Boom! You just built a streaming AI chatbot web app that feels like magic. No more waiting around – your AI responds word by word, just like the pros!

What you crushed today:

⚡ FastAPI + WebSockets – Live, two-way chat that never gets old
🔄 Async PocketFlow – AI calls that don’t freeze your app
🚀 Streaming responses – Watch the AI “type” in real-time

You’ve officially joined the ranks of developers building modern, responsive AI web apps. Pretty cool, right?

What’s next in our series:

Part 1: Command-line AI tools ✅
Part 2: Interactive web apps with Streamlit ✅
Part 3 (You just finished!): Real-time streaming ✅
Part 4 (Coming up!): Background tasks for heavy AI work

Ready for the big leagues? Part 4 will tackle those marathon AI tasks – think generating reports or complex analyses that take minutes, not seconds. We’ll explore background processing and Server-Sent Events to keep users happy even during the heavy lifting.

Want to try this yourself? Grab the complete code from the PocketFlow cookbook: FastAPI WebSocket Chat Example You’re building some serious AI web development skills! See you in Part 4! 🚀

Waterfall-Model software engineering. Does it still make sense nowadays?

June 7, 2025

Quality Assurance

How AI-Powered Robotic Automation Impacts Manufacturing Quality

June 8, 2025

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Hand-Picked Top-Read Stories

Determining Settling Time in Measurement Systems – An Analytical Approach

Bootstrapping Towards $100MM and a $1.7B Valuation

👹 Longest Binary Subsequence K – LeetCode 2311 (C++ | JavaScript | Python )

Trending Tags

Build an LLM Web App in Python from Scratch: Part 3 (FastAPI & WebSockets)

1. Why Your AI Web App Should Stream (It’s a Game Changer!) 🚀

2. FastAPI + WebSockets = Real-Time Magic ⚡

FastAPI: Your Speed Demon Backend

WebSockets: Live Chat Superpowers

3. Adding AI to the Mix: Why Async Matters 🤖

PocketFlow Goes Async

Streaming Chat Node: The Star of the Show

4. Putting It All Together: The Complete Streaming Flow 🔄

The FastAPI WebSocket Handler

Frontend: The Streaming Magic in JavaScript

5. Mission Accomplished! You Built a Real-Time AI Chatbot 🎉

Leave a Reply Cancel reply

Previous Post

Waterfall-Model software engineering. Does it still make sense nowadays?

Next Post

How AI-Powered Robotic Automation Impacts Manufacturing Quality

Build an LLM Web App in Python from Scratch: Part 3 (FastAPI & WebSockets)

1. Why Your AI Web App Should Stream (It’s a Game Changer!) 🚀

2. FastAPI + WebSockets = Real-Time Magic ⚡

FastAPI: Your Speed Demon Backend

WebSockets: Live Chat Superpowers

3. Adding AI to the Mix: Why Async Matters 🤖

PocketFlow Goes Async

Streaming Chat Node: The Star of the Show

4. Putting It All Together: The Complete Streaming Flow 🔄

The FastAPI WebSocket Handler

Frontend: The Streaming Magic in JavaScript

5. Mission Accomplished! You Built a Real-Time AI Chatbot 🎉

Leave a Reply Cancel reply

Previous Post

Next Post

Related Posts