Building Your Own Web Server — Part 4: Single-threaded non-blocking server

building-your-own-web-server-—-part-4:-single-threaded-non-blocking-server

All articles in this series

  1. Building Your Own Web Server: Part 1 — Theory and Foundations
  2. Building Your Own Web Server: Part 2 — Plan and Implementation of HTTP and Configuration parser
  3. Building Your Own Web Server — Part 3: Blocking Single and Multithreaded Server
  4. Building Your Own Web Server — Part 4: Single-threaded non-blocking server

Preface

Hello friend ,

if you are reading these lines it means that you achieved the 4th part of our series where we are trying to understand how web servers work under the hood, what challenges it faces, what approaches and concepts are used, and what pros and cons they have. If you are already here, you already learned or refreshed so many things: structure of http protocol, threads in Linux, lexical analyse, sockets, basics of networking and servers. It is already much more than some programmers know. I’m very proud of you! And it is an honour to make this journey with you. So to sum up, we’ve already gone through many things together in the first three articles:

  • what is http, and how it works
  • which features and version it has
  • how we can handle http requests and send responses
  • how we can configure server and setup routing
  • how we can crash the whole server just with one persistent connection by using one blocking thread
  • how we can gobble all memory in our machine up with multithreaded server

But it was just preparation, just a basis to be ready for the final and the most sophisticated topic.

Do you want to meet the final boss? 😎 Let’s go.

Intro

Let me remind you. Last article: we finished with implementation of the multithreaded web server.

And also we compared it performance to single threaded server. And we have noticed that not everything is so good with that approach. Threads required a lot of additional time and recourses and it affect our performance. And this “non-blocking” multithreaded approach… actually isn’t really non-blocking 😅.

💡 recv() still block every thread and consume a lot of time and resources for nothing.

But is there a way to use benefits of every of this approach and avoid problems follows to them?

Can we combine simplicity of single thread and avoid blocking? Can we do not waste a cpu time for fruitless waiting? And can we still have opportunity to handle modern loading and scaling.

Hopefully yes, and answer is I/O event notification mechanism!

Let’s talk about it more.

So… what is I/O event notification mechanism and epoll?

To explain it more visually, let’s try to find a nice analogy. Let’s imagine that you are a mail assistant in a big apartment building and responsible for managing hundreds of mailboxes of every apartment. You have to pick up the mail every time it arrives in any mailbox and reply on it.

With our first implementation with a blocking single threaded server your workflow would look like this.

First implementation

You are standing on the main entrance of the building and waiting for the post man, when he comes and brings the first mail, you immediately pick it up, write an answer, and then just staying and waiting for the new mail for the same flat. And if postman comes with mails for other flat you just sending him away and saying that you are busy, and waiting for only one special mail for your first flat.

Second implementation

Multithreaded approach looks very similar, but instead of that you are only one persona who should manage mailing of every apartment, we have a personal assistance for every flat, which is most of the time just waiting for the new post for his flat. Imagine how expensive and inefficient that is.

New approach

Now let’s imagine that you do not need to wait for the postman or check boxes or wait for particular post. You just comfortable sitting in you office in the building and every time when postman has some messages for any flat in your house he brings them directly to you. He ring in your bell and live messages in your postbox. So you can just pick them up, and reply on every. In that way your resources and efforts are used only for usefull work and only when they are needed. This is how I/O event notification mechanism works. And this mechanism is implemented by epoll system calls on Linux.

⚡ epoll in action

Let’s look at what this approach looks like in code.

When using epoll, you don’t check every socket in a loop. Instead, you wait once — and the kernel gives you a list of sockets that are ready. You simply pick them up and handle them.

events = epoll_wait()
for each ready_socket in events:
    read data

It’s simple. You don’t waste time checking sockets that have nothing to say. You just process the ones that are ready — and go back to waiting.

This is fundamentally different from blocking or even multithreaded models:

  • In the blocking version, you could only wait on one socket at a time.
  • In the multithreaded version, you created a thread per socket and they all waited separately (wasting resources).
  • But with epoll, one thread can wait for thousands of sockets and act only when something meaningful happens.

This is why it’s so efficient — and why it’s used in high-performance servers like NGINX and runtimes like Node.js and Python’s async ecosystem.

How it works under the hood

Let’s now step away from the analogy and look at how epoll actually works at the system level.

  • epoll is a Linux kernel feature designed for I/O event notification.
  • It allows a single thread to monitor thousands of file descriptors — sockets, pipes, files — without constantly looping over them.
  • Instead of asking each socket “Are you ready?”, you simply register them once, and the kernel will inform you when any of them is ready to be read from or written to.

This saves CPU time, avoids unnecessary system calls, and lets you scale your server efficiently.

Triggering modes:

There are two ways you can receive notifications from epoll, and they behave differently:

  • Level-triggered (default):

    The kernel keeps notifying you as long as a socket remains in a ready state.

    This is simpler to implement and works well in most cases.

  • Edge-triggered:

    You get notified only once — the moment a socket becomes ready. If you don’t read all available data immediately, you won’t get notified again.

    This mode is more efficient, but it requires extra care: you must drain all the data (or write as much as possible) in one go, or you might miss future events.

💡 In our Python implementation, we’ll stick with level-triggered behavior — it’s easier to reason about and already gives us great performance.

Typical usage flow

Let’s quickly walk through how epoll is typically used in practice. Whether in C or Python (via selectors), the core steps are the same:

  1. Create an epoll instance

    This is your central “inbox” — a place where all events will be collected.

  2. Register sockets you’re interested in

    You tell the kernel: “Please watch this socket and notify me when it’s ready to read or write.”

  3. Call epoll_wait() (or selector.select() in Python)

    Your program now pauses, efficiently waiting. It doesn’t consume CPU cycles — it just sleeps until something meaningful happens.

  4. Handle only the sockets that reported an event

    When epoll_wait() returns, it gives you a list of sockets that are ready.

    You process each one: read the data, write the response, or close the connection — then go back to waiting.

🧘‍♂️ The beauty of this approach is that your server stays completely calm and efficient — only acting when there’s real work to do.

📖 Want more depth? Here’s the official man page:

🔗 https://man7.org/linux/man-pages/man7/epoll.7.html

How does Nginx use it?

Nginx non-blocking

I encourage you to read this amazing official article from Nginx about their architecture. They describe nicely not only how the non blocking approach work for them. But evolution and the whole architecture of the fastest server in the world.

I will add just a few takes away from the article about main concept of non-blicking approach:

Think of the state machine like the rules for chess. Each HTTP transaction is a chess game. On one side of the chessboard is the web server – a grandmaster who can make decisions very quickly. On the other side is the remote client – the web browser that is accessing the site or application over a relatively slow network.

However, the rules of the game can be very complicated. For example, the web server might need to communicate with other parties (proxying to an upstream application) or talk to an authentication server. Third‑party modules in the web server can even extend the rules of the game.

NGINX is a True Grandmaster

Perhaps you’ve heard of simultaneous exhibition games, where one chess grandmaster plays dozens of opponents at the same time?

That’s how an NGINX worker process plays “chess.” Each worker (remember – there’s usually one worker for each CPU core) is a grandmaster that can play hundreds (in fact, hundreds of thousands) of games simultaneously.

  1. The worker waits for events on the listen and connection sockets.
  2. Events occur on the sockets and the worker handles them:
    • An event on the listen socket means that a client has started a new chess game. The worker creates a new connection socket.
    • An event on a connection socket means that the client has made a new move. The worker responds promptly.

A worker never blocks on network traffic, waiting for its “opponent” (the client) to respond. When it has made its move, the worker immediately proceeds to other games where moves are waiting to be processed, or welcomes new players in the door.

I do not think if it makes sense to try me to rephrase or summaries the whole article. It is written too eloquent to try to replace it with something else. Therefore I truly recommend to check it out and enjoy it. 😉

What does it have common with Node.js?

There is also one very popular technology which in the end uses the same approach- Node.js.

Node.js is built on libuv, a cross-platform library that provides an event loop and non-blocking I/O. On Linux, libuv internally uses epoll to efficiently manage multiple I/O operations.

That means when your Node.js app listens for network connections, reads files, or waits for timers — it doesn’t block. Instead, all those events are registered with epoll, and Node.js reacts only when something is ready, just like we described earlier.

This is how Node.js achieves high concurrency in a single thread — exactly the same concept we’re exploring in our server.

I can’t avoid adding this beautiful article with nice visualisation of event loop and role of epoll in it:

https://medium.com/preezma/node-js-event-loop-architecture-go-deeper-node-core-c96b4cec7aa4. It is absolutely a side topic. But if you are interested in it, I recommend you to read it. It gives you a deep understanding of one more technology. Consider it as 1+1 sale, two items with price of one 😉

How does Python use epoll?

Now that we understand what epoll is and why it’s powerful, you might wonder:

“Do I need to use low-level C syscalls to access it?”

The answer is: not at all — Python gives us a clean, high-level way to use epoll through its selectors module.

The idea

Python wraps the system’s best available I/O strategy — whether it’s select, poll, kqueue, or epoll — and gives us a consistent interface. On Linux, this wrapper uses epoll under the hood.

So when we write:

import selectors
sel = selectors.DefaultSelector()

Python automatically chooses the best mechanism for us. On Linux, this is:


That means we’re already using epoll — without doing anything special.

Why is this great?

It gives us the power of epoll (massively scalable I/O) without needing to dive into C APIs or system-level programming. You just write Python code that says:

  • “Watch this socket and tell me when it’s ready.”
  • Then react only when something happens.

Simple example:

sock.setblocking(False)
sel.register(sock, selectors.EVENT_READ, data=...)

Later in your event loop:

events = sel.select(timeout=1)
for key, mask in events:
    handle(key.fileobj)

How can we use this approach in our server?

Now that we’ve seen how epoll works — both conceptually and in other technologies like NGINX and Node.js — let’s bring all of that to life in our own server.

Instead of using threads, we’ll build a single-threaded, event-driven server using Python’s selectors module, which wraps around epoll (on Linux) or the best available alternative on other systems.

We’ll still support everything we’ve done before: parsing requests, routing, handling static files, and supporting keep-alive — but now, all of it will happen inside one lightweight, non-blocking event loop.

Take a deep breath. The code might look longer, but it’s structured and commented for clarity. And once you see how it works, you’ll understand why this model powers some of the fastest servers in the world.

Let’s build it.

import socket
import selectors
from typing import Tuple, Optional
from config_parser import load_config, ServerConfig
from http_parser import HTTPParser, HTTPMessage

# -------------------------
# Route Matching Logic
# -------------------------

class RouteMatcher:
    @staticmethod
    def match_location(locations, uri: str):
        # Find the longest prefix-matching location block
        matched_location = None
        longest_prefix = -1
        for path, root_dir in locations.items():
            if uri.startswith(path) and len(path) > longest_prefix:
                matched_location = root_dir
                longest_prefix = len(path)
        return matched_location

# -------------------------
# Data Buffer
# -------------------------

class DataProvider:
    def __init__(self):
        self._data = b""

    @property
    def data(self) -> bytes:
        return self._data

    @data.setter
    def data(self, chunk: bytes):
        # Append new data to the buffer
        self._data += chunk

    def reduce_data(self, size: int):
        # Remove processed data from the buffer
        self._data = self._data[size:]

# -------------------------
# Message Processor
# -------------------------

class HTTPProcessor:
    def __init__(self, data_provider: DataProvider):
        self.data_provider = data_provider

    def get_one_http_message(self) -> Optional[HTTPMessage]:
        try:
            # Attempt to parse a single HTTP message from the buffer
            message, consumed = HTTPParser.parse_message(self.data_provider.data)
            if message:
                # Remove the parsed message from the buffer
                self.data_provider.reduce_data(consumed)
            return message
        except Exception:
            return None

# -------------------------
# Server Entrypoint
# -------------------------

class Server:
    """
    Main server class. Reads config, binds to the correct port, and handles requests.
    """

    # ===================================================
    # Main epoll-based server loop:
    # 1. Create a non-blocking server socket.
    # 2. Register it with a selector to watch for incoming connections.
    # 3. Enter the event loop:
    #     a. Wait for events using selector.select().
    #     b. If the event is on the server socket, accept a new client.
    #     c. If the event is on a client socket, read data and respond.
    # 4. Parse requests and route them using our config and HTTP parser.
    # 5. Serve files or respond with a 404.
    # ===================================================

    def __init__(self, config_path: str):
        self.config = load_config(config_path)
        self.selector = selectors.DefaultSelector()

    def start(self):
        port = self.config.listen_ports[0]
        server_sock = socket.socket()
        server_sock.bind(("", port))
        server_sock.listen()
        server_sock.setblocking(False)
        # Register the server socket to accept new connections
        self.selector.register(server_sock, selectors.EVENT_READ, data=None)
        print(f"[Server] Listening on port {port}")

        try:
            while True:
                # Wait for events on registered sockets
                events = self.selector.select(timeout=1)
                # Iterate over the events returned by the selector
                # Each event corresponds to a socket ready for I/O
                for key, mask in events:
                    if key.data is None:
                        # Accept a new incoming connection from a client.
                        self._accept_connection(key.fileobj)
                    else:
                        # Handle incoming data from a client socket.
                        self._service_connection(key, mask)
        except KeyboardInterrupt:
            print("[Server] Shutting down")
        finally:
            # Clean up resources on shutdown
            self.selector.close()
            server_sock.close()

    def _accept_connection(self, sock):
        # Accept a new incoming connection from a client.
        # Set it to non-blocking and register it with the selector.
        conn, addr = sock.accept()
        print(f"[Server] Accepted connection from {addr}")
        conn.setblocking(False)
        # Create a new data provider for the connection
        data_provider = DataProvider()
        # Register the connection for reading
        self.selector.register(conn, selectors.EVENT_READ, data=data_provider)

    def _service_connection(self, key, mask):
        # Handle incoming data from a client socket.
        # Read data and trigger request processing if data is received.
        sock = key.fileobj
        data_provider = key.data
        addr = sock.getpeername()
        if mask & selectors.EVENT_READ:
            try:
                # Read data from the socket
                data = sock.recv(1024)
            except ConnectionResetError:
                data = None
            if data:
                # Add received data to the buffer
                data_provider.data = data
                self._handle_request(sock, data_provider)
            else:
                # Close the connection if no data is received
                print(f"[Server] Closing connection to {addr}")
                self.selector.unregister(sock)
                sock.close()

    def _handle_request(self, sock, data_provider):
        # Parse the buffered data into an HTTP request message.
        http_processor = HTTPProcessor(data_provider)
        while request := http_processor.get_one_http_message():
            # Determine the correct file path based on requested URL.
            url = request.url
            root = 'html'  # Default root directory
            if url == "/":
                url = "/index.html"
            else:
                # Match the location block for the requested URL
                root = RouteMatcher.match_location(self.config.routes[self.config.listen_ports[0]], url)

            file_path = f"{root}{url}"
            print(f"[Request] {url} => {file_path}")

            # Try to read and serve the requested file.
            try:
                with open(file_path, "rb") as f:
                    body = f.read()
                headers = (
                    "HTTP/1.1 200 OKrn"
                    f"Content-Length: {len(body)}rn"
                    "Content-Type: text/plainrn"
                )
                # Check if connection should be kept alive or closed.
                if "keep-alive" in request.headers.get("connection", "").lower():
                    headers += "Connection: keep-alivern"
                else:
                    # Close the connection if not keep-alive
                    self.selector.unregister(sock)
                    sock.close()
                    return
                headers += "rn"
                sock.sendall(headers.encode() + body)
            # If something goes wrong (e.g., file not found), send a 404.
            except Exception as e:
                print(f"[Error] {e}")
                self._send_404(sock)
                self.selector.unregister(sock)
                sock.close()
                return

    def _send_404(self, sock):
        # Helper method to respond with a 404 Not Found status.
        msg = b"404 Not Found"
        headers = (
            "HTTP/1.1 404 Not Foundrn"
            f"Content-Length: {len(msg)}rn"
            "Content-Type: text/plainrnrn"
        )
        sock.sendall(headers.encode() + msg)

# -------------------------
# Start Server
# -------------------------

if __name__ == "__main__":
    # Load the server configuration and start the server
    server = Server("config.conf")
    server.start()

Please, find the full implementation here.

When I implemented this approach, I was so excited to measure it performance. I believe you are as well 😉 Let’s check it out:

📊 Benchmark Results (All 4 Scenarios)

# Scenario Command Requests/sec Time
1 Single request, no keep-alive ab -n 100000 9897.79 10.103
2 Single request, keep-alive ab -n 100000 -k 18247.78 5.480
3 Concurrent (50x), no keep-alive ab -n 100000 -c 50 9887.54 10.114
4 Concurrent (50x), with keep-alive ab -n 100000 -c 50 -k 27261.97 3.668

📊 Performance Comparison Table

Scenario Command Single-threaded Multithreaded Server Non-blocking epoll
1. Single request, no keep-alive ab -n 100000 10308.65 req/sec 7871.07 req/sec 9,897.79 req/sec
2. Single request, with keep-alive ab -n 100000 -k 19351.89 req/sec 19139.16 req/sec 18,247.78 req/sec
3. Concurrent (50x), no keep-alive ab -n 100000 -c 50 22733.18 req/sec 7182.58 req/sec 9,887.54 req/sec
4. Concurrent (50x), with keep-alive ab -n 100000 -c 50 -k ❌ Crashed 13777.80 req/sec 27,261.97 req/sec

🧪 Interpreting the Benchmarks — What do they really tell us?

The numbers we’ve seen aren’t just data points — they reveal how each architecture behaves under real-world pressure.

Here’s the story behind the stats:

🧵 Multithreaded server

At first glance, it might seem like the multithreaded version should dominate — more threads, more parallelism, right?

But the reality is more nuanced. Each thread adds overhead: memory, context switching, and blocking on recv() calls. That overhead starts to hurt, especially when connection count rises.

So while multithreading helped us avoid complete blocking, it didn’t scale efficiently.

🚫 Naive single-threaded server

Surprisingly, the original single-threaded version performed well in some cases — particularly under low concurrency.

Why? It’s simple: there’s almost no overhead. But the moment multiple concurrent connections pile up, this model hits its limits. It blocks on one socket and ignores the rest. That’s why it crashes in the final scenario.

⚡ Epoll-based non-blocking server

This is where things change.

Under high concurrency with keep-alive (-c 50 -k), this version soars past the others — 27,000+ requests/sec. That’s not a small jump. That’s architectural advantage in action.

Why does it work so well?

  • We wait only when we must.
  • We act only when sockets are ready.
  • We scale without threads.

This version does more work with fewer resources, and it shows.

So yes — non-blocking I/O isn’t just a theoretical advantage. It’s a practical performance booster. Especially in the kind of high-connection environments modern servers live in.

This is the power of event-driven architecture. And now you’ve seen it, measured it, and built it with your own hands. 💪

🎯 Final Thoughts

So… we made it.

We started with the most naive blocking approach, hit the limits of multithreading, and finally built something truly powerful — a non-blocking, single-threaded server that can handle tens of thousands of connections with grace.

Looking at the benchmarks, the difference is clear.

It’s not just about raw numbers — it’s about how we use our machine’s resources.

  • No wasted CPU time.
  • No useless threads sleeping in the background.
  • Just one calm loop, doing real work only when needed.

This architecture isn’t just a toy. It’s how real-world servers like NGINX, Node.js, and many modern async frameworks work. And now you’ve built your own version of it — from scratch.

That’s a huge achievement. Be proud of it.

And if it feels like magic — it kind of is. But it’s the kind of magic you now understand deeply.

This was the last topic I planned to cover in that series. And the planned articles are complete. But it does not mean that the topic is exhaustive. There are still so many concepts, features and topics which we partially mentioned during this series and which exist in modern web.

But where do we go from here? Should we build a proxy mode? Serve files smarter? Add SSL? Or maybe even… write our own async framework?

Now it totally depends on you. Please share your ideas and let me know what you’d like to build next.

Please, let me know. Because this project — just like the web — is full of new paths to explore.🚀

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post
how-bain-&-company-adapted-its-product-strategy-for-ai:-kasia-mrowca-(director-of-product)-— product-unplugged

How Bain & Company adapted its product strategy for AI: Kasia Mrowca (Director of Product) — Product Unplugged

Next Post
america-makes-powers-the-next-gen-supply-chain-with-visionary-workforce-initiatives

America Makes Powers the Next-Gen Supply Chain with Visionary Workforce Initiatives

Related Posts