Software

3 minute read

Building a Real-Time Voice Assistant with Local LLMs on a Raspberry Pi

March 1, 2025

building-a-real-time-voice-assistant-with-local-llms-on-a-raspberry-pi

Introduction

In this document, I’m sharing my journey of turning a Raspberry Pi into a powerful, real-time voice assistant. The goal was to:

Capture voice input through a web interface.
Process the text using a local LLM (like Mistral) running on the Pi.
Generate voice responses using Piper for text-to-speech (TTS).
Stream everything in real-time via WebSockets.

All of this runs offline on the Raspberry Pi — no cloud services involved. Let’s dive into how I built it step by step!

1. Setting up the Raspberry Pi

First, I set up my Raspberry Pi with the latest Raspberry Pi OS. It’s important to enable hardware interfaces and connect a USB microphone and speaker.

Steps:

Update the system:

   sudo apt-get update
   sudo apt-get upgrade

Enable the audio interface:

   sudo raspi-config

Navigate to System Options > Audio and select the correct output/input device.

2. Installing Ollama for Local LLMs

Ollama makes it easy to run local LLMs like Mistral on your Raspberry Pi. I installed it using:

curl -fsSL https://ollama.com/install.sh | sh

Once installed, I pulled the Mistral model:

ollama pull mistral

To confirm it works, I ran a quick test:

ollama run mistral

The model was ready to process text right on the Pi!

3. Setting up Piper for Text-to-Speech (TTS)

For offline voice generation, I chose Piper — a fantastic open-source TTS engine.

Install dependencies:

   sudo apt-get install wget build-essential libsndfile1

Download Piper for ARM64 (Raspberry Pi):

   wget https://github.com/rhasspy/piper/releases/download/v1.0.0/piper_arm64.tar.gz
   tar -xvzf piper_arm64.tar.gz
   chmod +x piper
   sudo mv piper /usr/local/bin/

Test if Piper works:

   echo "Hello, world!" | piper --model en_US --output_file output.wav
   aplay output.wav

Now the Pi could “talk” back!

4. Creating the Backend (Node.js)

I built a simple Node.js server to:

Accept text from the client (voice input from a web app).
Process it using Mistral (via Ollama).
Convert the LLM response to speech with Piper.
Stream the audio back to the client.

server.js:

const express = require('express');
const { exec } = require('child_process');
const WebSocket = require('ws');

const app = express();
const PORT = 3001;

// WebSocket setup
const wss = new WebSocket.Server({ port: 3002 });

wss.on('connection', (ws) => {
  console.log('Client connected');

  ws.on('message', (message) => {
    console.log('Received:', message);

    // Run Mistral LLM
    exec(`ollama run mistral "${message}"`, (err, stdout) => {
      if (err) {
        console.error('LLM error:', err);
        ws.send('Error processing your request.');
        return;
      }

      // Convert LLM response to speech using Piper
      exec(`echo "${stdout}" | piper --model en_US --output_file output.wav`, (ttsErr) => {
        if (ttsErr) {
          console.error('Piper error:', ttsErr);
          ws.send('Error generating speech.');
          return;
        }

        // Send the audio file back to the client
        ws.send(JSON.stringify({ text: stdout, audio: 'output.wav' }));
      });
    });
  });
});

app.listen(PORT, () => {
  console.log(`Server running at http://localhost:${PORT}`);
});

5. Building the Real-Time Web Interface (React)

For the frontend, I created a simple React app to:

Record voice input.
Display real-time text responses.
Play the generated speech audio.

App.js:

import React, { useState } from 'react';

function App() {
  const [text, setText] = useState('');
  const [response, setResponse] = useState('');
  const [audio, setAudio] = useState(null);

  const ws = new WebSocket('ws://localhost:3002');

  const handleSend = () => {
    ws.send(text);
  };

  ws.onmessage = (event) => {
    const data = JSON.parse(event.data);
    setResponse(data.text);

    fetch(`http://localhost:3001/${data.audio}`)
      .then(res => res.blob())
      .then(blob => {
        setAudio(URL.createObjectURL(blob));
      });
  };

  return (
    <div>
      <h1>Voice Assistant</h1>
      <textarea value={text} onChange={(e) => setText(e.target.value)} />
      <button onClick={handleSend}>Send</button>
      <h2>Response:</h2>
      <p>{response}</p>
      {audio && <audio controls src={audio} />}
    </div>
  );
}

export default App;

6. Running the Project

Once the backend and frontend were ready, I launched both:

Start the backend:

  node server.js

Run the React app:

  npm start

I accessed the web app on my Raspberry Pi’s IP at port 3000 and spoke into the mic — and voilà! The assistant responded in real-time, all processed locally.

Conclusion

Building a real-time, fully offline voice assistant on a Raspberry Pi was an exciting challenge. With:

Ollama for running local LLMs (like Mistral)
Piper for high-quality text-to-speech
WebSockets for real-time communication
React for a smooth web interface

… I now have a personalized voice AI that works without relying on the cloud.

What is Index Bloat? — Whiteboard Friday

March 1, 2025

Software

10 CSS Tricks Every Frontend Developer Should Know

March 1, 2025

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Hand-Picked Top-Read Stories

Machine Vision Lighting Solutions for Unwanted Glare

I Fine Tuned an Open Source Model and the Bhagavad Gita Explained It Better Than Any Paper

What STEM Professionals Should Know About EB1A Self-Petition in 2026

Trending Tags

Building a Real-Time Voice Assistant with Local LLMs on a Raspberry Pi

Introduction

1. Setting up the Raspberry Pi

2. Installing Ollama for Local LLMs

3. Setting up Piper for Text-to-Speech (TTS)

4. Creating the Backend (Node.js)

5. Building the Real-Time Web Interface (React)

6. Running the Project

Conclusion

Leave a Reply Cancel reply

Previous Post

What is Index Bloat? — Whiteboard Friday

Next Post

10 CSS Tricks Every Frontend Developer Should Know

Building a Real-Time Voice Assistant with Local LLMs on a Raspberry Pi

Introduction

1. Setting up the Raspberry Pi

2. Installing Ollama for Local LLMs

3. Setting up Piper for Text-to-Speech (TTS)

4. Creating the Backend (Node.js)

5. Building the Real-Time Web Interface (React)

6. Running the Project

Conclusion

Leave a Reply Cancel reply

Previous Post

Next Post

Related Posts