Building a Real-Time Voice Assistant with Local LLMs on a Raspberry Pi

building-a-real-time-voice-assistant-with-local-llms-on-a-raspberry-pi

Introduction

In this document, I’m sharing my journey of turning a Raspberry Pi into a powerful, real-time voice assistant. The goal was to:

  • Capture voice input through a web interface.
  • Process the text using a local LLM (like Mistral) running on the Pi.
  • Generate voice responses using Piper for text-to-speech (TTS).
  • Stream everything in real-time via WebSockets.

All of this runs offline on the Raspberry Pi — no cloud services involved. Let’s dive into how I built it step by step!

1. Setting up the Raspberry Pi

First, I set up my Raspberry Pi with the latest Raspberry Pi OS. It’s important to enable hardware interfaces and connect a USB microphone and speaker.

Steps:

  1. Update the system:
   sudo apt-get update
   sudo apt-get upgrade
  1. Enable the audio interface:
   sudo raspi-config

Navigate to System Options > Audio and select the correct output/input device.

2. Installing Ollama for Local LLMs

Ollama makes it easy to run local LLMs like Mistral on your Raspberry Pi. I installed it using:

curl -fsSL https://ollama.com/install.sh | sh

Once installed, I pulled the Mistral model:

ollama pull mistral

To confirm it works, I ran a quick test:

ollama run mistral

The model was ready to process text right on the Pi!

3. Setting up Piper for Text-to-Speech (TTS)

For offline voice generation, I chose Piper — a fantastic open-source TTS engine.

  1. Install dependencies:
   sudo apt-get install wget build-essential libsndfile1
  1. Download Piper for ARM64 (Raspberry Pi):
   wget https://github.com/rhasspy/piper/releases/download/v1.0.0/piper_arm64.tar.gz
   tar -xvzf piper_arm64.tar.gz
   chmod +x piper
   sudo mv piper /usr/local/bin/
  1. Test if Piper works:
   echo "Hello, world!" | piper --model en_US --output_file output.wav
   aplay output.wav

Now the Pi could “talk” back!

4. Creating the Backend (Node.js)

I built a simple Node.js server to:

  • Accept text from the client (voice input from a web app).
  • Process it using Mistral (via Ollama).
  • Convert the LLM response to speech with Piper.
  • Stream the audio back to the client.

server.js:

const express = require('express');
const { exec } = require('child_process');
const WebSocket = require('ws');

const app = express();
const PORT = 3001;

// WebSocket setup
const wss = new WebSocket.Server({ port: 3002 });

wss.on('connection', (ws) => {
  console.log('Client connected');

  ws.on('message', (message) => {
    console.log('Received:', message);

    // Run Mistral LLM
    exec(`ollama run mistral "${message}"`, (err, stdout) => {
      if (err) {
        console.error('LLM error:', err);
        ws.send('Error processing your request.');
        return;
      }

      // Convert LLM response to speech using Piper
      exec(`echo "${stdout}" | piper --model en_US --output_file output.wav`, (ttsErr) => {
        if (ttsErr) {
          console.error('Piper error:', ttsErr);
          ws.send('Error generating speech.');
          return;
        }

        // Send the audio file back to the client
        ws.send(JSON.stringify({ text: stdout, audio: 'output.wav' }));
      });
    });
  });
});

app.listen(PORT, () => {
  console.log(`Server running at http://localhost:${PORT}`);
});

5. Building the Real-Time Web Interface (React)

For the frontend, I created a simple React app to:

  • Record voice input.
  • Display real-time text responses.
  • Play the generated speech audio.

App.js:

import React, { useState } from 'react';

function App() {
  const [text, setText] = useState('');
  const [response, setResponse] = useState('');
  const [audio, setAudio] = useState(null);

  const ws = new WebSocket('ws://localhost:3002');

  const handleSend = () => {
    ws.send(text);
  };

  ws.onmessage = (event) => {
    const data = JSON.parse(event.data);
    setResponse(data.text);

    fetch(`http://localhost:3001/${data.audio}`)
      .then(res => res.blob())
      .then(blob => {
        setAudio(URL.createObjectURL(blob));
      });
  };

  return (
    <div>
      <h1>Voice Assistant</h1>
      <textarea value={text} onChange={(e) => setText(e.target.value)} />
      <button onClick={handleSend}>Send</button>
      <h2>Response:</h2>
      <p>{response}</p>
      {audio && <audio controls src={audio} />}
    </div>
  );
}

export default App;

6. Running the Project

Once the backend and frontend were ready, I launched both:

  • Start the backend:
  node server.js
  • Run the React app:
  npm start

I accessed the web app on my Raspberry Pi’s IP at port 3000 and spoke into the mic — and voilà! The assistant responded in real-time, all processed locally.

Conclusion

Building a real-time, fully offline voice assistant on a Raspberry Pi was an exciting challenge. With:

  • Ollama for running local LLMs (like Mistral)
  • Piper for high-quality text-to-speech
  • WebSockets for real-time communication
  • React for a smooth web interface

… I now have a personalized voice AI that works without relying on the cloud.

Total
0
Shares
Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post
what-is-index-bloat?-—-whiteboard-friday

What is Index Bloat? — Whiteboard Friday

Next Post
10-css-tricks-every-frontend-developer-should-know

10 CSS Tricks Every Frontend Developer Should Know

Related Posts