Software

2 minute read

How to Get Audio Transcriptions from Whisper without a File System

December 10, 2023

how-to-get-audio-transcriptions-from-whisper-without-a-file-system

Whisper is OpenAI’s intelligent speech-to-text transcription model. It allows developers to enter audio and an optional styling prompt, and get transcribed text in response.

However, the official OpenAI Node.js SDK API docs only show one way to use Whisper – reading an audio file with fs.

async function main() {
  const transcription = await openai.audio.transcriptions.create({
    file: fs.createReadStream("audio.mp3"),
    model: "whisper-1",
  });

  console.log(transcription.text);
}

That works fine if you have static files… but in any consumer application, we’ll be processing data from an end-user client such as an app or web browser. To receive audio from thousands of users and save it as files is a major waste of disk space and a huge ineffeciency. Plus, serverless deployment is extremely popular today, and in a serverless environment we usually don’t have persistent file storage. I wrote this article because it was surprisingly hard to figure out how to achieve audio transcription without saving the audio as a file first.

How to use Whisper without files

On the client-side, you’ll need to get your audio into a Base64 encoded string. I’m using the library “@ricky0123/vad-react” for this purpose, which comes with utilities to accomplish that:

onSpeechEnd: (audio) => {
      const wavBuffer = utils.encodeWAV(audio);
      const base64 = utils.arrayBufferToBase64(wavBuffer);
      const audioUrlAsData = `${base64}`;
      // chose POST here with a payload to ensure the Base64 string doesn't violate the max length of a URL
      fetch("/api/transcribe", {
        method: "POST",
        body: JSON.stringify({ audioData: audioUrlAsData }), 
      })
}

Then on the server-side, the trick is to create a buffer from the base64 data and use the undocumented toFile function from OpenAI’s library.

import OpenAI, { toFile } from "openai";

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

export default async function handler(
  req,
  res
) {
  try {
    // Extract Base64 encoded data from the request
    const bodyData = JSON.parse(req.body);
    const base64Audio = bodyData.audioData;

    // Decode Base64 to binary
    const audioBuffer = Buffer.from(base64Audio, "base64");

    // Use OpenAI API to transcribe the audio
    const transcription = await openai.audio.transcriptions.create({
      file: await toFile(audioBuffer, "audio.wav", {
        contentType: "audio/wav",
      }),
      model: "whisper-1",
    });

    // Send the transcription text as response
    res.json({ transcription: transcription.text });
  } catch (error) {
    console.error("Error during transcription:", error);
    res.status(500).send("Error during transcription");
  }
}

Voila! Through this process, you can use Whisper without saving audio from every user as static files, allowing it to be used in a serverless environment.

Authenticate your React App with Supabase

December 10, 2023

Software

How to Serve Laravel Apps With Valet on MacOS

December 11, 2023

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Hand-Picked Top-Read Stories

Six simple behavioral science tips to improve any marketing message (and the brands that get it right)

Why People Still Bother Installing Modded Apps on iOS in 2025

Traditional vs. non-profit product management

Trending Tags

How to Get Audio Transcriptions from Whisper without a File System

How to use Whisper without files

Leave a Reply Cancel reply

Previous Post

Authenticate your React App with Supabase

Next Post

How to Serve Laravel Apps With Valet on MacOS

How to Get Audio Transcriptions from Whisper without a File System

How to use Whisper without files

Leave a Reply Cancel reply

Previous Post

Next Post

Related Posts