I am working on creating a web app in Next.js and Typescript. For a specific functionality, I must grab browser or desktop audio from the client side and store it for transcription. I have been able to get user audio through Chrome’s APIs (although this feature is not supported on Firefox). After getting the audio, I have made a simple HTML audio player that can play that recorded audio. Everything works. Now, I want to send this audio file to an online API like OpenAI so that I can receive a transcription of this audio. I have tried several ways to do this, but the OpenAI API always gives back an error saying that the file format is not supported although I am capturing audio in WAV format. I have also tried using the OpenAI node module., but that didn’t work either.:
// Component to capture user audio
import { useEffect, useRef, useState } from 'react';
import transcribe from '@/API/transcribe';
const AudioCaptureButton = () => {
const audioRef = useRef<HTMLAudioElement>(null);
const [mediaRecorder, setMediaRecorder] = useState<MediaRecorder | null>(null);
const [chunks, setChunks] = useState<BlobPart[]>([]);
const captureAudio = async () => {
try {
const mediaStream = await navigator.mediaDevices.getDisplayMedia({
video: true, // This is required, even though we are not capturing video. Setting this to false throws an error.
audio: true
});
const recorder = new MediaRecorder(mediaStream);
setMediaRecorder(recorder);
recorder.ondataavailable = (e) => {
setChunks((prev) => {
const updatedChunks = [...prev, e.data];
const blob = new Blob(updatedChunks, { 'type' : 'audio/wav' });
const audioURL = window.URL.createObjectURL(blob);
if (audioRef.current) {
audioRef.current.src = audioURL;
}
transcribe(blob);
return updatedChunks;
});
};
// recorder.ondataavailable = (e) => {
// setChunks((prev) => [...prev, e.data]);
// };
// recorder.onstop = async () => {
// const audioBlob = new Blob(chunks, { type: 'audio/wav; codecs=opus' });
// await transcribe(audioBlob);
// };
// Instead of providing a "Stop Capture" button, we can just stop the capture when the mediaStream ends.
mediaStream.getTracks().forEach(track => {
track.onended = () => {
if (recorder.state !== 'inactive') {
recorder.stop();
}
};
});
recorder.start();
} catch (err) {
console.error('Error capturing audio', err);
}
};
return (
<div>
<button onClick={captureAudio}>Start Capture</button>
<audio ref={audioRef} controls />
</div>
);
};
export default AudioCaptureButton;
And this is the API code:
import OpenAI from "openai";
import fs from "fs";
const openai = new OpenAI({
apiKey: "MY API KEY",
dangerouslyAllowBrowser: true
});
const transcribe = async (audio: Blob) => {
/* Function that will use OpenAI's Whisper API to transcribe audio. */
const file = new File(, "audio.wav", { type: 'audio/wav' });
const response = new Response(audio);
console.log(response);
const transcription = await openai.audio.transcriptions.create({
file: response,
model: "whisper-1",
}).then((res) => {
console.log(res);
return res;
}).catch((err) => { console.error(err); });
// const audioBlobWithType = new Blob(, { type: 'audio/wav' });
// const formData = new FormData();
// formData.append("file", audioBlobWithType, "audio.wav")
// formData.append("model", "whisper-1")
// const transcription = await fetch("https://api.openai.com/v1/audio/transcriptions", {
// method: "POST",
// headers: {
// "Authorization": "Bearer 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",
// },
// body: formData,
// });
// console.log(await transcription.json())
}
export default transcribe;
In the transcribe function, I am currently trying to use the OpenAI package to get the transcription. However, as mentioned, it says that the file format is not supported. In the transcribe function, I have also commented-out the code that used the API instead of the library to send the audio file. That too resulted in a file format not supported error.
I would appreciate any help in this matter. I have been stuck on this issue for more than a week now. If there is something wrong with the blob file itself, like, I am sending it at the wrong time, I would like to know that as well. Thank you!