I am attempting to use whisper to transcribe an audio file recorded with the mediaRecorder. But I am not able to generate a file format which is accepted by whisper. I tested whisper on a downloaded mp3 file which worked, indicating that it is probably a mistake with the audio I record.
I am recording the audio with the following JS code:
let mediaRecorder;
let recordedChunks = [];
async function startRecording() {
let stream;
try{
stream = await navigator.mediaDevices.getUserMedia({ audio: true });
} catch(error) {
console.error("Error accessing microphone:", error);
}
const options = {
audioBitsPerSecond: 64000,
mimeType: 'audio/webm'
};
mediaRecorder = new MediaRecorder(stream, options);
mediaRecorder.addEventListener("dataavailable", (event) => {
if (event.data.size > 0) {
recordedChunks.push(event.data);
}
});
mediaRecorder.addEventListener("stop", () => {
const audioBlob = new Blob(recordedChunks, { type: options.mimeType });
recordedChunks = [];
const reader = new FileReader();
reader.onload = (event) => {
socket.emit("audio", event.target.result);
};
reader.readAsArrayBuffer(audioBlob);
});
mediaRecorder.start();
}
When later receiving the audio I am changing into a suitable format – I attempted np, pytorch tensor, waveform and mp3 (all accepted by whisper). Using any one of those in whisper.transcribe() failed. One example of an attempted transformation from webm to waveform below:
waveform, sample_rate = torchaudio.load("incoming_audio.webm")
if sample_rate != 16000 or waveform.shape[0] != 1:
transform = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=16000, dtype=torch.float32)
waveform = transform(waveform)
waveform = waveform.mean(dim=0, keepdim=True)
As none of my attempted file format changes worked in the transcribe method of whisper (either I get the error: “TypeError: expected np.ndarray (got bytes)” or “Segmentation fault (core dumped)”) I think that my original recording might be flawed. To approach this I saved a recorded file and played it back with a media player from my files (which worked) and printed the file (format seemed non corrupted).
I would appreciate any guidance on what I am doing wrong, I am stuck!