whisper transcript of mediaRecorder audio errors – Blancer.com Tutorials and projects

I am attempting to use whisper to transcribe an audio file recorded with the mediaRecorder. But I am not able to generate a file format which is accepted by whisper. I tested whisper on a downloaded mp3 file which worked, indicating that it is probably a mistake with the audio I record.

I am recording the audio with the following JS code:

let mediaRecorder;
let recordedChunks = [];
async function startRecording() {
    let stream;
    try{
      stream = await navigator.mediaDevices.getUserMedia({ audio: true });
    } catch(error) {
      console.error("Error accessing microphone:", error);
    }
    const options = {
        audioBitsPerSecond: 64000,
        mimeType: 'audio/webm'
      };
    mediaRecorder = new MediaRecorder(stream, options);
    mediaRecorder.addEventListener("dataavailable", (event) => {
        if (event.data.size > 0) {
            recordedChunks.push(event.data);
        }
    });
    mediaRecorder.addEventListener("stop", () => {
        const audioBlob = new Blob(recordedChunks, { type: options.mimeType });
        recordedChunks = []; 
        const reader = new FileReader();
        reader.onload = (event) => {
            socket.emit("audio", event.target.result);
        };
        reader.readAsArrayBuffer(audioBlob);
    });
  mediaRecorder.start();
}

When later receiving the audio I am changing into a suitable format – I attempted np, pytorch tensor, waveform and mp3 (all accepted by whisper). Using any one of those in whisper.transcribe() failed. One example of an attempted transformation from webm to waveform below:

    waveform, sample_rate = torchaudio.load("incoming_audio.webm")
    if sample_rate != 16000 or waveform.shape[0] != 1:
        transform = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=16000, dtype=torch.float32)
        waveform = transform(waveform)
        waveform = waveform.mean(dim=0, keepdim=True)

As none of my attempted file format changes worked in the transcribe method of whisper (either I get the error: “TypeError: expected np.ndarray (got bytes)” or “Segmentation fault (core dumped)”) I think that my original recording might be flawed. To approach this I saved a recorded file and played it back with a media player from my files (which worked) and printed the file (format seemed non corrupted).

I would appreciate any guidance on what I am doing wrong, I am stuck!