Azure TTS audio is distorted – Blancer.com Tutorials and projects

In Javascript, I’m attempting to stream audio created in the format Audio16Khz32KBitRateMonoMp3 by the “microsoft-cognitiveservices-speech-sdk” SpeechSynthesizer via express to a react frontend app. The first couple of sentences sounds just fine but after that the speech is very distorted.

Here is the code that sends the audio:

    synthesizer.synthesizing = function (s, e) {
    currentAudioChunk = {
      audio: Buffer.from(e.result.audioData),
      offset: e.result.audioDuration / 10000, // Convert to milliseconds
    };

    sendEvent("audioData", {
      audio: currentAudioChunk.audio.toString("base64"),

      //his audioOffset data is null, and I'm sending it as a placeholder for now
      audioOffset: "0",
    });

    currentAudioChunk = null;
  };

When the audio is good the string sent looks like this:

“//NIxCElU/5IAY+IAYJ/2sMoSxHk0RH9BVzA0GTFmDQIAQ4Xv8iBn5oEqAUwtPDlBnyDh9f+ukXFoLdRAhSgoYRmQwQAIIJ0DVH/+eez/FwFkR+DcgYghGI/ImQQ0kU/////LRuTiDFw8fLRmk9Bv////v/+hTTN0Cuzk4TZ8qGCFf6UeMhhgB8BjVMd/t5V//NIxA4g2tqkAc9YAD7MqNv/AVisORkvL3sJA7Fje5e+5N7r/ZNxXO9991vl98n9A+bqGiY3myJPJAOEDjHGJfRMWtAsfGoO9ejd0GhoOy32zfEvfHvZL4vfvTvfV5xO2TeymMriv6//j+v5/j2MNzc+w4oEElx7v6u/2t/UcW0Tg+oDGEGSAs30kiaCvEsa//NIxA0f6tbFtGsQsB/wSwiDZ+tZaEobTf5rXCqf/XqycoAzWXtvSKlDpriVZX/mmfM+YFhrAo1ookTMDgiGJRKHii37Q6Eh+fFkyMUXF4c2/fTr+0rpvsyXQevBDueX8PPz/PPX88VXXxySFgZQJQqJRW71f/Ss+4XRpRipAO1XGSKjcylABX/VcJmGEBBG”

but when the distortion starts there are lots of repeated letters that appear to be junk, like this:

“//NIxHwAAANIAAAAAFVVVVVVVVVMQU1FMy4xMDBVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV//NIxHwAAANIAAAAAFVVVVVVVVVMQU1FMy4xMDBVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV//NIxHwAAANIAAAAAFVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV”

The repeated numbers are in the raw data received from Azure: it’s not an artefact of conversion to string.

How can I get clean audio from Azure TTS?

I tried stripping out the Vs but that just corrupted the data entirely.