I have a website that sends dynamically generated text to the server and the server returns the voiceover of this text to the site. Audio is transmitted in chunks, each 0.33 seconds long. The initial size of the transmitted voice file is unknown, because it is generated on a third-party API at the time of the request. Chunks come in the ReadableStream interface, each chunk is in uint8array format. The global task is to make the voice acting sound as the chunk is uploaded to the site.
I start reading the Readable Stream through reader and voicing these chunks as they are uploaded to the site. But I ran into a problem. The first chunk voices its part of the text (0.33 seconds), but the next ones give an error: Uncaught (in promise) EncodingError: Failed to execute ‘decodeAudioData’ on ‘BaseAudioContext’: Unable to decode audio data.
This is my code:
const response = await fetch(voiceAPI, requestOptions);
const reader = await response.body.getReader();
const readChunk = () => {
reader.read().then(({value, done}) => {
if (value) {
let audioContext = new AudioContext();
let audioSourceNode = audioContext.createBufferSource();
audioSourceNode.connect(audioContext.destination);
new Promise((resolve) => {
let decodedBuffer = audioContext.decodeAudioData(value.buffer);
resolve(decodedBuffer);
})
.then((decodedBuffer) => {
audioSourceNode.buffer = decodedBuffer;
audioSourceNode.start(0);
audioSourceNode.addEventListener('ended', () => {
audioSourceNode.buffer = null;
readChunk();
})
})
} else {
console.log('Stream is completed');
}
})
};
readChunk();
The problem is somehow related to decodeAudioData method, error occurs on the line with it’s call and only on the “second” call of decodeAudioData. The WebAPI documentation says that “This method only works on complete file data, not fragments of audio file data.” So I tried to make every time a new audioContext
with new audioSourceNode
, but that is not working.
I also tried to make global audioContext and not creating new one every iteration of readChunk();
let audioContext = new AudioContext();
let audioSourceNode;
const readChunk = () => {
reader.read().then(({value, done}) => {
if (value) {
audioSourceNode = audioContext.createBufferSource();
audioSourceNode.connect(audioContext.destination);
// also tried "let audioSourceNode = audioContext.createBufferSource();"
new Promise((resolve) => { /// decodeAudioData code }
I also tried to make one big array of all chunks and then voice them, from one call of decodeAudioData, and it works. But about 8 seconds pass between clicking on the voice-over button and the voice-over itself with this approach if text contains 1000+ symbols. So it’s not an option.
I will be glad of any help or idea. I’m definitely missing something, but I can’t figure out what.