Audio offset get wrong after some time when streaming audios

I use microsoft-cognitiveservices-speech-sdk (1.38.0) in order to do real time speech to text.
It seems like the offset is right when I send a full audio but it is wrong when I send it cut in a lot of audio chunks.

The more there is audio chunks the more inaccurate the offset is :

  • No chunks : 1 726 300 000
  • 369 chunks of 0.5 seconds : 1 729 600 000
  • 923 chunks of 0.2 seconds : 1 744 600 000
  • 1443 chunks of 0.1 seconds : 1 757 900 000

To reproduce here is some piece of code :

    const speechConfig = SpeechConfig.fromSubscription(<KEY>, <REGION);

    const pushStream = AudioInputStream.createPushStream();
    const audioConfig = AudioConfig.fromStreamInput(pushStream);
    const speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);

    speechRecognizer.recognized = async (recognizer, event) => {console.log(event)}
    speechRecognizer.canceled = async (recognizer, event) => {console.log(event)}
    speechRecognizer.startContinuousRecognitionAsync();

    for (let i = 1; i <= 1443; i++) {
      const formattedNumber = i.toString().padStart(4, '0');
      const buffer = fs.readFileSync(`/var/tmp/chunks/output_${formattedNumber}.wav`);
      pushStream.write(buffer);
    }

To create the audio chunks :

ffmpeg -i <INPUT_FILE> -f segment -segment_time 0.1 -c copy output_%04d.wav

Here is the audio link : https://drive.google.com/file/d/1H_RJuqMiBaVkpo9XHrgp1bpuFdgQl64O/view?usp=sharing

Thanks for your help