How to Integrate Google Cloud Speech-to-Text with Pusher in a Laravel Application?

I’m working on a Laravel 11 application where I need to stream audio from the frontend to Google Cloud Speech-to-Text and then broadcast the transcriptions using Pusher.

Frontend Code:

let mediaRecorder;
let audioChunks = [];

document.getElementById('record').addEventListener('click', async () => {
    const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
    mediaRecorder = new MediaRecorder(stream);

    mediaRecorder.ondataavailable = event => {
        audioChunks.push(event.data);
        sendData(event.data); // Sending data to the server
    };

    mediaRecorder.start(250); // Sending chunks every 250ms
});

document.getElementById('stop').addEventListener('click', () => {
    mediaRecorder.stop();
});

function sendData(audioData) {
    fetch('/stream-audio', {
        method: 'POST',
        headers: {
            'Content-Type': 'application/octet-stream',
            'X-CSRF-TOKEN': '{{ csrf_token() }}',
        },
        body: audioData
    });
}

// Pusher integration
Pusher.logToConsole = true;
var pusher = new Pusher('{{ env('PUSHER_APP_KEY') }}', { cluster: 'eu' });
var channel = pusher.subscribe('speech-transcript-created-channel');
var pusherTextArea = document.getElementById('pusherTextArea');
channel.bind('speech-transcript-created', function(data) {
    pusherTextArea.value += JSON.stringify(data) + 'n';
});

Backend Code (SpeechController.php):

protected $speechClient;
protected $stream;

public function __construct()
{
    $this->speechClient = new SpeechClient([
        'transport' => 'grpc',
        'credentials' => base_path(env('GOOGLE_APPLICATION_CREDENTIALS'))
    ]);

    $config = new StreamingRecognitionConfig([
        'config' => new RecognitionConfig([
            'encoding' => RecognitionConfigAudioEncoding::WEBM_OPUS,
            'sample_rate_hertz' => 48000,
            'language_code' => 'sk-SK',
        ]),
    ]);

    $this->stream = $this->speechClient->streamingRecognize();
    $this->stream->write(new StreamingRecognizeRequest(['streaming_config' => $config]));
}

public function streamAudio(Request $request)
{
    $audioContent = $request->getContent();
    $this->stream->write(new StreamingRecognizeRequest(['audio_content' => $audioContent]));

    while ($response = $this->stream->read()) {
        foreach ($response->getResults() as $result) {
            $transcript = $result->getAlternatives()[0]->getTranscript();
            event(new SpeechTranscriptCreated($transcript));
        }
    }

    return response()->json(['message' => 'Audio streamed successfully']);
}

Issue:

I’m using Pusher to broadcast the transcriptions back to the frontend, but I’m encountering an issue where I can’t successfully connect to Google Cloud Speech due to the following error:

{
  "message": "Audio Timeout Error: Long duration elapsed without audio. Audio should be sent close to real time.",
  "code": 11,
  "status": "OUT_OF_RANGE",
  "details": [ { "@type": "pc-high-bwd-bin", "data": "<Unknown Binary Data>" } ]
}

I suspect that using WebSockets might be necessary to maintain a real-time connection with Google Cloud Speech-to-Text, but I’m unsure how to implement this properly in Laravel. How can I integrate Google Cloud Speech-to-Text with Pusher effectively, or how can I resolve this timeout error?

Note:

  1. If I’m mixing incompatible approaches (such as using fetch to send audio data while expecting real-time processing with Pusher), I’d appreciate a brief explanation of what is wrong with this approach and how I might better structure this integration.
  2. The goal is to create a real-time translation solution, so relying on an HTTP API where the request is only sent after clicking “Stop Recording” is not ideal.