I’m working on a Laravel 11 application where I need to stream audio from the frontend to Google Cloud Speech-to-Text and then broadcast the transcriptions using Pusher.
Frontend Code:
let mediaRecorder;
let audioChunks = [];
document.getElementById('record').addEventListener('click', async () => {
const stream = await navigator.mediaDevices.getUserMedia({ audio: true });
mediaRecorder = new MediaRecorder(stream);
mediaRecorder.ondataavailable = event => {
audioChunks.push(event.data);
sendData(event.data); // Sending data to the server
};
mediaRecorder.start(250); // Sending chunks every 250ms
});
document.getElementById('stop').addEventListener('click', () => {
mediaRecorder.stop();
});
function sendData(audioData) {
fetch('/stream-audio', {
method: 'POST',
headers: {
'Content-Type': 'application/octet-stream',
'X-CSRF-TOKEN': '{{ csrf_token() }}',
},
body: audioData
});
}
// Pusher integration
Pusher.logToConsole = true;
var pusher = new Pusher('{{ env('PUSHER_APP_KEY') }}', { cluster: 'eu' });
var channel = pusher.subscribe('speech-transcript-created-channel');
var pusherTextArea = document.getElementById('pusherTextArea');
channel.bind('speech-transcript-created', function(data) {
pusherTextArea.value += JSON.stringify(data) + 'n';
});
Backend Code (SpeechController.php):
protected $speechClient;
protected $stream;
public function __construct()
{
$this->speechClient = new SpeechClient([
'transport' => 'grpc',
'credentials' => base_path(env('GOOGLE_APPLICATION_CREDENTIALS'))
]);
$config = new StreamingRecognitionConfig([
'config' => new RecognitionConfig([
'encoding' => RecognitionConfigAudioEncoding::WEBM_OPUS,
'sample_rate_hertz' => 48000,
'language_code' => 'sk-SK',
]),
]);
$this->stream = $this->speechClient->streamingRecognize();
$this->stream->write(new StreamingRecognizeRequest(['streaming_config' => $config]));
}
public function streamAudio(Request $request)
{
$audioContent = $request->getContent();
$this->stream->write(new StreamingRecognizeRequest(['audio_content' => $audioContent]));
while ($response = $this->stream->read()) {
foreach ($response->getResults() as $result) {
$transcript = $result->getAlternatives()[0]->getTranscript();
event(new SpeechTranscriptCreated($transcript));
}
}
return response()->json(['message' => 'Audio streamed successfully']);
}
Issue:
I’m using Pusher to broadcast the transcriptions back to the frontend, but I’m encountering an issue where I can’t successfully connect to Google Cloud Speech due to the following error:
{
"message": "Audio Timeout Error: Long duration elapsed without audio. Audio should be sent close to real time.",
"code": 11,
"status": "OUT_OF_RANGE",
"details": [ { "@type": "pc-high-bwd-bin", "data": "<Unknown Binary Data>" } ]
}
I suspect that using WebSockets might be necessary to maintain a real-time connection with Google Cloud Speech-to-Text, but I’m unsure how to implement this properly in Laravel. How can I integrate Google Cloud Speech-to-Text with Pusher effectively, or how can I resolve this timeout error?
Note:
- If I’m mixing incompatible approaches (such as using fetch to send audio data while expecting real-time processing with Pusher), I’d appreciate a brief explanation of what is wrong with this approach and how I might better structure this integration.
- The goal is to create a real-time translation solution, so relying on an HTTP API where the request is only sent after clicking “Stop Recording” is not ideal.