Capture real-time audio from mic in web application with Google Speech to Text API

Question

I have a C# MVC web application with voice input forms. I mean the user can input the values in textbox using voice via microphone. I decided to use Google Speech to text API to achieve this. But I have some doubts to achieve this using Google Speech to Text API.

Because I am trying to implement this in an MVC Web application, so the mic device not accessible from the server-side code. I want to convert the speech from real-time audio and show the output text to the webpage.

you could use SignalR to send the input to your server call the Google API and push the result back to the client. — Darem, Dec 07 '20 at 07:39
@Darem Thank you very much for your response. Could you please suggest any sample code. — Ragesh S, Dec 07 '20 at 10:54
This is from the official Microsoft page. https://docs.microsoft.com/en-us/aspnet/core/tutorials/signalr?view=aspnetcore-5.0&tabs=visual-studio. It is a simple chat app. But you can replace the string input with your microphone (byte) input and send it to your backend. I hope this helps you? — Darem, Dec 07 '20 at 11:14
Hello @Darem, could you please post your comment as an answer so that it's more visible for other people? — Alejandro, Dec 09 '20 at 15:14

Darem · Answer 1 · 2020-12-09T15:41:38.887

One solution to solve this problem would be to use SignalR.

You can follow this SO answer to get the microphone input and also a very nice explanation of how to handle microphone input with websockets.

The following is only sudo code to explain the concept!

It is also greatly simplified, because I don't know, for example, whether Google's API can handle the fact that you always send it only fragments of speech input, etc. And as I said the code gives only a rough overview of the basic process and has no logic if the server is offline etc.

But inside of the function process_microphone_buffer(event)function you can call SignalR.

So the function would something like

function process_microphone_buffer(event) {
    // you should handle this as a singelton
    const connection = new signalR.HubConnectionBuilder().withUrl("/speechToTextHub ").build();
    const microphone_output_buffer = event.inputBuffer.getChannelData(0);
    connection.invoke("SendMicrophoneBuffer", microphone_output_buffer).catch(function (err) {
        return console.error(err.toString());
    });
}

And on your Server you implement a corresponding hub:

using Microsoft.AspNetCore.SignalR;
using System.Threading.Tasks;

namespace SignalRChat.Hubs
{

    public class SpeechToTextHub : Hub
    {
        public async Task SendMicrophoneBuffer(byte[] buffer)
        {
            var googleApi = new GoogleApi();
            var speechToTextResult = await googleApi.GetTextFromSpeechAsync(buffer);
            Context.Clients.Client(Context.ConnectionId).SendAsync("SpeechToTextResult", speechToTextResult);
        }
    }
}

And on your client you have something like this

connection.on("SpeechToTextResult", function (textResult) {
   console.log(textResult);
});

If the answer is too general for Stackoverflow, I can also remove it. If there are still open questions, I can extend my answer accordingly.

Capture real-time audio from mic in web application with Google Speech to Text API

1 Answers1