0

I have a C# MVC web application with voice input forms. I mean the user can input the values in textbox using voice via microphone. I decided to use Google Speech to text API to achieve this. But I have some doubts to achieve this using Google Speech to Text API.

Because I am trying to implement this in an MVC Web application, so the mic device not accessible from the server-side code. I want to convert the speech from real-time audio and show the output text to the webpage.

halfer
  • 18,701
  • 13
  • 79
  • 158
Ragesh S
  • 3,691
  • 12
  • 92
  • 128
  • you could use SignalR to send the input to your server call the Google API and push the result back to the client. – Darem Dec 07 '20 at 07:39
  • @Darem Thank you very much for your response. Could you please suggest any sample code. – Ragesh S Dec 07 '20 at 10:54
  • 1
    This is from the official Microsoft page. https://docs.microsoft.com/en-us/aspnet/core/tutorials/signalr?view=aspnetcore-5.0&tabs=visual-studio. It is a simple chat app. But you can replace the string input with your microphone (byte) input and send it to your backend. I hope this helps you? – Darem Dec 07 '20 at 11:14
  • Hello @Darem, could you please post your comment as an answer so that it's more visible for other people? – Alejandro Dec 09 '20 at 15:14
  • 1
    @Alejandro I added my comment as a anwser. – Darem Dec 09 '20 at 15:42

1 Answers1

2

One solution to solve this problem would be to use SignalR.

You can follow this SO answer to get the microphone input and also a very nice explanation of how to handle microphone input with websockets.

The following is only sudo code to explain the concept!

It is also greatly simplified, because I don't know, for example, whether Google's API can handle the fact that you always send it only fragments of speech input, etc. And as I said the code gives only a rough overview of the basic process and has no logic if the server is offline etc.

But inside of the function process_microphone_buffer(event)function you can call SignalR.

So the function would something like

function process_microphone_buffer(event) {
    // you should handle this as a singelton
    const connection = new signalR.HubConnectionBuilder().withUrl("/speechToTextHub ").build();
    const microphone_output_buffer = event.inputBuffer.getChannelData(0);
    connection.invoke("SendMicrophoneBuffer", microphone_output_buffer).catch(function (err) {
        return console.error(err.toString());
    });
}

And on your Server you implement a corresponding hub:

using Microsoft.AspNetCore.SignalR;
using System.Threading.Tasks;

namespace SignalRChat.Hubs
{

    public class SpeechToTextHub : Hub
    {
        public async Task SendMicrophoneBuffer(byte[] buffer)
        {
            var googleApi = new GoogleApi();
            var speechToTextResult = await googleApi.GetTextFromSpeechAsync(buffer);
            Context.Clients.Client(Context.ConnectionId).SendAsync("SpeechToTextResult", speechToTextResult);
        }
    }
}

And on your client you have something like this

connection.on("SpeechToTextResult", function (textResult) {
   console.log(textResult);
});

If the answer is too general for Stackoverflow, I can also remove it. If there are still open questions, I can extend my answer accordingly.

Darem
  • 1,508
  • 1
  • 15
  • 33