5

I'm trying to get SAPI 5.4 (also MS Speech Platform SDK v11) to perform continuous speech recognition on the audio coming in from a Skype call.

I can use the SKYPE4COMLib to grab the audio coming in from Skype and push it over a TCP port by issuing an ALTER CALL instruction. You can direct the Skype audio to file or TCP socket. File worked fine, but I want it to run recognition live so use the TCP socket.

I then built a TCP listener to gather up the incoming data (audio raw format) and pass the byte array as a MemoryStream to SAPI. I've setup SAPI to expect raw audio in the format of 16bit, 16khz, mono, PCM. However a recognition event never occurs?!

I've tried saving that raw audio to disk instead, and then reading that into SAPI and it works fine... so the data itself is fine and Skype is correctly sending on the audio. However this doesn't let me do the continuous recognition I need.

The SAPI recognition code works fine using a WAV file, or raw file loaded from disk, or microphone. I just can't get it to work from a MemoryStream.

I found this similar article, none of the suggestions there seem to work for me, and discussion seems to have gone quiet.

Streaming input to System.Speech.Recognition.SpeechRecognitionEngine

Does anyone have any guidance on how to successfully get SAPI to continuously recognise speech from raw audio sent as a MemoryStream in C#?

Community
  • 1
  • 1
timemirror
  • 576
  • 4
  • 11
  • If you want to do continuous recognition, why are you using a `MemoryStream` to buffer it? You should give a `NetworkStream` directly to SAPI, optionally with a BufferedStream in front of it. You may need to derive from `NetworkStream` to override `Seek`. – Dark Falcon Sep 10 '12 at 14:24
  • I tried this which was suggested on that link in the question, but I couldn't get it to work. Is this what you're suggesting as well? NetworkStream netStream = new NetworkStream(socket,true); BufferedStream buffStream = new BufferedStream(netStream, 8000*16*1); appRecognizer.SetInputToAudioStream(buffStream, formatInfo); – timemirror Sep 10 '12 at 14:34
  • You need to derive from the stream and override Seek, as noted. – Dark Falcon Sep 10 '12 at 15:07
  • The reason I've been trying to use the MemoryStream is that is supports Seek, whereas NetworkStream doesn't. So I received the data on the NetworkStream and CopyTo(MemStream). Still can't get it to work though... – timemirror Sep 11 '12 at 10:42
  • Sean's answer here works well.... http://stackoverflow.com/questions/1682902/streaming-input-to-system-speech-recognition-speechrecognitionengine – timemirror Oct 25 '12 at 16:41

1 Answers1

0

As you are using streaming audio, I think you should use recognizer.RecognizeAsync

Eel Lee
  • 3,294
  • 2
  • 27
  • 44
Patel.NET
  • 1
  • 1
  • Hi Patel.NET - thanks, yer I'm using recognizer.RecognizeAsync(RecognizeMode.Multiple). The issue was that when you stream data it never raises the recognize event,so you need to override the stream class. – timemirror Jan 02 '14 at 05:52