9

I am trying to do "streaming" speech recognition in C# from a TCP socket. The problem I am having is that SpeechRecognitionEngine.SetInputToAudioStream() seems to require a Stream of a defined length which can seek. Right now the only way I can think to make this work is to repeatedly run the recognizer on a MemoryStream as more input comes in.

Here's some code to illustrate:

            SpeechRecognitionEngine appRecognizer = new SpeechRecognitionEngine();

            System.Speech.AudioFormat.SpeechAudioFormatInfo formatInfo = new System.Speech.AudioFormat.SpeechAudioFormatInfo(8000, System.Speech.AudioFormat.AudioBitsPerSample.Sixteen, System.Speech.AudioFormat.AudioChannel.Mono);

            NetworkStream stream = new NetworkStream(socket,true);
            appRecognizer.SetInputToAudioStream(stream, formatInfo);
            // At the line above a "NotSupportedException" complaining that "This stream does not support seek operations."

Does anyone know how to get around this? It must support streaming input of some sort, since it works fine with the microphone using SetInputToDefaultAudioDevice().

Thanks, Sean

Eric Brown
  • 13,308
  • 7
  • 28
  • 67
spurserh
  • 553
  • 3
  • 15

5 Answers5

15

I got live speech recognition working by overriding the stream class:

class SpeechStreamer : Stream
{
    private AutoResetEvent _writeEvent;
    private List<byte> _buffer;
    private int _buffersize;
    private int _readposition;
    private int _writeposition;
    private bool _reset;

    public SpeechStreamer(int bufferSize)
    {
        _writeEvent = new AutoResetEvent(false);
         _buffersize = bufferSize;
         _buffer = new List<byte>(_buffersize);
         for (int i = 0; i < _buffersize;i++ )
             _buffer.Add(new byte());
        _readposition = 0;
        _writeposition = 0;
    }

    public override bool CanRead
    {
        get { return true; }
    }

    public override bool CanSeek
    {
        get { return false; }
    }

    public override bool CanWrite
    {
        get { return true; }
    }

    public override long Length
    {
        get { return -1L; }
    }

    public override long Position
    {
        get { return 0L; }
        set {  }
    }

    public override long Seek(long offset, SeekOrigin origin)
    {
        return 0L;
    }

    public override void SetLength(long value)
    {

    }

    public override int Read(byte[] buffer, int offset, int count)
    {
        int i = 0;
        while (i<count && _writeEvent!=null)
        {
            if (!_reset && _readposition >= _writeposition)
            {
                _writeEvent.WaitOne(100, true);
                continue;
            }
            buffer[i] = _buffer[_readposition+offset];
            _readposition++;
            if (_readposition == _buffersize)
            {
                _readposition = 0;
                _reset = false;
            }
            i++;
        }

        return count;
    }

    public override void Write(byte[] buffer, int offset, int count)
    {
        for (int i = offset; i < offset+count; i++)
        {
            _buffer[_writeposition] = buffer[i];
            _writeposition++;
            if (_writeposition == _buffersize)
            {
                _writeposition = 0;
                _reset = true;
            }
        }
        _writeEvent.Set();

    }

    public override void Close()
    {
        _writeEvent.Close();
        _writeEvent = null;
        base.Close();
    }

    public override void Flush()
    {

    }
}

... and using an instance of that as the stream input to the SetInputToAudioStream method. As soon as the stream returns a length or the returned count is less than that requested the recognition engine thinks the input has finished. This sets up a circular buffer that never finishes.

Sean
  • 1,761
  • 1
  • 21
  • 26
  • Hi Sean, I've been trying to get your solution to work but so far not managed it. As with others above everything works fine from disk file but just doesn't work with MemoryStream. Do you occasionally issue a recognize request, or are you able to use the SpeechHypothesized, SpeechRecognized events? Could you post any more code to help? Thanks! – timemirror Aug 23 '12 at 09:23
  • Sorry, missed your question, there you go. With that i'm able to do real time speech recognition and also stream the audio feed out over the network (part of my open source project ispy - http://www.ispyconnect.com) – Sean Oct 23 '12 at 04:51
  • Thanks Sean...great looking project. – timemirror Oct 24 '12 at 11:59
  • You are a genius Sean! The latest version of your code works perfectly! Capturing the output from Skype and running against SAPI for speech recognition. Thanks very much for your help.... – timemirror Oct 25 '12 at 16:39
  • 2
    Hi timemirror do you have a little sample code using SpeechStreamer with skype ? – Jean-Philippe Encausse Jun 19 '13 at 20:20
  • @Sean hey, could you please give me a pointer on how to use this class? It seems really promising and I would love to use it. But how do I set it to use an existing Stream? – Monacraft Feb 20 '18 at 09:26
  • If you want, I could ask this question again (and that way mark this as correct). Sorry for digging up this from the past ahahaha – Monacraft Feb 20 '18 at 09:27
  • @Monacraft just read from your existing stream and write into this one – Sean Feb 20 '18 at 11:43
  • I got it to "recognize" spoken text on computer using this but unfortunately it's completely bogus for me trying with NAudio methods... See https://stackoverflow.com/questions/58678228/ – NoBugs Nov 03 '19 at 07:30
2

Have you tried wrapping the network stream in a System.IO.BufferedStream?

NetworkStream netStream = new NetworkStream(socket,true);
BufferedStream buffStream = new BufferedStream(netStream, 8000*16*1); // buffers 1 second worth of data
appRecognizer.SetInputToAudioStream(buffStream, formatInfo);
Serguei
  • 2,833
  • 3
  • 22
  • 32
  • Did you verify that the buffered stream supported seeking? I.e., in the above code, does buffStream.CanSeek() return true? – Eric Brown Nov 16 '09 at 20:11
2

Apparently it can't be done ("By design"!). See http://social.msdn.microsoft.com/Forums/en/netfxbcl/thread/fcf62d6d-19df-4ca9-9f1f-17724441f84e

2

This is my solution.

class FakeStreamer : Stream
{
    public bool bExit = false;
    Stream stream;
    TcpClient client;
    public FakeStreamer(TcpClient client)
    {
        this.client = client;
        this.stream = client.GetStream();
        this.stream.ReadTimeout = 100; //100ms
    }
    public override bool CanRead
    {
        get { return stream.CanRead; }
    }

    public override bool CanSeek
    {
        get { return false; }
    }

    public override bool CanWrite
    {
        get { return stream.CanWrite; }
    }

    public override long Length
    {
        get { return -1L; }
    }

    public override long Position
    {
        get { return 0L; }
        set { }
    }
    public override long Seek(long offset, SeekOrigin origin)
    {
        return 0L;
    }

    public override void SetLength(long value)
    {
        stream.SetLength(value);
    }
    public override int Read(byte[] buffer, int offset, int count)
    {
        int len = 0, c = count;
        while (c > 0 && !bExit)
        {
            try
            {
                len = stream.Read(buffer, offset, c);
            }
            catch (Exception e)
            {
                if (e.HResult == -2146232800) // Timeout
                {
                    continue;
                }
                else
                {
                    //Exit read loop
                    break;
                }
            }
            if (!client.Connected || len == 0)
            {
                //Exit read loop
                return 0;
            }
            offset += len;
            c -= len;
        }
        return count;
    }

    public override void Write(byte[] buffer, int offset, int count)
    {
        stream.Write(buffer,offset,count);
    }

    public override void Close()
    {
        stream.Close();
        base.Close();
    }

    public override void Flush()
    {
        stream.Flush();
    }
}

How to Use:

//client connect in
TcpClient clientSocket = ServerSocket.AcceptTcpClient();
FakeStreamer buffStream = new FakeStreamer(clientSocket);
...
//recognizer init
m_recognizer.SetInputToAudioStream(buffStream , audioFormat);
...
//recognizer end
if (buffStream != null)
    buffStream.bExit = true;
gmuraleekrishna
  • 3,110
  • 1
  • 24
  • 42
Hassen
  • 21
  • 1
1

I ended up buffering the input and then sending it to the speech recognition engine in successively larger chunks. For instance, I might send at first the first 0.25 seconds, then the first 0.5 seconds, then the first 0.75 seconds, and so on until I get a result. I am not sure if this is the most efficient way of going about this, but it yields satisfactory results for me.

Best of luck, Sean

spurserh
  • 553
  • 3
  • 15
  • 1
    I'm too having problems with SAPI and MemoryStreams..just can't get it to work although everything works fine from Default input, or File. When you said you got it working using a buffer, do you mean you use the BufferStream approach that Serguei suggested, or do you just hold back the recognition until the MemoryStream is larger? I've tried both without success. Are you using the SpeechHypothesized, SpeechRecognized events, or forcing RecognitionResult rr = recognizer.Recognize() every so often? Are you able to post any more code to help? Would be much appreciated. – timemirror Aug 23 '12 at 09:28