0

I simply need to know when the user starts talking into the microphone. I will not be doing any speech processing or anything fancy, just detect when the microphone has picked up anything. I've been looking for an hour now and can't find anything as simple as that. Can somebody point me in the right direction?

Update 1

I apologise for how late this is; I have been having connectivity issues. Here's the code I've been using:

override func viewDidLoad() {
    super.viewDidLoad()
    // Do any additional setup after loading the view, typically from a nib.

    let audioEngine = AVAudioEngine()
    let inputNode = audioEngine.inputNode
    let bus = 0
    inputNode.installTapOnBus(bus, bufferSize: 8192, format:inputNode.inputFormatForBus(bus)) {
        (buffer: AVAudioPCMBuffer!, time: AVAudioTime!) -> Void in
        println("Speech detected.")
    }

    audioEngine.prepare()
    audioEngine.startAndReturnError(nil)

}
Youssef Moawad
  • 2,636
  • 5
  • 24
  • 46
  • Here is a Tutorial: http://www.whatsoniphone.com/blog/tutorial-detecting-when-a-user-blows-into-the-mic/ – LoVo Mar 06 '15 at 09:10
  • Post some code as to what you've tried – Bamsworld Mar 06 '15 at 21:58
  • Thank you for updating your question. How many times does the block get called after you start the engine? What is meant to happen, is that the block continuously gets passed in buffers of audio data along with a timestamp. It is up to you to then read the buffers, to help determine how you interpret the data, and make the assumption of whether or not some predetermined event has occurred. – Bamsworld Mar 15 '15 at 04:03
  • @Bamsworld The block is called just two times right when the application starts, and not anymore. Also, I do not need to read the buffers. I do something that is independent of the sound detected, every time ANY sound is detected. – Youssef Moawad Mar 15 '15 at 12:46
  • see answer here: https://stackoverflow.com/questions/27846392/access-microphone-from-a-browser-javascript – Z80 Apr 30 '20 at 14:18

1 Answers1

0

The callback you're passing to installTapOnBus will be called with every audio block coming from the mic. The code above detects that your app has started listening -- to silence or anything else -- not whether someone is speaking into the mic.

In order to actually identify the start of speech you would need to look at the data in the buffer.

A simple version of this is similar to an audio noise gate used in PA systems: Pick an amplitude threshold and a duration threshold and once both are met you call it speech. Because phones, mics, and environments all vary you will probably need to adaptively determine an amplitude threshold to get good performance.

Even with an adaptive threshold, sustained loud sounds will still be considered "speech". If you need to weed those out too then you'll want to look at some sort of frequency analysis (e.g., FFT) and identify sufficient amplitude and variation over time in speech frequencies. Or you could pass the buffers to the speech recognition engine (e.g., SFSpeechRecognizer) and see whether it recognizes anything, hence piggybacking on Apple's signal processing work. But that's pretty heavyweight if you do it often.

Jason Campbell
  • 424
  • 4
  • 10