Android: Finding fundamental frequency of audio input

Question

So, I've been trying for some time now to find the best solution to calculate the fundamental frequency of a sample captured using AudioRecord in real-time. I have looked around some examples around here on SO: This one, and this one are the questions that helped me the most, but I still did not understand fully how they would work for finding the fundamental frequency. So what I am looking for is a more detailed explanation of what do I need to do to find the fundamental frequency having a sample.

So, I create an AudioRecord:

micData = new AudioRecord(audioSource, sampleRate, channel, encoding, bufferSize);
data = new short[bufferSize];

And start listening:

micData.startRecording();    
sample = micData.read(data,0,bufferSize);

And I understand how to create a Complex array, but I don't know exactly wich methods out of FFT.java I can use the values of to create these complex numbers and the wich one would be the method that returns the peak frequency.

Exactly what don't you understand? Pitch estimation is a large research topic. What are you recording and what are your accuracy requirements? — hotpaw2, Nov 25 '14 at 08:31
Sorry for not being precise, what I still don't understand is how you get the fundamental frequency(peak frequency, wich would be the predominant one I guess, for me to be able to determine wich note is being played) out of an array of FFT. — hsteffano, Nov 25 '14 at 09:27
Google "pitch detection or pitch estimation". There are many research papers (see: http://www.music-ir.org/mirex/wiki/MIREX_HOME) on various techniques. — hotpaw2, Nov 25 '14 at 09:42
You should put Java tag in your question since it's Android related. Otherwise code blocks won't get syntax highlighting... — Alexander, Nov 25 '14 at 09:54
If you strongly insist going with FFT, what you need to do is the following: 1. Fill the real part of the complex array with sample values 2. Call your FFT Calculation method and after that you have the imaginary part too. 3. Calculate the magnitude of each point as Re^2 + Im^2 (If you want the measure in DB you need to additionally calculate 10*log() ) 4. Now your result array is the frequency spectrum where SamplingFreq/numOfSamples gives you the distance between frequency points. For index i from the result array the frequency is (i+1)*SamplingFreq/numOfSamples. — Alexander, Nov 25 '14 at 10:39
Also keep in mind that you need to discard the second half from your result array because according to the sampling theorem, the recorded sound frequency range is half from the sampling frequency. (For 44100Hz, recorded sound goes to 22050Hz, so you discard the second half (22050hz-44100hz)... — Alexander, Nov 25 '14 at 10:42
This is not an easy problem to solve, and the design solution depends very much on context and the signals involved. In particular, finding the spectral peak from a DFT is more often than not a broken solution. — marko, Nov 25 '14 at 20:20

Alexander · Accepted Answer · 2014-11-26T08:35:06.453

Reading your question I see you are not sure yet that you want to use FFT. That's good because I don't recommend using just FFT. Stay in time domain, use Autocorrelation or AMDF and if you want more accurate results, than use FFT as a additional component.

Here is my Java code for calculating fundamental frequency. I wrote comments because you say you still don't understand the process.

public double getPitchInSampleRange(AudioSamples as, int start, int end) throws Exception {
    //If your sound is musical note/voice you need to limit the results because it wouldn't be above 4500Hz or bellow 20Hz
    int nLowPeriodInSamples = (int) as.getSamplingRate() / 4500;
    int nHiPeriodInSamples = (int) as.getSamplingRate() / 20;

    //I get my sample values from my AudioSamples class. You can get them from wherever you want
    double[] samples = Arrays.copyOfRange((as.getSamplesChannelSegregated()[0]), start, end);
    if(samples.length < nHiPeriodInSamples) throw new Exception("Not enough samples");

    //Since we're looking the periodicity in samples, in our case it won't be more than the difference in sample numbers
    double[] results = new double[nHiPeriodInSamples - nLowPeriodInSamples];

    //Now you iterate the time lag
    for(int period = nLowPeriodInSamples; period < nHiPeriodInSamples; period++) {
        double sum = 0;
        //Autocorrelation is multiplication of the original and time lagged signal values
        for(int i = 0; i < samples.length - period; i++) {
            sum += samples[i]*samples[i + period];
        }
        //find the average value of the sum
        double mean = sum / (double)samples.length;
        //and put it into results as a value for some time lag. 
        //You subtract the nLowPeriodInSamples for the index to start from 0.
        results[period - nLowPeriodInSamples] = mean;
    }
    //Now, it is obvious that the mean will be highest for time lag equal to the periodicity of the signal because in that case
    //most of the positive values will be multiplied with other positive and most of the negative values will be multiplied with other
    //negative resulting again as positive numbers and the sum will be high positive number. For example, in the other case, for let's say half period
    //autocorrelation will multiply negative with positive values resulting as negatives and you will get low value for the sum.        
    double fBestValue = Double.MIN_VALUE;
    int nBestIndex = -1; //the index is the time lag
    //So
    //The autocorrelation is highest at the periodicity of the signal
    //The periodicity of the signal can be transformed to frequency
    for(int i = 0; i < results.length; i++) {
        if(results[i] > fBestValue) {
            nBestIndex = i; 
            fBestValue = results[i]; 
        }
    }
    //Convert the period in samples to frequency and you got yourself a fundamental frequency of a sound
    double res = as.getSamplingRate() / (nBestIndex + nLowPeriodInSamples)

    return res;
}

What else you need to know is that there are common octave mistakes in the autocorrelation method especially if you have noise in the signal. From my experience, piano sound or guitar isn't problem. The mistakes are rare. But human voice could be...

Thank you so much, this is really helpful. In your code, nLowPeriodInSamples and nHiPeriodInSamples couldn't just be set as 4500 and 20 straight away? And in the double array(samples) I can use the AudioRecord, right? Also, what exactly is bestIndices[i]? — hsteffano, Nov 25 '14 at 10:20
No.... 20 and 4500 are frequencies in Hz. Since autocorrelation is method in time domain you need to convert the frequency limits to time limits in period in samples — Alexander, Nov 25 '14 at 10:25
Right, sorry, that was obvious. Also, on the double array(samples), can I use the AudioRecord object as it is, or does it need to be converted to an array? — hsteffano, Nov 25 '14 at 10:35
As you wish... you just need access to the sample values, to multiply them and put them to resulting array. If you don't wan't to use the double[] samples array, transfrm the code a little and use your class... — Alexander, Nov 25 '14 at 10:46
Thanks, ok. And one more thing, I see you using a for() for testing the best values, but where did bestIndices[i] come from? — hsteffano, Nov 25 '14 at 10:55
Ah, sorry.... it should be the best Index. In my code was bestIndices[i] because I was getting more than one result. I forgot to change it, but now it is changed. I edited my answer... — Alexander, Nov 25 '14 at 10:59
In your AudioSamples class, how are you capturing and storing the samples? This got really confusing using AudioRecord... — hsteffano, Nov 25 '14 at 21:38
In my case, AudioSamples class is a class from jAudio library and there are multiple constructors (get from file, from AudioInputStream etc...) but as far as I know, you won't be able to use jAudio library on Android. My application wasn't Android. However you can find bunch of answers here and at other forums with answer to your question, how to capture sound. The snippets in your question are correct. — Alexander, Nov 26 '14 at 08:30
It's ok, I got it working. However I am getting an issue, low pitch frequencies are not being recognized correctly, I am using a synth to test it. When I used it running only one time, it recognized the frequencies, but now I am running it on a thread that sleeps every 500ms, and for frequencies below 600Hz, it simply doens't recognize them. Any ideas? — hsteffano, Nov 26 '14 at 09:03
I don't know, is it a octave error or what ? Try changing the number of samples in the array (calculating on longer or shorter array). A lot of things are in question here. 600hz limit shouldn't be a problem at all... — Alexander, Nov 27 '14 at 08:11
Still not.Notes above C37 have the frequencies recognized fine, the ones below that get it confused, like if it didn't know the peak frequency, so it keeps showing very high or very low ones. — hsteffano, Nov 27 '14 at 23:20

Android: Finding fundamental frequency of audio input

1 Answers1

Linked