35

I want to plot the pitch of a sound into a graph.

Currently I can plot the amplitude. The graph below is created by the data returned by getUnscaledAmplitude():

alt text

AudioInputStream audioInputStream = AudioSystem.getAudioInputStream(new BufferedInputStream(new FileInputStream(file)));
byte[] bytes = new byte[(int) (audioInputStream.getFrameLength()) * (audioInputStream.getFormat().getFrameSize())];
audioInputStream.read(bytes);

// Get amplitude values for each audio channel in an array.
graphData = type.getUnscaledAmplitude(bytes, 1);


public int[][] getUnscaledAmplitude(byte[] eightBitByteArray, int nbChannels)
{
    int[][] toReturn = new int[nbChannels][eightBitByteArray.length / (2 * nbChannels)];
    int index = 0;

    for (int audioByte = 0; audioByte < eightBitByteArray.length;)
    {
        for (int channel = 0; channel < nbChannels; channel++)
        {
            // Do the byte to sample conversion.
            int low = (int) eightBitByteArray[audioByte];
            audioByte++;
            int high = (int) eightBitByteArray[audioByte];
            audioByte++;
            int sample = (high << 8) + (low & 0x00ff);

            toReturn[channel][index] = sample;
        }
        index++;
    }

    return toReturn;
}

But I need to show the audio's pitch, not amplitude. Fast Fourier transform appears to get the pitch, but it needs to know more variables than the raw bytes I have, and is very complex and mathematical.

Is there a way I can do this?

Amy B
  • 17,377
  • 12
  • 61
  • 81
  • 3
    You want to get frequency-domain information, but you don't want to use *THE* method of obtaining it? – Oliver Charlesworth Jan 16 '11 at 22:49
  • @Oli - I'm asking what the method is to obtain it, and said FFT doesn't appear to be the right method. You don't need to write aggressively. – Amy B Jan 16 '11 at 22:50
  • 1
    @Coronatus: Sorry, that wasn't meant to be aggressive, just intrigued. You appear to have rejected the FFT as an approach, but without really detailing what you see as its shortcomings... – Oliver Charlesworth Jan 16 '11 at 22:55
  • Why doesn't FT appear to be the right method? This is exactly what it is built to do. Your response doesn't make sense. – Hovercraft Full Of Eels Jan 16 '11 at 22:56
  • From the original post: `Fast Fourier transform appears to get the pitch, but it needs to know more variables than the raw bytes I have, and is very complex and mathematical.` – Amy B Jan 16 '11 at 22:58
  • 1
    Yes it is very complex and mathematical but it is what you need, and if you read the Wikipedia you'll see why it is crucial. Your job is to get your data to work with it. End of story. – Hovercraft Full Of Eels Jan 16 '11 at 22:59
  • 2
    If it's of any help, the FFT is really just an efficient way to implement the DFT (http://en.wikipedia.org/wiki/Discrete_Fourier_transform), which is a lot simpler to write (but runs in O(N^2) rather than O(N log N)). – Oliver Charlesworth Jan 16 '11 at 23:04
  • @Oli Charlesworth, your responses should be in an answer, not a comment, so they can be voted up as they have been eminently correct. – Hovercraft Full Of Eels Jan 16 '11 at 23:09
  • 1
    Do you need to display pitch or the frequency spectrum? The former is very tricky and its accuracy depends highly on what's offered on the input. The latter had been done countless times and FFT is the way to go. Check out http://www.dspdimension.com/admin/dft-a-pied/. Not about FFT but it teaches you the basics in simple language – Rob Vermeulen Jan 17 '11 at 12:10
  • what is graphData here in above code? – Shreyash Mahajan Jul 02 '13 at 07:15
  • What type is represented in type.getUnscaledAmplitude(bytes, 1); and where it been ceated? – eyal Dec 29 '14 at 09:05

3 Answers3

50

Frequency (an objective metric) is not the same as pitch (a subjective quantity). In general, pitch detection is a very tricky problem.

Assuming you just want to graph the frequency response for now, you have little choice but to use the FFT, as it is THE method to obtain the frequency response of time-domain data. (Well, there are other methods, such as the discrete cosine transform, but they're just as tricky to implement, and more tricky to interpret).

If you're struggling with the implementation of the FFT, note that it's really just an efficient algorithm for calculating the discrete Fourier transform (DFT); see http://en.wikipedia.org/wiki/Discrete_Fourier_transform. The basic DFT algorithm is much easier (just two nested loops), but runs a lot slower (O(N^2) rather than O(N log N)).

If you wish to do anything more complex than simply plotting frequency content (like pitch detection, or windowing (as others have suggested)), I'm afraid you are going to have learn what the maths means.

Oliver Charlesworth
  • 252,669
  • 29
  • 530
  • 650
24

Fast Fourier Transform doesn't need to know more then the input bytes you have. Don't be scared off by the Wikipedia article. An FFT algorithm will take your input signal (with the common FFT algorithms the number of samples is required to be a power of 2, e.g. 256, 512, 1024) and return a vector of complex numbers with the same size. Because your input is real, not complex, (imaginary portion set to zero) the returned vector will be symmetric. Only half of it will contain data. Since you do not care about the phase you can simply take the magnitude of the complex numbers, which is sqrt(a^2+b^2). Just taking the absoulte value of a complex number may also work, in some languages this is equivalent to the previous expression.

There are Java implementations of FFT available, e.g.: http://www.cs.princeton.edu/introcs/97data/FFT.java.html

Pseudo code will look something like:

Complex in[1024];
Complex out[1024];
Copy your signal into in
FFT(in, out)
for every member of out compute sqrt(a^2+b^2)
To find frequency with highest power scan for the maximum value in the first 512 points in out

The output will contain entires for frequencies between zero and half your sampling frequency.

Since FFT assumes a repeating signal you may want to apply a window to your input signal. But don't worry about this at first.

You can find more information on the web, e.g.: FFT for beginners

Also as Oli notes when multiple frequencies are present the perceived pitch is a more complex phenomenon.

Paul R
  • 195,989
  • 32
  • 353
  • 519
Guy Sirton
  • 7,981
  • 1
  • 24
  • 35
  • 1
    Note that *frequency* (an objective metric) is not the same as *pitch* (a subjective quantity). In general, pitch detection is a very tricky problem. – Oliver Charlesworth Jan 16 '11 at 23:05
  • That's correct but obtaining a power spectrum is probably the first step anyways. The perceived pitch may not be the frequency with the most power and some more work would be required. – Guy Sirton Jan 16 '11 at 23:08
  • Why does output contain entries for frequencies between zero and half your sampling frequency, not the entire sampling frequency? – Zen Feb 23 '17 at 02:13
  • 1
    @Zen That's to do with Nyquist limit ( https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem ). For a given sample rate you can only distinguish frequencies up to half the sample rate. Typically you want to make sure the signal you're processing doesn't contain higher frequencies by pre-filtering it before you sample. One extreme example is that a signal that is a sine wave at the sampling frequency will look like a DC signal since you will sample it at the same phase. That's aliasing, the higher frequency created an alias at the DC frequency (0Hz). – Guy Sirton Feb 23 '17 at 06:52
2

There are several other questions on stackoverflow about this problem. Maybe these will help.

Instead, you could try to find a copy of Digital Audio with Java by Craig Lindley. I don't think it's in print anymore, but the copy on my desk has a section on the FFT and also a sample application of a guitar tuner.

Community
  • 1
  • 1
Stewart Murrie
  • 1,301
  • 7
  • 12