37

Ok what im trying to do is a kind of audio processing software that can detect a prevalent frequency an if the frequency is played for long enough (few ms) i know i got a positive match. i know i would need to use FFT or something simiral but in this field of math i suck, i did search the internet but didn not find a code that could do only this.

the goal im trying to accieve is to make myself a custom protocol to send data trough sound, need very low bitrate per sec (5-10bps) but im also very limited on the transmiting end so the recieving software will need to be able custom (cant use an actual hardware/software modem) also i want this to be software only (no additional hardware except soundcard)

thanks alot for the help.

MatijaG
  • 748
  • 1
  • 7
  • 11
  • 1
    This may be helpful (be sure to read the replies): http://www.keyongtech.com/5003865-frequency-analysis-without-numpy – ChristopheD Apr 15 '10 at 19:07

4 Answers4

48

The aubio libraries have been wrapped with SWIG and can thus be used by Python. Among their many features include several methods for pitch detection/estimation including the YIN algorithm and some harmonic comb algorithms.

However, if you want something simpler, I wrote some code for pitch estimation some time ago and you can take it or leave it. It won't be as accurate as using the algorithms in aubio, but it might be good enough for your needs. I basically just took the FFT of the data times a window (a Blackman window in this case), squared the FFT values, found the bin that had the highest value, and used a quadratic interpolation around the peak using the log of the max value and its two neighboring values to find the fundamental frequency. The quadratic interpolation I took from some paper that I found.

It works fairly well on test tones, but it will not be as robust or as accurate as the other methods mentioned above. The accuracy can be increased by increasing the chunk size (or reduced by decreasing it). The chunk size should be a multiple of 2 to make full use of the FFT. Also, I am only determining the fundamental pitch for each chunk with no overlap. I used PyAudio to play the sound through while writing out the estimated pitch.

Source Code:

# Read in a WAV and find the freq's
import pyaudio
import wave
import numpy as np

chunk = 2048

# open up a wave
wf = wave.open('test-tones/440hz.wav', 'rb')
swidth = wf.getsampwidth()
RATE = wf.getframerate()
# use a Blackman window
window = np.blackman(chunk)
# open stream
p = pyaudio.PyAudio()
stream = p.open(format =
                p.get_format_from_width(wf.getsampwidth()),
                channels = wf.getnchannels(),
                rate = RATE,
                output = True)

# read some data
data = wf.readframes(chunk)
# play stream and find the frequency of each chunk
while len(data) == chunk*swidth:
    # write data out to the audio stream
    stream.write(data)
    # unpack the data and times by the hamming window
    indata = np.array(wave.struct.unpack("%dh"%(len(data)/swidth),\
                                         data))*window
    # Take the fft and square each value
    fftData=abs(np.fft.rfft(indata))**2
    # find the maximum
    which = fftData[1:].argmax() + 1
    # use quadratic interpolation around the max
    if which != len(fftData)-1:
        y0,y1,y2 = np.log(fftData[which-1:which+2:])
        x1 = (y2 - y0) * .5 / (2 * y1 - y2 - y0)
        # find the frequency and output it
        thefreq = (which+x1)*RATE/chunk
        print "The freq is %f Hz." % (thefreq)
    else:
        thefreq = which*RATE/chunk
        print "The freq is %f Hz." % (thefreq)
    # read some more data
    data = wf.readframes(chunk)
if data:
    stream.write(data)
stream.close()
p.terminate()
Justin Peel
  • 45,189
  • 5
  • 55
  • 78
  • wow great thanks, this looks like will do now i only gota figure how to read the audio real time from auido input (microphone) – MatijaG Apr 15 '10 at 23:56
  • 2
    Go the PyAudio site http://people.csail.mit.edu/hubert/pyaudio/ and scroll down the page to the examples. You'll see some that take input from the microphone. – Justin Peel Apr 16 '10 at 00:13
  • uhm can u help me figure why is this error happening: "need more than 0 values to unpack" on the following line "y0,y1,y2 = np.log(fftData[which-1:which+2:])" – MatijaG Apr 16 '10 at 14:51
  • Yeah, that was kind of buggy there. I've fixed it. The problem was that if which was = to 0 or the last value of fftData, then it wouldn't return 3 values there. We don't want the value in the 0 bin of fftData anyway (it is the DC offset). – Justin Peel Apr 16 '10 at 15:10
  • Would it have been possible to use np.fft.fftfreq to get the frequencies instead of having to do the conversion and interpolation yourself? – ad rees Mar 30 '11 at 08:40
  • @ad rees, it doesn't look like np.fft.fftfreq does what we want. – Justin Peel Mar 30 '11 at 15:03
  • @JustinPeel I want to compare two audios, can you help me please? – ehsandotnet Jul 07 '13 at 12:51
  • Someone know how to get fundamental frequency from the microphone audio input? – GiacomoLicari Nov 07 '14 at 10:57
  • 1
    Why is my length of the data always double the chunk*sample width? I cant enter the loop. – Liam Larsen Feb 27 '20 at 02:56
9

If you're going to use FSK (frequency shift keying) for encoding data, you're probably better off using the Goertzel algorithm so you can check just the frequencies you want, instead of a full DFT/FFT.

Guilherme
  • 91
  • 1
  • 1
2

You can find the frequency spectrum of the sliding windows over your sound from here and then check the presence of the prevalent frequency band via finding the area under the frequency spectrum curve for that band from here.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import auc
np.random.seed(0)

# Sine sample with a frequency of 5hz and add some noise
sr = 32  # sampling rate
y = np.linspace(0, 5 * 2*np.pi, sr)
y = np.tile(np.sin(y), 5)
y += np.random.normal(0, 1, y.shape)
t = np.arange(len(y)) / float(sr)

# Generate frquency spectrum
spectrum, freqs, _ = plt.magnitude_spectrum(y, sr)

# Calculate percentage for a frequency range 
lower_frq, upper_frq = 4, 6
ind_band = np.where((freqs > lower_frq) & (freqs < upper_frq))
plt.fill_between(freqs[ind_band], spectrum[ind_band], color='red', alpha=0.6)
frq_band_perc = auc(freqs[ind_band], spectrum[ind_band]) / auc(freqs, spectrum)
print('{:.1%}'.format(frq_band_perc))
# 19.8%

enter image description here

Reveille
  • 2,944
  • 2
  • 17
  • 36
0

While I haven't tried audio processing with Python before, perhaps you could build something based on SciPy (or its subproject NumPy), a framework for efficient scientific/engineering numerical computation? You might start by looking at scipy.fftpack for your FFT.

Karmastan
  • 5,398
  • 17
  • 24
  • 1
    ok i found this http://www.swharden.com/blog/2010-03-05-realtime-fft-graph-of-audio-wav-file-or-microphone-input-with-python-scipy-and-wckgraph/ tho now i wonder how will i find the freq range that is at the highest (also the SciPy helped a bit thanks – MatijaG Apr 15 '10 at 19:40
  • 1
    so did you figure out how to do this? – kRazzy R Dec 05 '17 at 18:10