Is it possible to see if two MP3 files are the same song by analyzing the files' bytes?

Question

This is to be done in C++ or C.... I know we can read the MP3s' meta data, but that information can be changed by anyone, can't it? So is there a way to analyze a file's contents and compare it against another file and determine if it is in fact the same song?

edit Lots of interesting things coming out that I hadn't thought of. Not at all a good idea to attempt this.

You might find http://www.codinghorror.com/blog/2010/09/youtube-vs-fair-use.html to be of interest when contemplating these issues. — Brian, Dec 20 '10 at 15:22

score 13 · Accepted Answer · answered Dec 19 '10 at 12:29

It's possible, but very hard.

Even the same original recording may well be encoded differently by different MP3 encoders or the same encoder with different settings... leading to different results when the MP3 is then decoded. You'd need to work out an aural model to "understand" how big the differences are, and make a judgement.

Then there's the matter of different recordings. If I sing "Once in Royal David's City" and Aled Jones sings it, are those the same song? What if there are two different versions of a song where one has slightly modified lyrics? The key could be different, it could be in a different vocal range - all kinds of things.

How different can two songs be but still count as "the same song"? Once you've decided that, then there's the small matter of implementing it ;)

+1 for "very hard" (it's quite possibly an extant research problem, at least in the general case) — Stuart Golodetz, Dec 19 '10 at 12:58

score 5 · Answer 2 · answered Dec 19 '10 at 13:13

5

If I really had to do this, my first attempt would be to take a Fourier transform of both songs and compare the histograms. You can use FFTW (http://www.fftw.org/) to take the Fourier transform, and then compare the histograms by summing the squares of the differences at each frequency. If the resultant sum is greater than some threshold (which you must determine by experimentation) then the songs are deemed to be different, otherwise they are the same.

answered Dec 19 '10 at 13:13

jstanley

724
4
6

@user547703: This sounds like a promising starting point, but it won't handle the general case -- for example, what if the two recordings are sung in a different key, or at a different tempo? – Stuart Golodetz Dec 19 '10 at 13:17
+1 for the brute force and quite simple but not so detailed: "Use Fourier" approach `:)` – rubenvb Dec 19 '10 at 13:18
1

@Stuart: Might we assume that @robinsonc494 meant "same song" in the *iTunes* sense of the phrase, meaning *same recording*? – Clifford Dec 19 '10 at 13:38
@Clifford: It's not clear what was meant, but I'm happy to up-vote. I think the general case problem is a lot more interesting, though, for what it's worth. – Stuart Golodetz Dec 19 '10 at 13:40
Note that if the source files are the same samplerate, you can skip doing any Fourier transform and just decode the mp3 bitstream without performing IMDCT on it. Then you have frequency-domain data for free. – R.. GitHub STOP HELPING ICE Dec 19 '10 at 14:05
Reading all the responses on here I've realised how flawed that approach would be...how do media players determine how to get the image thumbnail if none is available? Or on youtube when you're watch a vid that has a song in the background they display an ad to get the song... – zcourts Dec 19 '10 at 22:31

score 2 · Answer 3 · edited Dec 27 '10 at 12:06

2

No. Not SO simple.

You can check they contain the same encoded data, BUT:

Could be a different bitrate
Could be the same song, just a 1/100ths of a second off

In both cases the bytes would not match.

Basically, if a solution looks too simple to be true, it often is.

edited Dec 27 '10 at 12:06

aib

41,235
10
69
75

answered Dec 19 '10 at 12:30

TomTom

1
9
78
143

score 1 · Answer 4 · answered Dec 19 '10 at 13:59

If you mean "same song" in the iTunes sense of "same recording", it would be possible to compares two audio files, but not by byte-by-byte comparison of an encoded file since even for the same format there are variables such as data rate and compression that are selected at time of encoding.

Also each encoding of the same recording may include different lead-in/lead-out timings, different amplitude and equalisation, and may have come from differing original sources (vinyl, CD, original master etc.). So you need a comparison method that takes all these variables into account, and even then you will end up with a 'likelihood' of a match rather than a definitive match.

If you genuinely mean "same song", i.e. any recording by any artist of the same composition and lyrics, then you are unlikely to get a high statistical correlation in most cases since pitch, tempo, range, instrumental arrangement will be very different.

In the "same recording" scenario, relatively simple signal processing and statistical techniques could be applied, in the "same song" scenario, AI techniques would need to be deployed, and even then the results I suspect would be poor.

score 1 · Answer 5 · answered Dec 27 '10 at 11:32

If you want to compare MP3 files that originated from the same MP3, but have tagged with metadata differently, it would be straight forward to just compare the actual audio data. Since it originated from the same MP3 encoding, you should be able to do a byte by byte comparison. You would have to compare all byte. It should be sufficient to sample just a few to get a unique key that would be statistically almost impossible to find in another song.

If the files have been produced by different encoders, you would have to extract some "fuzzy" feature keys from the data and compare those keys. In a hurry I would probably construct an algorithm like this:

Decode audio to pulse-code modulation (wave) in a standard bit rate.
Find a fixed number of feature starting points using some dynamic location algorithm. For example find top 10 highest wave peaks ordered from beginning of wave or simply spread evenly across the wave (it would be a good idea to fix the first and last position dynamically though, since different encodings might not start and end at exactly the same point). An improvement would be to select feature points at positions in the wave that are not likely to be too repetitive.
Extract a set of one-dimensional feature key scalars from the feature points. For example, for each feature normalize the following n-sample values and count the number of zero-crossings, peak to average ratio, mean zero-crossing distance, signal-energy. The goal is to extract robust features that are relatively unique, while still characteristic even if some noise and distortion is added to the signal. This can obviously be improved almost infinitely.
Compare the extracted feature keys of the two files using some accuracy measurement (f.eks. 9 out of 10 feature extractions must match at least 99% on 4 out of 5 of their extracted feature keys).

The benefit of a feature extraction approach is that you can build a database of features for all your mp3-files and for a single file ask the question: What other media files have exactly or almost exactly the same feature as this one. The feature lookup could be implemented very efficiently with R*-trees or similar, which could be used to give you a fast distance measurement between the n-dimensional feature sets.

The above technique is essentially a variant of what is used in image search algorithms such as SIFT, which is probably the base of such application as Photosynth and Google Goggles. In image searching you filter the image for good candidate points for relatively unique features (such as corners of shapes), then you normalize the area around that feature to get normalized color, intensity, scale and direction of features. Finally you extract the features and search an n-dimensional database of features of other images and verify that found features in other images are geometrically positioned in the same pattern as in your search image. The technique for searching audio would be the same, only simpler, since audio is one dimensional.

score 0 · Answer 6 · answered Apr 28 '18 at 20:41

I think the Fast Fourier-Transform (FFT) approach hinted by jstanley is pretty good for most use cases; in particular, it works for verifying that the two are the same release/ same recording by the same artist/ same bitrate / audio quality.

To be more explicit, sox and spek (via command line and GUI, respectively) can do this pretty painlessly.

Spek is pretty foolproof -- just open the software and point it to the two audio files in question.

sox can generate spectograms (FFTs) from the command line line so: sox "$file" -n spectrogram -o "$outfile".

The result from either are two images; if they look basically identical, then for almost all intents and purposes, the two songs will be equivalent.

For example, I wanted to test if these two files:

Soundtrack to an imaginary film mixtape 2011.mp3
DJRUM - Sountrack to an imaginary film mixtape 2011 (for mary-anne hobbs).mp3

were the same. diff reported a difference in the binary files (perhaps due to metadata differences or minor encoding differences), but a quick glance at their spectrograms resolved it:

score 0 · Answer 7 · answered Jul 05 '11 at 02:19

Use the open source EchoPrint library to create a signature of the two audio files, and compare them with each other.

The library is very easy to use, and has clear examples on how to create the signatures.

http://echoprint.me/

You can even query their database with the signature and find matching song metadata (such as title, artist, etc).

Is it possible to see if two MP3 files are the same song by analyzing the files' bytes?

7 Answers7