0

this is a 2 part question. First off, is it possible to access the audio data in an MP3 independently of the ID3 tags, and secondly, is there any way to do so using available libraries?

I recently consolidated my music collection from 3 computers and ended up with songs which had changed ID3 tags, but the audio data itself was unmodified. Running a search for duplicate files failed because the file changed with the ID3 tag change, but I think it should be possible to identify duplicate files if I just run a deduplication using the audio data for comparison.

I know that it's possible to seek to a particular position past the ID3 header in the file, and directly read the data, but was wondering if there's a library that would expose the audio data so I could just extract the data, run a checksum on it, and store the computed result somewhere, then look for identical checksums. (Also, I'd probably have to use some kind of library when you take into account variable length headers.)

skaffman
  • 381,978
  • 94
  • 789
  • 754
Kyle
  • 3,720
  • 1
  • 19
  • 22
  • Similar questions, since there seems to be no 'link to a different question' box: Reading ID3 tags - http://stackoverflow.com/questions/1645803/how-to-read-mp3-file-tags (Consensus: Use a library) Tag readers for Java - http://stackoverflow.com/questions/73147/i-need-an-id3-tag-reader-library-for-java-preferably-a-fast-one http://stackoverflow.com/questions/86083/any-good-recommendations-for-mp3-sound-libraries-for-java http://stackoverflow.com/questions/278612/java-mp3-audio-editing-trimming-library – Kyle May 29 '10 at 05:18
  • Also, this seems to be my best bet, as far as I can tell: http://stackoverflow.com/questions/476227/detect-duplicate-mp3-files-with-different-bitrates-and-or-different-id3-tags – Kyle May 29 '10 at 05:19

1 Answers1

0

Coincidentally I wanted to do something similar the other day.

Here is a Ruby script that I whipped up:

http://code.google.com/p/kodebucket/source/browse/trunk/bin/mp3dump.rb

It dumps mpeg frames to stdout, so one could grab a checksum like so:

# mp3dump.rb file.mp3 | md5sum

user358390
  • 617
  • 4
  • 4
  • Hmm. Kind of what I was looking for, though I have no clue what it's doing. I'll accept it, but I wouldn't mind an explanation of what it's doing. I'm presuming the unless sequence is filtering out the ID3 tags somehow, but can't tell how. A link to whatever doc you used to create this would be awesome. :) – Kyle Jun 06 '10 at 13:51
  • Yeah, it's probably a bit obfuscated; stream of consciousness coding... The gist of it: open an mp3 file; read 4 bytes; if the bytes we've read is a valid mp3 header, read the frame and send it to stdout; otherwise we rewind 3 bytes and try again until we reach the end of the file. I used the following MPEG frame resoure: http://www.datavoyage.com/mpgscript/mpeghdr.htm – user358390 Jun 07 '10 at 18:43
  • This script has turned my MP3 to broken 54 KB. – Pavel Vlasov Aug 28 '12 at 21:13