4

I have almost 6 years of photos spread across the following services: Flickr, Facebook, Instagram and stored in iPhoto.

What would be the best way of programmatically figuring out which photos were missing from each of these services?

Some ideas I had:

  • Using an MD5 of the image thumbnail?
  • Comparing date / time timestamps?

I am looking for a way to generate a list of URLs / filenames which exist on one service but not on another.

I'm not fussy about the language used for the solution, something that runs on OS X.

Tom
  • 30,868
  • 31
  • 81
  • 104

3 Answers3

4

Using an MD5 of the image thumbnail.. wouldn't necessarily work as different services crop their images differently. They also compress their images differently so you would not be able to run with md5 of the larger samples.

Unfortunately, services like facebook also strip out all the EXIF data..

Here is one possible solution:

I bet you can break images up into 2x2 pieces and get an average color for each grid cell. You'd have four scores per image. To judge similarity, you would just do a sum of squares of differences between images.

This is basically just taking the RGB average for an image 4 times. Doing it 4 times helps account for rotation.

For a simpler and faster and more robust analysis, I would also suggest the TinEye API.

If you want to write the similarity-compute algorithm yourself, look here for ideas:

Image fingerprint to compare similarity of many images

Community
  • 1
  • 1
sambehera
  • 941
  • 3
  • 12
  • 31
1

I'll make the assumption that you already know how to get the photos via the various APIs from each service and that the hard part of the problem is comparing the photos. Check out the following answers on SO for how do that:

And if you don't mind paying for a web service that does it for you could try the Match Engine from Tineye.

Community
  • 1
  • 1
Todd Chaffee
  • 6,258
  • 28
  • 39
1

I think that mantaining a local centralized database of your photos should be the starting point of your work. So, if you don't have such a database yet (or it's not up to date), you should proceed and download every piece of information from all of your accounts.

This task shouldn't be too hard. There are several official/unofficial methods and tools to download entire accounts from these social networks.

  1. Facebook gives you directly a convenient zipfile with all your images, wall posts etc., just go to account settings and then select download a copy of your data.
  2. Flickr has a nice tool called Bulkr to download all of your photos.
  3. Instagram doesn't seem to provide official tools to complete this task, but you can choose for example between Instagram Downloader and Instaport.
  4. iPhoto should be already synchronized.

Now that any and all of your photos are on your PC, you'll have to figure out which are identical, similar and so forth. I think that this question should provide the solution to this problem.

Personally, I vote for this method, in the hope that pHash can be compiled under OS X. If pHash compiles and works, you can do a first pass of MD5, SHA1 or whatever to identify an exact match. If there is no such a match, you can then run pHash to see how close the two images are.

I could (given enough time) script everything in bash under Linux. I suppose that this could work also under Mac OS X, but probably you can achieve the same result with maybe even less coding in Cocoa.

When you find which photos are missing from a given service, you can finally push them to that service. But I suppose that here starts another question :)

Community
  • 1
  • 1
Avio
  • 2,418
  • 4
  • 26
  • 45