0

I'm working on an app (Rails 3.2, Mongoid) that takes all (by all I mean 'a lot') photos from a user. First, we get the albums (<userid>/albums), then, for each album, we take the photos in it (<albumid>/photos), then, the user photos (<userid>/photos). Then we do some operations on each photo, and save them to a DB.

Right now, as a safety net, for each 'orphan' photo, I check if the photo is not already present, to avoid duplicates. The current check is done on the db with a exists query. However, is makes lots of db queries, and this is not acceptable. I tried to do it server side, using an array to keep track of the photos, but it was way slower (I used Array#select if my memory is correct).

So, two questions : 1/ Is this 'safety net' useful, or can I take for granted that the orphan photos cannot be duplicates ? I'm thinking yes, but I think the check is done for a reason. 2/ If I have to check in order to be sure that there is no duplicates in the database, how should I do it in order to be efficient ?

EDIT

Ok looks like there truly can be duplicates, so question 1/ is solved. Now about 2: is it possible to "guess" from the retrieved photo fields if it belongs to an album, even though there's no field like album_id ? As in "if the photo is not from user and user is tagged in" -> orphan ?

Thanks for you time!

ksol
  • 10,940
  • 5
  • 35
  • 63

3 Answers3

1

Ok - we're gonna get our hands a little bit dirty and attempt to locate an orphan photo's album.

Disclaimer, these methods are subject to changes by Facebook that might not be announced.
I.E. - We will be taking advantage of URI structures that mean nothing in the Graph API. Facebook might change these URI's but leave the API unchanged therefore not needing to alert developers of any changes.

If you make a graph API call to /me/PHOTO_ID, you'll get a response similar to this :

{
  "id": "101...", 
  "from": {
    "name": "Lix", 
    "id": "101..."
  }, 
  "name": "Carrot cake chocolate cake.", 
  "picture": "https://fbcdn-photos...jpg", 
  ...
  "link": "https://www.facebook.com/photo.php?fbid=101...&set=a.105...&type=1", 
  ...
}

So I've stripped down that response so that we can talk specifically about the link property. As you can see, it is not a link that has anything to do with the API, it is a sort of perma-link to the image within Facebook. If you navigate to that URL you'll get something like this. The classic (no lightbox style) photo view.

Lets look at the link parameter a bit closer, and this time with a real example -

https://www.facebook.com/photo.php?fbid=376995716728&set=a.376995711728.190761.20531316728&type=3

We can see it has 3 parameters :

  1. fbid
  2. set
  3. type

As absurd as it sounds (and its pretty absurd :), the fbid parameter is in fact the photo_id. The set parameter has a format of a.{NUM}.{NUM}.{NUM}. The first batch of numbers after the a. is the album_id of that photo.

Now that you see the method, you can tell that it is vulnerable to change at any time. Facebook pushes updates to their UI all the time without having to publish it with a 90 day breaking change..yadda..yadda..yadda... Its simply the URL's that they use for internal navigation within the site. So, beware...

Lix
  • 45,171
  • 10
  • 95
  • 118
  • 1
    The fact that this can go away any day is blocking, otherwise this is definitely an interesting way to go about it. – ksol Apr 15 '12 at 19:44
  • They haven't changed since I have been using their systems... but yes... in production code this would not be recommended. – Lix Apr 15 '12 at 19:45
  • 1
    I don't think they will change it, at least without a warning -- after all, lots of links would be broken -- but yep, not really safe for production – ksol Apr 15 '12 at 19:47
1

Yes you can :-)

From the documentation: Photo FQL table

For a given photo with id XXXXXX:

select owner,album_object_id from photo where object_id=XXXXXX

If you don't get anything, that means you queried a photo which is out of your

access token reach (query a photo of someone else).

You need user_photos permission to access the user photo or friends_photos to access one of the user friends photo.

Otherwise, you should have the album id of that photo in album_object_id

Make sure in the Graph API explorer to click the Get Access Token button and check the user_photos permission when you test the query.

P.S.

I have tested this on various photos just to be sure and all tests came back positive :-)

Link to test in Graph API explorer:

https://developers.facebook.com/tools/explorer/?method=GET&path=fql%3Fq%3Dselect%20owner%2Calbum_object_id%20from%20photo%20where%20object_id%3DXXXXXXXX 

(don't forget to change the XXXXXXXX with the photo id)

Roni
  • 3,196
  • 3
  • 18
  • 27
0

Not sure how much a part of your question involves the detection of identical images. If this is indeed part of the issue you are addressing, then one sorta' brute force approach might be:

Examine image dimensions and filesize.  If no other image has these same properties(height, width, filesize), then the image cannot be a duplicate.  

If two images may be duplicates, perform pixel-by-pixel digital subtraction.  If result is zero, then images are duplicates.  

Depending on your dataset, this might be good combination of "Not too hard to implement" + "Not too processor intensive" + "Will always return an accurate result"

Many other more elegant approaches exist. Some discussion can bee seen here:
Image comparison - fast algorithm
and here:
Detecting image equality at different resolutions

If the issue you are discussing is more a question of "How can I determine the original PATH or FILE_LOCATION of a given image, then I suppose you have to know the origin of each photo as it is imported.

Community
  • 1
  • 1
Perry Horwich
  • 2,662
  • 3
  • 20
  • 47
  • This could be useful if the dataset was not large, and if we actually had the image data & the filesize. The Facebook API returns a few fields about a photo, but no filesize. And we don't download the actual picture on our servers, since we have the links to it – ksol Apr 15 '12 at 19:46
  • Oh. How do you, "... do some operations on each photo, and save them to a DB." without having the image data available? I suppose I misunderstood your initial question. – Perry Horwich Apr 15 '12 at 21:59
  • My mistake. I meant operations on the data we retrieve : renaming attributes to fit our models, deducing various statistics from the comments/likes/etc. That's what I meant :-) – ksol Apr 16 '12 at 07:15