16

I want to get the full history of my wall. But I seem to hit a limit somewhere back in June.

I do multiple calls like this:

SELECT created_time,message FROM stream WHERE source_id=MY_USER_ID LIMIT 50

SELECT created_time,message FROM stream WHERE source_id=MY_USER_ID LIMIT 51,100

and so on...

But I always end up on the same last (first) post on my wall. Through facebook.com I can go back much longer so Facebook obviously have the data.

Why am I not getting older posts? Is there another way to scrape my history?

sakibmoon
  • 1,921
  • 3
  • 19
  • 32
fdqps
  • 283
  • 1
  • 2
  • 9

9 Answers9

12

From http://developers.facebook.com/docs/reference/fql/stream :

The stream table is limited to the last 30 days or 50 posts, whichever is greater

digital illusion
  • 486
  • 3
  • 18
  • 10
    This answer is not entirely correct. The same reference also says: `however you can use time-specific fields such as created_time along with FQL operators (such as < or >) to retrieve a much greater range of posts.` So maybe the solution here is to **not** use `LIMIT` but to use the time paremeters. – kongo09 Aug 23 '11 at 16:56
  • I'll add 30 days is usually much more than 100. – studgeek Apr 05 '12 at 21:03
  • I've tried using time parameters, I tried a 3 day period and it worked. I tried a 30 day period and it didn't work. It only gave me 14 posts from a 9 day period. – crunkchitis Jun 06 '12 at 15:42
9

I am experiencing the same thing. I don't understand it at all, but it appears that the offset cannot be greater than the limit * 1.5

Theoretically, this means that always increasing the limit to match the offset would fix it, but I haven't been able to verify this (I'm not sure whether the problems I'm seeing are other bugs in my code or if there are other limitations I don't understand about getting the stream).

Can anyone explain what I'm seeing and whatever I'm missing?

You can reproduce my results by going to the FQL Test Console:

http://developers.facebook.com/docs/reference/rest/fql.query

pasting in this query:

SELECT post_id, created_time, message, likes, comments, attachment, permalink, source_id, actor_id 
FROM stream 
WHERE filter_key IN 
(
      SELECT filter_key 
      FROM stream_filter 
      WHERE uid=me() AND type='newsfeed'
) 
AND is_hidden = 0 limit 100 offset 150

When you click "Test Method" you will see one of the 2 results I am getting:

  1. The results come back: [{post_id:"926... (which I expected)
  2. It returns empty [] (which I didn't expect)

You will likely need to experiment by changing the "offset" value until you find the exact place where it breaks. Just now I found it breaks for me at 155 and 156.

Try changing both the limit and the offset and you'll see that the empty results don't occur at a particular location in the stream. Here are some examples of results I've seen:

  • "...limit 50 offset 100" breaks, returning empty []
  • "...limit 100 offset 50" works, returning expected results
  • "...limit 50 offset 74" works
  • "...limit 50 offset 75" breaks
  • "...limit 20 offset 29" works
  • "...limit 20 offset 30" breaks

Besides seeing the limit=offset*1.5 relationship, I really don't understand what is going on here.

Mortalus
  • 10,046
  • 9
  • 60
  • 108
Subcreation
  • 1,356
  • 11
  • 26
3

Skip the FQL and go straight to graph. I tried FQL and it was buggy when it came to limits and getting specified date ranges. Here's the graph address. Put in your own page facebook_id and access_token:

https://graph.facebook.com/FACEBOOK_ID/posts?access_token=ACCESS_TOKEN

Then if you want to get your history set your date range using since, until and limit:

https://graph.facebook.com/FACEBOOK_ID/posts?access_token=ACCESS_TOKEN&since=START_DATE&until=END_DATE&limit=1000

Those start and end dates are in unix time, and I used limit because if I didn't it would only give me 25 at a time. Finally if you want insights for your posts, you'll have to go to each individual post and grab the insights for that post:

https://graph.facebook.com/POST_ID/insights?access_token=ACCESS_TOKEN

crunkchitis
  • 658
  • 2
  • 10
  • 19
  • Tested this and confirmed that adding a limit allowed the system to return more than 25 results. Thanks for the fix! – James Gentes Feb 14 '13 at 05:01
  • Kind of outdated but facebook_id/posts does not supply the same results as the stream. photos, ablums and videos are excluded from this API method. – Segers-Ian Dec 05 '13 at 13:02
3

I dont know why, but when I use the filter_key = 'others' the LIMIT xx works.

Here is my fql query

SELECT message, attachment, message_tags FROM stream WHERE type = 'xx' AND source_id = xxxx AND is_hidden = 0 AND filter_key = 'others' LIMIT 5

and now I get exactly 5 posts...when i use LIMIT 7 i get 7 and so on.

sakibmoon
  • 1,921
  • 3
  • 19
  • 32
Roman
  • 31
  • 1
1

As @Subcreation said, something is wack with FQL on stream with LIMIT and OFFSET and higher LIMIT/OFFSET ratios seem to work better.

I have created an issue on it Facebook at http://developers.facebook.com/bugs/303076713093995. I suggest you subscribe to it and indicate you can reproduce it to get it bumped up in priority.

In the bug I describe how a simple stream FQL returns very inconsistent response counts based on its LIMIT/OFFSET. For example:

433 - LIMIT 500 OFFSET 0
333 - LIMIT 500 OFFSET 100
100 - LIMIT 100 OFFSET 0
0 - LIMIT 100 OFFSET 100
113 - LIMIT 200 OFFSET 100
193 - LIMIT 200 OFFSET 20
studgeek
  • 12,579
  • 6
  • 78
  • 90
0

You get a maximum likes of 1000 when using LIMIT FQL: SELECT user_id FROM like WHERE object_id=10151751324059927 LIMIT 20000000

Saki
  • 11
  • 1
0

You could specify created_time for your facebook query. create_time field is unix based time. You could convert it with such convertor http://www.onlineconversion.com/unix_time.htm, or use program methods depends on you language.

Template based on your request

SELECT created_time,message FROM stream WHERE source_id=MY_USER_ID and created_time>BEGIN_OF_RANGE and created_time>END_OF_RANGE LIMIT 50

And specific example from 20.09.2012 to 20.09.2013

 SELECT created_time,message FROM stream WHERE source_id=MY_USER_ID and created_time>1348099200 and created_time>1379635200 LIMIT 50
0

I have a similar issue trying to download older posts from a public page, adding a filter ' AND created_time < t', and setting t for each query to the minumum created_time I got so far. The weird thing is that for some values of t this returns an empty set, but if I manually set t back of one or two hours, then I start getting results again. I tried to debug this using the explorer and got to a point where a certain t would get me 0 results, and t-1 would get results, and repeating would give me the same behavior.

I think this may be a bug, because obviously if I created_time < t-1 gives me results, then also created_time < t should. If it was a question of rate limits or access rights, then I should get an error, instead I get an empty set and only for some values of t.

My suggestion for you is to filter on created_time, and change it manually when you stop getting results.

ggll
  • 903
  • 9
  • 12
-4

Try it with a comma:

SELECT post_id, created_time, message, likes, comments, attachment, permalink, source_id, actor_id FROM stream WHERE filter_key IN (SELECT filter_key FROM stream_filter WHERE uid=me() AND type='newsfeed') AND is_hidden = 0 limit 11,5

Jon
  • 2,764
  • 2
  • 21
  • 30
Muck
  • 1
  • 1