70

It's said that using skip() for pagination in MongoDB collection with many records is slow and not recommended.

Ranged pagination (based on >_id comparsion) could be used

db.items.find({_id: {$gt: ObjectId('4f4a3ba2751e88780b000000')}});

It's good for displaying prev. & next buttons - but it's not very easy to implement when you want to display actual page numbers 1 ... 5 6 7 ... 124 - you need to pre-calculate from which "_id" each page starts.

So I have two questions:

1) When should I start worry about that? When there're "too many records" with noticeable slowdown for skip()? 1 000? 1 000 000?

2) What is the best approach to show links with actual page numbers when using ranged pagination?

Roman
  • 3,591
  • 4
  • 26
  • 40

3 Answers3

100

Good question!

"How many is too many?" - that, of course, depends on your data size and performance requirements. I, personally, feel uncomfortable when I skip more than 500-1000 records.

The actual answer depends on your requirements. Here's what modern sites do (or, at least, some of them).

First, navbar looks like this:

1 2 3 ... 457

They get final page number from total record count and page size. Let's jump to page 3. That will involve some skipping from the first record. When results arrive, you know id of first record on page 3.

1 2 3 4 5 ... 457

Let's skip some more and go to page 5.

1 ... 3 4 5 6 7 ... 457

You get the idea. At each point you see first, last and current pages, and also two pages forward and backward from the current page.

Queries

var current_id; // id of first record on current page.

// go to page current+N
db.collection.find({_id: {$gte: current_id}}).
              skip(N * page_size).
              limit(page_size).
              sort({_id: 1});

// go to page current-N
// note that due to the nature of skipping back,
// this query will get you records in reverse order 
// (last records on the page being first in the resultset)
// You should reverse them in the app.
db.collection.find({_id: {$lt: current_id}}).
              skip((N-1)*page_size).
              limit(page_size).
              sort({_id: -1});
Sergio Tulentsev
  • 210,238
  • 40
  • 347
  • 343
  • Thanks, that's exactly what I need. Great combined aprroach - range by "_id" + skip(), very easy to use, much better than all aproaches I have read today while searching about the subject. – Roman Mar 14 '12 at 15:03
  • 1
    good answer, but in this approach you must know current page number. Only way to know it - is to send it in request – vacuum May 24 '12 at 08:41
  • Not the only one, but the one I'd choose. – Sergio Tulentsev May 24 '12 at 08:46
  • 1
    will this work if index need to be reversed? _sort({_id: -1})_ – vacuum May 28 '12 at 11:53
  • 1
    and one more question: how to effectively get last page ? – vacuum May 28 '12 at 13:11
  • Yes, it should work. And this is how you efficiently get the last page :) – Sergio Tulentsev May 28 '12 at 13:25
  • so, if the current_id is the last record for the current page, then will we skip less page like skip((N - 1) * page_size) ? – Albert Gan May 21 '13 at 13:27
  • Going back a page is still expensive ? Since it's skipping all but total_count_of_the_query - page – mickey06 Mar 18 '14 at 17:26
  • it looks to me that you'd need the `$gte` query selector, else you are always skipping first record of next page. –  Jun 16 '16 at 09:00
  • @amc: it looks like you're right. Thanks! :) Also the sorting in the second query looks suspicious to me. I'm gonna recheck this. – Sergio Tulentsev Jun 16 '16 at 09:01
  • just be aware that if records are added or deleted in between fetching pages, you'll get duplicate results. I.e. If there are 10 records and you read 5 then 5 new records arrive and then you skip to page 2, you'll get the same 5 records again. Since skip starts from the new 5 records. – mark May 25 '17 at 05:20
  • it it possible to also get the total # of pages (to return in the response?) – user1709076 Apr 10 '18 at 17:24
  • @user1709076: Yes, simply do another query. – Sergio Tulentsev Apr 10 '18 at 18:22
  • This is useful but has a problem which I am not able to solve. Pagination depends on the sort order and often a different field than ```_id``` would be used for sorting. Assume you are sorting by a field which can have duplicate values (e.g. orders by user name). Then if there happen to be two pages where the first items have the same value for the field by which you are sorting, you will get stuck, since the result for $gte will be the same for both pages. If there are at least 2N such items, you are guaranteed to get stuck eventually. Perhaps an idea comes to mind how this could be solved? – Avius Oct 22 '18 at 11:28
  • 1
    Clarification - this will not work if there are duplicate values. In case someone else stumbles upon this, I managed to overcome this limitation by following https://mixmax.com/blog/api-paging-built-the-right-way/ and adapting it to my own needs. – Avius Oct 22 '18 at 19:36
6

It's hard to give a general answer because it depends a lot on what query (or queries) you are using to construct the set of results that are being displayed. If the results can be found using only the index and are presented in index order then db.dataset.find().limit().skip() can perform well even with a large number of skips. This is likely the easiest approach to code up. But even in that case, if you can cache page numbers and tie them to index values you can make it faster for the second and third person that wants to view page 71, for example.

In a very dynamic dataset where documents will be added and removed while someone else is paging through data, such caching will become out-of-date quickly and the limit and skip method may be the only one reliable enough to give good results.

Tad Marshall
  • 1,283
  • 8
  • 9
1

I recently encounter the same problem when trying to paginate a request while using a field that wasn't unique, for example "FirstName". The idea of this query is to be able to implement pagination on a non-unique field without using skip()

The main problem here is being able to query for a field that is not unique "FirstName" because the following will happen:

  1. $gt: {"FirstName": "Carlos"} -> this will skip all the records where first name is "Carlos"
  2. $gte: {"FirstName": "Carlos"} -> will always return the same set of data

Therefore the solution I came up with was making the $match portion of the query unique by combining the targeted search field with a secondary field in order to make it a unique search.

Ascending order:

db.customers.aggregate([
    {$match: { $or: [ {$and: [{'FirstName': 'Carlos'}, {'_id': {$gt: ObjectId("some-object-id")}}]}, {'FirstName': {$gt: 'Carlos'}}]}},
    {$sort: {'FirstName': 1, '_id': 1}},
    {$limit: 10}
    ])

Descending order:

db.customers.aggregate([
    {$match: { $or: [ {$and: [{'FirstName': 'Carlos'}, {'_id': {$gt: ObjectId("some-object-id")}}]}, {'FirstName': {$lt: 'Carlos'}}]}},
    {$sort: {'FirstName': -1, '_id': 1}},
    {$limit: 10}
    ])

The $match part of this query is basically behaving as an if statement: if firstName is "Carlos" then it needs to also be greater than this id if firstName is not equal to "Carlos" then it needs to be greater than "Carlos"

Only problem is that you cannot navigate to an specific page number (it can probably be done with some code manipulation) but other than it solved my problem with pagination for non-unique fields without having to use skip which eats a lot of memory and processing power when getting to the end of whatever dataset you are querying for.