2

What's the most efficient way to iterate through an entire table using Datamapper?

If I do this, does Datamapper try to pull the entire result set into memory before performing the iteration? Assume, for the sake of argument, that I have millions of records and that this is infeasible:

Author.all.each do |a|
  puts a.title
end

Is there a way that I can tell Datamapper to load the results in chunks? Is it smart enough to know to do this automatically?

the Tin Man
  • 150,910
  • 39
  • 198
  • 279
Jason Marcell
  • 2,605
  • 5
  • 27
  • 39

3 Answers3

4

Thanks, Nicolas, I actually came up with a similar solution. I've accepted your answer since it makes use of Datamapper's dm-pagination system, but I'm wondering if this would do equally as well (or worse):

while authors = Author.slice(offset, CHUNK) do
  authors.each do |a|
    # do something with a
  end
  offset += CHUNK
end
Jason Marcell
  • 2,605
  • 5
  • 27
  • 39
2

Datamapper will run just one sql query for the example above so it will have to keep the whole result set in memory.

I think you should use some sort of pagination if your collection is big. Using dm-pagination you could do something like:

PAGE_SIZE = 20
pager = Author.page(:per_page => PAGE_SIZE).pager # This will run a count query
(1..pager.total_pages).each do |page_number|
  Author.page(:per_page => PAGE_SIZE, :page => page_number).each do |a|
    puts a.title
  end
end

You can play around with different values for PAGE_SIZE to find a good trade-off between the number of sql queries and memory usage.

  • Thanks, Nicolas. I also posted an answer to my own question. I'd be glad to hear feedback on my version. I think they're equivalent. – Jason Marcell May 15 '11 at 02:01
  • 1
    Your solution should work equally well and it may be cleaner. Abstracting that into a pagination aware iterator would be nice, something like `Author.each_in_chunks do` – coffeeworks May 15 '11 at 02:29
2

What you want is the dm-chunked_query plugin: (example from the docs)

require 'dm-chunked_query'

MyModel.each_chunk(20) do |chunk|
  chunk.each do |resource|
    # ...
  end
end

This will allow you to iterate over all the records in the model, in chunks of 20 records at a time.

EDIT: the example above had an extra #each after #each_chunk, and it was unnecessary. The gem author updated the README example, and I changed the above code to match.

dkubb
  • 2,036
  • 18
  • 16
  • 1
    Thanks, this looks very clean. It does look funny that they have `each_chunk(20).each` though. Why not just `each_chunk(20)`? Let the `each_chunk` method take the block. Oh well, still better syntax than what I came up with. – Jason Marcell May 15 '11 at 03:02
  • 1
    Jason, you're right. I just looked at the code and `#each_chunk` does yield a block, so I think the example I lifted might not be entirely accurate. I'll check with the gem author and post an update to make sure. – dkubb May 16 '11 at 03:07
  • 1
    Jason, I contacted the gem author, and he said the example I had used could be simplified to not require the extra `#each`. I updated my message above to match the example in the gem README. – dkubb May 16 '11 at 05:15