20

I'm looking for a Ruby ORM to replace ActiveRecord. I've been looking at Sequel and DataMapper. They look pretty good however none of them seems to do the basic: not loading everything in memory when you don't need it.

I mean I've tried the following (or equivalent) on ActiveRecord and Sequel on table with lots of rows:

 posts.each { |p| puts p }

Both of them go crazy on memory. They seem to load everything in memory rather than fetching stuff when needed. I used the find_in_batches in ActiveRecord, but it's not an acceptable solution:

  1. ActiveRecord is not an acceptable solution because we had too many problems with it.
  2. Why should my code be aware of a paging mechanism? I'm happy to configure somewhere the size of the page but that's it. With find_in_batches you need to do something like:

    post.find_in_batches { |batch| batch.each { |p| puts p } }

But that should be transparent.

So is there somewhere a reliable Ruby ORM which does the fetch properly?


Update:

As Sergio mentioned, in Rails 3 you can use find_each which exactly what I want. However as ActiveRecord is not an option, except if someone can really convince me to use it, the questions are:

  1. Which ORMs support the equivalent of find_each?
  2. How to do it?
  3. Why do we need a find_each, while find should do it, shouldn't it?
Arnaud Meuret
  • 977
  • 8
  • 24
mb14
  • 20,914
  • 4
  • 54
  • 97

5 Answers5

44

Sequel's Dataset#each does yield individual rows at a time, but most database drivers will load the entire result in memory first.

If you are using Sequel's Postgres adapter, you can choose to use real cursors:

posts.use_cursor.each{|p| puts p}

This fetches 1000 rows at a time by default, but you can use an option to specify the amount of rows to grab per cursor fetch:

posts.use_cursor(:rows_per_fetch=>100).each{|p| puts p}

If you aren't using Sequel's Postgres adapter, you can use Sequel's pagination extension:

Sequel.extension :pagination
posts.order(:id).each_page(1000){|ds| ds.each{|p| puts p}}

However, like ActiveRecord's find_in_batches/find_each, this does separate queries, so you need to be careful if there are concurrent modifications to the dataset you are retrieving.

The reason this isn't the default in Sequel is probably the same reason it isn't the default in ActiveRecord, which is that it isn't a good default in the general case. Only queries with large result sets really need to worry about it, and most queries don't return large result sets.

At least with the Postgres adapter cursor support, it's fairly easy to make it the default for your model:

Post.dataset = Post.dataset.use_cursor

For the pagination extension, you can't really do that, but you can wrap it in a method that makes it mostly transparent.

the Tin Man
  • 150,910
  • 39
  • 198
  • 279
Jeremy Evans
  • 11,361
  • 23
  • 25
  • The use_cursor stuff is postgres-only. I'm not sure MySQL supports cursors for returning results. The MySQL cursor documentation states "MySQL supports cursors inside stored programs", see http://dev.mysql.com/doc/refman/5.6/en/cursors.html. – Jeremy Evans Jan 17 '12 at 17:15
  • Php use db_fetch massively, so I guess Mysql support cursors. However maybe not in the default ruby driver – mb14 Jan 17 '12 at 17:32
  • Is using Postgres' cursors faster than `find_each` or `paged_each`? Is the only difference in speed that you aren't making multiple queries? – janko-m Feb 18 '15 at 12:26
5
Sequel.extension :pagination
posts.order(:id).each_page(1000) do |ds|
  ds.each { |p| puts p }
end

It is very very slow on large tables!

It becomes clear, looked at the method body: http://sequel.rubyforge.org/rdoc-plugins/classes/Sequel/Dataset.html#method-i-paginate

# File lib/sequel/extensions/pagination.rb, line 11

def paginate(page_no, page_size, record_count=nil)
  raise(Error, "You cannot paginate a dataset that already has a limit") if @opts[:limit]
  paginated = limit(page_size, (page_no - 1) * page_size)
  paginated.extend(Pagination)
  paginated.set_pagination_info(page_no, page_size, record_count || count)
end
Erik
  • 192
  • 1
  • 11
Alexander Dyomin
  • 301
  • 1
  • 1
  • 3
3

ActiveRecord actually has an almost transparent batch mode:

User.find_each do |user|
  NewsLetter.weekly_deliver(user)
end
the Tin Man
  • 150,910
  • 39
  • 198
  • 279
Sergio Tulentsev
  • 210,238
  • 40
  • 347
  • 343
  • Thanks (+1). It's in Rail 3, (we are using rails 2.3.x). Anyway ActiveRecord is not an option. I'll reformulate my question – mb14 Jan 13 '12 at 11:00
  • Too bad you can't use it :-( But I'll keep an eye on this question, maybe something interesting will pop up :-) – Sergio Tulentsev Jan 13 '12 at 11:02
  • 1
    find_each was around since 2.3.2 according to apidock: http://apidock.com/rails/v2.3.2/ActiveRecord/Batches/ClassMethods/find_each – tokland Jan 13 '12 at 11:49
  • @tokland. You are right ,anyway the question is : is there any ORM (excluding ActiveRecord) which provides this feature , and ideally without having to specify it (ie find should do the same) – mb14 Jan 13 '12 at 12:43
2

This code works faster than find_in_batches in ActiveRecord

id_max = table.get(:max[:id])
id_min = table.get(:min[:id])
n=1000
(0..(id_max-id_min)/n).map.each do |i|
    table.filter(:id >= id_min+n*i, :id < id_min+n*(i+1)).each {|row|}
end
Alexander Dyomin
  • 301
  • 1
  • 1
  • 3
-2

Maybe you can consider Ohm, that is based on Redis NoSQL store.

Carlo Pecchia
  • 1,133
  • 1
  • 6
  • 8