14

Summary

Given a Hash, what is the most efficient way to create a subset Hash based on a list of keys to use?

h1 = { a:1, b:2, c:3 }        # Given a hash...
p foo( h1, :a, :c, :d )       # ...create a method that...
#=> { :a=>1, :c=>3, :d=>nil } # ...returns specified keys...
#=> { :a=>1, :c=>3 }          # ...or perhaps only keys that exist

Details

The Sequel database toolkit allows one to create or update a model instance by passing in a Hash:

foo = Product.create( hash_of_column_values )
foo.update( another_hash )

The Sinatra web framework makes available a Hash named params that includes form variables, querystring parameters and also route matches.

If I create a form holding only fields named the same as the database columns and post it to this route, everything works very conveniently:

post "/create_product" do
  new_product = Product.create params
  redirect "/product/#{new_product.id}"
end

However, this is both fragile and dangerous. It's dangerous because a malicious hacker could post a form with columns not intended to be changed and have them updated. It's fragile because using the same form with this route will not work:

post "/update_product/:foo" do |prod_id|
  if product = Product[prod_id]
    product.update(params)
    #=> <Sequel::Error: method foo= doesn't exist or access is restricted to it>
  end
end

So, for robustness and security I want to be able to write this:

post "/update_product/:foo" do |prod_id|
  if product = Product[prod_id]
    # Only update two specific fields
    product.update(params.slice(:name,:description))
    # The above assumes a Hash (or Sinatra params) monkeypatch
    # I will also accept standalone helper methods that perform the same
  end
end

...instead of the more verbose and non-DRY option:

post "/update_product/:foo" do |prod_id|
  if product = Product[prod_id]
    # Only update two specific fields
    product.update({
      name:params[:name],
      description:params[:description]
    })
  end
end

Update: Benchmarks

Here are the results of benchmarking the (current) implementations:

                    user     system      total        real
sawa2           0.250000   0.000000   0.250000 (  0.269027)
phrogz2         0.280000   0.000000   0.280000 (  0.275027)
sawa1           0.297000   0.000000   0.297000 (  0.293029)
phrogz3         0.296000   0.000000   0.296000 (  0.307031)
phrogz1         0.328000   0.000000   0.328000 (  0.319032)
activesupport   0.639000   0.000000   0.639000 (  0.657066)
mladen          1.716000   0.000000   1.716000 (  1.725172)

The second answer by @sawa is the fastest of all, a hair in front of my tap-based implementation (based on his first answer). Choosing to add the check for has_key? adds very little time, and is still more than twice as fast as ActiveSupport.

Here is the benchmark code:

h1 = Hash[ ('a'..'z').zip(1..26) ]
keys = %w[a z c d g A x]
n = 60000

require 'benchmark'
Benchmark.bmbm do |x|
  %w[ sawa2 phrogz2 sawa1 phrogz3 phrogz1 activesupport mladen ].each do |m|
    x.report(m){ n.times{ h1.send(m,*keys) } }
  end
end
Phrogz
  • 271,922
  • 98
  • 616
  • 693
  • Your example at the top doesn't seem to agree with the details? In the example you show that if you select a key that doesn't exist in the original hash you should get a nil value in the new hash. In your Sequel example it doesn't seem you need to create a new hash but really just a subset. What is the real requirement? – Wes Apr 13 '11 at 19:07
  • @Wes The two are not incompatible, I think. The real situation will be that I will never (knowingly) ask for a key that does not exist in the original. I included `:d` in the summary to clearly specify how the edge case should be handled. However, I am also amenable to solutions which do not include any missing-but-requested keys. (Indeed, Mladen's answer and ActiveSupport both do not include any keys not present in the original.) – Phrogz Apr 13 '11 at 19:11
  • Wow, good to hear the result. A possible lesson here; a naive implementation is faster than going too much into Rubyish way and fully using its function? Hope ruby implementation gets faster. – sawa Apr 13 '11 at 23:36
  • The differences between the first 5 benchmarks are not statistically significant. That is, they are all essentially the same speed. – Rein Henrichs Apr 14 '11 at 19:55
  • Great question and followup! Exactly what I was looking for :) – pithyless Apr 25 '11 at 10:58
  • 1
    Just a fyi for anyone reading this 9 years later, Ruby has this built-in now ^^ https://ruby-doc.org/core-2.5.0/Hash.html#method-i-slice – lunarfyre Apr 05 '20 at 23:43

5 Answers5

19

I would just use the slice method provided by active_support

require 'active_support/core_ext/hash/slice'
{a: 1, b: 2, c: 3}.slice(:a, :c)                  # => {a: 1, c: 3}

Of course, make sure to update your gemfile:

gem 'active_support'
Matt Huggins
  • 73,807
  • 32
  • 140
  • 214
Blake Taylor
  • 8,833
  • 5
  • 34
  • 41
  • +1 I didn't know that you could cherry pick individual methods from ActiveSupport. See the updated question above for the results of benchmarking this method. – Phrogz Apr 13 '11 at 19:36
4

I changed by mind. The previous one doesn't seem to be any good.

class Hash
  def slice1(*keys)
    keys.each_with_object({}){|k, h| h[k] = self[k]}
  end
  def slice2(*keys)
    h = {}
    keys.each{|k| h[k] = self[k]}
    h
  end
end
sawa
  • 156,411
  • 36
  • 254
  • 350
3

Sequel has built-in support for only picking specific columns when updating:

product.update_fields(params, [:name, :description])

That doesn't do exactly the same thing if :name or :description is not present in params, though. But assuming you are expecting the user to use your form, that shouldn't be an issue.

I could always expand update_fields to take an option hash with an option that will skip the value if not present in the hash. I just haven't received a request to do that yet.

Jeremy Evans
  • 11,361
  • 23
  • 25
  • I had no idea. Very nice. This still does not address the needs of `Product.create()`, correct? – Phrogz Apr 14 '11 at 22:59
  • Ah, good point. Note that I just ran into the case last night where I was processing checkboxes and I _did_ explicitly want to include a `nil` value when asking for a field not present in the hash. I will definitely not be making a request for functionality to skip non-present values. :) – Phrogz Apr 15 '11 at 17:19
  • FWIW, another case where I just needed slice with Sequel: `model.add_associateditem( existing_item.slice( hash_of_fields_without_id ) )` – Phrogz Feb 07 '14 at 18:56
2

Perhaps

class Hash
  def slice *keys
    select{|k| keys.member?(k)}
  end
end

Or you could just copy ActiveSupport's Hash#slice, it looks a bit more robust.

Mladen Jablanović
  • 41,202
  • 10
  • 87
  • 110
0

Here are my implementations; I will benchmark and accept faster (or sufficiently more elegant) solutions:

# Implementation 1
class Hash
  def slice(*keys)
    Hash[keys.zip(values_at *keys)]
  end
end

# Implementation 2
class Hash
  def slice(*keys)
    {}.tap{ |h| keys.each{ |k| h[k]=self[k] } }
  end
end

# Implementation 3 - silently ignore keys not in the original
class Hash
  def slice(*keys)
    {}.tap{ |h| keys.each{ |k| h[k]=self[k] if has_key?(k) } }
  end
end
Phrogz
  • 271,922
  • 98
  • 616
  • 693
  • Why not Hash#only from ActiveSupport? – Rein Henrichs Apr 13 '11 at 17:20
  • 1
    @ReinHeinrichs Because I'm using Sinatra, not Rails, and I don't have the bloat of ActiveSupport included in my app. Also, because I didn't know about it. :) Thanks, I'll look into that. – Phrogz Apr 13 '11 at 18:51
  • Wouldn't including keys not existing in the hash cause setting some table columns to NULL? I believe you would have to check for `has_key?` in your app. – Mladen Jablanović Apr 13 '11 at 20:57
  • @Mladen Yes, it would. I'm torn on whether or not this is desirable. For example, an HTML checkbox that is unchecked will not send a key value pair. This is one case where I might ask for a column that is reasonably not present in the hash, and desire the `nil`. As you can see, I edited my answer above with a version that uses `has_key?`, for when this is desirable. – Phrogz Apr 13 '11 at 21:08
  • 1
    Database table can provide default values (not necessarrily NULLs) for its columns, so you're good if the parameter is not there. – Mladen Jablanović Apr 14 '11 at 06:20