0

I have two arrays of hashes with the format:

hash1

[{:root => root_value, :child1 => child1_value, :subchild1 => subchild1_value, bases => hit1,hit2,hit3}...]

hash2

[{:path => root_value/child1_value/subchild1_value, :hit1_exist => t ,hit2_exist => t,hit3_exist => f}...]

IF I do this

Def sample
  results = nil
  project = Project.find(params[:project_id])
  testrun_query = "SELECT root_name, suite_name, case_name, ic_name, executed_platforms FROM testrun_caches WHERE start_date >= '#{params[:start_date]}' AND start_date < '#{params[:end_date]}' AND project_id = #{params[:project_id]} AND result <> 'SKIP' AND result <> 'N/A'"
  if !params[:platform].nil? && params[:platform] != [""]
    #yell_and_log "platform not nil"
    platform_query = nil
    params[:platform].each do |platform|
      if platform_query.nil?
        platform_query = " AND (executed_platforms LIKE '%#{platform.to_s},%'"
      else
        platform_query += " OR executed_platforms LIKE '%#{platform.to_s},%'"
      end
    end
    testrun_query += ")" + platform_query
  end
  if !params[:location].nil? &&!params[:location].empty?
    #yell_and_log "location not nil"
    testrun_query += "AND location LIKE '#{params[:location].to_s}%'"    
  end
  testrun_query += " GROUP BY root_name, suite_name, case_name, ic_name,   executed_platforms ORDER BY root_name, suite_name, case_name, ic_name"
  ic_query = "SELECT ics.path, memberships.pts8210, memberships.sv6, memberships.sv7,   memberships.pts14k, memberships.pts22k, memberships.pts24k, memberships.spb32, memberships.spb64, memberships.sde, projects.name FROM ics INNER JOIN memberships on memberships.ic_id = ics.id INNER JOIN test_groups ON test_groups.id = memberships.test_group_id INNER JOIN projects ON test_groups.project_id = projects.id WHERE deleted = 'false' AND (memberships.pts8210 = true OR memberships.sv6 = true OR memberships.sv7 = true OR memberships.pts14k = true OR memberships.pts22k = true OR memberships.pts24k = true OR memberships.spb32 = true OR memberships.spb64 = true OR memberships.sde = true) AND projects.name = '#{project.name}' GROUP BY path, memberships.pts8210, memberships.sv6, memberships.sv7, memberships.pts14k, memberships.pts22k, memberships.pts24k, memberships.spb32, memberships.spb64, memberships.sde, projects.name ORDER BY ics.path"
  if params[:ic_type] == "never_run"
    runtest = TestrunCache.connection.select_all(testrun_query)
    alltest = TrsIc.connection.select_all(ic_query) 
    (alltest.length).times do |i|
      #exec_pltfrm = test['executed_platforms'].split(",")
      unfinishedtest = comparison(runtest[i],alltest[i])
      yell_and_log("test = #{unfinishedtest}")
      yell_and_log("#{runtest[i]}")
      yell_and_log("#{alltest[i]}")
    end
  end
end

I get in my log:

test = true
array of hash 1 = {"root_name"=>"BSDPLATFORM", "suite_name"=>"cli",  "case_name"=>"functional", "ic_name"=>"cli_sanity_test", "executed_platforms"=>"pts22k,pts24k,sv7,"}
array of hash 2 = {"path"=>"BSDPLATFORM/cli/functional/cli_sanity_test", "pts8210"=>"f", "sv6"=>"f", "sv7"=>"t", "pts14k"=>nil, "pts22k"=>"t", "pts24k"=>"t", "spb32"=>nil, "spb64"=>nil, "sde"=>nil, "name"=>"pts_6_20"}
test = false
array of hash 1 = {"root_name"=>"BSDPLATFORM", "suite_name"=>"infrastructure", "case_name"=>"bypass_pts14k_copper", "ic_name"=>"ic_packet_9", "executed_platforms"=>"sv6,"}
array of hash 2 = {"path"=>"BSDPLATFORM/infrastructure/build/copyrights", "pts8210"=>"f", "sv6"=>"t", "sv7"=>"t", "pts14k"=>"f", "pts22k"=>"t", "pts24k"=>"t", "spb32"=>"f", "spb64"=>nil, "sde"=>nil, "name"=>"pts_6_20"}
test = false
array of hash 1 = {"root_name"=>"BSDPLATFORM", "suite_name"=>"infrastructure", "case_name"=>"bypass_pts14k_copper", "ic_name"=>"ic_status_1", "executed_platforms"=>"sv6,"}
array of hash 2 = {"path"=>"BSDPLATFORM/infrastructure/build/ic_1", "pts8210"=>"f", "sv6"=>"t", "sv7"=>"t", "pts14k"=>"f", "pts22k"=>"t", "pts24k"=>"t", "spb32"=>"f", "spb64"=>nil, "sde"=>nil, "name"=>"pts_6_20"}
test = false
array of hash 1 = {"root_name"=>"BSDPLATFORM", "suite_name"=>"infrastructure", "case_name"=>"bypass_pts14k_copper", "ic_name"=>"ic_status_2", "executed_platforms"=>"sv6,"}
array of hash 2 = {"path"=>"BSDPLATFORM/infrastructure/build/ic_files", "pts8210"=>"f", "sv6"=>"t", "sv7"=>"f", "pts14k"=>"f", "pts22k"=>"t", "pts24k"=>"t", "spb32"=>"f", "spb64"=>nil, "sde"=>nil, "name"=>"pts_6_20"}

SO I get only the first to match but rest becomes different and I get result of one instead of 4230

I would like some way to match by path and root/suite/case/ic and then compare the executed platforms passed in array of hashes 1 vs platforms set to true in array of hash2

lifejuggler
  • 450
  • 5
  • 17
Tom Choi
  • 41
  • 3
  • Hey. So, still not completely clear. Could you provide a few examples of real hash structures that you want to compare where they aren't the same size and aren't sequential, and what you'd expect the result to be? The title and original question and the example code don't match up exactly, because optimizing for the fastest code based on your sample code may more to do with optimizing queries and processing result sets to compare them. – Gary S. Weaver Feb 06 '13 at 14:25

2 Answers2

1

Not sure if this is fastest, and I wrote this based on your original question that didn't provide sample code, but:

def compare(h1, h2)
  (h2[:path] == "#{h1[:root]}/#{h1[:child1]}/#{h1[:subchild1]}") && \
  (h2[:hit1_exist] == ((h1[:bases][0] == nil) ? 'f' : 't')) && \
  (h2[:hit2_exist] == ((h1[:bases][1] == nil) ? 'f' : 't')) && \
  (h2[:hit3_exist] == ((h1[:bases][2] == nil) ? 'f' : 't'))
end

def compare_arr(h1a, h2a)
  (h1a.length).times do |i|
    compare(h1a[i],h2a[i])
  end
end

Test:

require "benchmark"

h1a = []
h2a = []

def rstr
  # from http://stackoverflow.com/a/88341/178651
  (0...2).map{65.+(rand(26)).chr}.join
end

def rnil
  rand(2) > 0 ? '' : nil
end

10000.times do
  h1a << {:root => rstr(), :child1 => rstr(), :subchild1 => rstr(), :bases => [rnil,rnil,rnil]}
  h2a << {:path => '#{rstr()}/#{rstr()}/#{rstr()}', :hit1_exist => 't', :hit2_exist => 't', :hit3_exist => 'f'}
end

Benchmark.measure do
  compare_arr(h1a,h2a)
end

Results:

=>   0.020000   0.000000   0.020000 (  0.024039)

Now that I'm looking at your code, I think it could be optimized by removing array creations, and splits and joins which are creating arrays and strings that need to be garbage collected which also will slow things down, but not by as much as you mention.

Your database queries may be slow. Run explain/analyze or similar on them to see why each is slow, optimize/reduce your queries, add indexes where needed, etc. Also, check cpu and memory utilization, etc. It might not just be the code.

But, there are some definite things that need to be fixed. You also have several risks of SQL injection attack, e.g.:

... start_date >= '#{params[:start_date]}' AND start_date < '#{params[:end_date]}' AND project_id = #{params[:project_id]} ...

Anywhere that params and variables are put directly into the SQL may be a danger. You'll want to make sure to use prepared statements or at least SQL escape the values. Read this all the way through: http://guides.rubyonrails.org/active_record_querying.html

Gary S. Weaver
  • 7,616
  • 3
  • 35
  • 60
0
([element_being_tested].each do |el|
  [hash_array_1, hash_array_2].reject do |x, y|
    x[el] == y[el]
  end
end).each {|x, y| puts (x[bases] | y[bases])}

Enumerate the hash elements to test. [element_being_tested].each do |el|

Then iterate through the hash arrays themselves, comparing the given hashes by the elements of the given comparison defined by the outer loop, rejecting those not appropriately equal. (The == may actually need to be != but you can figure that much out)

  [hash_array_1, hash_array_2].reject do |x, y|
    x[el] == y[el]
  end

Finally, you again compare the hashes taking the set union of their elements.

.each {|x, y| puts (x[bases] | y[bases])}

You may need to test the code. It's not meant for production so much as demonstration because I wasn't sure I read your code right. Please post a larger sample of the source including the data structures in question if this answer is unsatisfactory.

Regarding speed: if you're iterating through a large data set and comparing multiple there's probably nothing you can do. Perhaps you can invert the loops I presented and make the hash arrays the outer loop. You're not going to get lightning speed here in Ruby (really any language) if the data structure is large.