I have a list of k artistes mapped to their respective music videos that they have starred in. This is represented in a multidimensional array:
musicvid_arr =
[["MUSICVID 1", 2014, ["ARTISTE 1", "ARTISTE 2", "ARTISTE 3"]],
["MUSICVID 2", 2014, ["ARTISTE 4", "ARTISTE 1", "ARTISTE 9", "ARTISTE 10"]],
["MUSICVID 3", 1935, ["ARTISTE 2", "ARTISTE 10", "ARTISTE 6"]],
["MUSICVID 4", 2010, ["ARTISTE 1", "ARTISTE 2", "ARTISTE 3"]],
["MUSICVID 5", 2009, ["ARTISTE 4", "ARTISTE 1", "ARTISTE 9", "ARTISTE 2", "ARTISTE 6", "ARTISTE 5"]],
["MUSICVID 6", 2014, ["ARTISTE 18", "ARTISTE 10", "ARTISTE 6", "ARTISTE 2"]],
["MUSICVID 7", 2014, ["ARTISTE 9", "ARTISTE 2", "ARTISTE 3", "ARTISTE 0", "ARTISTE 9"]],
["MUSICVID 8", 2000, ["ARTISTE 8", "ARTISTE 3", "ARTISTE 9", "ARTISTE 11", "ARTISTE 2", "ARTISTE 1"]],
["MUSICVID 9", 2014, ["ARTISTE 21", "ARTISTE 0", "ARTISTE 6"]],
["MUSICVID 10", 2014, ["ARTISTE 12", "ARTISTE 2", "ARTISTE 3"]],
["MUSICVID 11", 2013, ["ARTISTE 14", "ARTISTE 1", "ARTISTE 9", "ARTISTE 12"]],
["MUSICVID 12", 2014, ["ARTISTE 2"]]]
I want to create a method get_artistes
that takes the parameters: k
, r
, and musicvid_arr
:
def get_artistes(k, r, musicvid_arr)
# the code here
end
where
k
: the number of artistes to returnr
: the least number of artistes found in the return array ofk
artistes that must appear in each music video for the music video to be counted/valid
This method should return a list of artistes. If k = 3:
["ARTISTE 1", "ARTISTE 2", "ARTISTE 9"]
This image gives a better understanding of how k
and r
affect the most number. With reference to the image above,
# for r = 1 , it would have 11 valid music videos.
# for r = 2 , it would have 6 valid music videos.
# for r = 3 , it would have 3 valid music videos.
No matter what r
and k
we pass to this method, we want an array of artistes that have the most number of valid music videos.
What would be an effective and efficient approach on tackling this problem?
I attempted to do this via the following algorithm. I do not think that it is the most effective. With big datasets, it takes very long to run.
def get_artistes(k, r, musicvid_arr)
musicvid_arr=musicvid_arr.select{|t| t[2].size>=r}
artiste_arr = musicvid_arr.map.reduce({}){|a,vs|vs[2].each{|v|(a[v]||= [])<< vs[0]};a}.to_a.sort_by{|x| -x[1].count}
output = []
for i in 0...k
output << artiste_arr[i]
end
return output
end