2

I'd like to be able to find the 10 most common questions in a array of 300-500 strings, in Ruby.

An example element being

["HI, I'd like your product. I just have one question. How do I change my password?", "Can someone tell me how I change my password?", "I can't afford this. How do I cancel my account?", "Account cancelation?", "I forgot my password, how do I change my password?", .....]

Basically, I'm going to have an array of a lot of strings, and I have to extract the question, and find the 10 most common questions from that array.

I've tried looking around (checked out n-grams, but it didn't seem too relevant) and have yet to come up with any ideas.

Do you know of any algorithms you'd suggest I take a look at? A link to a couple examples would be terrific!

om-nom-nom
  • 60,231
  • 11
  • 174
  • 223
Emil Hajric
  • 700
  • 1
  • 9
  • 25
  • 1
    This is a very challenging task. If you want to do it well, you will have to invest a lot of time into research and experimentation. – Alex D Mar 08 '12 at 19:39

2 Answers2

2

I would say the first step would be to actually determine which Strings (or Substrings) are actually questions. A no-brainer approach to that would be to look out for "?", but then again depending on your requirement you can enhance that - maybe lookout out for "question words". That would probably be the easier part of your task.

Once you get a list of strings (that are supposedly questions) - you need to cluster similar ones and return the 10 largest bins. The best way would be to combine a semantic + syntax based approach. You could probably have a look at this paper as they seem to tackle the problem of finding similarities between two strings. They present some compelling reasons as to why a dual syntactic-semantic approach is required.

Hari
  • 4,297
  • 7
  • 35
  • 51
0

Not sure about special algorithms, but if I were assigned this task:

array = ["my account is locked.", "can i have the account password to my account?", "what's my password?"]

array.map! {|x| x.split(' ')} #make each sentence an element

word_freq = Hash.new(0)

i = 0
while i < array.length
array[i].each {|x| word_freq[x] += 1}
i += 1
end

word_freq.each {|m, x| puts "#{m} appears #{x} times"}  #words are now keys with frequency values

print word_freq.keys  #an array of key words to mess with
Michael Leveton
  • 323
  • 2
  • 6
  • 16
  • This does not really solve anything, it only gives you the most common keywords. The idea is to get the most common phrases or questions. – Emil Hajric Mar 12 '12 at 16:14
  • Are the question strings exact copies verbatim? Or are they variations on functionally equivalent questions? i.e. will they be user generated from a text area or from a drop down menu that you specify? – Michael Leveton Mar 13 '12 at 22:58