0

I have a list of keywords that I need to know if they are within 4 words of the word "access' in a sentence from a list. At the end, I want to total the number of times a keyword was matched with the word "access" for a particular sentence from a list.

Current output:

['Minority', 'patients', 'often', 'have', 'barrier', 'with', 'their', 'access', 'to', 'healthcare.'] 0
['Rural', 'patients', 'often', 'cite', 'distance', 'as', 'a', 'barrier', 'to', 'access', 'health', 'services.']
['Minority', 'patients', 'often', 'have', 'barriers', 'with', 'their', 'access', 'to', 'healthcare.'] 0
['Minority', 'patients', 'often', 'have', 'barriers', 'with', 'their', 'access', 'to', 'healthcare.'] 1

Desired output:

['Minority', 'patients', 'often', 'have', 'barriers', 'with', 'their', 'access', 'to', 'healthcare.'] 2
["I, am, an, avid, user, of, Microsoft, Access, databases"] 0
['Rural', 'patients', 'often', 'cite', 'distance', 'as', 'a', 'barrier', 'to', 'access', 'healthcare', 'services.'] 3

  accessdesc = ["care", "services", "healthcare", "barriers"] 

  sentences = ["Minority patients often have barriers with their access to 
  healthcare.", "I am an avid user of Microsoft Access databases", "Rural 
  patients often cite distance as one of the barriers to access healthcare 
  services."] 

  for sentence in sentences:                     
      nummatches = 0
      for desc in accessdesc:
           sentence = sentence.replace(".","") if "." in sentence else ''
           sentence = sentence.replace(",","") if "," in sentence else ''

           if 'access' in sentence.lower() and desc in sentence.lower():
           sentence = sentence.lower().split()

           access_position = sentence.index('access') if "access" in 
           sentence else 0

           desc_position = sentence.index(desc) if desc in sentence else 0

               if abs(access_position - desc_position) < 5  :

                   nummatches = nummatches + 1

               else:
                   nummatches = nummatches + 0
           print(sentence, nummatches)
  • 1
    There is a problem between your keywords list and your testcase. `barriers` in your keywords list cannot be counted as a match while it is writtent `barrier`in your test case. Which is it ? Moreover, should we assume that there can be only one 'access' per sentence ? – Luci Jun 18 '19 at 15:07
  • Thank you for pointing out the "barriers" issue. I switched it to singular form. There could be more than one "access" per sentence, yes. – tenebris silentio Jun 18 '19 at 15:09

1 Answers1

1

I think you need to switch the order of your loops from:

for desc in accessdesc:    
    for sentence in sentences: 

to:

for sentence in sentences:
    nummatches = 0 # Resets the count to 0 for each sentence
    for desc in accessdesc: 

This will mean you can check each word is in a sentence before you move onto the next sentence. Then just move the print(sentence, nummatches) statement outside of the second loop so you will print the result after each sentence.

Something else to look at is the line if 'access' and desc in sentence :. The and is combining the expression to the left and expression to the right of it and checking they are both evaluating to True. This means it is checking access == True is True as well as desc in sentence. What you want here is to check if access and desc are both in sentance. I would also recommend ignoring case for this check as 'access' does not equal 'Access'. So you can rewrite to this

if 'access' in sentence.lower() and desc in sentence.lower():
    sentence = sentence.lower().split()

So now because you are checking that desc is in the sentence in the if condition, you don't have to check again, like you've mentioned in the comment.

As note, your code will only likely work as expected if access or one of the keywords only appear one time or less in the sentence as sentence.index() will only find the first occurrence of the string. It will need extra logic to handle multiple occurrences of the strings.

EDIT

So your lines replacing the punctuation, e.g. sentence = sentence.replace(".","") if "." in sentence else '' will set the sentence to '' if that punctuation does not exist in the sentence. You could do all your replacements in one line, and then check against the list rather than the sentence string. Also you will want to check the word exists in the split list rather than in the string so it matches only on whole words.

'it' in 'bit'
>>> True
'it' in ['bit']
>>> False

So you could rewrite your code to this:

for sentence in sentences:                     
    nummatches = 0
    words = sentence.replace(".","").replace(",","").lower().split()
    # moved this outside of the second loop as the sentence doesn't change through the iterations
    # Not changing the sentence variable so can print in it's original form
    if 'access' not in words:
        continue # No need to proceed if access not in the sentence
    for desc in accessdesc:
         if desc not in words:
             continue # Can use continue to go to the next iteration of the loop
         access_position = words.index('access')
         desc_position = words.index(desc)

         if abs(access_position - desc_position) < 5  :
             nummatches += 1
             # else statement not required
    print(sentence, nummatches) # moved outside of the second loop so it prints after checking through all the words

As already mentioned, this will only work if 'access' or one of the keywords only appear in the sentence once or less. If they appear more than once, using index() will only find the first occurrence. Take a look at this answer and see if you can work a solution into your code. Also take a look at this answer on how to strip punctuation from a string.

Mikey Lockwood
  • 1,028
  • 1
  • 3
  • 19
  • Thank you for your response. I added some of your suggestions, but I'm still hitting a roadblock. 1) It isn't summing the total matches found per sentence and 2) now I'm getting all 0's. I'm sure I made some kind of newbie mistake, but any help is appreciated. Thanks! – tenebris silentio Jun 18 '19 at 16:16
  • @tenebrissilentio I've updated my answer with some suggested improvements – Mikey Lockwood Jun 19 '19 at 14:46
  • Thank you! I got it to work. I'll need to think about if "access" is in a string twice. Maybe I can split each sentence or something. This gives me a framework for now though. Thanks again. – tenebris silentio Jun 19 '19 at 16:25