0

Please Help me on that.

  1. I have a list of stopwords and I have a search list. I want to remove those stop words from the search list.
  2. After (Step 1), I want to match each split word with Dictionary Values. If Values Match replaces the particular word with corresponding Dictionary Key then join the others words.

So far I have done the Step 1 (see code below). It worked well:

    stopwords=['what','hello','and','at','is','am','i']
    search_list=['where is north and northern side',
                 'ask in the community at western environmental',
                 'my name is alan and i am coming from london southeast']
    dictionary = {'n': ['north','northern'],
                  's': ['south','southern'],
                  'e': ['east','eastern'],
                  'w': ['west','western'],
                  'env': ['environ.','enviornment','environmental']}

    result = [' '.join(w for w in place.split() if w.lower() not in stopwords)
                for place in search_list]

    print (result)

I need the below desirable Final Output to fulfill Step 2. What should I change/include in the above one line of code for getting my desired final output? Any other alternative method also welcome.

['where n n side', 'ask in the community w env', 'my name alan coming from london s']
martineau
  • 99,260
  • 22
  • 139
  • 249
hAI CHI
  • 19
  • 4
  • It would be more clear if you inverted your key/values in the dictionary, so `{ 'north': 'n', 'northern': 'n' ... }` etc. It would make your code easier to maintain as well. – Maurice Reeves Aug 11 '18 at 17:49

1 Answers1

3

You have to "reverse" your dictionary, as the lookup is the other way round:

rev_dict = {v:k for k,l in dictionary.items() for v in l}

now it's convenient for a replacement:

>>> rev_dict
{'east': 'e',
 'eastern': 'e',
 'enviornment': 'env',
 'environ.': 'env',
 'environmental': 'env',
 'north': 'n',
 'northern': 'n',
 'south': 's',
 'southern': 's',
 'west': 'w',
 'western': 'w'}

split your string again (you could have kept the list of words as is to avoid the split) and replace with default value as the word, in case of no match:

result = [" ".join([rev_dict.get(x,x) for x in s.split() if x not in stopwords]) for s in search_list]

Or combining both stop words removal and replacements:

stopwords={'what','hello','and','at','is','am','i'}  # define as a set for fast lookup
result = [" ".join([rev_dict.get(x,x) for x in s.split() if x not in stopwords]) for s in search_list]

in both cases, result:

['where n n side', 'ask in the community w env', 'my name alan coming from london southeast']
Jean-François Fabre
  • 126,787
  • 22
  • 103
  • 165
  • NameError: name 'result' is not defined – kantal Aug 11 '18 at 18:54
  • of course, `result` is the result from OP code. I've seen your edit. I had the same idea, but I'd like to add the alternative (all-in-one code) afterwards. You normally should not propose better solutions in edits (comments are better for this), so I should reject it but I'm just goind to improve it. – Jean-François Fabre Aug 11 '18 at 19:02
  • @Jean-FrançoisFabre Thanks for your wonderful help. I need one more help on that. After the completing the above said two steps, Finally I want output should print first 20 characters(including space) Ex ['my name alan coming from london southeast'] should be printed ['my name alan coming from londo']. Kindly answer it will great help for me. – hAI CHI Aug 17 '18 at 17:21
  • just slice the resulting strings: `[x[:20] for x in result]` – Jean-François Fabre Aug 17 '18 at 17:25
  • @Jean-FrançoisFabre along with above output I want to remove punctuation (!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~) from search_list like how we remove stop words. What to change the above code. Is this correct? punct=[!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~] result = [" ".join([rev_dict.get(x,x) for x in s.split() if x not in zip(stopwords,punct)]) for s in search_list] – hAI CHI Aug 28 '18 at 20:06
  • it's not clear from comments, I suggest looking into this: https://stackoverflow.com/questions/265960/best-way-to-strip-punctuation-from-a-string-in-python and if it doesn't solve, make another question. Follow up questions are discouraged on the site. – Jean-François Fabre Aug 28 '18 at 20:22