1

I have a nested list of words with lots of duplicates, and a list of uniquewords which is a set of the list words. I want to find the minimum starting point of an item in word. For instance:

words = [['apple',5],['apple',7],['apple',8],['pear',9], ['pear',4]
         ['grape',6],['baby',3],['baby',2],['baby',87]]

uniquewords = ['apple','pear','grape','baby']

I want a final result as:

[0,3,5,6]

I tried using enumerate(), because index() does not work on a nested list.

a = []
>>> for i in range(len(uniquewords)):
...     for index,sublist in enumerate(words):
...         if uniquewords[i] in sublist:
...             a.append(min(index)) 
... 
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
TypeError: 'int' object is not iterable

I'm sensing that this does not work, because I'm not telling python to append the indexes for each of the uniquewords. How would I get there?

jpp
  • 134,728
  • 29
  • 196
  • 240
song0089
  • 2,381
  • 7
  • 35
  • 60
  • I started down the road to solving this in the lists of lists format, and I ended up converting things in place to dictionaries... It would seem to me that if you used a dictionary, say, that mapped your tokens to a list of their occurrences, you'd have a better time. `k = { 'apple': [5, 7, 8]}` and `min(k['apple'])` would be a fine replacement. Thoughts? – burling Oct 09 '18 at 23:29

3 Answers3

1

One way is to construct a dictionary mapping words to indices via a simple for loop, only if the word does not alread exists in the dictionary. Then use map to extract the index for each word in uniquewords.

d = {}
for idx, (word, _) in enumerate(words):
    if word not in d:
        d[word] = idx

res = list(map(d.__getitem__, uniquewords))

print(res)

[0, 3, 5, 6]
jpp
  • 134,728
  • 29
  • 196
  • 240
0

Per my comment:

# dictionary comprehension... make an empty list entry for each word
k = {word[0]:list() for word in words}
# iterate through the list appending the word occurrence list entries
for word in words:
    k[word[0]].append(word[1])
burling
  • 359
  • 1
  • 4
  • I tried to decipher how this is meant to get `[0, 3, 5, 6]` but failed. Can you give a working example? – jpp Oct 10 '18 at 00:20
  • 1
    Oh, it won't! I totally misread the question. Thanks for pointing that out, @jpp – burling Oct 10 '18 at 00:31
  • @mburling Yea this doesn't answer the question, but can I ask about the first line -- how does k = {word[0]:list() for word in words} get the unique word? (I know it does, I want to know how) – song0089 Oct 10 '18 at 00:40
  • @song0089 It's a lazy one-liner that creates a dictionary. It's equivalent to creating an empty dictionary and inserting a key for each element of the words list we're iterating over the resolves to an empty list. The dictionary enforces uniqueness. – burling Oct 10 '18 at 00:54
0

We can use itertools.groupby due to the format of this list, and grab the index of the first item in list(g) for groupby(words, key=lambda x: x[0])

res = [words.index(list(g)[0]) for k, g in groupby(words, key=lambda x: x[0])]

Expanded:

res = []
for k, g in groupby(words, key=lambda x: x[0]):
    res.append(words.index(list(g)[0]))

print(res)
# [0, 3, 5, 6]

Also, we can search the sublist for our unique words and grab the index and then break. That will stop the loop from grabbing further indexes for each keyword.

res = []
for i in uniquewords:
    for j in words:
        if i in j:
            res.append(words.index(j))
            break
print(res)
# [0, 3, 5, 6]
vash_the_stampede
  • 4,274
  • 1
  • 5
  • 20