Slicing list with different string matching conditions

Question

I'd like to slice a list of strings based on substrings possibly contained into its elements:

l = ['Some long text', 'often begins', ' with ',
     'impenetrable fog ', 'which ends', ' somewhere further']
startIndex = [u for u, v in enumerate(l) if 'begins' in v)][0]
finalIndex = [u for u, v in enumerate(l) if 'ends' in v)][0]

so that I'd get:

' '.join(l[startIndex:finalIndex]) == 'often begins with impenetrable fog'

My main problem being that the beginning and end conditions used to get indexes are different and should be variable (basic substring containment as above-mentioned, regexes or other methods possible).

First and last elements might need to be stripped out but I guess this is a matter of adjusting indexes by 1. My code works in the ideal cases but will often fail as structure and contents of l are not very predictable. Absence of one or both elements matching conditions should end up with the final string being None.

Are comprehensions relevant, or mapping a lambda function to apply both conditions?

Can you please provide the clear example? Do you want all the `strings` in `list` between the `start` and `end` string. For example: `['How May', 'I help', 'You with', 'Your Problem', 'Andreas']`. With start = `help`, and end = `Problem`. What should be your required output? — Anonymous, Aug 25 '16 at 09:31
Yes I'd like to get `'You with Your Problem'` as output with your example. Thanks for replying anyway! — Andreas Lawrence, Aug 25 '16 at 09:45
By the way, what did you already try with map and lambda functions? — Kruupös, Aug 25 '16 at 20:46

score 1 · Answer 1 · answered Aug 25 '16 at 09:48

Try:

l = ['Some long text', 'often begins', 'with', 'impenetrable fog', 'which ends', 'somewhere further']

"""
return the index of the phase in 'phases' if phase contains 'word'
if not found, return 'default'
"""
def index(phases, word, default):
    for i, s in enumerate(phases):
        if word in s: return i
    return default

startIndex = index(l, "long", -1)
finalIndex = index(l, "somewhere", len(l))

print(' '.join(l[startIndex+1:finalIndex]))

Kruupös · Answer 2 · 2016-08-25T20:32:49.957

Or with next():

l = ['Some long text', 'often begins', ' with ', 'impenetrable fog ', 
     'which   ends', ' somewhere further']

startIndex = next((u for u, v in enumerate(l) if 'begins' in v), 0)
finalIndex = next((u for u, v in enumerate(l) if 'ends' in v), 0)

if (startIndex and finalIndex) and (finalIndex > startIndex):
    sentence = ' '.join(l[startIndex:finalIndex])
else:
    sentence = None
print(sentence)

Similar as list comprehension, execpt it doesn't return a list but the first element it found. if it doesn't found anything, it return an optional element (here '0')

This way, if there is no 'begins' or no 'ends' in your list, you don't have to print anything. Therefore, this allows you to check either if the 'ends' comes before the 'begins'.

I also love list comprehension but sometimes what you need isn't a list.

SOLUTION FOR ADVANCE USER:

The problem with the use of two comprehension list, is that you check twice your list from start and it will fail when ends comes before start:

l = ['Some long text ends here',  'often begins', ' with ', 'which   ends']
                     ^^^

To avoid this, you might use a generator with send() to only iterate once on your list.

def get_index(trigger_word):
    for u, v in enumerate(l):
        if trigger_word in v:
            trigger_word = yield u

gen = get_index('begins')
startIndex = gen.send(None)
finalIndex = gen.send('ends')

Here, the yield allows you to get the index without exiting the function.

This is better, but if there is no begins or endsin the list, there will be a StopIteration exception. To avoid this, you can just do a infinite loop on yield 0 instead. Now the complete solution will be:

def get_index(l, trigger_word):
    for u, v in enumerate(l):
        if trigger_word in v:
            trigger_word = yield u
    while True:
        yield 0

def concat_with_trigger_words(l):           
    gen = get_index(l, 'begins')
    startIndex = gen.send(None)
    finalIndex = gen.send('ends')
    return ' '.join(l[startIndex:finalIndex]) if (startIndex and finalIndex) else None

# Here some list for free lists for your future unitary tests ;)

l_orignal = ['Some long text here',  'often begins', ' with ', 
             'impenetrable fog ', 'which   ends', ' somewhere further']
l_start_with_ends = ['ends',  'often begins', ' with ', 
                     'impenetrable fog ', 'which   ends', 'begins']
l_none = ['random', 'word']
l_without_begin = ['fog', 'ends here']
l_without_end = ['begins', 'but never' '...']

print(concat_with_trigger_words(l_orignal)) # often begins  with  impenetrable fog 
print(concat_with_trigger_words(l_start_with_ends)) # often begins  with  impenetrable fog 
print(concat_with_trigger_words(l_none)) # None
print(concat_with_trigger_words(l_without_end)) # None
print(concat_with_trigger_words(l_without_begin)) # None

score 1 · Answer 3 · answered Aug 25 '16 at 09:55

>>> l = ['Some long text', 'often begins', ' with ',
...      'impenetrable fog ', 'which ends', ' somewhere further']
>>> start, end = 'begins', 'ends'
>>> key_index = {'start': {'word': start, 'index': -1}, 
                 'end': {'word': end, 'index': -1}}
>>> for i, val in enumerate(l):
...     if key_index['start']['word'] in val:
...         key_index['start']['index'] = i
...     elif key_index['end']['word'] in val:
...         key_index['end']['index'] = i
...
>>> start_index, end_index = key_index['start']['index'], key_index['end']['index']
>>> my_list = l[start_index+1:end_index] if start_index >=0 and end_index >= 0 and start_index+1 < end_index else None
>>> my_list
[' with ', 'impenetrable fog ']

Slicing list with different string matching conditions

3 Answers3