0

I have a string: "Are you ok? [Hello Hello Hello]. Yes I am! [Bye Bye Bye]"

I need to return ['Hello Hello Hello', 'Bye Bye Bye'] in a list.

Using a regular expression should be the easiest. I have tried findall() but it only returns the first word like Hello and Bye and not the entire string of [Hello Hello Hello] or [Bye Bye Bye]. I have also tried finditer() but that too is returning the only the first world.

text = "Are you ok? [Hello Hello Hello]. Yes I am! [Bye Bye Bye]"
def find_words(text):
    p = re.compile(r'(\w{3,})\s\1')
    for match in p.finditer(text):
        print(match.groups(0))

Expected result ['Hello Hello Hello', 'Bye Bye Bye'] When I run the code I get ['Hello', 'Bye']

1 Answers1

0

You could try this regex?

\b(\w+)\s+\1\b

as from here Regular Expression For Consecutive Duplicate Words

SamHDev
  • 154
  • 11