-4

i have a list of strings as :

te = ['Published 10 December 2020',
      'Published 10 November 2020\n    Last updated 30 November 2020\n      — see all updates',
      'Published 1 October 2020\n    Last updated 21 October 2020\n      — see all updates',
      'Published 23 July 2020\n    Last updated 1 December 2020',
      'Published 1 March 2021\n    Last updated 21 October 2020\n      — see all updates']

I want to extract only the date string which is coming after Published, what I want to get from this list is :

['10 December 2020',
      '10 November 2020',
      '1 October 2020',
      '23 July 2020',
      '1 March 2021']

how to do that? can we use any regex for it.?

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397
Talib Daryabi
  • 411
  • 3
  • 13

1 Answers1

0

Yes you can use regex. Try this regex

import re

pattern = r'(Published) (\d{1,2}\s\w+?\s\d{4})'
regex = re.compile(pattern)

te = ['Published 10 December 2020',
      'Published 10 November 2020\n    Last updated 30 November 2020\n      — see all updates',
      'Published 1 October 2020\n    Last updated 21 October 2020\n      — see all updates',
      'Published 23 July 2020\n    Last updated 1 December 2020',
      'Published 1 March 2021\n    Last updated 21 October 2020\n      — see all updates']
      
ans = []

for x in te:
    result = regex.findall(x)[0][1]
    ans.append(result)
    
print(ans)

Output:

['10 December 2020', '10 November 2020', '1 October 2020', '23 July 2020', '1 March 2021']
Hirusha Fernando
  • 458
  • 2
  • 16