0

My data look like this:

data = "2020-11-16 18:07:12,752 part1: 3, part2: [0.43732753 0.53800666 0.5945345  0.4680181  0.39867717 0.6513964
 0.6692839  0.5011144  0.41480267 0.40932187 0.40816575 0.44242284
 0.56950533 0.54023486 0.46301603 0.46460354], part3: 0.3253246169133328
2020-11-16 18:07:23,940 part1: 4, part2: [0.4273718  0.5393375  0.591234   0.46008328 0.3886507  0.658916
 0.7164184  0.37173408 0.42199427 0.5302575  0.34260145 0.5678605
 0.5731818  0.5455015  0.45556515 0.47291118], part3: 0.37686885359458105"

I want to extract everything after the time, namely after part1 to the end of part3. The desired output should look like:

output = "part1: 3, part2: [0.43732753 0.53800666 0.5945345  0.4680181  0.39867717 0.6513964
     0.6692839  0.5011144  0.41480267 0.40932187 0.40816575 0.44242284
     0.56950533 0.54023486 0.46301603 0.46460354], part3: 0.3253246169133328"

But the multiple line stuff is making everything break. My current code looks like this.

output = re.findall(r"part1:(.*)\d{4}-\d{2}", data,re.DOTALL)[0]

I tried all the methods I found including all in this post: How do I match any character across multiple lines in a regular expression?
Namely, I tried replacing (.\*) with ([\s\S]\*) or (.|\n|\r\*) or ((?s).\*) and their combinations with the re.DOTALL and re.MULTILINE flags. None of them worked. Could anyone help me out?

update: I tried part1:((.\*)\n)\*?(.\*)part3(.\*) and it worked on the finder in VSCode. But it doesn't work for python.

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397

0 Answers0