1

My objective is to extract a date string and following characters using Regex in Google Sheets (sheets function: regexextract) where the string is the last line of a cell and starts with a date format "yyyy-DD-MM" followed by ":". So the RegExpression that I currently have looks like: \d{4}-\d{2}-\d{2}:.+

This works fine but it returns the first match. Instead I want to start at the end of the cell and extract the last match when there are multiple date strings. This is because the contents are stored ascending by date inside the cell.

Sample cell:

2020-05-20: Status update blah blah
2020-05-27: PO Issued blah blah

Result requested: I want the end result to be a string starting with date and the characters that follow "2020-05-27: PO Issued blah blah" which is the last result. However I always get the first match which in the example above is: "2020-05-20: Status update blah blah"

Also I'm doing this in google sheets using regexextract() which shouldn't make a difference in the regex but just wanted to mention it.

Edit: I found out Sheets is using RE2 so I guess it did make a difference.

Freddy
  • 154
  • 2
  • 10
  • 1
    It makes a huge difference that you are using RE2. But try `(?s).*\n(\d{4}-\d{2}-\d{2}:.*)` if you have multiline string inside a cell, and you want to get the date that is at the start of the last line. Or, `(?s).*\n(\d{4}-\d{2}-\d{2}:[^\r\n]*)` – Wiktor Stribiżew May 29 '20 at 18:50
  • this should do it `REGEXEXTRACT(A1,"(?:.+)(\d{4}-\d{2}-\d{2})")` – Maher Fattouh May 29 '20 at 19:08
  • @WiktorStribiżew what you posted here in the comments works. Thanks so much. I know you mentioned it was a duplicate but the other post that I followed didn't work but I didn't realize Sheets uses RE2. This is the correct answer. – Freddy Jun 03 '20 at 11:34
  • Yeah, sorry it was downvoted (not by me, there was some hater here). I did not quite get your requirements, if you really want to only match the date that is at the beginning of the last line, the question is not a duplicate. Please precise your requirements in the question. – Wiktor Stribiżew Jun 03 '20 at 11:40
  • 1
    @WiktorStribiżew I revised my question and note I hope that clarifies that it's a different issue then the one marked as similar dupe. Thanks – Freddy Jun 03 '20 at 12:33

1 Answers1

1

You may use

=REGEXEXTRACT(A1, "(?m)^\d{4}-\d{2}-\d{2}:.*\z")

See the RE2 regex demo and the Google Sheets screenshot:

enter image description here

The (?m)^\d{4}-\d{2}-\d{2}:.*\z regex matches

  • (?m) - a MULTILINE modifier that makes ^ match start of a line and $ match end of a line
  • ^ - start of a line
  • \d{4}-\d{2}-\d{2}:.* - 4 digits, -, 2 digit, -, 2 digits, : and then rest of the line since . does not match line break chars by default
  • \z - the very end of the string (it is not affected by the (?m) modifier).

Note that (?s).*\n(\d{4}-\d{2}-\d{2}:.*) I suggested in the top comment below the question will match the last lines starting with a date, see a regex demo.

Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397