1

Trying to capture the timestamp in this log event (for Splunk)

172.21.201.135 | http | o@1I0BTOx1063x3667295x0 | hkv | 2020-06-10 17:43:18,951 | "POST /rest/build-status/latest/commits/stats HTTP/1.1" | "http://bitbucket.my.com/projects/WF/repos/klp-libs/compare/commits" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36" | 200 | 345 | 431 | - | 5 | 3dk4qm | 

Using the setting TIME_PREFIX, Splunk software uses the specified regular expression to looks for a match before attempting to extract a timestamp.

TIME_PREFIX = <regular expression>  

Default behaviour would be for Splunk to try to get the timestamp from the start of the line, but that is an IP-adress, therefore the need for the regex to match four pipes which is the ...time_prefix.

By using the following regex

(?:[^\|]*(\|)){4}

I want the regex to match on the fourth occurence of the '|', and then stop, non-greedy I guess.

rhellem
  • 639
  • 1
  • 9
  • 20
  • 1
    You need `^(?:[^|]*\|){4}(?[^|]*)`, I believe. See [regex demo](https://regex101.com/r/2CHkE6/2). Or. `^(?:[^|]*\|){4}\s*(?[^|]*[^|\s])` – Wiktor Stribiżew Jun 10 '20 at 16:58
  • Please try it and let know if it works for you – Wiktor Stribiżew Jun 10 '20 at 17:13
  • The first one will match the timestamp, that I do no need, I only need it to stop on the fourth occurence of the pipe, not capture. But - it might be that I actually do not need the regex to stop, checking now if Splunk got what it needed. – rhellem Jun 10 '20 at 17:16
  • what exactly do you want to extract? you said you want to capture the timestamp in your question, yet in your comment, you say you do not need it – Chase Jun 10 '20 at 17:18
  • Question updated. For understanding regex purpose I still would like to know if it possible to stop matching after the first match, but for my actual problem related to Splunk, it might actually be good enough... – rhellem Jun 10 '20 at 17:18
  • what does it mean for the regex to "stop". What's the point of the regex stopping if you don't want to capture anything? Do you want to extract everything before the 4th `|`? – Chase Jun 10 '20 at 17:19
  • 1
    Then `^(?:[^|]*\|){4}\s*` will do. – Wiktor Stribiżew Jun 10 '20 at 17:26

1 Answers1

1

There are two things to consider:

  • Anchor the pattern at the start of the string, else, the environment may trigger a regex search at every position inside the string, and you may get many more matches than you expect

  • When you do not need to create captures, i.e. when you needn't save part of the regex match to a separate memory buffer (in Splunk, the is equal to creating a separate field), you should use a non-capturing group rather than a capturing one when grouping a sequence of patterns.

Thus, you need

^(?:[^|]*\|){4}\s*

See the regex demo showing the match extends to the datetime substring without matching it.

Details

  • ^ - start of string anchor
  • (?:[^|]*\|){4} - a non-capturing group ((?:...)) that matches four repetitions ({4}) of any 0 or more chars other than | ([^|]*) and then a | char (\|)
  • \s* - 0 or more whitespaces.
Wiktor Stribiżew
  • 484,719
  • 26
  • 302
  • 397