I apologize for the poorly worded question.
I have a large number of strings like:
"ODLS_ND33283633__PS1185"
Which the first letters up to the first "_"
are a header and the remainder (ND33283633__PS1185) is a unique ID.
I wrote a regex in python trying to remove everything up to the first "_"
desiring
"ND33283633__PS1185"
as the end result.
I figured something like:
.*_? or .+?_
Would do the trick, but that was not the case...
I kept trying to write various regex unsuccessfully to accomplish this and finally went online and found another person's answer I was able to use as an example to rewrite as:
^[^_]+_
Which gave me my desired result, but now I have questions which I can't figure out the answer for:
I found that removing the "^" at the front and writing it as:
[^_]+_
caused the regex to remove everything up to the second "_" so the resulting string was:
"_PS1185"
I understand that "^"
identifies as the beginning of the line, but I would like to know why not including it removes up to the second without the "^" at the front?
My understanding is that [^_]+
matches characters NOT equal to "_"
1 or more number of times, so why would including the "^" at the beginning cause it to stop at the first, while excluding it causes it to stop at the second?
Another thing, when I replaced the "+"
symbol with a "*"
:
[^_]*_
I expected the same result but instead got:
PS1185
I thought that *
matches 0 or more, while +
matches 1 or more, so they're effectively the same except + is supposed to be more 'strict'. However, seeing these results makes me feel like I don't fully understand how regex is behaving. Is there anyone here that can please explain what is actually going on?