Disclaimer: this question has been redone, so comments and answers may appear unrelated. I apologize, but I did it for the sake of a clearer and better structured question.
Suppose a given string where I want to find two different groups (of names), where one group A satisfies condition 1 and group B satisfies condition 2 but also condition 1.
To put it in an example: say I have a mathematical function-
'[class.parameterA] * numpy.exp( [x]*module.constantA - constant_B/[x] ) + [parameter_B]'
-where I control the values of the parameters but not the ones for the constants.
I want to get (by using re.findall()
) a group for the constants
and a group for the parameters.
>>> group1
['numpy.exp', 'module.constantA', 'constant_B']
>>> group2
['class.parameterA', 'x', 'x', 'parameter_B']
I know that for this specific case I shouldn't
match numpy.exp
, but for the sake of the question's purpose, I allow
it to be a match.
To clarify, this question aims to seek for a representation of "ignore matching {sequence}" in regex and to know if there is the possibility to approach the problem in a "satisfy condition 1 ONLY" rather than "satisfy condition 1 and NOT condition 2" manner, so the solution can be extended to multiple conditions. Please provide a partially abstractive answer (not one that is overly specific to this example).
After a while, of course, I was able to find a partial solution (see bonus) for only one of the groups, but any other clear ones are very welcome:
c1 = r'\w+\.?\w*' # forces alphanumeric variable structure
# c1 = r'[\w\.\(\)]*?' allows more freedom (can introduce function calls)
# at the cost of matching invalid names, like class..parameterA
c2 = r'(?<=\[)', r'(?=\])'
re_group2 = c2[0] + c1 + c2[1]
>>>> re.findall(re_group2, func)
['class.parameterA', 'x', 'x', 'parameter_B']
The apparently intuitive bracket negation does not work for group1
, but I may be introducing it incorrectly:
c1 = r'\w+\.?\w*'
nc2 = r'(?<!\[\w)', r'(?!\w\])' # condition 2 negation approach
re_group1 = nc2[0] + c1 + nc2[1]
>>> re.findall(re_group1, func)
['class.parameterA', 'numpy.exp', 'x', 'module.constantA',
'constant_B', 'x', 'parameter_B']
Bonus: if there was, say, module.submodule.constantA
(more than 1 dot), how would the regex change?
I supposed c1 = r'\w+(\.\w+)*'
, but it doesn't do what I expected. Edit: I need to use a non-capturing group since I'm using re.findall
. So c1 = r'\w+(?:\.\w+)*'
.