So to understand this block of code, you have to understand a bit of regular expressions and a bit of the re
Python module. Let's first look at what re.sub
does. From the docs, the signature of the function looks like
re.sub(pattern, repl, string, count=0, flags=0)
Of importance here are the pattern
, repl
, and string
parameters.
pattern
is a regular expression pattern to be replaced
repl
is what you want to replace the matched pattern with, can be a string or function that takes a match object as an argument
string
is the string you want the replacement to act on
The function is used to find portions of the string
that match the regex pattern
, and replace those portions with repl
.
Now let's go into the regular expression used: [_-](.)
.
[_-]
matches any of the characters within the square brackets (_
or -
)
.
matches any character
(.)
captures any character in a capture group
Now let's put it all together. The full pattern will match two characters. The first character will be a _
or -
and the second character can be anything. In effect, the bold portions of the following strings will be matched.
- one_two
- test_3
- nomatchhere-
- thiswill_Match
- NoMatchHereEither_
- need_more_creative_examples-
The important part here is that the (.)
portion of the regex matches any character and stores it in a capture group, this allows us to reference that matched character in the repl
part of the argument.
Let's get into what repl
is doing here. In this case, repl
is a lambda function.
lambda x: x.group(1).upper()
A lambda is really not too much different than a normal Python function. You define arguments before the colon, and then you define the return expression after the colon. The lambda above takes x
as an argument, and it assumes that x
is a match object. Match objects have a group
method that allows you to reference the groups matched by the regex pattern (remember (.)
from before?). It grabs the first group matched, and uppercases it using the str
object's builtin upper
method. It then returns that uppercased string, and that is what replaces the matched pattern
.
All together now:
import re
def to_camel_case(text):
return re.sub('[_-](.)', lambda x: x.group(1).upper(), text)
The pattern
is [_-](.)
which matches any underscore or dash followed by any character. That character is captured and uppercased using the repl
lambda function. The portion of string
that matched that pattern is then replaced with that uppercased character.
In conclusion, I think the above answers most of your questions, but to summarize:
I looked up about re.sub() and group(), but I still couldn't put it together. I'm not sure how [_-](.)
works, how come [_-](w+)
doesn't work?
I will assume that you meant to use the \w
character set, instead of just w
. The \w
character set matches all alphanumeric characters and underscores. This pattern would work if the +
operator was not used. The +
matches characters greedily, so it will cause all characters that belong to the \w
set that follow an underscore or hyphen to be captured. This causes two issues: it will capitalize all captured characters (which could be a whole word) and it will capture underscores, causing later underscores to not be properly replaced.
How did he get ride of the hyphen and underscore with sub?
The function given to repl
returns only the uppercased version of the first capture group. In the pattern [-_](.)
, only the character following the hyphen or underscore is captured. In effect, the pattern [-_](.)
is matched and replaced with the uppercased character matched by (.)
. This is why the hyphen/underscore is removed.
Successfully capitalize only the first char of each words except the first word?
I thought x.group(1).upper() would capitalize the entire word, how come group(1) is referring to the first char?
The capture group only matches the first character following the underscore or hyphen, so that is what is uppercased.