re.split(r'\W+', 'Words, words, words.') output ['Words', 'words', 'words', '']

Question

I read the example in Python official docs reference to regex re.split()

>>> re.split(r'\W+', 'Words, words, words.')
['Words', 'words', 'words', '']

I am confused with the output, I guess it will produce

[",",  ",",  ","]

I think the following is legible:

In [100]: re.split(r',', 'Words, words, words.')
Out[100]: ['Words', ' words', ' words.']

How could (r'\W+', 'Words, words, words.' output that match?

score 1 · Accepted Answer · answered Aug 23 '18 at 06:20

The \W character stands for anything which is not a word (\w), that is, anything which is not a [a-zA-Z0-9_].

In your case, the , matches the \W+ expression (one or more characters which is not an alphanumeric character or an underscore), which is why you are ending up with an alphabetic output.

re.split(r'\W+', 'Words, words, words.') output ['Words', 'words', 'words', '']

1 Answers1