1

I was trying to search some patterns in regular expression in python.

As we know if we use the pattern, '[A-Za-z]+' , it means find the sequence of characters which either contains uppercase A to Z or lowercase a to z. So my single line of code (followed by answer) was,

>>> re.findall('[A-Za-z]+', 'This is my area!')
['This', 'is', 'my', 'area']

When I tried some another pattern '[[A-Z][a-z]]+' it returned an empty list, single line of code is as follow,

>>> re.findall('[[A-Z][a-z]]+', 'This is my area!')
[]

So, when I use two sequence inside a character set, what pattern is getting created ? Please guide.

Salil Tamboli
  • 98
  • 1
  • 11

1 Answers1

6

The engine will treat [[A-Z][a-z]]+ as:

  • [[A-Z] as the first character class which allows any upper case letter (A-Z) or a [. Think of it as [\[A-Z] instead where [ is escaped
  • [a-z] as the second character class which will allow any lower case letter (a-z)
  • ]+ allows one or more ]

So it will match [b] or Aa] or Aa]]]]]]] etc. and won't match your string 'This is my area!'

To play around with it further you can try this regex101 demo.

Nesting of Square Brackets:

To better understand how nesting of square brackets works in regex, consider another example [[[ABC]]]

Once an opening square bracket is found ([) which indicates the start of a character class then all subsequent opening square brackets [ are treated as escaped \[ until a closing square bracket is encountered ] which indicates end of the character class.

To test it out, take a look at these examples:

  • [[[ABC]]] is same as [\[ABC] followed by ]{2}. Example A]]

  • [[[[ABC]]]] is same as [\[ABC] followed by ]{3}. Example []]]

  • [[[[[ABC]]]]] is same as [\[ABC] followed by ]{4}. Example A]]]]

Hope this helps!

degant
  • 4,466
  • 1
  • 13
  • 28