-1

The new version of a type expansion software I upgraded to has included regular expressions. I am trying to understand them a little better so looking to break down what the two they've included to help avoid double capitalization at the beginning of a word.

The first is

\b[:upper:][:upper:][:lower:]+

I take that to mean that there is a word break before the entry begins and the first two letters have a Capital and then one or more lowercase letters.

The Second is

\b(IJ|CC)[:lower:]+

Which I take to mean if a word begins with capital "I" and capital "J" or two consecutive capital "C" plus one or more lowercase letter to allow them as exceptions.

I feel like I am missing something here. Can anyone advise as to what these expressions mean?

Keith Thompson
  • 230,326
  • 38
  • 368
  • 578
  • possible duplicate of [Reference - What does this regex mean?](http://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean) – Mathletics Jun 18 '14 at 19:53

2 Answers2

1

"IJ" means the character sequence, "I" followed by "J" - nothing special - and the conclusion about the behavior (if not the reasoning) is correct.

The expression \b(IJ|CC)[:lower:]+2 is merely a restrictive subset of \b[:upper:][:upper:][:lower:]+1, which restricts the input that starting with "IJ" or "CC".

String    Matches
------    -------
foo       (None)
IJ        (None)     No mach on [:lower:]+
IJfoo     1, 2       Matches IJ, which also matches [:upper:][:upper:]
CCfoo     1, 2
XXfoo     1          Matches [:upper:][:upper:], not IJ|CC
user2864740
  • 54,112
  • 10
  • 112
  • 187
0

[:lower:] and [:upper:] are POSIX Regular Expressions.

\b Word boundary
(IJ|CC) = "IJ" or "CC"
[:lower:] = [a-z]
[:upper:] = [A-Z]
+ One or more matches of what is right before it.

http://www.regular-expressions.info/posixbrackets.html

Gerhard Powell
  • 5,365
  • 4
  • 46
  • 54