IJ in a Regular Expression - What does it mean?

Question

The new version of a type expansion software I upgraded to has included regular expressions. I am trying to understand them a little better so looking to break down what the two they've included to help avoid double capitalization at the beginning of a word.

The first is

\b[:upper:][:upper:][:lower:]+

I take that to mean that there is a word break before the entry begins and the first two letters have a Capital and then one or more lowercase letters.

The Second is

\b(IJ|CC)[:lower:]+

Which I take to mean if a word begins with capital "I" and capital "J" or two consecutive capital "C" plus one or more lowercase letter to allow them as exceptions.

I feel like I am missing something here. Can anyone advise as to what these expressions mean?

possible duplicate of [Reference - What does this regex mean?](http://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean) — Mathletics, Jun 18 '14 at 19:53

user2864740 · Answer 1 · 2014-06-18T19:52:15.227

"IJ" means the character sequence, "I" followed by "J" - nothing special - and the conclusion about the behavior (if not the reasoning) is correct.

The expression \b(IJ|CC)[:lower:]+² is merely a restrictive subset of \b[:upper:][:upper:][:lower:]+¹, which restricts the input that starting with "IJ" or "CC".

String    Matches
------    -------
foo       (None)
IJ        (None)     No mach on [:lower:]+
IJfoo     1, 2       Matches IJ, which also matches [:upper:][:upper:]
CCfoo     1, 2
XXfoo     1          Matches [:upper:][:upper:], not IJ|CC

Gerhard Powell · Answer 2 · 2014-06-18T19:57:46.440

0

[:lower:] and [:upper:] are POSIX Regular Expressions.

\b Word boundary
(IJ|CC) = "IJ" or "CC"
[:lower:] = [a-z]
[:upper:] = [A-Z]
+ One or more matches of what is right before it.

http://www.regular-expressions.info/posixbrackets.html

edited Jun 18 '14 at 19:57

answered Jun 18 '14 at 19:44

Gerhard Powell

5,365
4
46
54

IJ in a Regular Expression - What does it mean?

2 Answers2