0

I need to replace uppercase O inside sentences, but should keep O in the beginning of a sentence.

So I need to find and match only lone O inside sentences.

For example :

OOP is a programing concept. Objects are instances Of O classes
                             ↑                        ↑
                           corect                   replace

The O in Objects is correctly capitalized starting a sentence. It should also take care of not matching the line beginning itself. The second O in 'O classes' should be the number ZERO 0 meaning quantity. The regex I need is to match that one.

I know how to match any O, but it would also match the first one. Any clues?

KyleMit
  • 45,382
  • 53
  • 367
  • 544

1 Answers1

2

Just to expand on Tibrogargan's comment, you can identify cases like this:

let input = "OOP is a programing concept. Objects are instances Of O classes"
let regex = /((?<!\.\s{1,2})(?<=\s)O(?=\s))/g
let output = input.replace(regex,'0')

console.log(output)

RegExp Groups

This uses two types of groups: the negative lookbehind and the positive lookahead & positive lookbehind. These are usefully called lookaround groups since they let you evaluate characters on either side of the main string without becoming part of the match themselves.

With the negative lookbehind, if the values contained by (?<!VALUES_HERE) are found before the following expression outside the parentheses, it will reject the match.

The backslash in the expression \. is for escaping the period since periods are part of the RegEx syntax. The \s represents one whitespace, but the following brackets are quantifiers which specify the expression will recognize 1 to 2 instances of whitespace.

Whereas, with the positive lookahead and lookbehind, if the expression contained by the the lookahead form (?=VALUES_HERE) or the lookbehind form (?<=VALUES_HERE) match and the expression before or after (respectively), then it will only return the main expression, not the lookahead or lookbehind.

Why like this?

Written like this, the expression throws out all matches which have a period with a space or two before the "O". It's looking for up to two spaces because we know the input here is from an OCR, which could occasionally misrepresent one space.

If the keyword were a proper noun, this expression would be more difficult to compose since the word would continue to appear with capitalization throughout the sentence, which is not the case here.

It will only accept as a match any "O" which has one space on both sides, preventing the accidental match of words which begin with a capital "O".

See the live expression on RegExr.

Further Reading