0

I'm trying to capture all words starting with upper case going one after another and preceding the Inc word. For example, to capture Test Alphabet from the whole row Parent company Test Alphabet Inc. announced. I made a regular expression pattern:

([A-Z]{1}[a-z]+)+

which takes all words starting with upper case. But it grabs Parent, which is not needed. When I try to limit the condition in this way:

([A-Z]{1}[a-z]+)+ (?=(Inc))

it takes only Alphabet and doesn't grab Test word which is needed. Please help me understand how to grab all words starting with upper case following one another and preceding Inc word? Thanks in advance!

Vlad
  • 426
  • 4
  • 12

2 Answers2

0

You can use this lookahead regex to match:

[A-Z][a-zA-Z]*(?=\s*(?:[A-Z][a-zA-Z]*\s+)*Inc\.)

RegEx Demo

  • [A-Z][a-zA-Z]* matches a word that starts with uppercase letter
  • Lookahead expression inside (?=...) ensures that we have 0 or more uppercase words followed by Inc. ahead of current word.
anubhava
  • 664,788
  • 59
  • 469
  • 547
  • @anubgava thanks! As far as I can see it sees `Alphabet` and `Test` as to separate words? – Vlad Jan 25 '18 at 15:24
  • To match it as a single string with spaces in between use: `[A-Z][a-zA-Z]*(?:\s+[A-Z][a-zA-Z]*)*(?=\s+Inc\.)` – anubhava Jan 25 '18 at 15:27
0

Try

((?:[A-Z]\w*\s*)*\s?)(?=\sInc)

It capture the company name as one group. It takes one shortcut using \w as allowed characters in the name. This means names can be a mixture of upper and lower case letters, as well as _. If this is unwanted behavior, change the \w to [a-z] for lower case letters only, or [A-Za-z] for mixed lower and upper case.

See it here at regex101.

SamWhan
  • 8,038
  • 1
  • 14
  • 42
  • You can save a lot of steps and cut the space at the end by using `(?:[A-Z]\w*\s*?)+(?=\s+Inc)` instead. – ctwheels Jan 25 '18 at 15:27
  • @ctwheels True, if just matching is OK. That doesn't capture the name as requested. But I agree with you. – SamWhan Jan 25 '18 at 15:29