1

So I have incoming data that looks something like this:

Applications                    7 days          6 days

And I'm trying to create regex that will match this line but not a line that has another column, like this:

Applications                    7 days          6 days        5 days

The regex that I'm trying to use is:

^(.*?)(\s){4,}(.*?)(\s){4,}[^(\s){2}]+

Where [^(\s){2}]+ would mean selecting everything up to a double space. The problem with this is that

  1. it doesn't work to begin with.
  2. the second line I have would still match this.

Is there any regex I can use to only match the 3 column table and not the 4 column, 5 column, etc.?

Taher Khorshidi
  • 5,021
  • 5
  • 28
  • 51
tallkid24
  • 1,267
  • 2
  • 12
  • 16
  • I would go about it differently and just test by splitting each line on `\s{2,}` then checking that the length of the array is equal to 3. – tenub May 16 '14 at 16:40
  • is this space or tab-delimited string? – nikis May 16 '14 at 16:40

2 Answers2

2

You should take care with character classes ([]) as some chars inside are treated literally (as if they were escaped).

Try this regex (demo here):

^((?:(?!\s\s).)+)(?:\s){4,}((?:(?!\s\s).)+)(?:\s){4,}((?:.(?!\s\s))+)$
  • I switched the (.*?) with ((?:(?!\s\s).)+) which will match everything up to a sequence of two spaces.
  • I added a $ at the end, so it wouldn't match the lines with more than two columns.
  • I also added some ?: so the groups would become non-matching groups.
  • Finally, I removed the character class from the end of the regex and added a negative look-ahead.

Columns not ending with spaces

This one will not accept lines where the second column ends with spaces (demo here):

^((?:(?!\s\s).)+)(?:\s){4,}((?:(?!\s\s).)+)(?:\s){4,}((?:.(?!\s\s)(?!\s$))+)$

Notice the addition of a second negative look-ahead in the last group: (?!\s$).

Community
  • 1
  • 1
acdcjunior
  • 114,460
  • 30
  • 289
  • 276
  • The only problem with that is that if there is more than 2 spaces after the end of the line it will still match it. Is there anyway for that to look ahead for 2 OR MORE spaces? Would it be as simple as adding a quantifier in the negative lookahead at the end? – tallkid24 May 16 '14 at 16:43
  • I added another negative-lookahead so it won't match lines ending with spaces. – acdcjunior May 16 '14 at 16:59
  • This is awesome! It works very well! May I ask some questions on it? What does the ?: do in regex? – tallkid24 May 16 '14 at 17:59
  • Sure. The `?:` is used to make the group a [non-capturing group](http://www.regular-expressions.info/brackets.html). For instance, if you have a regex like `(.*?)(a){2}(.*?)(b){2}`, when you "run" it in a string like `123aa789bb` the matched groups (important in replacing) will be: 1-`123`, 2-`aa`, 3-`789` and 4- `bb`. Now, maybe you are just interested in the `(.*?)` parts in the matched groups. In that case, you use `?:` in the ones you don't want, like: `(.*?)(?:a){2}(.*?)(?:b){2}`. This way, the matched groups in that sample string would be 1-`123` and 2-`789`. – acdcjunior May 16 '14 at 20:08
  • In other words, `?:` has no effect in the regex itself, just in what it will consider to be matched groups. In your case, as you probably didn't want to reference the `(\s){4,}`, I used `(?:\s){4,}`. In the end, if what the matched groups are doens't matter to you, you can ditch `?:` and let your regex be a little more readable. – acdcjunior May 16 '14 at 20:11
0

try this :

^[^\s]*(\s{2,}[^\s].*){2,}

assuming before each column-value there is at least 2 spaces.

DEMO

Taher Khorshidi
  • 5,021
  • 5
  • 28
  • 51