3

I'm having problems designing a regular expression. I'm not even sure if it is possible at all.

I want to match n characters, but one of it has to be a line break (or any defined character).

This is my input:

0000000
0000000
000A000
00AB000
AAAB000
ABBB000

My (not working) regex is

.*A.{5}A.{5}A.{5}A.*

Changing the mode to DOTALL is not enough since I have to ensure there is one linebreak between each matched A

I just want to know if my input matches, I don't want to extract anything.

I want to check if there is an A-diagonal in my input.

Unihedron
  • 10,251
  • 13
  • 53
  • 66
samjaf
  • 892
  • 1
  • 7
  • 18
  • 2
    wht should be the output – vks Sep 30 '14 at 10:51
  • 1
    What should it match? The input sample doesn't match even with DOTALL part of your regex `A.{5}A.{5}A.{5}` don't understand question :) – Jonny 5 Sep 30 '14 at 10:57
  • My example regex does not work. My example input should be matched by the regex. – samjaf Sep 30 '14 at 10:59
  • it works only if you change the line before the last line to `AAAB00A` – Avinash Raj Sep 30 '14 at 11:01
  • I'm not sure I understand but, with this: `.*A+.*` (replace `A` with the character you need), you'll match any string that has at least one `A`. This matches all the lines except the first two. With `\n` as the character, it matches all of them. http://regex101.com/r/xG9zZ6/1 – mechalynx Sep 30 '14 at 11:09
  • Well, if your initial problem is to check whether your input contains a A-diagonal or not, then I'm afraid regexes aren't powerful enough. Related: http://meta.stackexchange.com/q/66377/186921 `;)` – sp00m Sep 30 '14 at 11:43
  • @sp00m New link for the XY Problem: http://xyproblem.info – Unihedron Sep 30 '14 at 13:02

4 Answers4

5

Self-referencing groups (Qtax Trick)

/^(?:.(?=.*+\n(\1?+.).*+\n(\2?+.).*+\n(\3?+.)))*?...A.*+\n\1?+..A.*+\n\2?+.A.*+\n\3?+A/m

Explanation:

  • ^ Starts of a line.
  • (?:. Matches any character except newline.
  •     (?= Positive lookahead: Asserts that the following can be matched. This part is for capturing.
  •       .*+\n Matches everything up to the line, then the newline itself as well.
  •             (\1?+.)
    •   ?+: If this group has been matched, consume and add a character to the group, otherwise just match a character, and advance through.
  •       .*+\n Matches everything up to next line, same as the above.
  •             (\2?+.) Same as subpattern 1.
  •       .*+\n Advances the line.
  •             (\3?+.) Same as subpattern 1 and 2.
  •     ) Finishes the lookahead.
  • )*? Zero or more, match reluctantly.

The above group does the following. Note the colored groups:

pic
(source: gyazo.com)

However because this group is reluctantly quantified, this happens:

pic
(source: gyazo.com)

Note that while the colored groups may or may not be captured, the pointer location remains unchanged during the capture. Hence, at the very first iteration all capturing groups captures nothing. As such, we move onto the next part of the regexp:

  • ...A Three characters, then an "A" (literal character).
  •     .*+\n Skip through the rest of the line, and the newline character...
  • \1?+ If we captured group one, consume it!
  •   ..A Two characters, then an "A" (literal character).
  •     .*+\n Next line.
  • \2?+ Consume if possible.
  •   .A You get the idea, but I'll write text anyway. Same as upstairs.
  •     .*+\n Advance.
  • \3?+ ......
  • A End of match!

If you don't like text, I'll just draw it one more time:

pic
(source: gyazo.com)

Let's all bow to our master -

“vertical” regex matching in an ASCII “image”

Here's a code demo, and here's a regex demo of an extended version.

Community
  • 1
  • 1
Unihedron
  • 10,251
  • 13
  • 53
  • 66
3

With DOTALL the . also counts in newlines. Possibly you want such as A(?:[\r\n]*[^\r\n]){5} where I used [\r\n] as a newline and the negation [^\r\n] as non-newline.

This repeated at least 4 times with any amount of any characters before or after it:

.*?(?:A(?:[\r\n]*[^\r\n]){5}[\r\n]*){4,}.*

Additional could use a negative lookahead after A to verify, that there's not a sequence of at least 7 characters without a linebreak: (?![^\r\n]{7}) so the pattern becomes:

.*?(?:A(?![^\r\n]{7})(?:[\r\n]*[^\r\n]){5}[\r\n]*){4,}.*

Test at regex101

Jonny 5
  • 11,051
  • 2
  • 20
  • 42
  • If mine is correct depends on OP requirement, looking forward to seeing yours @Unihedron – Jonny 5 Sep 30 '14 at 12:23
  • 1
    After taking a closer look at how your regex works, I realized that yours does match very well, and is very efficient over mine on both successful matches and fails, but it doesn't work properly when the width is different or not fixed. – Unihedron Sep 30 '14 at 16:39
1

Below regex would ensures that there is a newline character present in between the A's.

(?:(A)(?:(?=.*?\n)(?=.*?.).){6}){1,}(?=(A))

DEMO

Avinash Raj
  • 160,498
  • 22
  • 182
  • 229
0
^[^A]*|(A.{6})

Try this.This will match your input.Because there was a \n .{5} was creating problems.When you change it to .{6} yo get the correct input.See the captures.Do not forget to add s and g modifiers.

See demo.

http://regex101.com/r/nA6hN9/46

vks
  • 63,206
  • 9
  • 78
  • 110