-2

I Want to match the below Code with my Regex, Simply say I want to match the whole for loop statement starting from for and ending with the }.

This is what I tried, but In my approach I have to give \R exactly the same times as the number of lines in the for loop, Can it be dynamic ? Or is there a better solution. Please tell me

Here is my code for Regex

for.+\(.*\R.*\R.*\R.*\R.*\R.*\R.*

And this is what I want to match

  for (i = 2; i <= n / 2; ++i) {
    // condition for non-prime
    if (n % i == 0) {
      flag = 1;
      break;
    }
  }
Toto
  • 83,193
  • 59
  • 77
  • 109
Mukul
  • 786
  • 4
  • 9
  • 3
    Using a parser would probably be a better solution. – choroba Mar 18 '21 at 10:42
  • I want to do it with regex, can you help me with that ? and I dont know which parser are you talking about, is it related to web scraping ? – Mukul Mar 18 '21 at 10:44
  • What language or regex library do you use? – choroba Mar 18 '21 at 10:54
  • 1
    You need a library that will treat newlines as whitespace .. – Mr R Mar 18 '21 at 11:02
  • True regular expressions can't count, so e.g. matching brace pairs to find where the close `}` on the function is beyond what they can do. Some "regex" engines actually go quite a bit beyond what a true regular expression can do, so it *might* be possible. See [this question](https://stackoverflow.com/questions/546433/regular-expression-to-match-balanced-parentheses). – Gordon Davisson Mar 18 '21 at 11:06
  • Anyways Can you guys suggest me a good Regex books, I want to master them from scratch. – Mukul Mar 18 '21 at 11:24

3 Answers3

1

Here's an example using AWK - it uses Regex for the matching, and some logic for the { in and } out processing.

awk 'BEGIN { infor=0; }
/for *\(.*{/ { infor++; print $0; next; }
/^.*$/ { if (infor) { print $0; } }
/{/ { if (infor) { infor++; } }
/}/ { if (infor) { infor--; } }'

Given this input -

What the AWK man

for (i = 2; i <= n / 2; ++i) {
    // condition for non-prime
    if (n % i == 0) {
      flag = 1;
      break;
    }
  }

echo ME
Echo you

  for (i = 2; i <= n / 2; ++i) {
      // condition for non-prime
      if (n % i == 0) {
        flag = 1;
        break;
      }
    }

it gives this output of just the for loops.

for (i = 2; i <= n / 2; ++i) {
    // condition for non-prime
    if (n % i == 0) {
      flag = 1;
      break;
    }
  }
  for (i = 2; i <= n / 2; ++i) {
      // condition for non-prime
      if (n % i == 0) {
        flag = 1;
        break;
      }
    }
Mr R
  • 680
  • 2
  • 15
0

You can try this regex :

#!/usr/bin/env bash

grep -z -Po '(?s)[ \t]*for [^{]+{[^{}]+condition for non-prime[^}]+}[^}]+}' << EOF
  for (i = 2; i <= n / 2; ++i) {
    // Different comment
    if (n % i == 0) {
      flag = 1;
      break;
    }
  }
  for (i = 2; i <= n / 2; ++i) {
    // condition for non-prime
    if (n % i == 0) {
      flag = 1;
      break;
    }
  }
other stuff
Philippe
  • 8,203
  • 1
  • 14
  • 17
0

An issue with your requirement is the possibility of unmatched braces (in comments or strings):

// This comment line will break the search of matching } pairs
print "And a message with a single } fails too."

When the code is well indented (perhaps some code formatting tool in your IDE), you can try a solution using the indents:

sed -rz 's/.*(^|\n)(\s*)(for[^\n]*\n)((\2[^}][^\n]*\n)*[^\n]*).*/\2\3\4\n/' inputfile

This will fail for loops with an empty line and will only extract the last loop from the file. I won't explain the command, using regex for your task is too complex.

When you know how much spaces are used for your indents (of rewrite the next command for tabs), you might try selecting a for-loop without or with 2 spaces indent.

sed -n '/^  for/,/^  }/ p; /^for/,/}/ p' inputfile

This is all guessing and will fail in real life cases. Please think about why you want to extract the for-loops and consider writing a parser that understands the syntax.

Walter A
  • 16,400
  • 2
  • 19
  • 36