1

I would like to search a .java file using Regular Expressions and I wonder if there is a way to detect one what lines in the file the matches are found.

For example if I look for the match hello with Java regular expressions, will some method tell me that the matches were found on lines 9, 15, and 30?

zx81
  • 38,175
  • 8
  • 76
  • 97
goaman
  • 67
  • 7

3 Answers3

6

Possible... with Regex Trickery!

Disclaimer: This is not meant to be a practical solution, but an illustration of a way to use an extension of a terrific regex hack. Moreover, it only works on regex engines that allow capture groups to refer to themselves. For instance, you could use it in Notepad++, as it uses the PCRE engine—but not in Java.

Let's say your file is:

some code
more code
hey, hello!
more code

At the bottom of the file, paste :1:2:3:4:5:6:7, where : is a delimiter not found in the rest of the code, and where the numbers go at least as high as the number of lines.

Then, to get the line of the first hello, you can use:

(?m)(?:(?:^(?:(?!hello).)*(?:\r?\n))(?=[^:]+((?(1)\1):\d+)))*.*hello(?=[^:]+((?(1)\1)+:(\d+)))

The line number of the first line containing hello will be captured by Group 2.

  • In the demo, see Group 2 capture in the right pane.
  • The hack relies on a group referring to itself. In the classic @Qtax trick, this is done with (?>\1?). For diversity, I used a conditional instead.

Explanation

  • The first part of the regex is a line skipper, which captures an increasing amount of the the line counter at the bottom to Group 1
  • The second part of the regex matches hello and captures the line number to Group 2
  • Inside the line skipper, (?:^(?:(?!hello).)*(?:\r?\n)) matches a line that doesn't contain hello.
  • Still inside the line skipper, the (?=[^:]+((?(1)\1):\d+)) lookahead gets us to the first : with [^:]+ then the outer parentheses in ((?(1)\1):\d+)) capture to Group 1... if Group 1 is set (?(1)\1) then Group 1, then, regardless, a colon and some digits. This ensures that each time the line skipper matches a line, Group 1 expands to a longer portion of :1:2:3:4:5:6:7
  • The * mataches the line skipper zero or more times
  • .*hello matches the line with hello
  • The lookahead (?=[^:]+((?(1)\1)+:(\d+))) is identical to the one in the line skipper, except that this time the digits are captured to Group 2: (\d+)
  • -

Reference

Community
  • 1
  • 1
zx81
  • 38,175
  • 8
  • 76
  • 97
2

If you are using a Unix based OS / terminal, you could use sed:

sed -n '/regex/=' file

(from this StackOverflow response)

Community
  • 1
  • 1
renlo
  • 21
  • 1
  • 6
  • This does not really attempt to answer the question. OP stated the use of regular expressions in Java. – Unihedron Jul 14 '14 at 05:10
  • He wants to find the line numbers within a java file. Using sed, he could use something like: sed -n '/hello/=' foo.java – renlo Jul 14 '14 at 08:24
  • While that is a solution, OP states "_if I look for the match __hello__ with Java regular expressions, will some method ..._", which implied the use of Java, Unix based OS and terminals would be a different dependency. – Unihedron Jul 14 '14 at 08:26
  • The question is only about using a regex and determining a line number. The example of using a java regular expression was not stating a requirement of solving the problem with java. The solution given by Renlo is simple and easy. Just replace "regex" with your regex. – Erin Heyming Aug 08 '14 at 18:26
0

There are no methods in Java that will do it for you. You must read the file line-by-line and check for a match on each line. You can keep an index of the lines as you read them and do whatever you want with that index when a match is found.

Greg
  • 1,160
  • 3
  • 15
  • 34