0

I've a path dir\n\tsubdir1\n\tsubdir2\n\t\tfile.ext that I want to process one segment at a time. For each segment, I want to know how many tabs precede it, and I want to have the rest of the path intact. For the given example

Iteration 1:

Preceding tabs: 0
Segment: dir
Rest: \n\tsubdir1\n\tsubdir2\n\t\tfile.ext

Iteration 2:

Preceding tabs: 1
Segment: subdir1
Rest: \n\tsubdir2\n\t\tfile.ext

Iteration 3:

Preceding tabs: 1
Segment: subdir2
Rest: \n\t\tfile.ext

Iteration 4:

Preceding tabs: 2
Segment: file.ext
Rest: ""

The pattern I came up with is ((?<=\\R)\\h*)(\\H+). However, that is giving me \tsubdir1\n as the first match. What am I doing wrong?

Abhijit Sarkar
  • 16,021
  • 13
  • 78
  • 152

1 Answers1

1

Since all sections are separated by line separator \n you can simply use .+ to match them since by default dot . can't match line separators, so you are sure that it will stop before \n (or any other line separator like \r).

You can also add some groups to separate tabs from actual segment like named group (?<tabs>\t*) to match zero or more tabs at start of each match.

To print rest of text after match simply substring after index of last matched character (you can obtain it via Matcher#end).

To print string which will contain \n and \t (not as literals but as pair of backslash and letter) you can either manually replace each "\n" with "\\n" and "\t" with "\\t" or use utility class like StringEscapeUtils from org.apache.commons.lang which contains escapeJava method which does it for us.

So your code can look like:

String path = "dir\n\tsubdir1\n\tsubdir2\n\t\tfile.ext";
Pattern p = Pattern.compile("(?<tabs>\t*)(?<segment>.+)");//dot can't match line separators
Matcher m = p.matcher(path);
int i = 1;
while(m.find()){
    System.out.println("iteration: " + i++);
    System.out.println("Preceding tabs: " + (m.group("tabs").length()));
    System.out.println("Segment: " + m.group("segment"));
    System.out.println("Rest: "+ StringEscapeUtils.escapeJava(path.substring(m.end())));
    System.out.println();
}

Output:

iteration: 1
Preceding tabs: 0
Segment: dir
Rest: \n\tsubdir1\n\tsubdir2\n\t\tfile.ext

iteration: 2
Preceding tabs: 1
Segment: subdir1
Rest: \n\tsubdir2\n\t\tfile.ext

iteration: 3
Preceding tabs: 1
Segment: subdir2
Rest: \n\t\tfile.ext

iteration: 4
Preceding tabs: 2
Segment: file.ext
Rest: 
Pshemo
  • 113,402
  • 22
  • 170
  • 242
  • Couple of comments: 1) `StringEscapeUtils` is now in [commons-text](https://search.maven.org/search?q=g:org.apache.commons%20AND%20a:commons-text), the one in commons-lang has been deprecated. 2) To literally print `\n`, replace with `\\\\n`, not `\\n`. – Abhijit Sarkar Mar 09 '19 at 06:42
  • 1
    @AbhijitSarkar (1) thanks for update, (2) only if you are using `replaceAll` which supports regex where \ is metacharecter and require additional escaping. But if you use `replace` which doesn't support regex syntax and *also* replaces all matches `replace("\n", "\\n")` should work fine. – Pshemo Mar 09 '19 at 10:28
  • You're correct, about `replace`. Can it be any more confusing, that both `replace` and `replaceAll` actually replace all? – Abhijit Sarkar Mar 09 '19 at 20:42
  • @AbhijitSarkar Yes, naming of replacing methods is confusing. Probable rationale behind `All` suffix is that it emphasize difference between it and `replaceFrist` which also supports regex syntax. Confusing part is that other replacing methods: `replace(char target, char replacement)` and `replace(CharSequence target, CharSequence replacement)` don't use regex but *also* replace *all* occurrences of `target`. – Pshemo Mar 09 '19 at 22:24
  • Alternative names could be `replaceRegex` and `replaceFirstRegex` which IMO would be less confusing but some could say that these names could be too long (which IMO is not the case since IDE would suggest them and people would autocomplete them, so we wouldn't really need more keystrokes). But that is just my opinion. – Pshemo Mar 09 '19 at 22:29
  • Or simply `replace(String target, String replacement, boolean regex)`. Similarly, `replaceAll` – Abhijit Sarkar Mar 09 '19 at 23:42
  • Yes, that is also an option, but regardless of what we will conclude I doubt it will change Java API :) – Pshemo Mar 10 '19 at 00:28
  • It most likely won't, and that's Java's biggest problem, inability to change. – Abhijit Sarkar Mar 10 '19 at 04:47