How to perform replaceAll excluding comments in java

Question

I have a file, typically XML files. I want to replace all the occurrences of 'x.y' with 'p.q'. But during this replacement, i want to ignore the occurrences of x.y in comments ().

I was trying to use String.replaceAll() to perform this task.

For Example :

<?xml version="1.0" encoding="UTF-8"?>
<name>This occurrence of x.y should be replaced</name>
<!-- This occurrence of x.y should not be replaced -->

I tried using String.replaceAll("x[\.]y", "p.q") but i could see that occurrences in comments are also getting replaced

I could use an other alternative by which i can read the file line by line and exclude the lines that starts with comments, but i am interested in using replaceAll()

Please provide a way by which this can be achieved.

[Obligatory link](http://stackoverflow.com/a/1732454/2071828). Don't use regex use one of the many XML parsers JAXP offers. — Boris the Spider, Aug 23 '14 at 10:59
Don't use regex for parsing XML. The easiest approach in this case is to use an XML parser. — Sergey Kalinichenko, Aug 23 '14 at 10:59
I've always personally preferred simple/standard JDOM, but I agree whole heartedly with Boris. Using regex for XML is a recipe for disaster. — Rudi Kershaw, Aug 23 '14 at 11:00
The question is equivalent to asking how to remove a screw using only a hammer. Even if there is a way of doing it, it will be harder, more complicated, and more dangerous than using a screwdriver. — Patricia Shanahan, Aug 23 '14 at 11:01
I've updated my answer so that it should work with all XML structures instead of with your specific example. That way it scales if you need to use larger or deeper XMLs in the future. — Rudi Kershaw, Aug 23 '14 at 13:18

score 2 · Accepted Answer · edited May 23 '17 at 12:13

2

Although this isn't strictly the answer you are looking for, I have a recommendation.

I'd recommend using a proper XML parser like Java DOM to check and replace text in your nodes, rather than dealing with your XML as a raw String. Something like this should replace the corresponding text in your node if they are not a comment.

File f = new File("your.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(f);

NodeList eList = doc.getElementsByTagName("*");
for (int e = 0; e < eList.getLength(); e++) {
    Node element = eList.item(e);
    NodeList nList = element.getChildNodes();
    for(int n = 0; n < nList.getLength(); n++){
        Node node = nList.item(n);
        if(node.getNodeType()==Node.TEXT_NODE){
            node.setNodeValue(node.getNodeValue().replace("x.y", "p.q")); 
        }
    }
}

If memory/efficiency are an issue (like when your.xml is huge), you would be better off using SAX, which is faster (a little more code intensive) and doesn't store the XML in memory.

Once your Document has been edited you'll probably want to use a Transformer to create a suitable output. (Official guide here, curtsey of Boris the Spider's comment)

Hope this helps.

Further Reading;

edited May 23 '17 at 12:13

Community

1
1

answered Aug 23 '14 at 11:03

Rudi Kershaw

10,659
6
46
72

1

Totally agree, +1. Although I have to say that converting a `Document` to a `String` to save it is wrong. A [`Transformer`](http://docs.oracle.com/javase/7/docs/api/javax/xml/transform/Transformer.html) is specifically designed for the task. There is a [tutorial here](http://docs.oracle.com/javase/tutorial/jaxp/xslt/writingDom.html). – Boris the Spider Aug 23 '14 at 11:09
@BoristheSpider - Thanks, good call, I will replace that part of the answer. – Rudi Kershaw Aug 23 '14 at 11:12
@JaqenH'ghar - Also a good point, it does beg the question of whether the `getElementXXX()` methods or `getChildNodes()` will also find comments. I don't think they would but I haven't tested it. – Rudi Kershaw Aug 23 '14 at 11:18
1

I think one can simply do `if (!(node instanceof Comment))` as Comment extends Node – Jaqen H'ghar Aug 23 '14 at 11:27
1

@JaqenH'ghar - Thanks for the help. I've amended the code : ) – Rudi Kershaw Aug 23 '14 at 11:34
1

Thanks for the suggestions, i am trying to explore DOM parser for parsing and replacing strings .. – Appana Sandeep Aug 23 '14 at 14:05
There is no need to use regex here! [`String.replace`](http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replace(java.lang.CharSequence,%20java.lang.CharSequence)) is **much** faster and does need the ugly escapes. – Boris the Spider Aug 23 '14 at 15:26

Jonny 5 · Answer 2 · 2014-08-23T12:32:31.897

1

If using regex, an option would be to use lookarounds for checking to replace only outside comments:

(?s)x\.y(?!(?:(?!<!--).)+-->)

As a Java string:

"(?s)x\\.y(?!(?:(?!<!--).)+-->)"

Used the (?s) DOTALL modifier for making the . also match newlines.

Test at regexplanet (click on Java)

edited Aug 23 '14 at 12:32

answered Aug 23 '14 at 12:11

Jonny 5

11,051
2
20
42

This works fine for XML comments. I was trying to apply the same patter for excluding comments in .properties file '#' using x\\.y(?!(?:(?!#).)+), but it is not working. The text in the # line is also getting matched .. Is there any thing which i am missing here – Appana Sandeep Aug 23 '14 at 14:02
@AppanaSandeep That's a different task. If you know e.g. one line can't be longer such as `1024` try this in `(?m)` *multiline*-mode: `(?m)(? – Jonny 5 Aug 23 '14 at 17:37
This works great .. Can i know the explanation for this ..? Is there any link where i can learn more info on regex ? – Appana Sandeep Aug 23 '14 at 17:53
@AppanaSandeep It matches in `(?m)` multi-line mode: `^` and `$` match start and end of each line. `(? – Jonny 5 Aug 23 '14 at 18:08
@AppanaSandeep To learn more about regex see: [SO Regex FAQ](http://stackoverflow.com/a/22944075/3110638), [RexEgg](http://www.rexegg.com/), [regular-expressions.info](http://www.regular-expressions.info/tutorial.html), test on [regex101](http://regex101.com/) and read explanations, read [Jeffrey Friedl's book](http://regex.info/) :) – Jonny 5 Aug 23 '14 at 18:10

How to perform replaceAll excluding comments in java

2 Answers2