
I am trying to extract out the content within the tags using regex expression in java listed below. I have been trying to get the paragraph context within the tags but can't get it out.

   Some text without tags here...
   Paragraph 1...

   Paragraph 2...

     Some text without tags here...

   Paragraph 1...

   Paragraph 2...
     Some text without tags here...

The above tags and contents are stored in a string variable: stringToSearch. The following are my codes.

Pattern p = Pattern.compile("<question1>(.*)</question1>");
Matcher a = p.matcher(stringToSearch);
                        System.out.print("\n Matching pattern...");
//                          Search the patterns in string
                            if (a.find()) {
                                String codeGroup = a.group(1);
                                System.out.format("'%s'\n", codeGroup);

However i am unable to get the tags which i suspect is due to the new lines that may appear within the Paragraphs. Reason for reg expression and not xml parser is due to the environment that i may have to use |question| |/question| or [[question]] [[/question]] special symbols.

2 Answers2

Pattern p = Pattern.compile("<question1>(.*)</question1>",Pattern.DOTALL);
gagan singh
  • 1,521
  • 1
  • 4
  • 11

Your regex does not represent all the question(d) tags. Try something like below :

String stringToSearch = "Some text without tags here..."
                + "<question1>"
                + "   Paragraph 1..."
                + "   Paragraph 2..."
                + "</question1>"
                + "     Some text without tags here..."
                + "<question2>"
                + "   Paragraph 1..."
                + "   Paragraph 2..."
                + "</question2>"
                + "Some text without tags here...";

        Pattern pattern = Pattern.compile("<(\\w+)( +.+)*>((.*))</\\1>");

        Matcher matcher = pattern.matcher(stringToSearch);

        while (matcher.find()) {

Alternatively you can use some xml parser libraries like jTopas ,jSoup to make it super easy.

Madushan Perera
  • 2,528
  • 1
  • 15
  • 30