-3

I am trying to extract out the content within the tags using regex expression in java listed below. I have been trying to get the paragraph context within the tags but can't get it out.

   Some text without tags here...
   <question1>
   Paragraph 1...

   Paragraph 2...

   </question1>
     Some text without tags here...
   <question2>

   Paragraph 1...

   Paragraph 2...
   </question2>
     Some text without tags here...

The above tags and contents are stored in a string variable: stringToSearch. The following are my codes.

Pattern p = Pattern.compile("<question1>(.*)</question1>");
Matcher a = p.matcher(stringToSearch);
                        System.out.print("\n Matching pattern...");
//                          Search the patterns in string
                            if (a.find()) {
                                String codeGroup = a.group(1);
                                System.out.format("'%s'\n", codeGroup);
                            }    

However i am unable to get the tags which i suspect is due to the new lines that may appear within the Paragraphs. Reason for reg expression and not xml parser is due to the environment that i may have to use |question| |/question| or [[question]] [[/question]] special symbols.

2 Answers2

0
Pattern p = Pattern.compile("<question1>(.*)</question1>",Pattern.DOTALL);
gagan singh
  • 1,521
  • 1
  • 4
  • 11
0

Your regex does not represent all the question(d) tags. Try something like below :

String stringToSearch = "Some text without tags here..."
                + "<question1>"
                + "   Paragraph 1..."
                + "   Paragraph 2..."
                + "</question1>"
                + "     Some text without tags here..."
                + "<question2>"
                + "   Paragraph 1..."
                + "   Paragraph 2..."
                + "</question2>"
                + "Some text without tags here...";

        Pattern pattern = Pattern.compile("<(\\w+)( +.+)*>((.*))</\\1>");

        Matcher matcher = pattern.matcher(stringToSearch);

        while (matcher.find()) {
            System.out.println(matcher.group(3));
        }

Alternatively you can use some xml parser libraries like jTopas ,jSoup to make it super easy.

Madushan Perera
  • 2,528
  • 1
  • 15
  • 30