0

please help me to split a large html file to multiple html's using java a tricky algorithm . I've tried up to a limit.please help me

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title></title>
<link href="template.css" rel="stylesheet" type="text/css"/>
<link href="page-template.xpgt" rel="stylesheet" type="application/vnd.adobe-page-template+xml"/>
</head>
<body>
<div class="story">
<p class="cn">2</p>
<p class="img"><img src="images/common.jpg" alt=""/></p>
<p class="ct"> some text!</p>
<p class="tx"><span class="dropcap"> some text</span> some text!</p>
<p class="tx"> some text!</p>
<p class="img"><img src="images/ch02-fig1.jpg" alt=""/></p>
<p class="tx"> some text some text   some text  some text.</p>
<p class="img"><img src="images/ch02-fig2.jpg" alt=""/></p>
<p class="tx"> some text some text  some text  some text </p>
<p class="tx"> some text  some text  some text </p>
<p class="tx"> some text  some text  some text some text.</p>
<p class="img"><img src="images/ch02-fig3.jpg" alt=""/></p>
<p class="tx"> some text!</p>
<p class="tx">
</p>
</div>
</body>
</html>

this is my html file according to the count of of

some text

html file should be splitted !
januprasad
  • 1,837
  • 15
  • 27

2 Answers2

1

Your question is pretty vague :).

On splitting String(html in this case): The easiest was is to read in the html file as text into a String, then use String.split() method to split the string around the desired regex. For example .split("/div") will give you a crude approach where your html will be broken up into "divs" (supposed you even have divs in your html). However this will work badly for nested divs.

On reading/writing files: Reading a plain text file in Java Also you will find a hackload of html parser on the net that will most likely work ten times better in your case.

Community
  • 1
  • 1
Don Kartacs
  • 310
  • 2
  • 7
1
You can use the following logic ....   
List<String> lines = Files.readAllLines(FileSystems.getDefault()
                    .getPath("yourhtmlfile"),
                    StandardCharsets.UTF_8);    
            for (String htmlData : lines)
            {
                Pattern splitPattern = Pattern
                        .compile(sometext_pattern);
                Matcher match = splitPattern.matcher(htmlData);

                while (match.find())
                {    
                    String lineToBeSplit = match.group();    

                }

                            .
                            .

            }

"lineToBeSplit" will have the split data.