split pdf to multiple html file with pdf2htmlEX

Question

I'm trying to split a PDF file into separate HTML files. I mean for each PDF page I want an HTML file. This is how I do it:

pdf2htmlEX --split-pages 1 LMS.pdf --page-filename lms%03.html

In the result I got an empty LMS.html and other files: lms%031.html, lms%032.html. The problem is that those html files are not correctly formatted, no CSS style?

score 2 · Answer 1 · edited May 23 '17 at 11:58

Funny thing about that... I stumbled across your question while trying to solve an identical problem. I used the same command as yours, except without setting the --page-filename parameter. Using your example, my pdf2htmlEX call would be analogous to:

pdf2htmlEX --split-pages 1 LMS.pdf

Then I opened up the main HTML file in Chrome to find a bunch of blank pages. After searching around a bit, I opened up the same file in Firefox. It worked. Very strange. No errors reported in the console output. Of course, I didn't even think to look in the Chrome console output. When I did I found:

Uncaught NetworkError: Failed to execute 'send' on 'XMLHttpRequest': Failed to load 'file:///...'.

Thank God for StackOverflow. I don't know why it works in Firefox, but if you're getting the errors reported by Chrome, you need to be running a web server.

The easiest and fastest way for me to do this was to change into the directory in which I converted the PDF and run:

python -m SimpleHTTPServer

By default, your page will be served up at http://localhost:8000. Problem solved. Use whatever server suits you best.

This is due to a difference in implementation of the Same Origin Policy. For downloaded HTML files, Chrome doesn't allow any external file access, while Firefox allows access within that directory. Same Origin Policy through a webserver goes by domain name, which is much more sensible. — 700 Software, Jun 15 '15 at 18:14

split pdf to multiple html file with pdf2htmlEX

1 Answers1