1

I'm trying to split a PDF file into separate HTML files. I mean for each PDF page I want an HTML file. This is how I do it:

pdf2htmlEX --split-pages 1 LMS.pdf --page-filename lms%03.html

In the result I got an empty LMS.html and other files: lms%031.html, lms%032.html. The problem is that those html files are not correctly formatted, no CSS style?

JasonMArcher
  • 12,386
  • 20
  • 54
  • 51
HamidIng
  • 85
  • 11

1 Answers1

2

Funny thing about that... I stumbled across your question while trying to solve an identical problem. I used the same command as yours, except without setting the --page-filename parameter. Using your example, my pdf2htmlEX call would be analogous to:

pdf2htmlEX --split-pages 1 LMS.pdf 

Then I opened up the main HTML file in Chrome to find a bunch of blank pages. After searching around a bit, I opened up the same file in Firefox. It worked. Very strange. No errors reported in the console output. Of course, I didn't even think to look in the Chrome console output. When I did I found:

Uncaught NetworkError: Failed to execute 'send' on 'XMLHttpRequest': Failed to load 'file:///...'.

Thank God for StackOverflow. I don't know why it works in Firefox, but if you're getting the errors reported by Chrome, you need to be running a web server.

The easiest and fastest way for me to do this was to change into the directory in which I converted the PDF and run:

python -m SimpleHTTPServer

By default, your page will be served up at http://localhost:8000. Problem solved. Use whatever server suits you best.

Community
  • 1
  • 1
Daniel Bidulock
  • 2,234
  • 23
  • 27
  • 1
    This is due to a difference in implementation of the Same Origin Policy. For downloaded HTML files, Chrome doesn't allow any external file access, while Firefox allows access within that directory. Same Origin Policy through a webserver goes by domain name, which is much more sensible. – 700 Software Jun 15 '15 at 18:14