Questions tagged [pdf2htmlex]

pdf2htmlEX renders PDF files in HTML, utilizing modern Web technologies. It aims to provide an accurate rendering, while keeping optimized for Web display.

pdf2htmlEX renders PDF files in HTML, utilizing modern Web technologies. It aims to provide an accurate rendering, while keeping optimized for Web display.

pdf2htmlEX is best for text-based PDF files, for example scientific papers with complicated formulas and figures. Text, fonts and formats are natively preserved in HTML such that you can still search and copy. Math formulas, figures and images are also supported. The generated HTML file is static, with optional features powered by JavaScript.

pdf2htmlEX is also a publishing tool, almost 50 options make it flexible for many different use cases: PDF preview, book/magazine publishing, personal resume...

Useful links:

30 questions
1
vote
1 answer

split pdf to multiple html file with pdf2htmlEX

I'm trying to split a PDF file into separate HTML files. I mean for each PDF page I want an HTML file. This is how I do it: pdf2htmlEX --split-pages 1 LMS.pdf --page-filename lms%03.html In the result I got an empty LMS.html and other files:…
HamidIng
  • 85
  • 11
0
votes
0 answers

reading and publishing pdf with nodejs

I want to read and publish the pdf file using pdf2html with nodejs. I use the pdf2html library for this. I can see the pdf content in html with console.log. However, when I assign the html information I have seen to a variable called pdfContent and…
omerix
  • 61
  • 6
0
votes
0 answers

Background color of node in DOM

Could you please let me know how can i get Background color of element/node in DOM able to get below output style="top:161.80327pt;left:29.21pt;line-height:7.4866333pt;font-family:Arial;font-size:7.0pt;width:48.82689pt;" using below code…
0
votes
0 answers

Invalid font weight while converting pdf to html

I am using the pdf2html package for conversion of PDF to HTML. Things get converted but it always shows warning - Syntax Error - Invalid Font Weight. Due to that, we got distorted text in Html output. We are getting issue when we process this Html…
Gaurav Tomer
  • 641
  • 1
  • 7
  • 24
0
votes
1 answer

Convert PDF to HTML without losing any format

I'm developing a Python Flask webapp and I'm trying to convert some user uploaded pdfs to nicely formatted HTML, like the HTML that is being produced when you display a pdf inside an iframe. I tried several things so far: the pdfminer.six library,…
robo-monk
  • 136
  • 2
  • 8
0
votes
1 answer

Pdf2htmlEX common error "Cannot load font"

Running the pdf2htmlEX.exe Windows binary from the command prompt works as expected. While, running the pdf2htmlEX Windows binary in a wrapper (.Net in my case) I received an error like the one below. __tmp_font1.ttf is not in a known format (or…
Bernesto
  • 927
  • 11
  • 17
0
votes
1 answer

Pdf2Html Installation

I 'm trying to install Pdf2HtmlEx Software on Ubuntu Server 18.04.1 LTS. The repository is not maintained but the sotware is very useful for me. I installed it on Xubuntu desktop distro and on a docker image but i can't do it on ubuntu server. It…
0
votes
2 answers

Install pdf2htmlEX on heroku

I used this Aptfile: fonts-liberation libreoffice-base-core libreoffice-calc libreoffice-writer libreoffice libpython2.7 pdf2htmlex poppler-utils And installation completed successfully. I even checked version of pdf2htmlEX in heroku…
0
votes
0 answers

running Pdf2htmlEX on linux using php

Kindly I request your help on the following issue: I am using pdf2htmlEX to convert my pdf files to HTML. The tool is working perfectly in WAMP; however, when I implement it on my Linux server, the tool is not working. My php code:
0
votes
0 answers

pdfminer when I am trying to run pdf2txt.py not working in windows

I have installed pdfminer and when I am trying to run pdf2txt.py test.pdf -t html -o test.html no error showing and command also not executing in windows. Please help me how can i convert true pdf files in html file. Thanks.
0
votes
1 answer

pdf2htmlEX's output shows Times New Roman font for only a few characters?

I have never seen anything like this. I use a tool called pdf2htmlEX, which converts a PDF to HTML, but I have a weird issue. Look at this screenshot: See the first character (W)? It's in Times New Roman. Now here's the even more weird part: Only…
MortenMoulder
  • 5,021
  • 6
  • 44
  • 89
0
votes
1 answer

Pdf2htmlEx: The html size converted by pdf is very large?

Now I convert pdf to html via pdf2htmlEx, Source file pdf 21MB, Converted html nearly 900MB, Conversion command: pdf2htmlEX --no-drm 0 --embed-image 1 --dest-dir ./output09 ./b.pdf ./b.html Is there any way to improve the size of the output html?
charisMao
  • 89
  • 12
0
votes
1 answer

Font misalignment during pdf to html conversion using pdf2htmlEx tool

FONT ISSUES WITH PDF TO HTML CONVERSION All "ti","fi","tt" characters are missing SAMPLE SCREENSHOT Font overlapping issue SAMPLE SCREENSHOT NOTE: I don't get this issue with firefox. Getting the above issues in chrome in safari browser I AM…
Tom Taylor
  • 2,378
  • 1
  • 27
  • 48
0
votes
2 answers

Getting text location from pdf

I want to know the location of all the words in the pdf page. I have been trying to find something on the web but couldn't. Can anyone help me which library (preferably in java platform) should I use?
Prabhjot Rai
  • 27
  • 1
  • 3
0
votes
1 answer

cmake complaints about lack of support of C++0x of the compiler despite the latest version of clang is installed

I am trying to use cmake to build pdf2htmlEX This is the error message: CMake Error at CMakeLists.txt:108 (message): Error: your compiler does not support C++0x, please update it Here is the version number of the clang compiler $ which…
Anthony Kong
  • 29,857
  • 33
  • 139
  • 244
1
2