1

I'm using pdf2htmlEX to convert a pdf to html, and the output displays correctly when it's generated locally on a mac, but not when it's generated in production on amazon linux. Multiple pages have this issue, but I'll use page 22 of this pdf as a specific example.

For the incorrect html output (generated on linux):

  1. while certain text is not visible when it's rendered in the browser, the correct text is in the underlying html upon inspection with chrome dev tools
  2. which is caused by the element's css visibility attribute (specified by class name ff13) being set to hidden, where in the correct conversion it is set to visible
  3. and I can see in dev tools under the css styles computed tab for rendered fonts that the correct font is DejaVu Sans and the incorrect font is Helvetica

I checked and confirmed that DejaVuSans.ttf (and other DejaVu fonts) is installed on the linux machine at /usr/share/fonts/dejavu/, so my best guess is that for some reason the pdf2htmlEX program can't find the font file when it does the conversion, so it marks the css visibility property as hidden. I also tried to install the core mac (source here) and microsoft fonts, reboot the machine, and try again, but it didn't seem to help.

Does anyone know either how to fix this or troubleshoot from here? Thanks in advance for any help!

JustCodin
  • 69
  • 5

1 Answers1

0

You need to ensure font files for all unembedded PDF fonts are in the fontconfig path. You can see the path list in the fontconfig config file (usually /etc/fonts/fonts.config). Look at the top of this file for the list of directories. If your font file is not in one of these then it will not be found.

In your case I would move the font files into /usr/share/fonts rather then in a subdirectory.

David Hedley
  • 315
  • 3
  • 9
  • I checked `/etc/fonts/fonts.conf`and I see under `` the directory `/usr/share/fonts`. I copied the font files from `/usr/share/fonts/dejavu/` to `/usr/share/fonts/`, and rebooted, but it didn't seem to work. When I run `sudo fc-cache -f -v`, I see `/usr/share/fonts/dejavu/`, and when I run `fc-list` I see various `DejaVu Sans` entries. – JustCodin Mar 09 '20 at 01:22
  • I'm assuming on page 22 the missing text is one of the two unembedded fonts on that page: ArialMT or Arial-BoldMT. What does it say if you do `fc-match ArialMT`? – David Hedley Mar 25 '20 at 07:23
  • When I execute `fc-match ArialMT`, it returns `DejaVuSans.ttf: "DejaVu Sans" "Book"`. Thanks for your continued help. I sincerely appreciate it as this is important for what I'm working on. – JustCodin Mar 27 '20 at 02:03