0

FONT ISSUES WITH PDF TO HTML CONVERSION

  1. All "ti","fi","tt" characters are missing

SAMPLE SCREENSHOT

  1. Font overlapping issue

SAMPLE SCREENSHOT

  • NOTE: I don't get this issue with firefox. Getting the above issues in chrome in safari browser

I AM USING

  • Using the 0.13.6 version of pdf2htmlEX
  • Using the following command to convert pdf to html

pdf2htmlEX --split-pages 1 --zoom 3 --fit-width 920 --correct-text-visibility 1 --dest-dir $1 $2 2>&1

TRIED

Using --fallback 1 option solves all my above problems. But

  1. The fallback option reduces the clarity of document.
  2. Table in the page disappears rather replaced with empty space.

DOUBTS

  1. Could you please explain a bit more on fallback?

  2. I have tried the above one (using fallback). Please suggest me if you prefer a different approach to solve the above problem with fonts.

Getting the above issues with chrome and safari whereas, in Firefox it is working fine.

Tom Taylor
  • 2,378
  • 1
  • 27
  • 48

1 Answers1

2

The above issue occurs only in - webkit web browsers like chrome and safari - which provides support for ligatures - whereas browser like firefox does not.

A ligature is a combination of two or more letters joined as a single glyph

​Root cause

This issue with missing characters is due to ligature support provided by these modern browsers - let me explain how

1.The tool while converting - it converts characters to glyphs using poppler for rendering - now these browser when they come across characters like tt tf ti ff fi consider them to be ligature and searches for glyphs corresponding to tt and not t t

2.Since they do not have their corresponding glyphs - they just skip the characters and renders the rest - so, we fount the characters missing

Could be solved by

Disabling/ Turning-off the ligature in these browsers - embedding the css in the generating content

For more details please refer:

Please correct me if I am wrong.

Community
  • 1
  • 1
Tom Taylor
  • 2,378
  • 1
  • 27
  • 48