0

Now I convert pdf to html via pdf2htmlEx,

Source file pdf 21MB, Converted html nearly 900MB, Conversion command:

pdf2htmlEX --no-drm 0 --embed-image 1 --dest-dir ./output09 ./b.pdf ./b.html

Is there any way to improve the size of the output html?

charisMao
  • 89
  • 12

1 Answers1

0

I already resolve it by the follow command:

pdf2htmlEX --embed-image 1 --embed-css 0 --embed-font 1 --embed-javascript 0 --embed-outline 0 --no-drm 0 --dest-dir ./output0928 ./a.pdf ./a.html

The meaning of the parameters is as follows:

--embed-css <int>             embed CSS files into output (default: 1)   
--embed-font <int>            embed font files into output (default: 1)  
--embed-image <int>           embed image files into output (default: 1)  
--embed-javascript <int>      embed JavaScript files into output (default: 1) 
--embed-outline <int>         embed outlines into output (default: 1)   
charisMao
  • 89
  • 12
  • You can further reduce the size of the output by patching FontForge to stop putting timestamps in the generated font files, and then post process the files to remove duplicates, which will include fonts and background images. – David Hedley Sep 14 '17 at 08:43