7

I use KnpSnappyBundle 1.6.0 and wkhtmltopdf 0.12.5 to generate PDFs from HTML in PHP like so:

$html = $this->renderView(
    'pdf/template.html.twig',
    [ 'entity' => $entity, ]
);

return new PdfResponse($snappy->getOutputFromHtml($html,
    ['encoding' => 'UTF-8', 'images' => true]), 'file'.$entity->getUniqueNumber().'.pdf'
);

My issue: on my production server, when I refer to assets (images or css) that are hosted on the same server as my code, generating a PDF takes around 40-50 seconds. Even when I only use a tiny image that is hosted on the same server it takes 40 seconds. I could use images that are much larger that are hosted on another server and generating the PDF will happen instantly.

My server is not slow in serving assets or files in general. If I simply render out the HTML as a page it happens instantly (with or without the assets). When I locally (on my laptop) request assets from my production server to generate a PDF it also happens instantly.

The assets I require in the HTML that needs to be rendered to PDF all have absolute URLs, this is required for wkhtmltopdf to work. For example: <img src="https://www.example.com/images/logo.png"> The difficult thing is, everything works but just very slowly. There is no pointing to a non-existent asset that would cause a time-out.

I first thought it might have to do with wkhtmltopdf, so I tried different versions and different settings, but this did not change anything. I also tried to point to another domain on the same server, the problem remains. I tried not using the KnpSnappyBundle, but the problem also remains.

So my guess now is that it is a server issue (or a combination with wkhtmltopdf). I am running Nginx-1.16.1 and serve all content over SSL. I have OpenSSL 1.1.1d 10 Sep 2019 (Library: OpenSSL 1.1.1g 21 Apr 2020) installed and my OS is Ubuntu 18.04.3 LTS. Everything else works as expected on this server.

When I look in the Nginx access logs, I can see a get request is made by my own IP-address when using assets from the same server. I cannot understand though why this is taking so long and I have run out of ideas of what to try next. Any ideas are appreciated!

I'll add my Nginx config for my domain (in case it might help):

server {
        root /var/www/dev.example.com/public;
        index index.php index.html index.htm index.nginx-debian.html;

        server_name dev.example.com www.dev.example.com;

        location / {
        # try to serve file directly, fallback to index.php
        try_files $uri /index.php$is_args$args;     
    }

        location ~ ^/index\.php(/|$) {
        fastcgi_pass unix:/var/run/php/php7.3-fpm.sock;
        fastcgi_split_path_info ^(.+\.php)(/.*)$;
        include fastcgi_params;
        fastcgi_param SCRIPT_FILENAME $realpath_root$fastcgi_script_name;
        fastcgi_param DOCUMENT_ROOT $realpath_root;
        internal;
    }

  location ~ \.(?:jpg|jpeg|gif|png|ico|woff2|cur|gz|svg|svgz|mp4|ogg|ogv|webm|htc|js|css)$ {
        gzip_static on;

        # Set rules only if the file actually exists.
        if (-f $request_filename) {
        expires max;
        access_log off; 
        add_header Cache-Control "public";
    }
            try_files $uri /index.php$is_args$args;     
 }

    error_log /var/log/nginx/dev_example_com_error.log;
    access_log /var/log/nginx/dev_example_com_access.log;

    listen 443 ssl; # managed by Certbot
    ssl_certificate /etc/letsencrypt/live/dev.example.com/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/dev.example.com/privkey.pem; # managed by Certbot
    include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
}

server {
    if ($host = dev.example.com) {
        return 301 https://$host$request_uri;
    } # managed by Certbot

    server_name dev.example.com www.dev.example.com;
    listen 80;
    return 404; # managed by Certbot
}

Udate 5 Aug 2020: I tried wkhtmltopdf 0.12.6, but this gives me the exact same problem. The "solution" that I posted as an answer to my question a few months ago is far from perfect which is why I am looking for new suggestions. Any help is appreciated.

Dirk J. Faber
  • 3,197
  • 1
  • 11
  • 34
  • Very hard to say. Can you try without the Knp bundle? Try this one instead https://github.com/mikehaertl/phpwkhtmltopdf. Here are my blog notes on the package https://delboy1978uk.wordpress.com/2014/11/24/html-to-pdf-using-wkhtmltopdf/ – delboy1978uk Nov 04 '19 at 10:40
  • @delboy1978uk, thank you for this suggestion. I tried it with that bundle, but I have the exact same issue. It's good to know I can rule out the Knp Bundle though. – Dirk J. Faber Nov 04 '19 at 11:17
  • If the output is correct, but only takes too long to generate, then I reckon the issue is either 1/ https - peer SSL verification or 2/ DNS host name lookup in CLI mode. To fix 1/ try switching ALL links from `https` to `http`. For 2/ check what server name sees PHP in CLI mode. Do not trust HTTP mode - that may use different php.ini / env variables. Eventually define properly etc/hostname. – lubosdz Aug 05 '20 at 20:25
  • @lubosdz, `http` made no change. When I use PHP in the cli (with `php -a`) and run `echo gethostname();` it shows my name `server`, which is the same as in `/etc/hostname`. – Dirk J. Faber Aug 05 '20 at 21:09
  • @DirkJ.Faber 1. Since we still suspect issues with assets, have you tried generating PDF without including any assets? 2. Check the [response time from curl](https://stackoverflow.com/a/22625150/3209381) on one of your assets from the server itself. 3. Try generating directly from `wkhtmltopdf` CLI on your rendered HTML page if possible. – sykez Aug 12 '20 at 12:13
  • On a side note, I've had some issues with `wkhtmltopdf` as well before. IIRC, I was previously using `localhost` in my asset URLs which caused delays, but after changing to actual domain name with assets on CDN, this improved. However I still ditched it due to memory issues and switched to [Browsershot](https://github.com/spatie/browsershot/) ([Puppeteer](https://github.com/puppeteer/puppeteer) wrapper). It's not the answer to your question, but perhaps you can try it :) – sykez Aug 12 '20 at 12:17
  • You seem to imply it only takes a long time with locally hosted assets. Does this mean you tried generating a PDF with assets from another source (maybe https://google.com) and it went quickly? – dracstaxi Aug 12 '20 at 16:35
  • I had a lot of problem with wkhtmltopdf. I strongly suggest you to try AthenaPDF: https://github.com/arachnys/athenapdf – Felippe Duarte Aug 12 '20 at 16:35
  • @dracstaxi, yes. with assets from another source it went quickly. – Dirk J. Faber Aug 12 '20 at 18:47
  • @sykez, If I do not use any assets of the server itself generating the pdf happens instantly. when I use curl the response time is < 1 second. I will try your suggestions though! – Dirk J. Faber Aug 12 '20 at 18:49

3 Answers3

1

This sounds like a DNS issue to me. I would try adding an entry in /etc/hosts for example:

127.0.0.1     example.com
127.0.0.1     www.example.com

And pointing your images to use that domain

kaan_a
  • 1,789
  • 14
  • 30
  • As simple as this solution is, it worked for me! Strangely enough, I did not experience any other issues without having these lines in my hosts file, while the server has been up and running for quite some time. – Dirk J. Faber Aug 13 '20 at 07:11
0

I have not found the root of my problem. However, I have found a workaround. What I have done is:

Install wkhtmltopdf globally (provided by my distribution):

sudo apt-get install wkhtmltopdf 

This installs wkhtmltopdf 0.12.4 (on 5 Nov 2019) through the Ubuntu repositories. This is an older version of wkhtmltopdf and running this by itself gave me a myriad of problems. To solve this, I now run it inside xvfb. First install it by running:

sudo apt-get install xvfp

Then change the binary path of the wrapper you use that points to wkhtmltopdf to:

'/usr/bin/xvfb-run /usr/bin/wkhtmltopdf' 

In my case, I use KnpSnappyBundle and set the binary path in my .env file In knp_snappy.yaml I set binary: '%env(WKHTMLTOPDF_PATH)%' and in .env I set WKHTMLTOPDF_PATH='/usr/bin/xvfb-run /usr/bin/wkhtmltopdf' (as described above). I can now generate PDFs although there are some issues with the layout.

Dirk J. Faber
  • 3,197
  • 1
  • 11
  • 34
0

Not sure if this is acceptable for you or not, but in my case, I always generate an HTML file that can stand on it's own. I convert all CSS references to be included directly. I do this programatically so I can still keep them as separate files for tooling. This is fairly trivial if you make a helper method to include them based on the URI. Likewise, I try to base64 encode all the images and include those as well. Again, I keep them as separate files and do this programatically.

I then feed this "self-contained" html to wkhtmltopdf.

I'd share some examples, but my implementation is actually C# & Razor.

That aside, I would also build some logging into those helpers with timestamps if you're still having problems so you can see how long the includes are taking.

I'm not sure what the server setup is, but possibly there's a problem connecting to the NAS or something.

You could also stand to throw some logging with timestamps around the rest of the steps to get a feel exactly which steps are taking a long time.

Other tips, I try to use SVGs (where I can) for images, and try not to pull large (or any) CSS libraries into the html that becomes the pdf.

Nick Kuznia
  • 1,508
  • 15
  • 25