Elastic Beanstalk disable health state change based on 4xx responses

Question

I have a rest api running on Elastic Beanstalk, which works great. Everything application-wise is running good, and working as expected.

The application is a rest api, used to lookup different users.

example url: http://service.com/user?uid=xxxx&anotherid=xxxx

If a user with either id's is found, the api responds with 200 OK, if not, responds with 404 Not Found as per. HTTP/1.1 status code defenitions.

It is not uncommon for our api to answer 404 Not Found on a lot of requests, and the elastic beanstalk transfers our environment from OK into Warning or even into Degraded because of this. And it looks like nginx has refused connection to the application because of this degraded state. (looks like it has a threshold of 30%+ into warningand 50%+ into degraded states. This is a problem, because the application is actually working as expected, but Elastic Beanstalks default settings thinks it is a problem, when it's really not.

Does anyone know of a way to edit the threshold of the 4xx warnings and state transitions in EB, or completely disable them?

Or should i really do a symptom-treatment and stop using 404 Not Found on a call like this? (i really do not like this option)

You should provide a dedicated endpoint for health check. This endpoint will check all the components of your system (e.g. database ping, external system ping, etc), and respond according to the health. Do not use the user endpoint for that, as you can see it is not a good representation of the health of your system. — cdelmas, Apr 04 '16 at 11:21
The problem is that elastic beanstalk monitors all the application responses in the load balancer. And when it reaches a threshold of 30+% 4xx statuses, beanstalk changes my applications state, even when the /health endpoint still returns 200 OK — Martin Hansen, Apr 04 '16 at 12:01
One option is to migrate the environment from Enhanced to Basic health reporting which does not monitor status codes -- however, this is less recommended. The other option would probably require patching up the underlying EB health check daemon running on the EB servers. — Elad Nava, Jul 01 '16 at 07:35

Elad Nava · Accepted Answer · 2018-09-11T09:26:20.147

Update: AWS EB finally includes a built-in setting for this: https://stackoverflow.com/a/51556599/1123355

Old Solution: Upon diving into the EB instance and spending several hours looking for where EB's health check daemon actually reports the status codes back to EB for evaluation, I finally found it, and came up with a patch that can serve as a perfectly fine workaround for preventing 4xx response codes from turning the environment into a Degraded environment health state, as well as pointlessly notifying you with this e-mail:

Environment health has transitioned from Ok to Degraded. 59.2 % of the requests are erroring with HTTP 4xx.

The status code reporting logic is located within healthd-appstat, a Ruby script developed by the EB team that constantly monitors /var/log/nginx/access.log and reports the status codes to EB, specifically in the following path:

/opt/elasticbeanstalk/lib/ruby/lib/ruby/gems/2.2.0/gems/healthd-appstat-1.0.1/lib/healthd-appstat/plugin.rb

The following .ebextensions file will patch this Ruby script to avoid reporting 4xx response codes back to EB. This means that EB will never degrade the environment health due to 4xx errors because it just won't know that they're occurring. This also means that the "Health" page in your EB environment will always display 0 for the 4xx response code count.

container_commands:
    01-patch-healthd:
        command: "sudo /bin/sed -i 's/\\# normalize units to seconds with millisecond resolution/if status \\&\\& status.index(\"4\") == 0 then next end/g' /opt/elasticbeanstalk/lib/ruby/lib/ruby/gems/2.2.0/gems/healthd-appstat-1.0.1/lib/healthd-appstat/plugin.rb"
    02-restart-healthd:
        command: "sudo /usr/bin/kill $(/bin/ps aux | /bin/grep -e '/bin/bash -c healthd' | /usr/bin/awk '{ print $2 }')"
        ignoreErrors: true

Yes, it's a bit ugly, but it gets the job done, at least until the EB team provide a way to ignore 4xx errors via some configuration parameter. Include it with your application when you deploy, in the following path relative to the root directory of your project:

.ebextensions/ignore_4xx.config

Good luck, and let me know if this helped!

Okay!! It works perfectly, sorry for the noise. The issue was that REDIRECTING IN NGINX, I do this for http -> https, does not count in the health. The high number of redirects on production were actual redirects (not https). Thanks again! — Andy Hayden, Aug 30 '16 at 23:00
I changed `status.index(\"4\") == 0` to `status.start_with?(\"404\", \"422\")` and it's working a treat. — Andy Hayden, Aug 31 '16 at 20:50
Sorry for the late reply. I have not had the time to look more into this. (We switched back to basic health check). I will accept this answer as it's the closest thing i have seen to a solution, and hope Amazon will implement something properly in EB itself. — Martin Hansen, Sep 06 '16 at 13:28
We have a cloudwatch metrics filter setup for 412 errors that we would like to monitor. I was wondering if this workaround will disable errors going to cloudwatch as well? — Khai Do, Sep 11 '17 at 18:36
Yes, most likely this workaround will affect your CloudWatch metrics as well. — Elad Nava, Sep 16 '17 at 04:02
This is now configurable from EB settings - see the answer below (https://stackoverflow.com/a/51556599/69002) for details — Mat Schaffer, Sep 11 '18 at 02:04

score 38 · Answer 2 · answered Jul 27 '18 at 11:01

38

There is a dedicated Health monitoring rule customization called Ignore HTTP 4xx (screenshot attached) Just enable it and EB will not degrade instance health on 4xx errors.

answered Jul 27 '18 at 11:01

Anshul Gupta

420
4
8

1

This is a recent addition, but I think should supplant the currently accepted answer. Frustratingly I can't find any documentation in https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/command-options-general.html that would give a clue as to how to configure this programmatically instead of from the console. – MrCranky Sep 05 '18 at 08:00
4

https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/health-enhanced-rules.html now has the info for how to set this via the CLI or even an .ebextensions file – Mat Schaffer Sep 11 '18 at 02:03
1

For those who search where it is: Configuration > Monitoring. Section "Health monitoring rule customization". The recent version have two options for the app and the load balancer. – Cyril N. Apr 14 '21 at 08:34

score 13 · Answer 3 · answered Jun 22 '17 at 03:09

Thank you for your answer Elad Nava, I had the same problem and your solution worked perfectly for me!

However, after opening a ticket in the AWS Support Center, they recommended me to modify the nginx configuration to ignore 4xx on Health Check instead of modifying the ruby script. To do that, I also had to add a config file to the .ebextensions directory, in order to overwrite the default nginx.conf file:

files:
  "/tmp/nginx.conf":
    content: |

      # Elastic Beanstalk Managed

      # Elastic Beanstalk managed configuration file
      # Some configuration of nginx can be by placing files in /etc/nginx/conf.d
      # using Configuration Files.
      # http://docs.amazonwebservices.com/elasticbeanstalk/latest/dg/customize-containers.html
      #
      # Modifications of nginx.conf can be performed using container_commands to modify the staged version
      # located in /tmp/deployment/config/etc#nginx#nginx.conf

      # Elastic_Beanstalk
      # For more information on configuration, see:
      #   * Official English Documentation: http://nginx.org/en/docs/
      #   * Official Russian Documentation: http://nginx.org/ru/docs/

      user  nginx;
      worker_processes  auto;

      error_log  /var/log/nginx/error.log;

      pid        /var/run/nginx.pid;


      worker_rlimit_nofile 1024;

      events {
          worker_connections  1024;
      }

      http {

          ###############################
          # CUSTOM CONFIG TO IGNORE 4xx #
          ###############################

          map $status $loggable {
            ~^[4]  0;
            default 1;
          }

          map $status $modstatus {
            ~^[4]  200;
            default $status;
          }

          #####################
          # END CUSTOM CONFIG #
          #####################

          port_in_redirect off;
          include       /etc/nginx/mime.types;
          default_type  application/octet-stream;


          # This log format was modified to ignore 4xx status codes!
          log_format  main   '$remote_addr - $remote_user [$time_local] "$request" '
                             '$status $body_bytes_sent "$http_referer" '
                             '"$http_user_agent" "$http_x_forwarded_for"';

          access_log  /var/log/nginx/access.log  main;

          log_format healthd '$msec"$uri"'
                             '$modstatus"$request_time"$upstream_response_time"'
                             '$http_x_forwarded_for' if=$loggable;

          sendfile        on;
          include /etc/nginx/conf.d/*.conf;

          keepalive_timeout  1200;

      }

container_commands:
  01_modify_nginx:
    command: cp /tmp/nginx.conf /tmp/deployment/config/#etc#nginx#nginx.conf

Although this solution is quite more verbose, I personally believe that it is safer to implement, as long as it does not depend on any AWS proprietary script. What I mean is that, if for some reason AWS decides to remove or modify their ruby script (believe me or not, they love to change scripts without previous notice), there is a big chance that the workaround with sed will not work anymore.

ERROR: [Instance: i-00fe453a7b32ae26c] Command failed on instance. Return code: 1 Output: cp: cannot create regular file '/tmp/deployment/config/#etc#nginx#nginx.conf': No such file or directory. — Jeremie Weldin, Jul 19 '17 at 15:03
This method of replacing the nginx.conf no longer works, see https://stackoverflow.com/a/45155825/194538 — Jeremie Weldin, Jul 19 '17 at 15:20

score 1 · Answer 4 · answered Jul 22 '17 at 06:20

Here is a solution based off of Adriano Valente's answer. I couldn't get the $loggable bit to work, although skipping logging for the 404s seems like that would be a good solution. I simply created a new .conf file that defined the $modstatus variable, and then overwrote the healthd log format to use $modstatus in place of $status. This change also required nginx to get restarted. This is working on Elastic Beanstalk's 64bit Amazon Linux 2016.09 v2.3.1 running Ruby 2.3 (Puma).

# .ebextensions/nginx.conf

files:
  "/tmp/nginx.conf":
    content: |

      # Custom config to ignore 4xx in the health file only
      map $status $modstatus {
        ~^[4]  200;
        default $status;
      }

container_commands:
  modify_nginx_1:
    command: "cp /tmp/nginx.conf /etc/nginx/conf.d/custom_status.conf"
  modify_nginx_2:
    command: sudo sed -r -i 's@\$status@$modstatus@' /opt/elasticbeanstalk/support/conf/webapp_healthd.conf
  modify_nginx_3:
    command: sudo /etc/init.d/nginx restart

Vlassios · Answer 5 · 2017-09-11T12:29:19.280

Based on Elad Nava's Answer, I think it's better to use the elasticbeanstalk healthd's control script directly instead of a kill:

container_commands:
    01-patch-healthd:
        command: "sudo /bin/sed -i 's/\\# normalize units to seconds with millisecond resolution/if status \\&\\& status.index(\"4\") == 0 then next end/g' /opt/elasticbeanstalk/lib/ruby/lib/ruby/gems/2.2.0/gems/healthd-appstat-1.0.1/lib/healthd-appstat/plugin.rb"
    02-restart-healthd:
        command: "sudo /opt/elasticbeanstalk/bin/healthd-restart"

Finally when investigating this issue, I've noticed that healthd and apache log status codes differently with the former using %s while the latter %>s resulting in discrepancies between them. I've also patched this using:

    03-healthd-logs:
        command: sed -i 's/^LogFormat.*/LogFormat "%{%s}t\\"%U\\"%>s\\"%D\\"%D\\"%{X-Forwarded-For}i" healthd/g' /etc/httpd/conf.d/healthd.conf

score 0 · Answer 6 · edited Jun 27 '18 at 11:36

I recently ran into the same issue of being bombarded with 4xx errors as you have. I tried the suggestions listed above, but nothing worked for me. I reached out to AWS Support and here is what they have suggested, and it solved my problem. I have an Elastic Beanstalk application with 2 instances running.

Create a folder called .ebextensions
Inside this folder, create a file called nginx.config (make sure it has the .config extension. ".conf" won't do!)
If you are deploying your application with a Docker container, then please make sure this .ebextensions folder is included in the deployment bundle. For me, the bundle included the folder as well as the Dockerrun.aws.json

Here is the entire content of the nginix.config file:

files:
  "/etc/nginx/nginx.conf":
    content: |
      # Elastic Beanstalk Nginx Configuration File
      user  nginx;
      worker_processes  auto;

      error_log  /var/log/nginx/error.log;

      pid        /var/run/nginx.pid;

      events {
          worker_connections  1024;
      }

      http {

          # Custom config
          # HTTP 4xx ignored.
          map $status $loggable {
            ~^[4]  0;
            default 1;
          }

          # Custom config
          # HTTP 4xx ignored.
          map $status $modstatus {
            ~^[4]  200;
            default $status;
          }

          include       /etc/nginx/mime.types;
          default_type  application/octet-stream;

          access_log    /var/log/nginx/access.log;

          log_format  healthd '$msec"$uri"$modstatus"$request_time"$upstream_response_time"$http_x_forwarded_for';

          include       /etc/nginx/conf.d/*.conf;
          include       /etc/nginx/sites-enabled/*;
      }

If you clean up the formatting, this solution works on beanstalk platform v2.8.4 running Docker 17.09.1-ce — w2bro, Mar 02 '18 at 17:12
We discovered that the above nginx file works perfect, EXCEPT for on application rebuild, such as in the event of auto-scaling. To make it work for that situation too, remove the last 3 lines which restarts the nginx. I removed it, so people can just copy and paste the above script. :) But originally, there are these 3 lines: container_commands: restart-nginx: command: "service nginx restart" — Tracy Xia, Mar 13 '18 at 22:14
Thanks @qing-xia, I had the same issue as well and removed the same lines to resolve. — w2bro, May 12 '18 at 06:38

score 0 · Answer 7 · answered Apr 02 '18 at 06:59

Solution provided by AWS support as of April 2018:

 files:
  "/tmp/custom-site-nginx.conf":
    mode: "000664"
    owner: root
    group: root
    content: |
       map $http_upgrade $connection_upgrade {
           default        "upgrade";
           ""            "";
       }
       # Elastic Beanstalk Modification(EB_INCLUDE)
       # Custom config
       # HTTP 4xx ignored.
       map $status $loggable {
                ~^[4]  0;
                default 1;
       }


       server {
           listen 80;

         gzip on;
         gzip_comp_level 4;
         gzip_types text/html text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript;

           if ($time_iso8601 ~ "^(\d{4})-(\d{2})-(\d{2})T(\d{2})") {
               set $year $1;
               set $month $2;
               set $day $3;
               set $hour $4;
           }
           access_log /var/log/nginx/healthd/application.log.$year-$month-$day-$hour healthd if=$loggable;

           access_log    /var/log/nginx/access.log;

           location / {
               proxy_pass            http://docker;
               proxy_http_version    1.1;

               proxy_set_header    Connection            $connection_upgrade;
               proxy_set_header    Upgrade                $http_upgrade;
               proxy_set_header    Host                $host;
               proxy_set_header    X-Real-IP            $remote_addr;
               proxy_set_header    X-Forwarded-For        $proxy_add_x_forwarded_for;
           }
       }

 container_commands:
   override_beanstalk_nginx:
     command: "mv -f /tmp/custom-site-nginx.conf /etc/nginx/sites-available/elasticbeanstalk-nginx-docker-proxy.conf"

Elastic Beanstalk disable health state change based on 4xx responses

7 Answers7