45

I am launching a django application on aws elastic beanstalk. I'd like to run background task or worker in order order to run celery.

I can not find if it is possible or not. If yes how could it be achieved?

Here is what I am doing right now, but this is producing an event type error every time.

container_commands:
  01_syncdb:
    command: "django-admin.py syncdb --noinput"
    leader_only: true
  50_sqs_email:
    command: "./manage.py celery worker --loglevel=info"
    leader_only: true
Francisco C
  • 8,871
  • 4
  • 31
  • 41
Maxime P
  • 745
  • 1
  • 9
  • 12
  • what kind of error do you have? – EsseTi Feb 13 '13 at 11:13
  • I suspect you need to run celery in daemon mode: http://docs.celeryproject.org/en/latest/tutorials/daemonizing.html#daemonizing which would require a custom AMI for your beanstalk. This is not for the fainthearted as suggested here: http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features.managing.ec2.html – Chris Wheadon Feb 18 '13 at 12:06
  • I think you can find an answer here: http://stackoverflow.com/questions/12813586/running-pythons-celery-on-elastic-beanstalk-with-django – Zaar Hai Mar 20 '13 at 13:45
  • If you want something lighter than celery, you can try https://pypi.org/project/django-eb-sqs-worker/ package - it uses Amazon SQS for queueing tasks. – DataGreed Jun 22 '20 at 23:08

3 Answers3

71

As @chris-wheadon suggested in his comment, you should try to run celery as a deamon in the background. AWS Elastic Beanstalk uses supervisord already to run some deamon processes. So you can leverage that to run celeryd and avoid creating a custom AMI for this. It works nicely for me.

What I do is to programatically add a celeryd config file to the instance after the app is deployed to it by EB. The tricky part is that the file needs to set the required environmental variables for the deamon (such as AWS access keys if you use S3 or other services in your app).

Below there is a copy of the script that I use, add this script to your .ebextensions folder that configures your EB environment.

The setup script creates a file in the /opt/elasticbeanstalk/hooks/appdeploy/post/ folder (documentation) that lives on all EB instances. Any shell script in there will be executed post deployment. The shell script that is placed there works as follows:

  1. In the celeryenv variable, the virutalenv environment is stored in a format that follows the supervisord notation. This is a comma separated list of env variables.
  2. Then the script creates a variable celeryconf that contains the configuration file as a string, which includes the previously parsed env variables.
  3. This variable is then piped into a file called celeryd.conf, a supervisord configuration file for the celery daemon.
  4. Finally, the path to the newly created config file is added to the main supervisord.conf file, if it is not already there.

Here is a copy of the script:

files:
  "/opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh":
    mode: "000755"
    owner: root
    group: root
    content: |
      #!/usr/bin/env bash

      # Get django environment variables
      celeryenv=`cat /opt/python/current/env | tr '\n' ',' | sed 's/export //g' | sed 's/$PATH/%(ENV_PATH)s/g' | sed 's/$PYTHONPATH//g' | sed 's/$LD_LIBRARY_PATH//g'`
      celeryenv=${celeryenv%?}

      # Create celery configuraiton script
      celeryconf="[program:celeryd]
      ; Set full path to celery program if using virtualenv
      command=/opt/python/run/venv/bin/celery worker -A myappname --loglevel=INFO

      directory=/opt/python/current/app
      user=nobody
      numprocs=1
      stdout_logfile=/var/log/celery-worker.log
      stderr_logfile=/var/log/celery-worker.log
      autostart=true
      autorestart=true
      startsecs=10

      ; Need to wait for currently executing tasks to finish at shutdown.
      ; Increase this if you have very long running tasks.
      stopwaitsecs = 600

      ; When resorting to send SIGKILL to the program to terminate it
      ; send SIGKILL to its whole process group instead,
      ; taking care of its children as well.
      killasgroup=true

      ; if rabbitmq is supervised, set its priority higher
      ; so it starts first
      priority=998

      environment=$celeryenv"

      # Create the celery supervisord conf script
      echo "$celeryconf" | tee /opt/python/etc/celery.conf

      # Add configuration script to supervisord conf (if not there already)
      if ! grep -Fxq "[include]" /opt/python/etc/supervisord.conf
          then
          echo "[include]" | tee -a /opt/python/etc/supervisord.conf
          echo "files: celery.conf" | tee -a /opt/python/etc/supervisord.conf
      fi

      # Reread the supervisord config
      supervisorctl -c /opt/python/etc/supervisord.conf reread

      # Update supervisord in cache without restarting all services
      supervisorctl -c /opt/python/etc/supervisord.conf update

      # Start/Restart celeryd through supervisord
      supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd
André Laszlo
  • 13,970
  • 2
  • 60
  • 73
yellowcap
  • 3,685
  • 34
  • 48
  • 6
    Thank you for posting this! Celery and EB have been a challenge, but your solution seems to work! I found an issue however: if there's a `%` sign in an environment variable supervisord throws a formatting error. I believe `%` is escaped by adding an additional `%`, like `%%`. Is there any way to format the env vars to add that extra `%` to all `%`? https://github.com/Supervisor/supervisor/issues/291 –  Jul 09 '14 at 00:27
  • 8
    In that case you could add an additional find/replace piece to the part where the environmental variables are parsed. For instance, `sed 's/%/%%/g'` will replace any `%` with `%%`. The command chain at the beginning of the script does a bunch of string replacements to make the env vars list supervisord compatible. So try adding it after the first command: `cat /opt/python/current/env | tr '\n' ',' | sed 's/%/%%/g' | ...` – yellowcap Jul 09 '14 at 09:41
  • @yellowcap Thank you for the great and detailed answer! – neurix Jul 09 '15 at 22:51
  • 8
    This definitely works but there are some issues with it. If you do this, your web and worker instances are tied to each other. So if the load on your workers increases, you are scaling both your web and workers instances. The other issue is if you have a celery beat task, you will end up with duplicate tasks if you scale up. You must only have 1 instance running your celery beat. I know the second issue is not related to what this question is about, but a project with celery workers can have celery beat as well. – AliBZ Jul 12 '16 at 18:14
  • Yes of course ideally you would have two separate instances running! The above setup is useful if you don't have the resources to buy several servers and you want to squeeze out as much as you can from each instance. I am running a low traffic Django app on a single small instance, for that it works great. And even if you have several instances, you might not want to "reserve" one just for the worker. That depends entirely on the use case. Agreed on the celery beat side, that would duplicate tasks so it would not be a good solution for celery beat if you have multiple instances. – yellowcap Jul 13 '16 at 08:07
  • I've created a script named "99-celery.config" and copied your script but it didn't work. Can you help me? Should I configure anything about supervisor on my local computer? http://stackoverflow.com/questions/38566456/how-to-run-a-celery-worker-on-aws-elastic-beanstalk – Çağatay Barın Jul 27 '16 at 17:03
  • somehow in my ec2, supervisorctl is not available as a command...but I got it working, thanks a bunch. OP should accept this answer. – Evan Chu Aug 06 '16 at 02:32
  • for the duplicate tasks, use a central cache server like redis or memcached and create a lock so that other instances dont reun the same task twice – Dr Manhattan Aug 11 '16 at 20:29
  • This is great help but like it was mentioned scalability requires execution on main node only. So container_commands should be used instead since it allows usage of leader_only option. I used 2 commands. First for creating the bash file, then second for executing it. This is my solution for django app: http://stackoverflow.com/questions/41161691/how-to-run-a-celery-worker-with-django-app-scalable-by-aws-elastic-beanstalk/41161692#41161692) – smentek Dec 15 '16 at 10:23
  • 1
    Your code worked fine until I decide to migrate some variables which were in my settings.py to my Elastic Beanstalk environment properties. Indeed, I have the following error when the script is called : for \'environment\' is badly formatted'>: file: /usr/lib64/python2.7/xmlrpclib.py line: 800 celeryd: ERROR (no such process) Thanks for the help. – Paul Wasson Dec 20 '16 at 09:23
3

I was trying to do something similar in PHP however for whatever reason I couldn't keep the worker running. I switched to a AMI on an EC2 server and have had success ever since.

Michael J. Calkins
  • 30,148
  • 14
  • 58
  • 89
1

For those using Elasticbeanstalk with Rails & Sidekiq. Here's a collection of ebextensions that ultimately did the trick for me:

https://gist.github.com/ctrlaltdylan/f75b2e38bbbf725acb6d48283fc2f174

Dylan Pierce
  • 3,147
  • 1
  • 26
  • 35