标签:
https://www.syncano.io/blog/configuring-running-django-celery-docker-containers-pt-1/
Update: Fig has been replaced by Docker Compose, and is now deprecated. Docker Compose should serve as a drop-in replacement for fig.sh used in this article.
Today, you‘ll learn how to setup a distributed task processing system for quick prototyping. You will configure Celery with Django, Postgres, Redis, and Rabbitmq, and then run everything in Docker containers. You‘ll need some working knowledge of Docker for this tutorial, which you can get in one my previous posts here.
Django is a well-known Python web framework, and Celery is a distributed task queue. You‘ll use Postgres as a regular database to store jobs, Rabbitmq to route tasks to different queues, and Redis as a task storage backend.
Note: Although I don‘t demonstrate it in this post, Redis can be used in a variety of different ways:
- As a key value store - As a cache - To publish and/or subscribe - For distributed locking
When you build a web application, sooner or later you‘ll have to implement some kind of offline task processing.
Example:
A user wants to convert her cat photos from .jpg to .png or create a .pdf from her collection of .jpg cat files. Doing either of these tasks in one HTTP request will take too long to execute and will unnecessarily burden the web server - meaning we can‘t serve other requests at the same time. The common solution is to execute the task in the background - often on another machine - and poll for the result.
A simple setup for an offline task processing could look like this:
This setup looks nice, but it has one flaw - it doesn‘t scale. What if she has a lot of cat pictures and one server wouldn‘t be enough? Or if there was some other very big job and all other jobs would be blocked by it? This is why you need to be prepared to scale.
To scale, you need something between the web server and worker: a broker. The web server would schedule new tasks by communicating with the broker, and the broker would communicate with the workers. You probably also want to buffer your tasks, retry if they fail, and monitor how many of them were processed.
You would have to create queues for tasks with different priorities or for those suitable for a different kind of worker.
All of this can be greatly simplified by using Celery - an open source distributed tasks queue. It works like a charm after you configure it -as long as you do so correctly.
Celery consists of:
You can watch a more in-depth introduction to Celery here or jump straight to Celery‘s getting started guide.
Sooner or later, you will end up with a pretty complex distributed system - and distributed systems have fallacies that you should be aware of:
With Docker, it‘s much easier to test solutions on a system level - by prototyping different task designs and the interactions between them.
Start with the standard Django project structure. It can be created with django-admin, if you have it installed.
$ tree -I *.pyc
.
├── Dockerfile
├── fig.yml
├── myproject
│ ├── manage.py
│ └── myproject
│ ├── celeryconf.py
│ ├── __init__.py
│ ├── models.py
│ ├── serializers.py
│ ├── settings.py
│ ├── tasks.py
│ ├── urls.py
│ ├── views.py
│ └── wsgi.py
├── README.md
├── requirements.txt
├── run_celery.sh
└── run_web.sh
Since we are working with Docker, we need a proper Dockerfile to specify how our image will be built.
Dockerfile
# use base python image with python 2.7
FROM python:2.7
# add requirements.txt to the image
ADD requirements.txt /app/requirements.txt
# set working directory to /app/
WORKDIR /app/
# install python dependencies
RUN pip install -r requirements.txt
# create unprivileged user
RUN adduser --disabled-password --gecos ‘‘ myuser
Our dependencies are:
requirements.txt
django==1.7.2
celery==3.1.17
Djangorestframework==3.0.3
psycopg2==2.5.4
redis==2.10.3
I‘ve frozen versions of dependencies to make sure that you will have a working setup. If you wish, you can update any of them, but it‘s not guaranteed to work.
Now we only need to set up Rabbitmq, Postgresql, and Redis. Since Docker introduced its official library, I use their official images whenever possible. However, even these can be broken sometimes. When that happens, you‘ll have to use something else.
Here are the images I tested and selected for this project:
Now you‘ll use fig.sh to combine your own containers with the ones we chose in the last section. If you‘re not familiar with Fig.sh, check out my post on making your Docker workflow awesome with fig.
fig.yml
# database container
db:
image: postgres:9.4
environment:
- POSTGRES_PASSWORD=mysecretpassword
# redis container
redis:
image: redis:2.8.19
# rabbitmq container
rabbitmq:
image: tutum/rabbitmq
environment:
- RABBITMQ_PASS=mypass
ports:
- "5672:5672" # we forward this port because it‘s useful for debugging
- "15672:15672" # here, we can access rabbitmq management plugin
# container with Django web server
web:
build: . # build using default Dockerfile
command: ./run_web.sh
volumes:
- .:/app # mount current directory inside container
ports:
- "8000:8000"
# set up links so that web knows about db, rabbit and redis
links:
- db:db
- rabbitmq:rabbit
- redis:redis
# container with redis worker
worker:
build: .
command: ./run_celery.sh
volumes:
- .:/app
links:
- db:db
- rabbitmq:rabbit
- redis:redis
You‘ve probably noticed that both the worker and web server run some starting scripts. Here they are:
run_web.sh
#!/bin/sh
cd myproject
# migrate db, so we have the latest db schema
su -m myuser -c "python manage.py migrate"
# start development server on public ip interface, on port 8000
su -m myuser -c "python manage.py runserver 0.0.0.0:8000"
run_celery.sh
#!/bin/sh
cd myproject
# run Celery worker for our project myproject with Celery configuration stored in Celeryconf
su -m myuser -c "celery worker -A myproject.celeryconf -Q default -n default@%h"
The first script - runweb.sh - will migrate the database and start django development server on port 8000.
Ths second one , runcelery.sh, will start a celery worker listening on a queuedefault.
At this stage, these scripts won‘t work as we‘d like them to because we haven‘t yet configured them. Our app still doesn‘t know that we want to use Postgres as database and where to find it (in a container somewhere). We also have to configure Redis and Rabbitmq.
But before we get to that, there are some useful Celery settings that will make your system perform better. Below are complete settings of this django app.
myproject/settings.py
import os
from kombu import Exchange, Queue
BASE_DIR = os.path.dirname(os.path.dirname(__file__))
# SECURITY WARNING: keep the secret key used in production secret!
SECRET_KEY = ‘megg_yej86ln@xao^+)it4e&ueu#!4tl9p1h%2sjr7ey0)m25f‘
# SECURITY WARNING: don‘t run with debug turned on in production!
DEBUG = True
TEMPLATE_DEBUG = True
ALLOWED_HOSTS = []
# Application definition
INSTALLED_APPS = (
‘django.contrib.staticfiles‘,
‘rest_framework‘,
‘myproject‘,
)
MIDDLEWARE_CLASSES = (
)
REST_FRAMEWORK = {
‘DEFAULT_PERMISSION_CLASSES‘: (‘rest_framework.permissions.AllowAny‘,),
‘PAGINATE_BY‘: 10
}
ROOT_URLCONF = ‘myproject.urls‘
WSGI_APPLICATION = ‘myproject.wsgi.application‘
# Localization ant timezone settings
TIME_ZONE = ‘UTC‘
USE_TZ = True
CELERY_ENABLE_UTC = True
CELERY_TIMEZONE = "UTC"
LANGUAGE_CODE = ‘en-us‘
USE_I18N = True
USE_L10N = True
# Static files (CSS, JavaScript, Images)
# https://docs.djangoproject.com/en/1.7/howto/static-files/
STATIC_URL = ‘/static/‘
# Database Configuration
DATABASES = {
‘default‘: {
‘ENGINE‘: ‘django.db.backends.postgresql_psycopg2‘,
‘NAME‘: os.environ.get(‘DB_ENV_DB‘, ‘postgres‘),
‘USER‘: os.environ.get(‘DB_ENV_POSTGRES_USER‘, ‘postgres‘),
‘PASSWORD‘: os.environ.get(‘DB_ENV_POSTGRES_PASSWORD‘, ‘‘),
‘HOST‘: os.environ.get(‘DB_PORT_5432_TCP_ADDR‘, ‘‘),
‘PORT‘: os.environ.get(‘DB_PORT_5432_TCP_PORT‘, ‘‘),
},
}
# Redis
REDIS_PORT = 6379
REDIS_DB = 0
REDIS_HOST = os.environ.get(‘REDIS_PORT_6379_TCP_ADDR‘, ‘127.0.0.1‘)
RABBIT_HOSTNAME = os.environ.get(‘RABBIT_PORT_5672_TCP‘, ‘localhost:5672‘)
if RABBIT_HOSTNAME.startswith(‘tcp://‘):
RABBIT_HOSTNAME = RABBIT_HOSTNAME.split(‘//‘)[1]
BROKER_URL = os.environ.get(‘BROKER_URL‘,
‘‘)
if not BROKER_URL:
BROKER_URL = ‘amqp://{user}:{password}@{hostname}/{vhost}/‘.format(
user=os.environ.get(‘RABBIT_ENV_USER‘, ‘admin‘),
password=os.environ.get(‘RABBIT_ENV_RABBITMQ_PASS‘, ‘mypass‘),
hostname=RABBIT_HOSTNAME,
vhost=os.environ.get(‘RABBIT_ENV_VHOST‘, ‘‘))
# We don‘t want to have dead connections stored on rabbitmq, so we have to negotiate using heartbeats
BROKER_HEARTBEAT = ‘?heartbeat=30‘
if not BROKER_URL.endswith(BROKER_HEARTBEAT):
BROKER_URL += BROKER_HEARTBEAT
BROKER_POOL_LIMIT = 1
BROKER_CONNECTION_TIMEOUT = 10
# Celery configuration
# configure queues