Not bad?

Follow @askthepony Subscribe to RSS

Thanks!

  • Blog
  • About

Bits about Django performance, scalability and deployment

Django and PostgreSQL – improving the performance with no effort and no code.

Posted by ymirpl on July 8, 2011
We really like Django and PostgreSQL. And we can improve their performance two times (2x!) with no changes at all done to your application.

It’s a long post, so we’ll start with a summary:

Application performance

We will be using:

  • Django 1.3 + Jinja2 2.6-dev
  • nginx 1.0.4 + gunicorn 0.12.2 (1 worker)
  • PostgreSQL 8.4

Performance test will be performed using:

  • Blitz – set to sweep 1-110 users
  • Apache Benchmark – with concurrency set to 1, and 100 request – to average out the single request time

Case study

Yup, it’s going to be a social example. I know. We start with a simple twitter-like application (microblogging, but without followers – what a twist!). Let’s call it:

The Tuitter!

You can see the source code at Ask The Pony Performance Test GitHub repo. This repository contains a few more clever files, but we will not talk about them just yet, their time will come in the next few weeks (keep in touch for The Big Django Hosting real cost and performance review).

We will be looking at the performance of the index site, which checks your session (kept in the DB – Django default – we do not recommend this on production) and prints out 10 latest Tuits (using INNER JOIN query). It will come in 2 flavours:

  • / – using Django templating engine
  • /jinja2/ – using the excellent Jinja2 templating system (we DO recommend using it!)

See the Demo here.

Stage 1

We’ve created some user accounts and Tuitted some Tuits to fill the DB. Let’s use blitz to find out it’s melting point.

Stage 1 – Django templates

Stage 1 – Jinja2 templates

Apache Benchmark shows:

Requests per second:    24.04 [#/sec] (mean)
Time per request:       41.602 [ms] (mean)
Time per request:       41.602 [ms] (mean, across all concurrent requests)

Using Jinja2:

Requests per second:    31.40 [#/sec] (mean)
Time per request:       31.850 [ms] (mean)
Time per request:       31.850 [ms] (mean, across all concurrent requests)

It peaked at 24 requests/second. By the way – see the advantage of using Jinja2? – it made 31 requests/second.

So let’s profile this view. We use our simple InstrumentMiddleware (just add it to your MIDDLEWARE_CLASSES, as the first item). If you add ‘?profile=’ to the URL, it will display the profiling data.

 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      3    0.013    0.004    0.013    0.004 {cursor.execute}
      1    0.007    0.007    0.007    0.007 {psycopg2._psycopg.connect}
     22    0.002    0.000    0.003    0.000 base.py:275(__init__)
   1560    0.002    0.000    0.002    0.000 {isinstance}
  22/11    0.002    0.000    0.007    0.001 query.py:1128(get_cached_row)
294/284    0.001    0.000    0.003    0.000 encoding.py:54(force_unicode)

Problem

Whoa, so we’re spending most time in psycopg2._psycopg.connect, just connecting to the PostgreSQL? What can we do to improve it?

Solution

Well, we’ve got to try connection pooling. Each time you make a request, Django makes a new connection to the database. Pooling means that each gunicorn instance will have its own connection set up and never closed, so it would reduce the need to call _psycopg.connect. But that would mean having to modify Django. We do not want to do that, as it makes upgrading Django and using fabric/pip deployment painful in the long run.

We’ve got 2 options:

  • use pgpool-II
  • use pgBouncer

They are both production ready solutions. From the user perspective they are used the same way – we connect to them instead of the database, and they deal with the connection pooling.

Stage 2 – pgpool-II

Installing pgpool

On Debian just do a:

apt-get install pgpool

pgpool configuration

The file we’re interested in is pgpool.conf. In Debian it will be in /etc, if you compile pgpool from source, you will have to copy and rename the sample configuration file to pgpool.conf.

Default options are all right, we just have to set the database info. Set options:
backend_host_name, backend_port, backend_socket_dir to your PostgreSQL instance info. By default pgpool will listen at the port 5433. Start pgpool up, change the database port in your settings.py and your ready to go.

Performance

Stage 2 – Django templates + pgpool

Stage 2 – Jinja2 templates + pgpool

Apache benchmark shows:

Requests per second:    30.74 [#/sec] (mean)
Time per request:       32.534 [ms] (mean)
Time per request:       32.534 [ms] (mean, across all concurrent requests)

Using Jinja2:

Requests per second:    48.93 [#/sec] (mean)
Time per request:       20.437 [ms] (mean)
Time per request:       20.437 [ms] (mean, across all concurrent requests)

This time we peaked at 31 requests/sec (49 for Jinja2). Well, it’s better by 20%, with almost no effort! But profiling data says that we still spend much time connecting, which is caused by pgpool having to authorize us every time with the database. It would help much more if the database server was in a different datacenter.

Stage 3 – pgBouncer

Now it starts getting interesting. pgBouncer is a event-driven connection pooler, which will not authorize your every connection. Instead it will authorize you itself using PostgreSQL’s auth file or even your own text file with users and passwords. This should make our application perform A LOT better.

Installing pgBouncer

Use our script to compile it from source (Debian/Ubuntu-compatible) or do it yourself.

pgBouncer configuration

Modify the configuration file (or the sample configuration file available in etc/pgbouncer.ini of the source code copied over to /usr/local/etc/pgbouncer.ini):

  1. Modify the [databases] section to look like this (if your server is on your local machine on the default port):
    [databases]
    * = host=127.0.0.1 port=5432
  2. In [pgbouncer] section:
    1. Set the listening port for example to:
      listen_port = 6432
    2. Set the authorization file, if using PostgreSQL 8.x on default settings:
      auth_file = /var/lib/postgresql/<your postgres version, ie. 8.3>/main/global/pg_auth
    3. Set the log and pid files:
      logfile = /var/log/pgbouncer.log
      pidfile = /tmp/pgbouncer.pid
    4. You have to set the user pgBouncer will run as:
      user = postgres

      You can choose any user, as long as it’s not root. We will not dwell about security here.

  3. Run by executing:
    su -l postgres -c "pgbouncer -d /usr/local/etc/pgbouncer.ini"
  4. Modify the settings.py file, set the database port accordingly (in this example – 6432).

Done. Let’s get it smoking:

Performance

Stage 3 – Django templates + pgBouncer

Stage 3 – Jinja2 templates + pgBouncer

Apache Benchmark shows:

Requests per second:    36.44 [#/sec] (mean)
Time per request:       27.445 [ms] (mean)
Time per request:       27.445 [ms] (mean, across all concurrent requests)

Using Jinja2:

Requests per second:    70.99 [#/sec] (mean)
Time per request:       14.086 [ms] (mean)
Time per request:       14.086 [ms] (mean, across all concurrent requests)

Well well well, peaked at 37 requests/second, that’s 50% better than what we started with! With Jinja2: 71 req/s. It will be even better for you – this server has only one CPU core – and we hit another limit, this time we’re CPU, not I/O bound (as a side effect – that’s why Jinja2 is 2 times better this time).

Next week Tomek Kopczuk (@tkopczuk) will write about running Django on Heroku‘s brand new cedar stack and getting the most out of it (6 times more, in fact).

Follow us here on Twitter!

Comments

  • http://twitter.com/pcapr kowsik

    Great blog! Thanks for the plug on http://blitz.io. Do let us know how we can continue to improve our service!

    [edit] Just gave @tkopczuk +250 blogging credits for talking us up! Your free plan now allows you to rush up to 500 concurrent users. Enjoy.

  • http://twitter.com/muniu Muniu Kariuki

    Awesome!

  • Philip Cammarata

    I like that you use Jinja2 for use in Django but how do you recommend you go about integrating it?  I’ve seen a few options such as the template_loader, coffin and middleware.  How does ATP do it?

  • http://www.askthepony.com/blog/ Marcin Mincer

    Pony says – go for Coffin. It eases the transition quite a bit for no apparent loss if compared to going for Jinja2 alone. Good luck!

  • Philip Cammarata

    Thanks! I’ll check out Coffin first.

  • http://www.askthepony.com/blog/2011/07/getting-django-on-heroku-prancing-8-times-faster/ Getting Django on Heroku prancing 8 times faster. | Ask The Pony

    [...] use our old sample project called The Tuitter, which is available here on github. Just [...]

  • Anonymous

    In your `InstrumentMiddleware`, you can simply use: `stats = pstats.Stats(request.profiler, stream=stream)`, instead of creating a temporary file and later deleting it.

  • http://www.askthepony.com/blog Tomek Kopczuk

    True, well spotted!

  • http://www.facebook.com/amir.fru Amir Fruchtman

    great post !

About this blog

Ask The Pony is a weblog, where various members of our team share knowledge about Django.

Make lightning fast apps that scale.

Follow @askthepony
Subscribe to RSS


Are you a designer?

SnapRuler
our delicious screen ruler for Mac

”

If you’re a full time designer or developer, get ready to have your mind blown.
Harrison Weber, The Next Web

Mac App Store

Recent Posts

  • Fabric script to deploy minified, combined and otherwise optimized media.
  • Setup a complete Django server, deploy, rollback – all in one powerful script.
  • Getting Django on Heroku prancing 8 times faster.
  • Django and PostgreSQL – improving the performance with no effort and no code.
  • How to send a proper unicode encoded email using Python 2.7

Copyright © 2011 Blade Polska

  • Blog
  • About
Tweet