Poor Man's ESI with Nginx SSIs and Django
Traditional ESI
ESI is a technology used by many proxy cache tools like Squid and Varnish to define dynamic requests on the server side. This allows you to serve a static or cached version of a page with only certain elements needing to be generated by the server-side.
Like many application developers I'm cheap and lazy, I don't really have the resources to run a separate frontend cache before requests hit the webserver. Not to mention the obvious new layer of complexity that would be introduced when debugging issues on the production deployment. I like to keep things simple and clean in the servers. I use Nginx proxying to either FCGI, uWSGI, or Gunicorn as a python server.
Nginx SSIs
Lately I found out Nginx has a solid server-side include implementation and considered trying to use this in the same sort of fashion as a traditional ESI system. It turns out, it's almost exactly like ESI only it is easy to configure and uses much less server resources than running another daemon for proxy caching.
Here's some example usage from the Nginx wiki:
nginx
location / { ssi on; } # pretty freaking simple
When this is enabled on a location, it will look for a couple different tags that the Nginx SSI module defines. The most useful of these is the include tag:
html
<!--# include virtual="/some/url?with=getvars" -->
On their own, they seem somewhat irrelevant unless you are serving static html documents with these tags for dynamic elements. The cool part is that ssi on; also works for memcached and proxied content. for example:
nginx
location / { ssi on; memcached_pass 127.0.0.1:11211; error_page 404 405 502 = @app_server; }
It would pull the pre-rendered document from cache, then process any SSI tags in the content before serving it to the user. This would be a second request on the server side but only one request from the client. The internal request to the Python app contains all request headers and vars that were sent to the original request. This means you could use dynamic forms within SSI requests. Pretty cool stuff.
Where does Django fit in?
You've probably already noticed, this content isn't really specific to any one server-side technology, the above information would be just as useful to a PHP developer.
The issue is loading the caches in the first place. You wouldn't want to return a page from Python to a user with SSI tags in it as then it would have to turn right around again at the webserver and make another request to Python. The overhead for the TCP connections would be stupid when you were already there making the page in the first place. You would ideally serve the entire page to the user and send a copy full of SSI tags to memcached for subsequent requests, this is where Django fits in.
The Way I Did It
python
import re from django.core.urlresolvers import get_urlconf, get_resolver, Resolver404 class NginxSSIMiddleware(object): ''' Emulates Nginx SSI module for when a page is rendered from Python. SSI include tags are cached for serving directly from Nginx, but if the page is being built for the first time, we just serve these directly from Python without having to make another request. Takes a response object and returns the response with Nginx SSI tags resolved. ''' def process_response(self, request, response): include_tag = r'<!--#[\s.]+include[\s.]+virtual=["\'](?P<path>.+)["\'][\s.]+-->' resolver = get_resolver(get_urlconf()) patterns = resolver._get_url_patterns() def get_tag_response(match): for pattern in patterns: try: view = pattern.resolve(match.group('path')[1:]) if view: return view[0](request, *view[1], **view[2]).content except Resolver404: pass return match.group('path')[1:] response.content = re.sub(include_tag, get_tag_response, response.content) return response
I placed this middleware after my caching middleware so that the tags are sent to the cache, but before serving the page back, It finds all the include tags, resolves the url, and returns the response of the view associated with that url.
There are a couple downsides to this — This method only implements the location tag of the Nginx SSI module and there are many other useful ones, and any dynamic blocks of content will have to be exposed on separate urls. Neither of these are really a deal breaker in my application, but you may want to take this middleware to another level for your setup and needs.
Why Would I Do This?
In my opinion, this is the only way to properly have user specific content on fully cached pages. There are a couple of good ways:
- Two-phase template rendering
- This means pulling the rendered page from memcached in Python and then substituting dynamic content into the template before serving to the user.
- Usually ends up with some type of non-standard comment tag to be replaced with ugly regexs. (<!-- username -->)
- Latency while Python contacts the memcached server, renders the template, then sends the entire response back to the server.
- AJAX request
- Caching the entire page and then making separate client side requests for user specific data.
- Slow render presented to the user.
- Multiple requests to the server.
- Completely dependent on the client having javascript enabled.
- More possibility for security issues.
I'd love to hear how you are dealing with this issue in the comments, let me know.
Posted on June 18, 2010
Tags: caching, django, memcached, nginx, python, scaling, server
blog comments powered by Disqus