Sep
13
FriendFeed, which was recently acquired by Facebook, just released an interesting piece of open source software.
Tornado is an open source version of the scalable, non-blocking web server and tools that power FriendFeed. The FriendFeed application is written using a web framework that looks a bit like web.py or Google’s webapp, but with additional tools and optimizations to take advantage of the underlying non-blocking infrastructure.
The story so far
This release generated widespread interest among the Python and open source development communities. Rightfully so. There are many reasons to like Tornado. To begin with, it’s fast — and that’s fundamental for a web server. By using nginx as a load balancer and a static file server, and running a few Tornado instances (usually one per core available on the machine) it’s possible to handle thousands upon thousands of concurrent connections on relatively modest hardware; and this isn’t just theory. Tornado has already proven its worth in the field, by allowing FriendFeed to scale graciously.
Tornado is not only a fast web server, it acts as a very lightweight application framework as well. As such, it’s an appealing alternative to well established frameworks to the growing group of developers who’d like to develop “closer to the metal” and avoid the baggage associated with full-fledged web frameworks. The two things combined make Tornado ideal for developing “real time” web services and applications.
The feedback so far hasn’t been all positive though. Criticism of the project has mainly focused on the lack of test coverage and the fact that FriendFeed has opted not to contribute to, and improve on, the existing Twisted Web project (which has similar goals). To make things worse, there were a few nonchalant comments about it as well. Performance issues and lack of ease of use were the reported motivations for starting a new project from scratch.
Dustin Sallings started working on a hybrid solution (henceforth Tornado on Twisted) that would reportedly keep the good parts that Tornado introduced, while using Twisted as its core for networking and HTTP parsing.
At this point I became naturally curious about the speed of these three web servers. Is Tornado really faster than Twisted Web? And what about Tornado on Twisted, would it be faster or slower? Let’s find out.
Benchmark results
I ran a simple Hello World app for all three web servers. All the web servers were run in standalone mode without a load balancer. I stress tested the web servers with httperf using a progressively larger amount of concurrent requests. 100,000 requests were generated for each test. The web servers were run on a desktop machine with an Intel® Core™2 Quad Processor Q6600 (8M Cache, 2.40 GHz, 1066 MHz FSB) processor and 8GB of RAM. The operating system of choice was Ubuntu 9.04 (x86_64).
Without further ado, here are the results:

As you can see Tornado turned out to be faster than the rest of the Python web servers. Handling a peak of almost 3900 req/s with a single front-end and on commodity hardware is nothing to sneer at.
Twisted Web didn’t do too bad either (max. 2703.7 req/s), but the difference in performance is noticeable. Likewise, the performance of Tornado on Twisted was virtually identical to that of Twisted Web.
There you have it. I was curious about the possible outcome and now I know. Remember, this is a report on the numbers I got on my machine, not a research paper. But I hope that you find them interesting nevertheless.
Show me the code
Tornado:
import tornado.httpserver
import tornado.ioloop
import tornado.options
import tornado.web
import logging
from tornado.options import define, options
define("port", default=8888, help="run on the given port", type=int)
class MainHandler(tornado.web.RequestHandler):
def get(self):
self.write("Hello, world!")
def main():
tornado.options.parse_command_line()
application = tornado.web.Application([
(r"/", MainHandler),
])
http_server = tornado.httpserver.HTTPServer(application)
http_server.listen(options.port)
tornado.ioloop.IOLoop.instance().start()
if __name__ == "__main__":
main()
Twisted Web:
from twisted.internet import epollreactor
epollreactor.install()
from twisted.internet import reactor
from twisted.web import server, resource
class Simple(resource.Resource):
isLeaf = True
def render_GET(self, request):
return "Hello, world!"
site = server.Site(Simple())
reactor.listenTCP(8888, site)
reactor.run()
Tornado on Twisted:
from twisted.internet import epollreactor
epollreactor.install()
from twisted.internet import reactor
import tornado.options
import tornado.twister
import tornado.web
import logging
from tornado.options import define, options
define("port", default=8888, help="run on the given port", type=int)
class MainHandler(tornado.web.RequestHandler):
def get(self):
self.write("Hello, world!")
def main():
tornado.options.parse_command_line()
application = tornado.web.Application([
(r"/", MainHandler),
])
site = tornado.twister.TornadoSite(application)
reactor.listenTCP(options.port, site)
reactor.run()
if __name__ == "__main__":
main()
UPDATE (September 14, 2009):
- The original version of this post included Unicorn as well. This wasn’t fair however, since it’s not an asynchronous web server.
- EventMachine HTTP Server was added, but I have since decided to remove it as I prefer to let the article be a fair comparison between asynchronous Python web servers.
- I initially used Apache Benchmark (ab). The results were misleading at best. I re-ran the tests with httperf and updated the results above.
- Stock Tornado couldn’t be tested with httperf because their HTTP Server doesn’t implement getClientIP(). I had to manually modify a method to return the remote ip address. This may introduce a very minimal advantage for Tornado, but it should be negligible in this context.
- I modified the examples for Twisted and Tornado on Twisted, to ensure that both took advantage of the epoll-based reactor.
Sep
9
Improve the speed and security of your SQL queries
Filed Under DB2, Python, Ruby | 5 Comments
An easy way to improve the performance and security of SQL queries is to replace literals with parameters. By replacing literal values with parameters, advanced relational databases will be able to compile your queries and have their execution plans cached. This saves time and precious resources when the same query (minus the actual values) is executed over and over.
Consider the following series of queries:
SELECT * FROM users WHERE karma BETWEEN 100 AND 499;
SELECT * FROM users WHERE karma BETWEEN 500 AND 999;
SELECT * FROM users WHERE karma BETWEEN 1000 AND 1999;
SELECT * FROM users WHERE karma BETWEEN 2000 AND 4999;
SELECT * FROM users WHERE karma BETWEEN 5000 AND 9999;
SELECT * FROM users WHERE karma BETWEEN 10000 AND 50000;
These each represent the same query and can be transformed into a single parameterized query:
SELECT * FROM users WHERE karma BETWEEN ? AND ?;
Trying to use clever tricks with quotes in order to inject arbitrary SQL code becomes futile. Parameters are considered values, and have no effect on the structure of the query itself.
Parameterized queries are therefore efficient and go a long way towards preventing SQL injection attacks in your applications. They have virtually no downside.
Newbie developers often ignore the existence of this feature and end up irritating seasoned DBAs who have to deal with the consequences of their incompetence. Leon Katsnelson argues that this is such an important matter, that every DBA should forward this Computerworld article to their developers. I tend to agree with how important of an issue that is.
That article provides the following example in Java:
String lastName = req.getParameter("lastName");
String query = "select * from customers where last_name = ?"
PreparedStatement pstmt = connection.prepareStatement(query);
pstmt.setString(1, lastName);
try { ResultSet results = pstmt.execute(); }
Here I’ll show you an example of how to work with parameterized queries from Ruby and Python. I’ll use the Ruby and Python drivers for DB2.
Ruby first:
require 'ibm_db'
conn = IBM_DB.connect("mydb", "db2inst1", "mypassword")
query = "SELECT * FROM users WHERE karma BETWEEN ? AND ?"
pstmt = IBM_DB.prepare(conn, query)
values = [500, 999]
IBM_DB.execute(pstmt, values)
while row = IBM_DB.fetch_array(pstmt)
puts "#{row[0]}:#{row[1]}"
end
We load the driver (use mswin32/ibm_db on Windows, and ibm_db.bundle on Mac), create a prepared statement, and then bind the two parameter values to it through the execute method. We then fetch the resultset one row at a time and print the value of the first two fields for each record. For fine-tuned control we could have used the IBM_DB::bind_param method.
The Python version is very similar:
import ibm_db
conn = ibm_db.connect("mydb", "db2inst1", "mypassword")
query = "SELECT * FROM users WHERE karma BETWEEN ? AND ?"
pstmt = ibm_db.prepare(conn, query)
values = (500, 999)
ibm_db.execute(pstmt, values)
tuple = ibm_db.fetch_tuple(pstmt)
while tuple:
print tuple[0] + ":" + tuple[1]
tuple = ibm_db.fetch_tuple(pstmt)
As you can see, working with parameterized queries is not any harder than dynamically generating SQL queries. Yet the benefits of doing so are huge.
Unfortunately, despite being a very sound choice to base an Object-Relational Mapper (ORM) on, ActiveRecord does not use parameterized queries. Even when it looks like you are passing parameters to a given method, these are actually used to dynamically form an SQL query. Of course you are still free to use parameterized queries in your Rails applications by employing the driver directly. But I really think this is something ActiveRecord should be built upon.
Luckily for Django developers, Django’s ORM uses parameterized queries, thus improving both performance and security with a single design choice. In the Python world you couldn’t get away with ignoring parameterized queries.
For those of you using Rails, all is not lost. DB2 Express-C 9.7 has a killer feature known as the Statement Concentrator, which caches similar queries allowing them to use a shared access plan. It’s not as efficient as using prepared statements in your code, but it’s the best you can do when, as in the case of ActiveRecord, you can’t use parameterized queries directly. Leon’s article explains in greater detail how this feature actually works.
Sep
8
Enabling support for DB2 and Python/Django/SQLAlchemy on Mac OS X Snow Leopard
Filed Under DB2, Django, Python | Leave a Comment
This is the Python version of a post I made about Ruby a few days ago.
Now that Mac OS X 10.6 is out, it’s time to leave the world of 32 bit computing behind. The pre-installed Python interpreter will run in 64 bit mode by default, so you may need to pay attention when installing some C-based eggs.
Assuming you have DB2 Express-C installed already, the ibm_db Python egg for DB2 can easily be installed by following these simple steps:
$ sudo -s
$ export IBM_DB_LIB=/Users/<username>/sqllib/lib64
$ export IBM_DB_DIR=/Users/<username>/sqllib
$ export ARCHFLAGS="-arch x86_64"
$ easy_install ibm_db
This will install the ibm_db C driver, and the ibm_db_dbi Python module that complies to the DB-API 2.0 specification.
You can verify that the installation was successful my running the following:
$ python
>>> import ibm_db
>>>
Now, for the Django adapter, install Django first (if you haven’t done so already):
$ sudo easy_install django
The Django adapter can then be installed as follows:
$ sudo easy_install ibm_db_django
Finally, if have installed SQLAlchemy and wish to install the DB2 adapter for it, run:
$ sudo easy_install ibm_db_sa
Please let me know if you encounter any issues, I’d be glad to help you.





















