Jun
19
DB2 Express-C 9.7 and the Django adapter released
Filed Under DB2, Django, Ruby, Ruby on Rails | 9 Comments
This is a great day for those of us who love DB2, as DB2 Express-C 9.7 has just been released. As mentioned before, this is the best DB2 ever, and an extremely important release.
To learn more about what’s new in this release, please check out the recording of our latest webinar:
If you run Linux, Unix or Windows, download it while it’s hot.
DB2 9.7 on the Cloud
Another great aspect of this release is that for the first time ever, DB2 has been released both as a product and as a deployment on the Cloud. If you pop over to RightScale, you can get a trial account for free and should see DB2 Express-C 9.7 on both CentOS and Ubuntu within the partner catalog. RightScale has been an amazing partner and they really do wonders to simplify Cloud Computing. In ten minutes time you can be up and running on the Cloud, thanks to the templates provided.
DB2 support for Django
But the good times don’t stop there, we are also announcing the first official release of the Django adapter for DB2. It sounded crazy when I first proposed the idea within IBM back in 2006, but now it’s a reality.
You can download the .tar.gz archive from the Google Code homepage for the project, or simply by clicking here. This version fully supports the Django 1.0.2 API. For instructions on how to install it, please read the Getting started with the IBM DB Django adapter guide. The current version supports DB2 for Linux, Unix, Windows and MAC OS X, version 8.2 or higher (9.5 FP2 or higher for MAC OS X). In the future, IBM Cloudscape, Apache Derby, Informix (IDS) and both System i & z/OS will be supported.
ibm_db gem updated to 1.1
I’ll conclude this DB2-centric post with a smaller, but still interesting announcement. The ibm_db gem has been updated to version 1.1. This release includes support for ActiveRecord’s QueryCache mechanism, enhanced support for BigInt (and BigSerial), support for rename_column (requires DB2 9.7), parametrization of the timestamp datatype (requires DB2 9.7), and a few fixes and performance enhancements as well. It is recommended that you upgrade to this version.
Jun
5
Do Androids Count Electric Sheep with DB2 or MySQL?
Filed Under DB2, Mac, Ruby, Ruby on Rails | 26 Comments
Counting rows is an ubiquitous operation on the web, so much so that it’s often overused. Regardless of misuse, there is no denying that the performance of counting operations has an impact on most applications. In this post I’ll discuss my findings about the performance of DB2 9.5 and MySQL 5.1 regarding counting records.
For those of you who are not into science fiction, let me clarify that the odd title of this post is a tongue-in-cheek reference to the great novel, Do Androids Dream of Electric Sheep?.
I connected to the database, created the table, imported the data and benchmarked counting operations using ActiveRecord in a standalone script. Here is the code I used:
#!/usr/bin/env ruby
require "rubygems"
require "active_record"
require 'benchmark'
ActiveRecord::Base.establish_connection(
:adapter => :mysql,
:username => "myuser",
:password => "mypass",
:database => "mydb")
ActiveRecord::Schema.define do
create_table :people, :force => true do |t|
t.string :name, :null => false
t.string :fbid, :null => false
t.string :gender
t.string :profession
end
end
class Person < ActiveRecord::Base
end
# This can be sped up by performing an import instead
Person.transaction do
File.open("person.tsv").each_line do |line|
line = line.split(/\t/)
p = Person.new
p.name = line[0]
p.fbid = line[1]
p.gender = line[6]
p.profession = line[17]
p.save!
end
end
n = 100
Benchmark.bm(26) do |x|
x.report("Count all:") { n.times { Person.count } }
x.report("Count profession:") { n.times { Person.count(:profession) } }
x.report("Count females:") do
n.times { Person.count(:conditions => "gender = 'Female'") }
end
x.report("Count males w/ profession:") do
n.times { Person.count(:profession, :conditions => "gender = 'Male'") }
end
end
Please note that importing records in a huge transaction containing hundreds of thousands of INSERT operations is far from the most efficient way to import. Massive imports of data using the load/import facilities provided by each database is the way to go (also see the ar-extensions plugin). The lengthy import wasn’t benchmarked here though, so it isn’t determinant for this article.
people.tsv is a 92.7 MB tab separated values file that contains 875,857 records from the Freebase project (in my file I removed the header line, leaving only records).
For those who are not familiar with ActiveRecord, the queries executed behind the scenes are (in order):
SELECT count(*) AS count_all FROM people
SELECT count(people.profession) AS count_profession FROM people
SELECT count(*) AS count_all FROM people WHERE (gender = 'Female')
SELECT count(people.profession) AS count_profession FROM people WHERE (gender = 'Male')
While the table definition (for MySQL) is:
CREATE TABLE `people` (
`id` int(11) DEFAULT NULL auto_increment PRIMARY KEY,
`name` varchar(255) NOT NULL,
`fbid` varchar(255) NOT NULL,
`gender` varchar(255),
`profession` varchar(255)
) ENGINE=InnoDB
As easily verified by enabling logging with:
ActiveRecord::Base.logger = Logger.new(STDOUT)
Without much further ado, here are the times I obtained on my last generation MacBook Pro 2.66 GHz with 4 GB DDR3 RAM, and 320 GB @ 7200 rpm hard disk, running Mac OS X Leopard:
MySQL:
Count all: 42.467522
Count profession: 52.130935
Count females: 54.575469
Count males w/ profession: 64.046631
DB2:
Count all: 5.818485
Count profession: 7.714391
Count females: 8.556377
Count males w/ profession: 9.656739
Or in graph form:
That’s an impressive difference. To be exact, in this example DB2 was between 6 and 7 times faster than MySQL. In the case of COUNT(*), DB2 counted almost a million records in 58 milliseconds, or in about the blink of an eye according to Wolfram Alpha.
For those who are skeptical, please note that DB2 was not manually fine-tuned in any way. The client codepage was set to 1252 to allow Greek letters, and the log size was increased to permit such a huge transaction during the import. That’s it, no optimizations were attempted. This is DB2 Express-C out of the box. It looks like smart androids count electric sheep with DB2 after all.
The advantages of DB2 over MySQL when dealing with a massive volume of traffic are well known (and not limited to performance either), but DB2 can dramatically improve performance even for your average web application. And DB2 9.7, which will be released this month, increases the performance and the ability to self-tune itself to the available resources and required workload even further. If you’d like to try DB2 Express-C for yourself, you can download it here. It doesn’t cost you a dime to obtain and can be used for development, testing and production absolutely free of charge.
May
18
Memoization in Ruby and Python
Filed Under Python, Quick Tips, Ruby | 14 Comments
Wikipedia defines memoization as “an optimization technique used primarily to speed up computer programs by having function calls avoid repeating the calculation of results for previously-processed inputs.”. This typically means caching the returning value of a function in a dictionary of sorts using the parameters passed to the function as a key. This is done in order to reuse that returning value immediately without calculating it again, when the function is invoked with the same arguments. Even though we are trading space for time, it is often invaluable for speeding up certain recursive functions and when dealing with dynamic programming where intermediate calls are often repeated many times.
Using memoization in Ruby is very easy thanks to the memoize gem. The first step to getting started is therefore to install it:
$ sudo gem install memoize
Successfully installed memoize-1.2.3
1 gem installed
Installing ri documentation for memoize-1.2.3...
Installing RDoc documentation for memoize-1.2.3...
Now we can use the memoize method as illustrated in the example below:
require 'rubygems'
require 'memoize'
require 'benchmark'
include Memoize
def fib(n)
return n if n < 2
fib(n-1) + fib(n-2)
end
Benchmark.bm(15) do |b|
b.report("Regular fib:") { fib(35) }
b.report("Memoized fib:") { memoize(:fib); fib(35)}
end
In the first block we simply invoke fib(35), while in the second one we first invoke the method memoize(:fib) to memoize the method fib. Running this code on my machine prints the following:
user system total real
Regular fib: 55.230000 0.160000 55.390000 ( 55.819205)
Memoized fib: 0.000000 0.000000 0.000000 ( 0.001305)
We went from almost a minute of run time to an instantaneous execution. Optionally we could even pass a file location to the function memoize and this would use marshaling to dump and load the cached values on/from disk.
For Python we can write a simple decorator that behaves in a similar manner. In its simplest form it can be implemented as follows:
# memoize.py
def memoize(function):
cache = {}
def decorated_function(*args):
try:
return cache[args]
except KeyError:
val = function(*args)
cache[args] = val
return val
return decorated_function
Or more efficiently:
# memoize.py
def memoize(function):
cache = {}
def decorated_function(*args):
if args in cache:
return cache[args]
else:
val = function(*args)
cache[args] = val
return val
return decorated_function
When the memoized function has been invoked, we look in the cache to see if an entry for the given arguments already exist. If it does, we immediately return that value. If not, we call the function, cache the results and return its returning value.
Truth be told, the limit of this approach lies in the fact that since we are using a dictionary, only immutable objects can be used as keys. For example, we can use a tuple but are not allowed to have a list as a parameter. For the example within this article, this approach will suffice, but to take advantage of memoization when using arguments that are mutable, you may want to consider the approach described in this recipe.
We can now rewrite the Ruby example above in Python as follows:
import timeit
from memoize import memoize
def fib1(n):
if n < 2:
return n
else:
return fib1(n-1) + fib1(n-2)
@memoize
def fib2(n):
if n < 2:
return n
else:
return fib2(n-1) + fib2(n-2)
t1 = timeit.Timer("fib1(35)", "from __main__ import fib1")
print t1.timeit(1)
t2 = timeit.Timer("fib2(35)", "from __main__ import fib2")
print t2.timeit(1)
Running this code on my machine prints the following:
9.32223105431
0.000314950942993
In Python 2.5’s case by employing memoization we went from more than nine seconds of run time to an instantaneous result.
Granted we don’t write Fibonacci applications for a living, but the benefits and principles behind these examples still stand and can be applied to everyday programming whenever the opportunity, and above all the need, arises.













