Engine Yard

Monte Carlo simulation of the Monty Hall Problem in Ruby and Python

Antonio Cangiano January 1st, 2009

Reading Jeff Atwood’s post The Problem of the Unfinished Game, reminded me of a similar problem. The Monty Hall Problem is a well known probability puzzle that has tricked many people. In fact, if you are not familiar with it already, chances are that you’ll get it wrong. And you would be in good company along with many mathematicians and physicists, including the great mathematician, Paul Erdos. This puzzle is loosely based on the television show Let’s Make a Deal, and is equivalent to some much older puzzles you may be familiar with (e.g. the three prisoners problem). In its simplest form, it asks the following question:

Suppose you’re on a game show, and you’re given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what’s behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice?

This definition of the problem is admittedly ambiguous. Thankfully Wikipedia points us towards a more exact definition:

Suppose you’re on a game show and you’re given the choice of three doors. Behind one door is a car; behind the others, goats [that is, booby prizes]. The car and the goats were placed randomly behind the doors before the show. The rules of the game show are as follows: After you have chosen a door, the door remains closed for the time being. The game show host, Monty Hall, who knows what is behind the doors, now has to open one of the two remaining doors, and the door he opens must have a goat behind it. If both remaining doors have goats behind them, he chooses one randomly. After Monty Hall opens a door with a goat, he will ask you to decide whether you want to stay with your first choice or to switch to the last remaining door. Imagine that you chose Door 1 and the host opens Door 3, which has a goat. He then asks you “Do you want to switch to Door Number 2?” Is it to your advantage to change your choice?

The Monty Hall Problem

Think about it for a moment, then read on. To answer this question, most people will try to determine which of the two possible outcomes has a higher probability. Problems arise when trying to correctly calculate the probability of these two events though. There are two closed doors and the car could be behind either of them. Hence, most people’s “common sense” and psychology leads them to believe that there is a 50% chance that the car is behind the initially selected door, and 50% that it’s behind the other closed door that was offered up by Monty. Initially it would seem that switching or staying with the first choice doesn’t really make a difference.

Unfortunately that’s not the right answer. The correct answer is that there is a two out of three chance of winning by switching to the other door; so switching is always to your advantage. This result is considered to be a paradox because it’s very counterintuitive to the way that many people think. It is in fact so counterintuitive that most people will argue with you in an attempt to convince you otherwise. I invite you to check out the Wikipedia entry on the problem/paradox, to read a step-by-step explanation with figures about why switching gives you about 66.7% chance of winning the car and why staying with the initial choice gives you only a 33.3% success rate.

When you make your first choice your probability of winning the car is only 1/3. If you decide to switch, you will win only if the first choice you made was wrong. And since your first choice came with a 2 out of 3 chance of picking a goat, switching will then (logically) give you 2/3 chance of winning. Another easy way to come to intuitively accept this surprising result, is to wildly exaggerate the terms of the problem. If there were a billion doors, you picked one, and then Monty proceeded to open up all the remaining doors but one, we’d have a situation where it would be extremely unlikely that you picked the right door at the beginning, while it would be extremely likely that the remaining door was the one that was concealing the car.

Even after reading several explanations and aids to understand these results, there are still people who are skeptical or refuse to believe them. Let’s verify the outcome with a simulation.

What you find below is a quick Ruby script that I wrote to run a Monte Carlo Simulation of the Monty Hall problem/paradox. It runs the game a million times and then measures how many times the player won by sticking with their first choice, and how many times switching would have led to winning the car.

#!/usr/bin/env ruby -w

# Monte Carlo simulation for the Monty Hall Problem:
# http://en.wikipedia.org/wiki/Monty_Hall_problem

=begin
When using a Ruby version older than 1.8.7
define the following two methods:

  class Array
    def shuffle
      self.sort_by { rand }
    end
    
    def choice
      self.shuffle.first
    end
  end
=end

# Utility class for the simulation of a single Monty Hall game.
class MontyHall
  def initialize
    @doors = ['car', 'goat', 'goat'].shuffle
  end

  # Return a number representing the player's first choice.
  def pick_door
    return rand(3)
  end

  # Return the index of the door opened by the host.
  # This cannot represent a door hiding a car or the player's chosen door.
  def reveal_door(pick)
    available_doors = [0, 1, 2]
    available_doors.delete(pick)
    available_doors.delete(@doors.index('car'))
    return available_doors.choice
  end

  # Return true if the player won by staying
  # with their first choice, false otherwise.
  def staying_wins?(pick)
    won?(pick)
  end

  # Return true if the player won by switching, false otherwise.
  def switching_wins?(pick, open_door)
    switched_pick = ([0, 1, 2] - [open_door, pick]).first
    won?(switched_pick)
  end

  private

  # Return true if the player's final pick hides a car, false otherwise.
  def won?(pick)
    @doors[pick] == 'car'
  end
end

if __FILE__ == $0
  ITERATIONS = (ARGV.shift || 1_000_000).to_i
  staying = 0
  switching = 0

  ITERATIONS.times do
    mh = MontyHall.new
    picked = mh.pick_door
    revealed = mh.reveal_door(picked)
    staying += 1 if mh.staying_wins?(picked)
    switching += 1 if mh.switching_wins?(picked, revealed)
  end

  staying_rate = (staying.to_f / ITERATIONS) * 100
  switching_rate = (switching.to_f / ITERATIONS) * 100

  puts "Staying: #{staying_rate}%."
  puts "Switching: #{switching_rate}%."
end

And here is an “equivalent” version I wrote in Python:

#!/usr/bin/env python
"""
Monte Carlo simulation for the Monty Hall Problem:
http://en.wikipedia.org/wiki/Monty_Hall_problem.
"""
import sys
from random import randrange, shuffle, choice

DOORS = ['car', 'goat', 'goat']

def pick_door():
    """Return a number representing the player's first choice."""
    return randrange(3)

def reveal_door(pick):
    """Return the index of the door opened by the host.
    This cannot be a door hiding a car or the player's chosen door.
    """
    all_doors = set([0, 1, 2])
    unavailable_doors = set([DOORS.index('car'), pick])
    available_doors = list(all_doors - unavailable_doors)
    return choice(available_doors)

def staying_wins(pick):
    """Return True if the player won by staying
    with their first choice, False otherwise.
    """
    return won(pick)

def switching_wins(pick, open_door):
    """Return True if the player won by switching,
    False otherwise.
    """
    other_doors = set([pick, open_door])
    switched_pick = (set([0, 1, 2]) - other_doors).pop()
    return won(switched_pick)

def won(pick):
    """Return True if the player's final pick hides a car,
    False otherwise.
    """
    return (DOORS[pick] == 'car')

def main(iterations=1000000):
    """Run the main simulation as many
    times as specified by the function argument.
    """
    shuffle(DOORS)

    switching = 0
    staying = 0

    for dummy in xrange(iterations):
        picked = pick_door()
        revealed = reveal_door(picked)
        if staying_wins(picked):
            staying += 1
        if switching_wins(picked, revealed):
            switching += 1

    staying_rate = (float(staying) / iterations) * 100
    switching_rate = (float(switching) / iterations) * 100

    print "Staying: %f%%" % staying_rate
    print "Switching: %f%%" % switching_rate

if __name__ == "__main__":
    if len(sys.argv) == 2:
        main(int(sys.argv[1]))
    else:
        main()

Even if you are not familiar with Ruby or Python, you may be able to understand what’s going on here. The main body of the program emulates the game and keeps track of the number of victories when the player sticks with their initial choice, and when they switch. Notice that this code intentionally tries not to be clever, in order not to annoy “skeptical” people.

There are many points in the code where correct assumptions about the problem would lead us to code that is faster and much more compact. For example, if the player wins a given game by sticking with his first answer, it’s obvious that switching would have made him lose. We could just calculate the difference between 100 and the success rate of staying with the first choice, and we’d obtain the success rate for switching. But here we are trying to simulate the problem as faithfully as possible and abstract as little as necessary.

As always with Monte Carlo Simulations, the outcome is slightly variable during each run since it depends on random input; but by the law of large numbers, it will very slowly converge to the expected values (despite the pseudo-randomness used here). For example, when I executed the code above for the first time on my machine, I obtained the following:

Staying: 33.382%.
Switching: 66.618%.

The results of this simulation should be enough to convince you that the theoretical results are actually true; we are easily fooled, and the mathematicians who got it right were not making stuff up. ;-)

Happy New Year to my readers, I wish you all the best for a happy, successful 2009!


If you enjoyed this post, then make sure you subscribe to my RSS Feed.

Developers are blinded by the light

Antonio Cangiano December 30th, 2008

Blinded by the light,
revved up like a deuce,
another runner in the night
— Bruce Springsteen

Humans are exceptionally bad at calculating odds. We let our limited experience strongly influence our perception of the likelihood of an event. For instance, we tend to vastly overestimate the odds of dying due to a terrorist attack, accidental firearm discharge, or a hurricane, and vastly underestimate causes of deaths like falling, drowning or the flu. The reason for this is that the media constantly reminds us of the dangers of terrorism, hurricanes or flash stories about children who were accidentally shot. Seldom do you find stories about a person drowning, falling, or dying due to a flu being reported on the national news channels. News stories have a tendency to be sensationalized, so as to capture one’s attention and hook in large audiences, and as such they contribute to peoples’ bias when it comes to estimating what is, and is not, likely to occur.

Likewise, overexposure to blissfully happy lottery winners holding up their over-sized checks on TV and in the papers tends to distort peoples’ perception of the likelihood of winning by buying a single ticket. A more mathematical and objective approach to the problem would quickly reveal that the odds are much worse than they appear to be on the surface. [1]

I can’t help but notice that this is exactly what’s happening to the development/startup world, too. It’s the new gold rush. Far too many developers are trying to build the next big social network, be the next Facebook (or YouTube), gather crowds in the millions, in the hopes of being bought for a ridiculous sum of money by a large company. The media loves these sorts of stories.

As a consequence, developers who are trying to build the next Facebook are akin to lottery ticket buyers. A few of them will succeed and win, but most will fail miserably. How many social networks do we really need? The ad-supported model works for some lucky companies that manage to attract huge crowds while keeping their expenses to a minimum (e.g. PlentyOfFish) or which get acquired (e.g. YouTube, who is otherwise costing Google money). Everyone else is burning cash and wasting the money and good faith of VCs in the process.

I fear that a lot of developers are blinded by the light. Their perception of the actual odds of “making it” are skewed by the media’s continuous coverage of million - if not billion - dollar acquisitions and success stories. And some VCs encourage this behavior in the hope of seeing great returns on their investments. After all these are very wealthy people, and they’re are not interested in small scale success.

Aside from the obvious waste of time and resources, I think that many developers are leaving excellent opportunities on the table in order to pursue a highly unlikely outcome. The ratio of the likelihood of making 10 million with a traditional business plan and the likelihood of making a billion a la YouTube, is not proportional to the different quality of life that those amounts can afford you. If you are broke, have $30K in credit card debt, or are middle class, you’ll find that 10 million dollars could increase the quality of your life much more than going from 10 million to a billion ever could. And it’s important to understand that aiming at a more likely, albeit smaller, outcome does not in any way prevent you from “dreaming big” afterward, once you’ve already achieved success with your first (or first successful) venture.

Would you rather enter a draw for a million dollars with a 1 in 20 chance of winning, or a draw for five hundred million dollars with a 1 in 50,000,000 chance? A rational person would opt for the first, yet most startups today are leaning towards the second draw. They do so because they vastly overestimate their odds of being successful with the second draw.

Create a product and charge people for it. Unless you really have to, don’t take VC money, instead consider bootstrapping your company. One of the main advantages of the software world is the exceptionally small amount of capital needed to get started. If you want to stick to Web applications, use the Software as a Service (SaaS) model and make your users pay for the software and service you provide. You’ll have a much smaller audience, less scalability problems and expenses, and a whole lot more revenue and a greater chance of being profitable. Joel Spolsky (with his gorgeous office spaces) makes millions in revenue thanks to a company that, for most of its existence, has sold a web bug tracker. How many free bug trackers do you know of? How many competitors exist in that market? Many, I’m sure. Yet while Joel’s popularity no doubt helped his company, it still showcases how a business can be successful by building a better mouse trap.

But like David Heinemeier Hansson mentioned, there are countless under the radar companies making money like that. [2] If you take your eyes off the spotlight, you’ll see that many companies are very successful at what they do, though they’re not famous or making news headlines. Some of them actually strive to not attract too much attention to their success (often measured in millions of dollars), in order to prevent competitors from springing up.

Regardless of whether you’re a household name or not, you don’t even have to create Web applications to be majorly successful. Mobile apps for smartphones, including the iPhone, come to mind. But good old-fashioned desktop applications keep a wide range of software companies in business. That’s why the skewed perception that you can’t make money with commercial desktop software anymore, or that desktop applications are dead, is utterly ridiculous. As a developer/micro ISV/startup, your chances of making money with well designed desktop software are much higher than building any sort of YouTube, Flickr or Facebook clone.

To understand how skewed our perceptions are, you just need to talk with companies who are open to sharing their software sales statistics. You’ll be shocked by the amount of money that’s being made with relatively common software. Balsamiq makes a UI sketching application that sells for $79. The author managed to make $100K in revenue in the first 5 months, mostly by selling the desktop version of his application. And he is certainly far from being one of the biggest winners in this industry. I mention this though because it shows how a decent idea that’s well executed can quickly bring in revenue when you charge your users. And if you think that $100K in five months is small, let me ask you how many free web sites manage to net a comparable monthly income. If you are looking for larger revenues, check out Omni Graffle, which earned The Omni Group millions of dollars, or set your sights on B2B applications (in which market some applications sell for thousand of dollars a piece).

While many developers are blinded by the light, wise ones with a mind for entrepreneurship are building actual software businesses. I invite you to get out there and do the same.


Footnotes

[1] The concepts I summarized here are much more eloquently illustrated by Dan Gilbert in this TED talk.

[2] David Heinemeier Hansson makes a similar point in a post of his which inspired this one.


If you enjoyed this post, then make sure you subscribe to my RSS Feed.

Random thoughts on software piracy and open source business models

Antonio Cangiano December 28th, 2008

In a recent blog entry, Jeff Atwood discussed the subject of software piracy, bringing up the example of a succesfull indie game called World of Goo, whose estimated piracy rate is about 82% (initially reported as 90%).

Perhaps in an effort to appeal to the ethical side of his readers, Jeff underlines how “this is not a game that deserves to be pirated”, how it’s developed by a team of two indie developers “not another commercial product extruded from the bowels of some faceless Activision-EA corporate game franchise sweatshophow”, and how its low price point makes it affordable (it’s currently on sale for $15).

I understand the psychological reasons behind those arguments, but I don’t feel that their implications are acceptable. No software deserves to be pirated, whether it costs 15 bucks or 3 million dollars, if it’s developed by a single programmer who’ll end up bankrupt or by a huge corporation like Microsoft. Piracy is unlawful, independently from the cost or creator of the object at hand. It is not theft, but it’s still wrong and a violation of a licensing agreement.

Piracy is rampant and the 82 or 90% figure is not far-fetched, I’m sure. The software industry is in fact in a similar position as the music industry. And Jeff gets two important points right. First, there is no point in punishing your legitimate customers with DRM and other inconveniences. It’s OK to “keep honest people honest”, but going out of your way to prevent piracy can do more harm than good, as EA learned through their Spore experience. In December, EA finally released a DRM-free version of Spore on Steam which is considered by many to be an acceptable method for delivering games. According to many people it offers a decent balance between software protection and the level of annoyance for users. Second, it is absurd to assume that the 90% of non-paying users of your software would have bought it if they couldn’t get a hold of a pirated copy. The BSA and RIAA’s astronomical claims in this regard are utter bullshit, which conveninetly ignores both simple economics and reality.

Jeff’s post then goes on to argue that the best anti-piracy strategy is to build a great product and charge a fair price for it. World of Goo itself proves that those two points do not constitute a reliable strategy when it comes to reducing piracy. That game is truly a great product (I don’t like games and even I enjoyed the demo) and it sells for what is arguably a very reasonable price. What’s true though, is that the secret of being successful in the business of software is to create programs that people want, and to price them accordingly so that the legitimate “10% crowd” will be open to buying them. It’s not the best anti-piracy strategy in the sense that independently from the quality and price, people will still pirate your software anyway. It is however the best business strategy, since it’s an appealing offer to your pool of potential buyers.

In addition to that, a third point that actually reduces piracy is offering additional value to genuine users. You could for example reward your customers by providing them with physical goods (e.g. a manual, stickers, posters, etc…), access to an online support community and/or allowing them access to additional sever side services which are not available to illegitimate users.

I think that the lesson here is the same one that can be applied to the music world. Focus on quality, price in a manner that is appealing to your audience, take same basic technical counter measures to keep people honest, and then just ignore piracy. You cannot protect your software, no matter what you do. It’s annoying, but piracy is not going away, so any effort put towards penalizing your paying customers in a futile attempt to combat it, will only hurt your business.

It is also true that other business models exist, even though they are not always applicable to every type of program. For example, Software as a Service (SaaS) takes care of piracy by providing sofware server-side, once users have been charged a fee. This also has the added benefit of enabling a recurring billing cycle that would have been far less welcomed by consumers and small businesses, were it to be applied to a standard shrinkwrapped piece of software. But I don’t feel that piracy is a strong enough argumentation for killing off desktop application development, whenever a desktop app is better suited than a web one for a given job.

Amongst other alternative business models that kill piracy, there are open source ones. The open source world can claim to have accomplished many great things software wise, but it seldom provides viable ways of earning money directly from software. In a rebuttal to Jeff’s post, Dare Obasanjo (a Microsoft Evangelist) provides three open source business models and shows how they rarely fit the reality of B2C shrinkwrapped software. Quoting from his post, these points are:

  • Selling support, consulting and related services for the “free” software (aka the professional open source business model ) – Red Hat
  • Dual license the code and then sell traditional software licenses to enterprise customers who are scared of the GPL – MySQL AB
  • Build a proprietary Web application powered by Open Source software – Google

There may be variations, but those are the main ones. Some people may raise objections against the second point. Why would companies be scared of the GPL? Working for IBM, I’ve experienced a bit of the enteprise world, and let me tell you that Dare is absolutely right. Many companies in the enteprise space are scared by open source software, particularly those programs released under the GPL license (due to its possible viral implications), and wouldn’t touch them with a ten foot pole. I have seen companies spend thousands of dollars on products that were available for free under the GPL license, mainly due to the legal implications of using GPL software.

You’ll notice how none of these models are really applicable to B2C desktop applications. So, as far as desktop applications are concerned, the traditional “Word of Goo approach” is the right way to go.

Moving away from the problem of software piracy, the inadequacy of the main open source model when it comes to the world of shrinkwrapped software brings us to two points that I feel are worth bringing up. Let me prefix this by saying that I believe in the value of open source, but I do see fundamental flaws in the business models surrounding it. The first is that it’s the developer’s right to charge for software they produce. Freedom 2 of Richard Stallman’s free software philosophy (”The freedom to redistribute copies so you can help your neighbor”) works against developers’ best interests. People should be paid for the fruits of their labor, whether copying it is almost free (like in the case of digital content) or not. And this is true for software, songs, videos or any digitally transmissible content.

It’s nice that people decided to volunteer their time to build an empire of free software that’s openly available to everyone. It’s a huge accomplishment that derives from the GNU philosophy, but it should be viewed as the same thing as when lawyers do pro bono work. We shouldn’t expect developers (or lawyers, or architects, or similar professionals) to stop charging for the product of their work. Again, I think it’s great that developers help each other with free tools and libraries for developing programs, but there is no reason why people should be ashamed to sell their programs commercially to businesses and consumers, and make a living off of them.

In fact, selling software, whether desktop, mobile or web based, is a great way of earning money. The proliferation of startups is a testament to many people’s desire to combine their love of the software craft with the possibility of acquiring wealth. “But Antonio, developers can make money by selling support and related services” I hear you say. And this brings us to a second flaw of FOSS. Instead of being paid for writing the best software you can, you get paid for providing technical support to the few people who buy it or the occasional ad-hoc customization. Where is the incentive to provide good documentation and easy to use quality software, when your livehood depends on your customers needing help from you? As a developer, would you rather spend your time building great software or act as a customer service representative? Consulting or providing technical support doesn’t scale nearly as much as software sales do. You can sell 10,000 copies of your application without lifting a finger, but you can’t scale to 10,000 people paying for technical support that easily. Both approaches can bring in similar revenues, but while the first requires an indie developer, the second requires a full blown company with many, many technical agents.

Open source models are fine, when they actually make business sense, but programmers should not be afraid to charge for their software. Trying to avoid piracy by switching to an inferiror business model that gives software away for free is foolish. Accept piracy as a necessary evil, and focus your attention on coding and promoting your commercial applications.


If you enjoyed this post, then make sure you subscribe to my RSS Feed.

DB2 on Mac officially released

Antonio Cangiano December 23rd, 2008

As pre-announced in my two previous posts, DB2 for Mac OS X Leopard is finally available for download. It’s now official, DB2 on Mac is here.

Reflections on DB2 on Mac

Several people, including myself, would happily ditch their virtual machines and start introducing DB2 into their native Mac development stacks. But this milestone represents much more than the immediate implications would have us believe. A few years ago, the idea of giving away DB2 for free would have been met with rejection. Yet, DB2 Express-C came along, and unlike the other “express” databases, it’s a true production-ready DB2 version that can be used free of charge.

Likewise, the idea of having a DB2 version for Mac was unthinkable up to a few years ago. Yet today we finally have a copy of DB2 Express-C for Mac OS X that’s available for download. Aside from this being an acknowledgment of the growing importance of Mac as a development and business platform, I feel it underlines IBM’s ability to change. The desire that a few of us mac addicts had, coupled with reasonable pressure from the community, was sufficient enough to make DB2 on Mac a reality. This matters and appeals to both the developer and the technical evangelist in me.

In the list of downloads, you’ll notice that the Mac download is only 138 MB, versus the 412 MB of Linux’s 64-bit. The reason for this difference is that DB2 Express-C for Mac currently ships in English only, and at this stage it doesn’t include either DB2 Text Search or the Java based tools like the DB2 Control Center. This lighter package is, in my opinion, a welcome side effect of this brand new beta release.

Getting started with DB2 and Rails on Mac OS X

Since the first download went live on Friday, a newer release that includes a guide for installing DB2 on Mac OS X was published and it incorporates a few changes that will make the lives of developers easier, as they approach building and using drivers (e.g. the ibm_db Ruby gem). If you downloaded this beta version over the weekend, do not worry: just grab - and execute - this shell script (e.g. sudo fixlib.sh). If you are downloading DB2 on Mac now, you won’t need this script of course.

Once you’ve downloaded DB2 for Mac OS X Leopard, please proceed to read this PDF guide, which will tell you everything you need to know (and more) about installing DB2 on your Mac, as well as providing extra details. It’s best not to skip over reading this document, as the installation on Mac OS X requires a few more steps than simply running the setup wizard.

With DB2 installed and started (db2start), and the SAMPLE database created (db2sampl), you’re ready to start playing with this power horse. For details about SAMPLE’s structure you can read this article in the InfoCenter.

To run the DB2 console (known as the Command Line Processor or CLP for short), run:

$ db2

To connect to the SAMPLE database, from within the CLP run:

db2 => connect to sample

Unless you get an error, you should now be ready to query the database. For example, run the following query:

db2 => select count(*) from staff

Then to exit from the CLP, simply run:

db2 => quit

If this sanity test worked well you can proceed with installing the ibm_db gem (which includes the Ruby driver and the Rails adapter for DB2). To do so, run the following, adjusting the path to your own username of course:

$ sudo -s
$ export IBM_DB_INCLUDE=/Users/acangiano/sqllib/include
$ export IBM_DB_LIB=/Users/acangiano/sqllib/lib32
$ export ARCHFLAGS="-arch i386"
$ gem update --system
$ gem install ibm_db
$ exit

The ibm_db gem will be installed on your system and is ready to be used. To verify that this is the case, run a small Ruby program with the following code:

require 'rubygems'
require 'ibm_db.bundle'

conn = IBM_DB.connect("sample","my_username", "my_password")
if conn
 stmt = IBM_DB.exec(conn, "select count(*) from staff")
 count = IBM_DB.fetch_array(stmt)[0]
 puts "The staff table contains #{count} records."
else
  puts "Connection error: #{IBM_DB.conn_errormsg}"
end

If everything is fine and dandy, you should see the message “The staff table contains 35 records.”.

Now that Ruby can talk with DB2, we can move on to Rails. Assuming you have Rails 2.2.x installed, run the following to create a sample bookshelf application:

$ rails books -d ibm_db

This generates a Rails application (as usual) with a config/database.yml file customized for DB2. You’ll notice that unlike with MySQL, the database names are not books_development, books_production and books_test. The names are truncated by default due to the fact that DB2 currently only allows for database names that are up to 8 characters long. Feel free to change the development database in database.yml simply to ‘books’.

As a Rails developer you may also be accustomed to running rake db:create to automatically create the development database, yet this feature is not available for DB2 at this point, so instead you can create the database using the db2 command, as follows:

db2 create database books

DB2 allows you to specify all kinds of options for the creation of databases, but in its simplest form, the line above will work just fine.

Once the development database has been created, you should be able to use Rails with DB2 as you normally would with other database management systems. For example, you could scaffold a resource as follows:

$ ruby script/generate scaffold Book title:string
 author:string isbn:string description:text loaned:boolean

Start the webserver with:

$ ruby script/server

And then visit http://localhost:3000/books to perform CRUD operations on book records.

At this stage, the only caveats are that you’ll have to use the db2 command, rather than ruby script/dbconsole, and that you won’t be able to use the rename_column method in your migrations. On the plus side, you’ll have the XML datatype (t.xml in your sexy migrations) at your disposal, to natively store XML documents and retrieve them through XQuery and SQL/XML.

I really hope that you’ll enjoy DB2 on Mac! Don’t be afraid to ask for help, if you need it, in the DB2 Express-C forum. Oh and we are trying to get the word out there. Your help is highly appreciated. You can promote this story on Twitter, Hacker News, Reddit, DZone, StumbleUpon and Digg.


Disclaimer: The opinions expressed in this post are mine and mine alone, and do not necessarily represents the opinions of my employer, IBM.


If you enjoyed this post, then make sure you subscribe to my RSS Feed.

Download DB2 on Mac

Antonio Cangiano December 19th, 2008

I’m glad to announce that DB2 Express-C 9.5.2 for Mac OS X Leopard is available for download. Later tonight, I will provide further details. Meanwhile, enjoy! :)


If you enjoyed this post, then make sure you subscribe to my RSS Feed.

Learn Merb

Antonio Cangiano December 13th, 2008

Merb's logoThe most effective martial artists specialize in their discipline, but are not afraid to cross-train in others. Bruce Lee—arguably the most famous and influential martial artist of the past century—trained first in Tai Chi Chuan, then Gung Fu, and boxing, as well as learning western fencing. The insight taken from so many disciplines led him to create the Jeet Kune Do form of combat.

Programmers are not all that different. Cross-training in other languages and frameworks can only improve one’s overall mastery of the craft. When it comes to Ruby frameworks, the two most popular choices are Ruby on Rails and Merb. They’re often seen as being contenders, but this truly isn’t a zero-sum game; learning both is a very sensible move. They both enable you to write web applications in Ruby, and are somewhat similar, so learning one after you know the other shouldn’t be very challenging. In the many cases people learn Merb after they’ve had some experience with Rails, but either way, acquiring a solid grasp of both frameworks provides developers with extra flexibility. Often people who learn both, will end up mostly just using one or another, depending on their individual preferences. But it’s worth knowing them so as to be able to write both CRUD-style applications that fall within Rails’ solution space, and more complex, edge cases where Rails’ opinions will end up contending with yours.

Among the reasons to give Merb a chance, is its focus on performance, a smaller memory footprint and an extreme level of modularity, which enables you to pick and choose which components you’d like to use.

Merb is not as mature as Rails, of course, but it has reached version 1.0.x and with it developers can have greater confidence in a more stabilized API. Now is perhaps the best moment to get involved and learn more about this rising framework. Not surprisingly though, Merb finds itself in a similar spot to the one that Rails was in a couple of years ago (in terms of weakness of documentation when it comes to getting started). Thankfully, this point is being taken seriously and there’s been some major progress in terms of improving the documentation for Merb. Below are some useful links to get you started with Merb.

Merb has an official API documentation, a wiki, a google group, and a community site called Merbunity for news, projects and tutorials. The irc.freenode.net #merb channel is also a useful and welcoming spot. Furthermore, there is a Peepcode PDF draft called Meet Merb. If you want something even more substantial, on the book front there are several titles coming out in the near future. These include Merb in Action, The Merb Way, Beginning Merb and Merb: What You Need To Know. There is also an open source Merb book, whose development is led by Matt Aimonetti. It’s a work in progress, but probably a very good starting point, which just happens to have the added bonus of being free. And if your interested in Merb, don’t miss InfoQ’s interview with Yehuda Katz, who’s Merb’s lead developer and one of the sharpest guys we have in the Ruby community.

Finally, if you are a professional developer who wants to quickly progress with Merb and bring their skills to the next level, do not miss your chance to attend a three day intensive course on Merb, which is being offered by Yehuda and Matt in Phoenix, AZ between January 19 and 21 (2009). Registration has been open for two days already and 20 out of the 30 available spots have already been snapped up. The remaining seats won’t last more than a day or two, so if you are interested, don’t delay (sign up now and you’ll also benefit from an early registration price).

2009 is almost here, so why not take the opportunity to learn Merb this year?


If you enjoyed this post, then make sure you subscribe to my RSS Feed.

DB2 on Mac to ship before Christmas

Antonio Cangiano December 12th, 2008

PC Vs. MAC, DB2 Edition

This is not an official announcement, but I must share the news with you. DB2 Express-C for Mac OS X Leopard will finally be shipping out (before Christmas), in all likelihood it could be as soon as early next week. You may recall how more than a year ago I blogged about how the work on porting DB2 to the Mac had started. It took admittedly longer than expected but DB2 on Mac is coming, and is absolutely free of charge, of course. The team is still playing with the bubble wrap, but DB2 on Mac is a reality.

What took IBM so long? DB2 is a database management system that’s highly optimized for each platform that it’s available for, so that it can take full advantage of the operating system at hand. In other words, porting DB2 from one platform to another, is not so trivial. The task is made more challenging by the extremely high standards set by IBM. You may be familiar with the whole scandal surrounding MySQL 5.1, which was released despite known fatal bugs. Something like that is simply not acceptable to IBM. Each release of DB2 has to go through a huge amount of regression and performance tests - for months. If the product does not pass all these tests and others, then DB2 is not shipped.

On top of this, a few months ago the decision to ship DB2 Express-C 9.5.2 (rather than 9.5) was made, and as you probably know, DB2 Express-C 9.5.2 was only released a little while ago for other supported platforms. So the first piece of good news is that you’ll get the latest version of DB2 on the Mac. It’s going to be a 64 bit version and will require Leopard to work:

$ db2level
DB21085I  Instance "acangiano" uses "64" bits and DB2 code release "SQL09052" with level identifier "03030107".
Informational tokens are "DB2 v9.5.0.2", "s081205", "DARWIN64", and Fix Pack "2".
Product is installed at "/Users/acangiano/sqllib".

The second good thing is that unlike MySQL 64 bit, you won’t have to jump though hoops to build the Ruby driver due to the fact that the database is 64 bits and Ruby ships on Leopard as 32 bits. We ensured that gem install ibm_db would work out of the box, so you don’t have to.

According to Apple, my personal Mac is broken for good (the video chip is dead), which is very bad timing. But I installed DB2 and played around with it on a work Mac Pro machine. I had some fun with Ruby and Rails as well. This is great news for many categories of developers, including those who have been trying to convince their managers to get them a MacBook Pro but didn’t have much of a case due to the lack of availability of a DB2 version. Now, you’ll have a good excuse to get yourself a Mac. ;-)

Stay tuned for the official announcement and keep in mind that this is going to be a beta (perfect for development purposes) and extra features and performance improvements will be added in future releases.

Disclaimer: The opinions expressed in this post are mine and mine alone, and do not necessarily represents the opinions of my employer, IBM.


If you enjoyed this post, then make sure you subscribe to my RSS Feed.

Reflections on the Ruby shootout

Antonio Cangiano December 10th, 2008

Yesterday I published The Great Ruby Shootout and it quickly gathered a fair deal of attention. It was on the front page of Slashdot, Hacker News, Reddit, and so on. More than 15,000 people came by to read about the results of my comparison between Ruby implementations.

Those numbers looked good but something didn’t add up. Ever since I clicked the “Publish” button, I had a very uneasy feeling about the main shootout figures. They just didn’t seem right. I had a chance, particularly during the writing of my book, to extensively use Ruby on Vista and I can guarantee you that it’s visibly slower than on GNU/Linux. The Phusion team had benchmarked their Ruby Enterprise Edition against Ruby 1.8.6 many times, and found it to be about 25% faster. Yet my results were showing it as twice as fast than Ruby 1.8.7, which in turn is already faster than 1.8.6. To makes things worse, I’ve used Ruby 1.9 and found it to be faster than Ruby 1.8.7, but not 5 times as fast. For most programs that I tried Rubinius didn’t seem faster than Ruby 1.8. And the more I pondered it, the more it began to feel like one too many things didn’t add up.

In the comments, Isaac Gouy reported a couple of issues with the Excel formulas, where a few unsuccessful tests were mistakenly added to the totals. This skewed the results slightly, particularly in terms of penalizing JRuby. However, this wasn’t really it. Sure, the totals were inaccurate, but not enough to fundamentally change the main outcome of those results.

As I was discussing this somewhat unexpected result with Hongli Lai (co-author of Ruby Enterprise Edition), he mentioned that he knew what might be causing this anomaly. I had run the initial test against Ruby installed through apt-get, because I’d made a couple of assumptions. The first was that most people would probably be using the Ruby version that was deployed by their OS’ packaging system in both development and production mode. The second was that the performance of this version would be roughly similar to the one built from scratch. This second assumption would turn out to be highly mistaken.

I decided to run a test using Ruby 1.8.7 built from source as the baseline and added a column for Ruby 1.8.7, installed through apt-get, to the tables. In addition I also corrected the issue pointed out by Isaac. I updated the original shootout with the correct data, and what you see below is a bar chart for the geometric mean of the ratios for the successful benchmarks.

Geometric mean bar chart


Notice how everything makes much more sense now. Ruby 1.9 and JRuby are very close, respectively 2.5 and 1.9 faster than Ruby 1.8.7 (from source) on these benchmarks. Less impressive result sure, but I suspect much more realistic. The results for Ruby Enterprise Edition are in line with the 25% speed increase, if we consider that 1.8.7 is a bit faster than 1.8.6. Rubinius is still slower than MRI for most tests, but it’s improving. Ruby on Windows is slow. So slow in fact, that Ruby on GNU/Linux is twice as fast.

The really big, flashing warning though is what happens when you install Ruby through apt-get. Compiling from source gives you double the speed, according to these tests. I expected a 10/20% increase, not 100%. The gist of it is that prepackaged Ruby is compiled using the option –enable-pthreads and there is the whole issue of shared vs static libraries. But whatever the reason, this is a significant difference. For production use, in light of these results, I feel that it would be foolish to use the slower version of Ruby provided by apt-get/aptitude.

I rectified the results as soon as possible because the last thing I wanted was to mislead the Ruby community or worse still, betray its trust. Major kudos to Isaac for spotting the calculation issue, and Hongli for selflessly pointing out that the excellent Ruby Enterprise Edition results were probably due to the low performance of the Ubuntu’s version of Ruby.


If you enjoyed this post, then make sure you subscribe to my RSS Feed.

The Great Ruby Shootout (December 2008)

Antonio Cangiano December 9th, 2008

The long awaited Ruby virtual machine shootout is here. In this report I’ve compared the performances of several Ruby implementations against a set of synthetic benchmarks. The implementations that I tested were Ruby 1.8 (aka MRI), Ruby 1.9 (aka Yarv), Ruby Enterprise Edition (aka REE), JRuby 1.1.6RC1, Rubinius, MagLev, MacRuby 0.3 and IronRuby.


Disclaimer


Just as with the previous shootout, before proceeding to the results, I urge you to consider the following important points:

  • Engine Yard sponsors this website, and also happens to sponsor, to a much greater extent, the Rubinius project. Needless to say, there is no bias in the reporting of the data below concerning Rubinius;
  • Don’t read too much into this and don’t draw any final conclusions. Each of these exciting projects has its own reason for being, as well as different pros and cons, which are not considered in this post. They each have a different level of maturity and completeness. Furthermore, not all of them have received the same level of optimization yet. Take this post for what it is: an interesting and fun comparison of Ruby implementations;
  • The results here may change entirely in a matter of months. There will be other future shootouts on this blog. If you wish, grab the feed and follow along;
  • The scope of the benchmarks is limited because they can’t test every single feature of each implementation nor include every possible program. They’re just a sensible set of micro-benchmarks which give us a general idea of where we are in terms of speed. They aren’t meant to be absolutely accurate when it comes to predicting real world performance;
  • Many people are interested in the kind of improvements that the tested VMs can bring to a Ruby on Rails deployment stack. Please do not assume that if VM A is three times faster than VM B, that Rails will serve three times the amount of requests per minute. It won’t. That said, a faster VM is good news and can definitely affect Rails applications positively in production;
  • These tests were run on the machines at my disposal, your mileage may vary. Please do test the VMs that interest you on your hardware and against programs you actually need/use;
  • In this article, I sometimes blur the distinction between “virtual machine” and “interpreter” by simply calling them “virtual machines” for the sake of simplicity;
  • Some of the benchmarks are more interesting for VM implementers than for end users. That said, if you think the benchmarks being tested are silly/inadequate/lame, feel free to contribute code to the Ruby Benchmark Suite and if accepted, they’ll make it into the next shootout;
  • Finally, keep in mind that there are three kinds of lies: lies, damned lies, and statistics.


Ruby implementations being tested


All of the Ruby implementations that were able to run the current Ruby Benchmark Suite have been grouped together in one main shootout. This group consists of Ruby 1.8.7 (p72, built from source, and installed through apt-get), Ruby 1.9.1 (from trunk, p5000 revision 20560), Ruby Enterprise Edition (1.8.6-20081205), JRuby 1.1.6RC1 and Rubinius (from trunk), all of them were tested on Ubuntu 8.10 x64, plus Ruby 1.8.6 (p287. from the One-Click Installer) on Windows Vista Ultimate x64. The hardware used for this benchmark was my desktop workstation with an Intel Core 2 Quad Q6600 (2.4 GHz) CPU and 8 GB of RAM. JRuby was run with the -J-server option enabled and by specifying 4 Mb of stack (required to pass certain recursive benchmarks). The best times out of five iterations were reported, and these do not include startup times or the time required to parse and compile classes and method for the first time. Several of these new tests also have variable input sizes.

The MagLev team provided me with an early alpha version of MagLev for the purpose of testing it in this shootout. Since this VM is not mature enough yet to run the Ruby Benchmark Suite, I used custom scripts against an old version of the Ruby Benchmark Suite on Ubuntu 8.10 x64. MagLev was tested, along with Ruby 1.8.6 (p287), on the same machine as that of the main shotoout, though the benchmarks were different (even when they had the same names as the ones in the main shootout).

MacRuby 0.3 and Ruby 1.8.6 (p114) were tested on Mac OS X Leopard using the previous version of the Ruby Benchamrk Suite. Since my MacBook Pro died (sigh), for this benchmark I used a Mac Pro, with two Quad-Core Intel Xeon 2.8 Ghz processors and 18 GB of RAM.

IronRuby (from trunk) and Ruby 1.8.6 (p287) were tested on a previous version of the Ruby Benchmark Suite on Windows Vista x64 on the same quad-core used for the main shootout. The MagLev, MacRuby and IronRuby numbers reported here were the best times out of five iterations, and include startup time. IronRuby on Mono was not tested because I couldn’t get it to work on my machine, despite having tried several IronRuby versions and two different Mono versions. Please also notice that Ruby 1.8.6 (p287) was tested twice on Windows, once for the main shootout against the current Ruby Benchmark Suite, and a second time to compare it with IronRuby, against the old benchmarks.

Note: As tempting as it is, do not compare implementations that belong to different shootouts directly to one another. It would be very disingenuous to directly compare VMs tested with different benchmarks and/or different machines. The only comparisons that make sense are the ones within each of the four groups.


Main shootout


The following table shows the run times for the main implementations. The table is fairly wide, so you’ll have to click on the image to view the data in a new tab.

Main Shootout's times


Green, bold values indicate that the given virtual machine was faster than Ruby 1.8.7 on GNU/Linux (our baseline), whereas a yellow background indicates the absolute fastest implementation for a given benchmark. Values in red are slower than the baseline. Timeout indicates that the script didn’t terminate in a reasonable amount of time and was (automatically) interrupted. The values reported at the bottom are the total amounts of time (in seconds) that it would take to run the common subset of benchmarks which were successfully executed by every virtual machine. When our baseline VM generated an error, others were used, starting with Ruby 1.8.7 on Vista (for color coding purposes only).

The following image shows a bar chart of the total time requested for the common subset of successfully executed benchmarks (those whose names are in blue within the tables):

Total Time


More interestingly, the following table shows the ratios of each Ruby implementation based on the baseline (MRI):

Main Shootout's ratios


The baseline time is divided by the time at hand to obtain a number that tells us “how many times faster” an implementation is for a given benchmark. 2.0 means twice as fast, while 0.5 means half the speed (so twice as slow). The geometric mean at the bottom of the table tells us how much faster or slower a virtual machine was when compared to the main Ruby interpreter, on “average”. Just as with the totals above, only those 101 tests, which were successfully run by each VM, where included in the calculation.

More concisely, here is a bar chart showing the geometric mean of the ratios for the various implementations tested:

Geometric Mean


I prefer to let the data speak for itself, but I’d like to briefly comment on these results. Just a few quick considerations.

Working off of the geometric mean of the ratios for the successful tests, Ruby MRI compiled from source is twice as fast than the Ruby shipped by Ubuntu, and by the One-Click Installer on Vista. The huge performance gap between ./configure && make && sudo make install and sudo apt-get install ruby-full should not be taken lightly when deploying in production. These numbers also reveal what most of us already knew: Ruby is particularly slow on Windows (800-pound gorillas in the room, or not).

Performance-wise Rubinius has more work left to be done to catch up with Ruby 1.8.7 and other faster VMs, particularly if we take into account the number of timeouts. But it has improved in the past year and I think it’s on the right track.

Ruby Enterprise Edition is about as fast as Ruby 1.8.7 compiled from source, which is reasonable considering that it’s a patched version of Ruby 1.8.6 aimed at the reduction of memory consumption (a parameter which wasn’t tested within the current shootout).

Speaking of excellent results, Ruby 1.9.1 and JRuby 1.1.6 both did very well. It looks like we finally have a couple of relatively fast alternatives to what is a slow main interpreter. According to the results above, and with the exception of a few tests, on average they are respectively 2.5 and 2 times faster than Ruby 1.8.7 (from source), and 5 and 4 times faster than Ruby 1.8.7 installed through apt-get on Ubuntu or Ruby 1.8.6 installed through the One-Click installer on Vista. Again, this does not mean than every program (particularly Rails) will gain that kind of speed, but these results are very encouraging nevertheless.


MagLev


There has been a lot of buzz about MagLev since Avi Bryant’s first benchmarks were shown a few months ago. Here we finally see it being put to the test. The table below shows the times obtained by running MagLev and Ruby 1.8.6 (p287) against MagLev’s set of benchmarks based on the old Ruby Benchmark Suite:

MagLev's times


And here are the ratios:

MagLev's ratios


You’ll notice how MagLev swings from being much faster than MRI to being much slower. I believe there is much room for improvement, but at almost twice the speed of MRI, these early results are definitely promising.


MacRuby


These are the times for MacRuby 0.3 on Mac OS X 10.5.5:

MacRuby's times


And of course, the ratios against the MRI baseline:

MacRuby's ratios


MacRuby is relatively new, so these are not bad results. More work is required, but it’s a good start.


IronRuby


Finally (I promise these are the last ones), here are the two tables for IronRuby and Ruby 1.8.6:

IronRuby's times
IronRuby's ratios


IronRuby is slower than Ruby 1.8.6 on Windows, which in turn is much slower than Ruby 1.8.7 on GNU/Linux. This is not very surprising. This project has been focusing on integrating with .NET and catching up with the implementation of the language by improving the RSpec pass rate, as opposed to performing any optimizations and/or fine tuning (as per John Lam’s presentation at RubyConf 2008). We’ll measure its improvements in the next shootouts.


Conclusion


Overall I think these are great results. Ruby 1.8 (MRI), with its slowness and memory leaks, belongs to the past. It’s time for the community to move forward and on to something better and faster - and we don’t lack interesting alternatives to do so at this stage.

I hope that for the next shootout, MagLev, MacRuby and IronRuby will be able to run the benchmark suite, so that they can all be tested and directly compared with each other. I also hope to include Tim Bray’s XML benchmark, some sort of “Pet Shop” sample Rails and Merb application and, above all, include memory usage statistics.

You can find the Excel file for the main shootout here. That’s all for now. Feel free to comment, subscribe to my feed, share this link and promote it on Hacker News, Reddit, DZone, StumbleUpon, Twitter, and Co. Putting together this shootout was a lot of work, so I definitely appreciate you spreading the word about it. Until next time…

Update (December 10, 2008): This article has been updated to correct a couple of major issues with yesterday’s results. I adjusted my commentary as well, in light of the corrected figures.


If you enjoyed this post, then make sure you subscribe to my RSS Feed.

IBM’s XML Challenge (lots of prizes inside)

Antonio Cangiano December 1st, 2008

The XML ChallengeIBM is holding a series of challenges centered around XML. The whole event is labeled The XML Challenge (subtitle: Search for the XML superstar). Rockstar references aside, this is a pretty cool initiative that can provide you with some freebies as well as high quality prizes if you win any of the available contests.

The Contests

What I say below applies to US and Canada, as the contest is being held worldwide in 30 countries separately, and each of these may have different individual contests and prizes as well. In fact, the first thing you’ll see when you visit the xmlchallenge.com site should be a popup that prompts you to select your country.

For US/Canada there are 5 contests: Video, Gadget, Query, Porting and XML. The Video Contest consists of creating a funny/creative/cool video about XML, XQuery and/or DB2. The Gadget Contest is about developing a downloadable gadget/widget that leverages DB2. The Query Contest requires you to use XQuery to query a database and come up with the answer to five questions. The Ported Application Contest is all about porting an existing application to DB2 or creating a new one that uses DB2. And finally, the XML Contest asks you to build a useful, user-friendly XML application from scratch. The last two contests can be approached as a team or as an individual. The Query, Ported App and XML contests start today!

You don’t have to participate in all of them, of course. But by participating in any of these you gain points, and there are six badges that you can obtain: XML Challenger, XML Rookie, XML Whiz, XML Star, XML Master and XML Grand Master. I feel so nerdy reporting this. The XML Grand Masters will be enrolled in a draw for an additional prize.

The Prizes

Speaking of prizes, let’s see what goodies are up for grab. There will be a few give-aways just for participating. For example, the first 500 participants will receive an XML Challenge T-Shirt as well as a Rubick’s Cube. The first 1000 to complete the Quick Quiz during registration will also receive a free T-Shirt. But let’s move on to the more substantial prizes.

For the sake of awarding prizes, the contestants will be split in two groups, students and professional developers.

The Video Contest: The deadline is December 17, 2008 and a few selected winners will receive an 8GB iPod Video Nano (for both students and developers).

The Gadget Contest: The deadline is December 17, 2008 and the winning students grab Canon Powershot SD870 Cameras, while winning developers get 80GB Zunes.

The Query Contest: After you register, you’ll have 24 hours to submit your answers. The first 50 successful participants for each group (for a combined total of 100) will receive a 1GB USB key, while all the contestants with the right answers will be entered in a draw for a grand prize. This is a Playstation 3 40GB for the students, and a 32GB iPod Touch for the developers;

The Ported Application Contest: The deadline is January 31st, 2009. The winning team or individual amongst the students will receive an HD Pavilion HDX Notebook, while the winning developer will score a Lenovo IdeaPad U110. The second prize for both of the two groups will a Garmin nüvi GPS.

The XML Contest: The deadline is January 31st, 2009. The 1st prize for each group will be a high-end 17″ Alienware Laptop (two laptops will be awarded in total). The second prize for both groups will be a Nintendo Wii (again, two in total).

Finally, two lucky XML Grand Masters, one developer and one student, will receive a Bose Wave Radio II.

I hope you consider enrolling now and best of luck! If you need some help with getting started with DB2 Express-C, you can download the free e-book which is available in several languages. Oh, and finally we have an Italian version as well.

Promote this post

Hey, would you give me a hand in promoting this post? If you are in the US or Canada, and mention & link to this post from your blog, you’ll receive a free XML Challenge T-Shirt and a Rubick’s Cube as well. All you have to do is send me an email (to acangiano at gmail.com) with a link to your blog entry, as well as your shirt size and complete mailing address. Thank you!


If you enjoyed this post, then make sure you subscribe to my RSS Feed.

Next »