Archive for February, 2008

On Rails and DB2

Antonio Cangiano February 29th, 2008

James Governor wrote an interesting post entitled “On douchebags, DB2 and Rails: DHH not pragmatic enough?”. It’s partially in response to Jeff Atwood’s article “Douchebaggery” which, in turn is a reply to an old post by David Heinemeier Hansson. So please excuse me if I limit the entertainment value of this post by not calling anyone names. :)

Amongst a few sensible comments, James also argues that David may not be nearly opinionated and pragmatic enough after all:

What’s bothering me though is that perhaps Hansson isn’t opinionated enough. You see Rails targets MySQL. MySQL is the Rails database. Forget all the configuration crap and just use MySQL. There is no need to consider other database platforms. LAMP may or may not be a literal but MySQL is utterly embedded in the Rails way. In Rails database is not a design decision, its a given.

As most of you will already know, Rails was extracted from a real web application called Basecamp. This approach is thought to be the first sign (scientists believe there are 37 of them :-P) that we are dealing with a pragmatic framework. Its features and design choices weren’t defined a priori by David, but they were implemented to satisfy particular needs (and opinions) which Hansson and his team had, while developing database driven web applications.

When analyzed under a proverbial microscope, one sees that Rails didn’t actually invent anything new. Neither the MVC paradigm nor the Active record pattern were especially innovative. They’d been well known for years in the Computer Science world and even adopted (to a certain extent) within the enterprise realm. Yet, Rails was a revolutionary piece of software. What Rails did was to tie all the loose ends together into a neat, tidy package. On top of that achievement, it clearly favored simplicity, pragmatism and developer productivity over anything else. Forget about repetition and countless XML configuration files, it was opinionated software that made web development, or at least most of it, rather trivial. Less choices for the developer (the Paradox of Choice comes to mind) turned out to be a significant gain in terms of “ease of use”, which in turn bolstered Rails’ adoption.

Rails wasn’t supposed to be the answer to all web development needs everywhere - just to most of them. This means that the making of such a DSL involved a lot of assumptions. In turn, there are people who don’t use Rails due to these assumptions, and projects where adopting Rails wouldn’t probably be the smartest choice either. But Rails worked for David and 37Signals, and in doing so, it also happens to work very well for thousands of other developers around the world with similar development needs, who’re facing analogous challenges. David did his best to create a usable and enjoyable web framework, and succeeded spectacularly in responding to the needs of most web developers.

For this, Rails constantly gets accused of being far too opinionated. How many times have we seen comments about the lack of prepared statement support, that there’s no connection pool, no thread safety, no unicode, about deployment issues and so on. James’ comments are interesting because they’re entirely unusual. He’s arguing that Rails is not opinionated enough, and that in itself, is an opinion worth replying to (and not simply because he explicitly called upon my views on the subject at hand within his post):

Why in the Getting Started section of the book, do the authors even mention DB2 as a potential target database, even if its just an aside (Rails does allow exceptions it just doesn’t like them)? I mean if a DB2 bigot came to ask Hansson for a job at 37Signals he would be shit out luck wouldn’t he? You can imagine the laughs- “this guy wants to back end Rails apps with DB2 - what a ***tard.”

I know Anthony would disagree with me, and arguably Rails abstracts the database decision, so Rails is not tightly coupled to MySQL. But that’s not the point - we’re not talking about JDBC here.

I’m pretty sure that David doesn’t care about shared hosting, prepared statements, support for stored procedures, JRuby, DB2 or the “enterprise world”. It’s not his way of creating and going about web development, plus he really doesn’t have any need of these for his own purposes. Therefore don’t expect support for these points from him. However, that said, David is also being fairly reasonable with those who think differently than he does.

David’s message is: if you need these type of features, then please by all means, go ahead and implement them yourself. Rails’ core team has enabled the development community to extend Rails to fit a different kind of bill. While the policy in terms of accepting patches may be strict and not particularly welcoming of every single contribution, the plugin architecture coupled with Ruby’s flexibility and extensibility, allows the injection of substantial features which in turn significantly affect Rails. There are limits of course, and some people even preferred to create a similar framework from scratch, because they found it easier and neater to do so than to completely restructure and refactor Rails’ core.

David clearly favors opinionated conceptual integrity and the act of defining a set of sensible conventions over abstraction/modularity/extensibility, which would be regarded highly by those who require enterprise features or view web development in a markedly different way. He did not intentionally set out to block these things either though, and that’s worth remembering. He proposed his take on things, but he certainly did not impose it on anyone else. Doing so simply would not make sense (and might have required more work on his part in some cases).

Take ActiveRecord for example: it’s not a perfect abstraction, given that its Ruby implementation makes several MySQL centric assumptions in regards to SQL generation, which makes development of alternative adapters for other databases less than trivial (particularly for so called enterprise databases). But it’s still essentially a database agnostic pattern that would be almost harder not to implement as such. David’s implementation poses a few challenges here and there, but we’re still very much within the realm of possibility.

David might only use Rails with MySQL, and he may only slightly care about the subset of features which are common to all databases (including SQLite, the current default database) and nothing more, but at the same time he clearly realizes that we are not all Davids. He’s not willing to warp his creation in order to make everyone happy. That’s not David’s style. However, if other databases or options are available to Rails developers, why not mention them as a testament of the interest surrounding Rails?

The fact that DB2 is a viable (free of charge) option for Rails developers doesn’t affect the Rails core team in a perceivable way, and it still offers the possibility of using Rails to a whole new segment of programmers. Which bring us to the next point:

Anthony says “an increasing number of Rails developers adopting DB2 as their database of choice”. I would like to know more about this “increasing number” from Anthony […]

While perhaps not the most vocal bunch in the blogosphere, there are many developers and companies interested in DB2 on Rails. In my experience, I’d say that there are three main likely/common scenarios (of DB2 on Rails users):

  1. Companies who have already widely adopted DB2 in their infrastructure and are now considering Rails for some of their projects. The developers involved are usually very excited about DB2 on Rails, because it means that they have a rare chance to use Ruby/Rails rather than Java/J2EE or similar technologies, with a data server that they already have a good understanding of. While some web developers choose the framework first and consider the database at a latter stage, most companies don’t work that way.
  2. Developers who have DB2 skills and are looking for a good framework for developing web applications. They have no reason to use any other database but DB2.
  3. Companies or developers who understand the technological advantage offered by DB2’s features (e.g. utmost speed, pureXML, compression, replication, high availability, affordable 24/7 support, etc…) and therefore choose it for their Rails based projects.

Rails doesn’t exploit advanced database features, and as a result, it doesn’t provide any further compelling reasons to adopt other databases aside from MySQL. It takes a discerning developer to realize that DB2 has much more to offer and that there are concrete benefits to adopting it - even when developing with Rails.

It’s also true that working with an independent adapter in Rails will generally be less “comfy” than using SQLite/MySQL/PostgreSQL. For example, certain tasks are not implemented for DB2 by the Rails team, and as such won’t be available. But it’s a small price to pay. In order to change this, I decided to start a project in my spare time (not an IBM project), called DB2 Extended. From the RubyForge description: “db2_extended is a Rails plugin that extends support and enhances the user experience for developers who adopt DB2 as their backend for Rails applications”. This project is currently empty, but I hope to perform the first svn commits over the weekend. I’ll start with simple things that close the gap (like making rake db:create work for DB2) and incrementally, I’d love to throw in DB2 specific functionalities and enhancements as well. Being a personal, open source project, I will more than happily accept patches and contributes from the DB2 on Rails community. Hopefully in a short period, we’ll have a plugin that makes web development with Rails and DB2 even more fun and productive.

As usual, these opinions are only mine and don’t necessarily represent my employer’s (IBM) positions, strategies or opinions.

IBM releases DB2 adapter for SQLAlchemy

Antonio Cangiano February 13th, 2008

A while ago I informally announced IBM’s intention to develop an SQLAlchemy adapter for DB2 and Informix IDS. Today, I’m happy to inform you that we have a first working release for DB2 on Linux, Unix and Windows (LUW). Support for Informix IDS is next (almost done), and after that, it will be System i and z/OS’ turn.

This release will surely excite those Pythonistas who can appreciate DB2 for what it is: one of the most powerful data servers in the world. Which, in its Express-C version, also happens to be gratis (”free as in free beer”). But there is more to it than just that.

IBM has in fact created a project on Google Code, for supporting Python development with IBM Data Servers. Aside from downloads and SVN access, this gives the project a nice public bug tracker which was missing up until this point. A Google Group was also created in order to have an easy to follow support mailing list, and I invite you to join it now.

With the switch to Google Code, there was also an update to the Python drivers (now version 0.2.5), which contain a few improvements and a bug fix for the egg that wasn’t working properly on Linux.

The project currently hosts the following components:

  1. The ibm_db Python Egg for Linux and Windows which contains:
    • The ibm_db driver: a C extension module that wraps IBM’s Data Server Driver for ODBC and CLI APIs and provides a Python client interface for the DB2 and Informix IDS databases.
    • The ibm_db_dbi: a Python coded module that relies on the ibm_db Python driver, and complies with the DB-API 2.0 specs.
  2. The ibm_db_sa: a Python coded adapter which implements the SQLAlchemy 0.4 API specification.

Please use the driver and/or the adapter for SQLAlchemy and let us know if you encounter any issues or have any feedback about it.

Essential guide to the Ruby driver for DB2

Antonio Cangiano February 8th, 2008

This guide is a basic tutorial that’ll quickly get you started with the ibm_db Ruby driver. You’ll be querying DB2 from Ruby in no time, but please bear in mind that this document is not meant to be thorough (use the API reference for this purpose). Also, this is a guide for the IBM provided driver and not the ruby-db2 community provided one, and it introduces you to the Ruby driver only not the Rails adapter.

I strongly encourage you to use the IBM one since, while still open source, it is constantly developed and maintained by IBM and it has an active community of people using it in production. The ruby-db2 driver provided by the community hasn’t been updated for over 2 years and I’m not sure whether it’s abandoned or not.

The latest release of ibm_db is only 2 months old (at the time of writing); it’s very stable and fast, and utilizes the common API adopted by IBM for dynamic language drivers (such as PHP, Perl and Python). This means that if you learn one, you’ll easily be able to adapt your skills to another language. My advice is to stick with IBM’s driver.

Please follow these instructions for installing DB2 and IBM’s Ruby driver. Should you encounter any issues with the installation process, please refer to the DB2 Express-C forum and the Rubyforge forum.

When facing difficulties with the installation of DB2, use the former, while if you have problems with the ibm_db driver, use the latter which is monitored by, besides a group of helpful members of the community, myself, Alex Pitigoi (who is the current maintainer of the Rails adapter) and some other members of the team who developed the driver. We’ll carefully listen to your questions and usually provide help or a resolution in a timely fashion.

Before we get started let me clarify one point about the name of the driver. It was formerly known as IBM_DB2, but IBM decided to drop the 2 at the end, in order to reflect the intention to provide a driver that not only works with DB2, but also with other selected IBM Data Servers, like Informix Dynamic Server version 11 (Cheetha), IBM Cloudscape or Apache Derby.

Loading the IBM_DB driver

In order to connect to DB2 LUW from Ruby, we first need to “require” the driver. Linux and Unix users can run:

require 'rubygems'
require 'ibm_db'

On Windows, use:

require 'rubygems'
require 'mswin32/ibm_db'

Please note that you won’t need the first line if you haven’t installed the driver through RubyGems or if you already have the RUBYOPT environment variable set for rubygems. The execution of the second line will return true if the driver has been successfully “required”, false if it was already loaded, and it will raise an exception otherwise. From now on I’ll use ‘ibm_db’, so if you are using Windows, change it accordingly to ‘mswin32/ibm_db’.

Connecting to a DB2 database

The IBM_DB module contains a series of methods and classes that can be used to interact with DB2 from Ruby. The first method that we’ll use is IBM_DB::connect. This allows us to connect to both local and remote databases. Those who have DB2 Connect installed will even be able to connect to DB2 on iSeries and z/OS.

More technically, connecting to a local database means establishing a cataloged connection by specifying a database alias that exists in the DB2 client catalog. For the first few examples we’ll use the SAMPLE local database/alias. In case you don’t have it already, you can create it by running db2sampl.

The following snippet connects to the SAMPLE local database:

require 'rubygems'
require 'ibm_db'

conn = IBM_DB::connect("sample", "db2inst1", "mypassword")

if conn
  puts "We're connected!"
  IBM_DB::close(conn)
else
  puts "There was an error in the connection: #{IBM_DB::conn_errormsg}"
end

The following is the equivalent code for connecting through TCP/IP to a remote database (in reality it obviously connects to local databases too, but locally it’s usually better to use an alias):

require 'rubygems'
require 'ibm_db'

conn = IBM_DB::connect("DRIVER={IBM DB2 ODBC DRIVER};DATABASE=sample;\
                       HOSTNAME=localhost;PORT=50000;PROTOCOL=TCPIP;\
                       UID=db2inst1;PWD=mypassword;", "", "")
if conn
  puts "We're connected!"
  IBM_DB::close(conn)
else
  puts "There was an error in the connection: #{IBM_DB::conn_errormsg}"
end

For an “uncataloged” connection to a database through TCP/IP, the complete connection string is passed to the connect method. Beside the usual arguments (database, username and password), in this case we also need to specify the hostname or IP address of the dataserver, and the port number (defaulted to 50000 on Linux/Unix).

In both cases (local or remote), the connect method returns a connection handle resource when the connection is successfully established, while it returns false for any failed attempts. This implies a couple of important things. The connection outcome should be verified with a conditional if statement, because it won’t raise any exceptions for you if there are problems connecting. Luckily you can use IBM_DB::conn_errormsg to obtain a string containing the error message returned by DB2. For example, in our snippet, if we were to use the wrong password, we would receive an error message similar to the following:

There was an error in the connection: [IBM][CLI Driver] SQL30082N Security processing failed with reason "24" ("USERNAME AND/OR PASSWORD INVALID"). SQLSTATE=08001 SQLCODE=-30082"

Error messages are usually descriptive enough to let you understand what the problem is. Should you find yourself stuck though, you can first try searching through the Information Center, then Google it and if you still can’t solve the issue, ask directly in the DB2 forum.

The following is a common error with a message that is rather puzzling:

"[IBM][CLI Driver] CLI0133E Option type out of range. SQLSTATE=HY092 SQLCODE=-99999"

If you get this error, it usually means that you are using a version of DB2 that is too old. Install the latest FixPack or the latest version of DB2 (currently 9.5) to resolve the problem.

Please note that while it’s a good idea to use the method IBM_DB::close to close connections that are no longer needed, the driver would have closed the connection automatically for you upon finishing the execution of the script.

The IBM_DB::connect method also accepts an optional parameter that affects the behavior of the connection. For example it’s possible to turn on or off the autocommit, specify the case for the column names in the resultset, or even define a scrollable cursor. As usual, the API reference has all the details, but here is an example that turns off autocommit and specifies the option for retrieving lowercase columns:

require 'rubygems'
require 'ibm_db'

conn = IBM_DB.connect("sample", "db2inst1", "mypassword",
                      {IBM_DB::SQL_ATTR_AUTOCOMMIT => IBM_DB::SQL_AUTOCOMMIT_OFF,
                       IBM_DB::ATTR_CASE => IBM_DB::CASE_LOWER})
if conn
  puts "We're connected!"
  IBM_DB::close(conn)
else
  puts "There was an error in the connection: #{IBM_DB::conn_errormsg}"
end

Finally, before moving on, note that there is a similar method called IBM_DB::pconnect that establishes persistent connections.

Executing queries

The following snippet executes a query and prints the retrieved records:

require 'rubygems'
require 'ibm_db'

if conn = IBM_DB::connect("sample", "db2inst1", "mypassword")
  sql = "SELECT * FROM EMPLOYEE"
  begin
    if stmt = IBM_DB::exec(conn, sql)
      while row = IBM_DB::fetch_assoc(stmt)
        puts "#{row['FIRSTNME']} #{row['LASTNAME']}: #{row['EMPNO']}"
      end
      IBM_DB::free_result(stmt)
    else
      puts "Statement execution failed: #{IBM_DB::stmt_errormsg}"
    end
  ensure
    IBM_DB::close(conn)
  end
else
  puts "Connection failed: #{IBM_DB::conn_errormsg}"
end

It may seem complicated because we added some extra “padding” to the code, but executing queries with the driver is very straightforward. The reason why we have these extra conditions and error checks, is because both the connect and the exec methods don’t raise errors when they fail. Let’s analyze in detail the code above.

We first attempt to establish a connection, placing the result in the conn variable. The returned value can either be a valid connection handle or false. If it’s a false value, it means that the connection attempt failed, therefore we print the error message retrieved with IBM_DB::conn_errormsg.

If the connection was successful we then proceed to execute a query through the method IBM_DB::exec, which accepts two arguments: the connection handle stored in conn, and a string representing the SQL query. This, just like the method connect, will return false if DB2 returns an error and therefore the execution of the query has failed. In that case we print the error message with the IBM_DB:stmt_errormsg method. If the SQL statement was issued successfully the exec method returns a statement resource that we use to fetch the records (of course, for SELECT queries).

In order to fetch the results, we loop using the IBM_DB::fetch_assoc method that retrieves one record at a time. The loop will be interrupted when there’s no more rows left in the resultset. The IBM_DB::fetch_assoc method returns a hash whose keys correspond to the columns. By default, SQL columns are case insensitive, so unless you specified otherwise in a connection attribute, the hash will have uppercase keys.

The table EMPLOYEE has several columns, amongst others FIRSTNME (not a typo on my part), LASTNAME and EMPNO. In this snippet we print all the employees’ first and last names, along with their employee numbers. Finally, we clean up the resources associated with the resultset by executing IBM_DB::free_result. This operation, just like closing the connection, would have happened automatically if not specified, when the script ends.

A series of alternative (and similar) methods exist: fetch_array, fetch_both, fetch_object and fetch_row. Again, the API reference has all the details. They are very easy to use if you understand the manner in which we used fetch_assoc in the snippet above.

Parameterized queries

In the previous section we showed how to run an SQL query to interrogate a DB2 database from Ruby. While the approach adopted so far is perfectly acceptable, in some instances there is a better way. Chances are that sometimes you’ll find yourself repeating the same query over and over.

In some other scenarios, you may also run a series of queries which follow the same pattern except that the values provided vary from query to query. Consider for a moment the following queries:

SELECT * FROM USERS WHERE AGE > 20;
SELECT * FROM USERS WHERE AGE > 30;
SELECT * FROM USERS WHERE AGE > 40;
SELECT * FROM USERS WHERE AGE > 50;
SELECT * FROM USERS WHERE AGE > 60;
SELECT * FROM USERS WHERE AGE > 70;

These are identical except for the cut off value for the column AGE. In such cases, the best approach is to use parameterized queries. For those who aren’t too familiar with the expression, parameterized queries are queries where the SQL statement contains parameters whose values will be passed to the database without having to dynamically create the complete SQL query within your program.

This approach is not only less error prone and much more secure (e.g. in avoiding SQL Injections), but it can also drastically improve performance.

The following snippet shows an example of their usage:

require 'rubygems'
require 'ibm_db'

if conn = IBM_DB::connect("sample", "db2inst1", "mypassword")

  sql = " SELECT FIRSTNME, LASTNAME FROM EMPLOYEE WHERE SALARY BETWEEN ? AND ?"

  stmt = IBM_DB::prepare(conn, sql)
  values = [15000, 50000]

  begin
    if IBM_DB::execute(stmt, values)
      while row = IBM_DB::fetch_array(stmt)
        puts "#{row[0]} #{row[1]}"
      end
    else
      puts "Execution failed: #{IBM_DB::stmt_errormsg(stmt)}"
    end
  ensure
    IBM_DB::close(conn)
  end
else
  puts "Connection failed: #{IBM_DB::conn_errormsg}"
end

Let’s take a look at what’s going on here. We connect as usual but this time around we place question marks in the SQL query rather than the actual values. Then we prepare the statement through the method IBM_DB::prepare which accepts the connection handle and the parameterized SQL query as required arguments. The API reference for the driver has details for an optional third parameter that changes the way the cursor operates.

Behind the scenes, DB2 creates an optimized access plan for retrieving data with that statement. Subsequently issuing the prepared statement with IBM_DB::execute (not IBM_DB::exec) enables the statements to reuse that access plan and avoids the overhead of dynamically creating a new access plan for every statement we issue. Please note that we could bind the parameters explicitly by using the method IBM_DB::bind_param, which allows us to specify the parameter type (input, output, input-output or file), data type, precision, scale and so on. However this is usually only necessary for stored procedures where we intend to obtain information through output parameters.

If all the parameters are input ones, we can simply pass an array of values (aptly named ‘values’ in our snippet) to the IBM_DB::execute method. Notice that in the code above, we arbitrarily decided to use the IBM_DB::fetch_array method rather than IBM_DB::fetch_assoc as used in the previous one. Unlike the latter, with the former we use the numeric index 0 and 1 to access the values contained in the first two columns of the current record, since the method returns an array and not an associative array (a hash).

Stored Procedures

The ibm_db ruby driver fully supports stored procedures. Let’s start with an extremely simple example that is technically no different than running any other query.

require 'rubygems'
require 'ibm_db'

if conn = IBM_DB::connect("books", "db2inst1", "mypassword")

  drop_column_sql = "ALTER TABLE USERS DROP SAMPLE_COLUMN"
  reorg_sql = "CALL ADMIN_CMD('REORG TABLE USERS')"

  if IBM_DB::exec(conn, drop_column_sql)
    puts "Column dropped"
  else
    puts IBM_DB::stmt_errormsg
  end

  if IBM_DB::exec(conn, drop_column_sql)
    puts "Reorg successful"
  else
    puts IBM_DB::stmt_errormsg
  end

else
  puts "Connection failed: #{IBM_DB::conn_errormsg}"
end

The snippet above drops a column from a table. After dropping a column, DB2 requires the issuing of a REORG command. Unfortunately, IBM_DB::exec can only execute valid SQL statements and not DB2 command line processor (CLP) commands. Luckily though, the stored procedure ADMIN_CMD allows us to execute certain administrative commands, including REORG.

So, as you can see, in simple cases like this we can just use pass an SQL statement in which we call a stored procedure or user function to IBM_DB::exec. In practice, things are not always this straightforward. If we are using stored procedures, chances are that we need to handle output parameters too.

Let’s assume that we have a stored procedure called SALES_BY_TITLE which requires three parameters. The first parameter is the name of the author, the second one is the title of a book. Imagine that the second parameter is an input-output one, because we can provide a partial title for the book, and the procedure will provide us with the complete title if it exists.

The third (output) parameter will tell us the number of copies sold to date. From Ruby, we can handle this scenario quite easily:

require 'rubygems'
require 'ibm_db'

conn = IBM_DB::connect("books", "db2inst1", "mypassword")

if conn
  sql = "CALL SALES_BY_TITLE(?, ?, ?)"
  stmt = IBM_DB::prepare(conn, sql)

  author = "Allen Ginsberg"
  title = "Cosmo%"
  copies = 0

  IBM_DB::bind_param(stmt, 1, "author", IBM_DB::PARAM_IN)
  IBM_DB::bind_param(stmt, 2, "title", IBM_DB::PARAM_INOUT)
  IBM_DB::bind_param(stmt, 3, "copies", IBM_DB::PARAM_OUT)

  if IBM_DB::execute(stmt)
    puts "Title: #{title}"
    puts "Copies: #{copies}"
  else
    puts IBM_DB::stmt_errormsg(stmt)
  end
else
  puts "Connection failed: #{IBM_DB::conn_errormsg}"
end

As shown before, we CALL the stored procedure and use question marks as placeholders for the parameters in the SQL statement. Immediately after having prepared the statement, we need to initialize the three variables author, title and copies, and bind them to the prepared statement through the IBM_DB::bind_param method. The second argument of this method is the position of the parameter (starting from 1) and the third one is the type of parameter. Again, the API reference has more details about its usage.

After executing the statement (with the IBM_DB::execute method) the two variables, title and copies, which where respectively bound as INOUT and OUT parameters, will contain the full title of the book and the number of copies which we can print for the user to see.

While the IBM_DB Ruby driver provides us with many more methods, understanding this tutorial will give you enough material to get started with using the driver. Above all, I hope it will make the process of looking up specific methods in the online documentation much easier.

Next »