Archive for the 'Python' Category

Announcing Ruby on Crack

Antonio Cangiano April 1st, 2008

After several months of keeping it under wraps, I’m happy to officially announce my own web framework to the world. It’s called Ruby on Crack and will be released by RailsConf 2008. The name of the framework was chosen because I wanted to push the idea of a complete break from the existing Ruby frameworks, a clear cut, if you will.

Rails is great, don’t get me wrong, but it’s very opinionated. If you need to get things done in a different way, coding in Rails can become a burden. More reasonable from this viewpoint is Merb, but it’s still somewhat too close to Rails. Ruby on Crack is very different. A programmer using Crack no longer has strong opinions or is constrained within the rules defined by the framework. The only guiding principle is FUD (Fast Useful Development), but you get to decide your own specific style of web development. The framework is very modular and each component is connected to another one through a unified API called PIPE.

According to my preliminary tests, using Crack makes you 5 to 10 times more productive than using Rails. Not only that, but speed wise, you’re running much faster with Crack. Ruby on Crack ships with two extremely fast web servers, as opposed to WEBrick with Rails, called Purebred and Fat. Purebred handles requests by spawning new threads, while Fat uses an event based approach. From what I’ve seen so far, they easily outperform any other existing webservers, to the point that I was able to serve a sample app at a rate of 10,000 requests per second on commodity hardware.

Part of the speed boost that Crack can give you, is due the highly efficient, thread-safe and powerful ORM called Freebase. Even Datamapper is really slow compared to Freebase. On average, Freebase smokes ActiveRecord (it’s 4 to 5 times faster) and it can take advantage of advanced database features which are not supported in ActiveRecord, Datamapper, Og or Sequel.

I don’t want to reveal too much right now, but Ruby on Crack is truly a revolutionary approach to web development and makes you value the true power and colorful nature of Ruby. All of this will be explained in detail in future posts, and the code will be released into the wild for all to enjoy within the next couple of months. It took a lot of effort to get my team to work on Crack, but the results are rather satisfying. Don’t just take my word for it though, here are a few testimonials from prominent figures in the community who’ve had a chance to use the closed beta of Ruby on Crack:

“Those who use Crack can really appreciate the beauty of Ruby.” — Matz

“Fuck You” — DHH

“Incredible framework. I plan to publish a book about it ASAP, and I’m sure that it’ll be the first in a long series of books that I’ll write while using Crack.” — Dave Thomas

“Crack is what we really needed at Engine Yard.” — Ezra Zygmuntowicz

“Wow, what an eye opener. Crack made web programming fun again.” — Obie Fernandez

“You ass#$!@ m0#@#!@&%$” — Zed Shaw

“Once you have tried Crack for the first time, you realize how addictive it is. I simply can’t go back to Rails anymore.” — April Pesce

“Web 3.0 will belong to those on Crack.” — Tim O’Reilly

“If there is something that the Ruby community got right it’s Crack. I wish Python programmers were on Crack too!” — Guido van Rossum

“This is truly innovative and a godsend for startups. Give us about 6 years and we’ll have Arc on Crack too.”- Paul Graham

Django’s tipping point

Antonio Cangiano March 20th, 2008

Django seems to have reached its tipping point, that critical mass which will enable its momentum to skyrocket. Getting here took a while though; partially because of a lack of hype and partially due to Rails’ very prominent presence in the market. Now this well deserving framework has finally begun to be widely adopted and considered as a valid alternative to Rails, for agile web development. Why do I care about what other people are going to use? I care because I’m deeply passionate about technology that works and that keeps things as simple as possible - as such forms of innovation always should. Independently from their adoption, promotion, and the differences in their approaches, both Django and Rails have at their core, a lot of substance and can greatly simplify and improve the way a web developer’s creative process flows.

Stating that Django has reached its tipping point is a bold claim, but I can present some evidence to back it up. I will use Rails and Ruby as a comparison for Django and Python - but don’t construe this as a race between the two frameworks. Rails is still the most popular and will probably continue to be for a long time. I’m only comparing the numbers to get an idea of where Django stands right now.

Visiting irc.freenode.net, I’ve noticed that the Python (#python) channel is often more populated than the Ruby one (#ruby-lang), and the same goes for Django (#django) and Ruby on Rails (#rubyonrails). For example, right now I see 517 members for Python and 354 for Ruby, 382 for Django and 298 for Rails. Django and Python consistently have more hackers in their chats than Rails and Ruby. This doesn’t say too much, given that the average developer doesn’t hang out on irc, but it’s still somewhat indicative of Django’s growing community.

Moving to newsgroups/Google Groups, things start to change a little. As I write this, there are 12,457 subscribers for comp.lang.python and only 6,935 for comp.lang.ruby (with 1,857 members in ruby-talk-google). “Django users” has 8,178 members versus the 13,355 of “Ruby on Rails: Talk”. So far this month, the Django group has had 1,244 messages versus the 2,890 of the Rails one. By looking at these numbers, without any pretense of being too scientific in our comparative methods, we get the impression that the Rails community is almost twice as big as the Django one, which sounds about right. On the other hand we also get that the Python community is larger than the Ruby one (confirmed also by the irc results above). In looking at these numbers, Rails also has the advantage of being the most used Ruby framework by far. In Python-land, Turbogears (3,303 members), Pylons (1,333 members) and good old Zope split the pie too, even though Django remains the most popular choice. Guido van Rossum’s blessing for Django was just the icing on the cake.

By observing the TIOBE Index, we see that Python is in 7th position versus the 10th position where we find Ruby. Perhaps more interestingly, Python has had a +0.70 delta since last March, while Ruby a -0.11%. Again, this is certainly not an exact science, my friends. TIOBE accuracy is often disputed for good reason, but I think it’s still an indicative factor.

Speaking of less than entirely reliable things, Alexa (django vs rails), Compete, and Google Trends (yes, rails is a very generic term) all confirm the anecdotal evidence that Rails is still far more popular. That said, the values start to be at least somewhat comparable.

In my opinion the strongest indicators of Django’s increasing popularity come from the publishing world. You’ll see many books in print for a given topic, only if their publishers believe that there is a large enough market for them. In 2007 the following two books were published: Professional Python Frameworks: Web 2.0 Programming with Django and Turbogears and The Definitive Guide to Django: Web Development Done Right (available for free online). 2008 has only just started and already there’s been one Django title published (Sams Teach Yourself Django in 24 Hours), with two further titles lined up: Practical Django Projects and Python Web Development with Django (which I’m currently reviewing for Pearson, as it’s in the process of being written - and I must say, I think it’s going to be a very good one).

5 books on Django announced to date and more lined up to be released this year, I’m sure. There are now many books in print that cover Rails (my recommendations here), but the sudden spur of Django books reminds me of Rails a couple of years ago and will surely help widen Django’s popularity. Watch closely because things will move fast in Django-land.

Using Python to detect the most frequent words in a file

Antonio Cangiano March 18th, 2008

Working with Python is nice. Just like Ruby, it usually doesn’t get in the way of my thought process and it comes “with batteries included”. Let’s consider the small task of printing a list of the N most frequent words within a given file:

from string import punctuation
from operator import itemgetter

N = 10
words = {}

words_gen = (word.strip(punctuation).lower() for line in open("test.txt")
                                             for word in line.split())

for word in words_gen:
    words[word] = words.get(word, 0) + 1

top_words = sorted(words.iteritems(), key=itemgetter(1), reverse=True)[:N]

for word, frequency in top_words:
    print "%s: %d" % (word, frequency)

I won’t provide a step by step explanation of what I believe is already rather understandable. There are however a few tricky considerations to be made on behalf of those who are not too familiar with the language. First and foremost, I love using Generator Expressions because they are lazily evaluated and have a math-like readability. It’s just a very convenient way of crating generator objects. Notice how in the snippet I favor them over the option of placing a whole file into a string by concatenating the read() method to the open() one. Doing so results in a significant performance improvement for large files. Generator Expressions and List Comprehension are extremely useful language features which are inherited from the world of functional programming, and I’m glad that Python fully embraces them.

In the third for loop we count words and add them and their respective frequencies to the words dictionary (similar to a Ruby Hash). Notice how the method get() enabled us to specify a default value before incrementing the counter, in case the given key didn’t exist yet (which means that the word we were adding hadn’t been encountered before). We pass operator.itemgetter() as a keyword argument (another nice Python feature) to the sorted() function. itemgetter() returns a callable object that fetches the given item(s) from its operand which, in our case, essentially means that we can tell sorted() to sort based on the value of the dictionary’s items (the frequency of the words) rather than based on the keys (the words themselves).

Unfortunately there is a problem with this code. It will correctly sort the most popular words in the file, but equally represented words won’t be alphabetically ordered. Given that we specified a reverse order for the sorted() function, we could simply pass it key=itemgetter(1, 0) to order (in descending order) by value first and by key second. But let’s be realistic. In most cases, you want to have these type of keys whose values are equal, be alphabetically ordered (in ascending order). With a few changes to the code, this can be easily achieved:

from string import punctuation

def sort_items(x, y):
    """Sort by value first, and by key (reverted) second."""
    return cmp(x[1], y[1]) or cmp(y[0], x[0])

N = 10
words = {}

words_gen = (word.strip(punctuation).lower() for line in open("test.txt")
                                             for word in line.split())

for word in words_gen:
    words[word] = words.get(word, 0) + 1

top_words = sorted(words.iteritems(), cmp=sort_items, reverse=True)[:N]

for word, frequency in top_words:
    print "%s: %d" % (word, frequency)

Previously we specified what “key” should we use for sorting, while in this case we now have a much greater deal of control. By defining the function sort_items() and passing a pointer to it for the cmp argument of the function sorted(), we get to define how the comparison amongst the items of the dictionary should be carried out. The function that we defined at the beginning of the script will return -1, 0 or 1, depending on how the two key-value pairs compare. The returned value is cmp(x[1], y[1]) or cmp(y[0], x[0]). This may seem complicated but the trick is rather easy. The first part compares the frequencies of the two words and returns 1 or -1 if one is greater than the other. If they are equal, the expression to the left of the or will be 0, therefore the expression on the right of the or will be returned. On the right we compare the keys (the words), but invert the order of the arguments y and x to reverse the effects of the reversed ordering defined in sorted().

Finally, for those who prefer to use a lambda expression, rather than to define a function, we can write the following:

from string import punctuation

N = 10
words = {}

words_gen = (word.strip(punctuation).lower() for line in open("test.txt")
                                             for word in line.split())

for word in words_gen:
    words[word] = words.get(word, 0) + 1

top_words = sorted(words.iteritems(),
                   cmp=lambda x, y: cmp(x[1], y[1]) or cmp(y[0], x[0]),
                   reverse=True)[:N]

for word, frequency in top_words:
    print "%s: %d" % (word, frequency)

Or simplified further by getting rid of reverse=True and using key rather than cmp:

from string import punctuation

N = 10
words = {}

words_gen = (word.strip(punctuation).lower() for line in open("test.txt")
                                             for word in line.split())

for word in words_gen:
    words[word] = words.get(word, 0) + 1

top_words = sorted(words.iteritems(),
                   key=lambda(word, count): (-count, word))[:N] 

for word, frequency in top_words:
    print "%s: %d" % (word, frequency)

Please bear in mind that the code makes a few assumptions so as to keep things simple. As it stands, the script would consider “l’amore” as a single word, and an accidental lack of spaces wouldn’t be accounted for (e.g. “word.Another” would be a single word too). The replace() method can be used to address these sorts of special cases.

Sure, this was a rather trivial example, born from an iPython session, but I think it gives away Python’s expressiveness and flexibility when dealing with problems that, approached in some other languages, would be much more error prone and verbose. Batteries included indeed.

Next »