Archive for March, 2008

This Week in Ruby (March 31, 2008)

Antonio Cangiano March 31st, 2008

One of the main advantages of Ruby’s growing community, is the speed at which exciting news pops up. This post briefly covers must-read highlights for new developments in the Ruby and Rails communities throughout the past week. I’ll attempt to repeat a “This Week in Ruby” post approximately every week, so feel free to follow along by subscribing to my feed.

Rails 2

Craig Webster published the first part of an easy to follow tutorial entitled Getting Started with Rails 2.0. The nice thing is that it also covers version control through Git, which is becoming extremely common in the Ruby/Rails community. Speaking of trends, a study by Gartner predicts that the Ruby language will reach 4 million programmers within the next 5 years. Numbers that would definitely position Ruby as a mainstream programming language. Incidentally, Obie Fernandez published an interesting survey of major corporations and large companies which are embracing Ruby on Rails. A very impressive list that is destined to grow quickly as Ruby’s implementation and ecosystem improve.

The official Rails blog has a post about a few changes in the core Rails team which brings the number of hackers down to 6, and establishes a Core alumni group of “retired” Rails core programmers. The small team seems to be busy, as Rails 2.1 is about to be released with some new exciting features, such as migrations based on UTC timestamp, named_scope, gems as plugins and much more as outlined by this caboo.se post.

On the deployment side, there are a few exciting happenings. Apple has published the third part of their tutorial, titled Deploying Rails Applications on Mac OS X Leopard. HighScalability.com has an article about an incredible service offered by Heroku that offers one-click, hassle free, cloud computing based Rails deployment. Their service is still in beta, and takes advantage of Amazon web services, but from what I’ve seen so far their secure and scalable deployment is as easy as it gets and it’s going to be a welcomed addition to the deployment options available to Rails programmers. Even more interestingly, a Dutch company called Phusion put up a demo about their Passenger project. Phusion Passenger, aka mod_rails, is a module for Apache that will bring ease of deployment, stability and performance to Rails. The company claims that according to their tests, mod_rails should be slightly faster than using Apache+Mongrel, it handles Rails processes automatically by spawning them depending on the effective traffic, and they are working with the largest hosts in order to ensure real world performance and stability when under heavy loads.

James Golick continues his series of “Plugins I’ve Known and Loved”, this time covering Ultrasphinx, a plugin for the uber-fast search daemon Sphinx.

Nicolas Sanguinetti wrote a script, that attempts to speed up the process of setting up an empty Rails project. I don’t find the procedure particularly boring so I don’t plan to use it, but it might be useful to some people.

Ruby

Peter Cooper, of Ruby Inside, has an article about two re-implemenations of the Ruby language. They are both named Sapphire, adding further to the confusion. The first one is a real fork, which was apparently born from the author’s dissatisfaction with the current management of the MRI, and the desire to have better support for Windows along, of course, with extra features like Aspect Oriented Programming, Unicode, etc, etc… The second one appears to be a new language, a stripped down and “stricter” version of Ruby that is supposed to be more Smalltalk-like and would be running, at least in the beginning, on the .NET platform. They are both worth mentioning, but currently I honestly fail to get excited about them. Ruby has an implementation problem, not a design one, in my opinion. However, we’ll see what they are able to deliver. In my book, a good implementation of a Ruby like language would be far from being frowned upon.

Remaining in the field of new implementations, InfoQ has an article about a neat new addition called HotRuby which is able to execute opcodes generated by the Ruby 1.9 VM (aka YARV). HotRuby is an incomplete and tiny (40 KB) implementation, faster than Ruby 1.9, that runs on JavaScript and Flash.

For those interested in Ruby and compilers, Vidar Hokstad has published the first two parts of a tutorial called “Writing a compiler in Ruby bottom up” (part 1 and part 2). Even if you are not into compilers, it’s a rather interesting read.

Sun reaffirmed their commitment to Ruby this week, by announcing the Ruby Development Center on the Sun Developer Network. I like Sun’s serious efforts and fast paced development of both JRuby and NetBeans (whose 6.1 beta is now out).

A new project hit the frontpage of programming.reddit.com: StrokeDB. Odd name, I know, but if you are interested in CouchDB, you may also take an in-depth look at StrokeDB. It’s a project that sounds rather promising if properly implemented. In their own words: “StrokeDB is an embeddable distributed document database written in Ruby. It is schema-free, it scales infinitely, it even tracks revisions and perfectly integrates with Ruby applications.”. The developers had a talk at EURUKO 2008, the European Ruby conference that took place this past weekend. Videos from the conference are not up yet, but meanwhile you can equally enjoy the videos and slides from MountainWest RubyConf 2008 on ConFreaks, which has published a talk by Evan Phoenix and Ezra Zygmuntowicz, respectively about Rubinius and Merb.

Finally, last week I spotted a bug that made Ruby 1.9 (built from trunk) significantly slower than Ruby 1.8. After a bit of investigation I was able to single out that the problem concerned Mac OS X only. With some testing by Chris Shea, the exact culprit revision was isolated and the core team has already worked on a fix. It was a very prompt and impressive response by Matz and his team.

RubyGems

Eric Hodel has announced the release of RubyGems 1.1.0. Aside from bugfixes and a couple of minor features, the most welcomed improvement is a significant speed boost that makes the tool faster. If you’d like to hear Eric talk about Rubygems and his involvement with the Ruby community, this week InfoQ published an interview with him (the interview itself is from RubyConf 2007 though).

This is my first episode of “This Week in Ruby” so feel free to provide feedback if you found it useful or if you have suggestions for improving it. Thank you for reading!

‘inject’, ‘each’ and ‘times’ methods much slower in Ruby 1.9 on Mac OS X

Antonio Cangiano March 25th, 2008

Jay Fields has a nice post about the goodness of Enumerable#inject in Ruby. Like anyone who enjoys functional programming, I appreciate the concept behind inject and its possible usage. Your language may implement it with a method called inject, a function fold or reduce, but the concept is the same. I think that there is nothing inherently evil about using Enumerable#inject in your code, however one must be very careful and question whether there are simpler alternatives available. Abuse of creative usage of inject or cleverness is often paid for in terms of the readability of the code and can’t be discouraged enough. Another good reason not use inject is when performance counts and we are dealing with large datasets. In Ruby 1.8 in fact, inject is much slower than an equivalent solution implemented with the each iterator or other available methods. People will tell you that this doesn’t matter, but in the real world, having a method take twice as long because you opted for inject rather than each, is not always justifiable or acceptable.

For example, the following snippet benchmarks four different methods for summing the first n natural numbers. It benchmarks them for n = 10, 102, 103, 104, 105, 106, 107 and 108. Please note that the problem at hand can be solved arithmetically, like Gauss did when he was 10, but it will serve us well to test differences in the raw speed of the compared methods provided by Ruby.

sum.png

The sum_with_while is admittedly ugly, and entirely not idiomatic. No Rubyist would write the method in that way. It’s there though, in order to have a non-iterator, loop-based solution which doesn’t rely on methods from the Enumerable class. It will give us some perspective. Also, the choice of reopening the Integer class, rather than writing methods that accept n as an argument, is entirely arbitrary and doesn’t affect the benchmark in any way.

require 'benchmark'
include Benchmark

class Integer
  def sum_with_inject
    (1..self).inject(0) { |sum, i| sum + i }
  end

  def sum_with_each
    sum = 0
    (1..self).each { |i| sum += i }
    sum
  end

  def sum_with_times
    sum = 0
    (self+1).times { |i| sum += i }
    sum
  end

  def sum_with_while
    sum, i = 0, 1
    while i <= self
      sum += i
      i += 1
    end
    sum
  end
end

(1..8).each do |p|
  n = 10**p
  puts "=== 10^#{p} ==="
  benchmark do |x|
    x.report("inject: ") { n.sum_with_inject }
    x.report("each:   ") { n.sum_with_each }
    x.report("times:  ") { n.sum_with_times }
    x.report("while:  ") { n.sum_with_while }
  end
end

These are the results on my MacBook Pro Core 2 Duo 2.2GHz with ruby 1.8.6 (2007-09-24 patchlevel 111) [i686-darwin9.2.0]:

    === 10^1 ===
    inject:   0.000000   0.000000   0.000000 (  0.000071)
    each:     0.000000   0.000000   0.000000 (  0.000020)
    times:    0.000000   0.000000   0.000000 (  0.000018)
    while:    0.000000   0.000000   0.000000 (  0.000022)
    === 10^2 ===
    inject:   0.000000   0.000000   0.000000 (  0.000084)
    each:     0.000000   0.000000   0.000000 (  0.000043)
    times:    0.000000   0.000000   0.000000 (  0.000041)
    while:    0.000000   0.000000   0.000000 (  0.000058)
    === 10^3 ===
    inject:   0.000000   0.000000   0.000000 (  0.000677)
    each:     0.000000   0.000000   0.000000 (  0.000335)
    times:    0.000000   0.000000   0.000000 (  0.000338)
    while:    0.000000   0.000000   0.000000 (  0.000492)
    === 10^4 ===
    inject:   0.010000   0.000000   0.010000 (  0.008473)
    each:     0.010000   0.000000   0.010000 (  0.003258)
    times:    0.000000   0.000000   0.000000 (  0.003142)
    while:    0.000000   0.000000   0.000000 (  0.004887)
    === 10^5 ===
    inject:   0.120000   0.000000   0.120000 (  0.115180)
    each:     0.060000   0.000000   0.060000 (  0.071298)
    times:    0.070000   0.000000   0.070000 (  0.063845)
    while:    0.080000   0.000000   0.080000 (  0.079443)
    === 10^6 ===
    inject:   1.410000   0.010000   1.420000 (  1.414042)
    each:     0.870000   0.000000   0.870000 (  0.882625)
    times:    0.870000   0.000000   0.870000 (  0.875730)
    while:    1.030000   0.000000   1.030000 (  1.039151)
    === 10^7 ===
    inject:  14.320000   0.030000  14.350000 ( 14.351873)
    each:     9.050000   0.030000   9.080000 (  9.086909)
    times:    9.120000   0.020000   9.140000 (  9.137516)
    while:   10.610000   0.020000  10.630000 ( 10.654512)
    === 10^8 ===
    inject: 144.070000   0.310000 144.380000 (144.517040)
    each:    90.380000   0.360000  90.740000 ( 90.898424)
    times:   89.960000   0.280000  90.240000 ( 90.476917)
    while:  106.760000   0.380000 107.140000 (107.741355)

These results are not surprising. The ugly sum_with_while method is just slightly faster than the sum_with_inject one, while sum_with_each and sum_with_times are significantly faster than sum_with_inject. For n sufficiently large, each is (linearly) about 1.5 times faster than inject. Again, nothing new, if you have done some number crunching with Ruby, you already know a few tricks and the fact that inject is notoriously slow.

Just out of curiosity, I decided to see how things have improved with Ruby 1.9. Here comes the surprise: ruby 1.9.0 (2008-03-21 revision 15825) [i686-darwin9.2.0] yields the following results.

    === 10^1 ===
    inject:   0.000000   0.000000   0.000000 (  0.000045)
    each:     0.000000   0.000000   0.000000 (  0.000038)
    times:    0.000000   0.000000   0.000000 (  0.000040)
    while:    0.000000   0.000000   0.000000 (  0.000016)
    === 10^2 ===
    inject:   0.000000   0.000000   0.000000 (  0.000311)
    each:     0.000000   0.000000   0.000000 (  0.000293)
    times:    0.000000   0.000000   0.000000 (  0.000292)
    while:    0.000000   0.000000   0.000000 (  0.000014)
    === 10^3 ===
    inject:   0.000000   0.000000   0.000000 (  0.003008)
    each:     0.000000   0.000000   0.000000 (  0.002914)
    times:    0.000000   0.000000   0.000000 (  0.002891)
    while:    0.000000   0.000000   0.000000 (  0.000079)
    === 10^4 ===
    inject:   0.020000   0.010000   0.030000 (  0.029825)
    each:     0.020000   0.010000   0.030000 (  0.028448)
    times:    0.020000   0.010000   0.030000 (  0.028573)
    while:    0.000000   0.000000   0.000000 (  0.000717)
    === 10^5 ===
    inject:   0.230000   0.090000   0.320000 (  0.315088)
    each:     0.210000   0.100000   0.310000 (  0.306036)
    times:    0.210000   0.090000   0.300000 (  0.304437)
    while:    0.030000   0.000000   0.030000 (  0.021261)
    === 10^6 ===
    inject:   2.410000   0.920000   3.330000 (  3.332743)
    each:     2.320000   0.930000   3.250000 (  3.259242)
    times:    2.320000   0.920000   3.240000 (  3.233116)
    while:    0.320000   0.000000   0.320000 (  0.328699)
    === 10^7 ===
    inject:  24.190000   9.150000  33.340000 ( 33.387602)
    each:    23.350000   9.180000  32.530000 ( 32.795653)
    times:   23.220000   9.100000  32.320000 ( 32.394359)
    while:    3.360000   0.020000   3.380000 (  3.377791)
    === 10^8 ===
    inject: 243.020000  91.920000 334.940000 (335.498686)
    each:   234.440000  92.490000 326.930000 (332.153885)
    times:  234.680000  93.470000 328.150000 (355.418129)
    while:   33.840000   0.160000  34.000000 ( 34.151623)

Look carefully, the good news is that while has been sped up by a factor of 3. You may notice that the execution time of the methods which employ inject, each and times are essentially the same in Ruby 1.9. This could be considered good news too, if we didn’t take a look at the previous output. All three methods in Ruby 1.9 are 2 to 3 times slower than in Ruby 1.8. each and times were both faster than while and are now 10 times slower! Forget about inject for a second, which after all can be avoided; but idiomatic Ruby code uses each iterators all over the place. Am I missing something, or Houston, do we have a problem? This loss of performance, if confirmed, should be considered as a bug.

To expose the extent of this regression, let’s see how JRuby 1.1RC3 (with the -J-server option enabled) performs:

    ==== 10^1 ===
    inject:   0.150000   0.000000   0.150000 (  0.150000)
    each:     0.001000   0.000000   0.001000 (  0.001000)
    times:    0.001000   0.000000   0.001000 (  0.001000)
    while:    0.000000   0.000000   0.000000 (  0.000000)
    ==== 10^2 ===
    inject:   0.001000   0.000000   0.001000 (  0.002000)
    each:     0.001000   0.000000   0.001000 (  0.001000)
    times:    0.001000   0.000000   0.001000 (  0.001000)
    while:    0.000000   0.000000   0.000000 (  0.001000)
    ==== 10^3 ===
    inject:   0.011000   0.000000   0.011000 (  0.011000)
    each:     0.017000   0.000000   0.017000 (  0.018000)
    times:    0.012000   0.000000   0.012000 (  0.012000)
    while:    0.004000   0.000000   0.004000 (  0.003000)
    ==== 10^4 ===
    inject:   0.092000   0.000000   0.092000 (  0.091000)
    each:     0.020000   0.000000   0.020000 (  0.020000)
    times:    0.018000   0.000000   0.018000 (  0.018000)
    while:    0.008000   0.000000   0.008000 (  0.008000)
    ==== 10^5 ===
    inject:   0.059000   0.000000   0.059000 (  0.058000)
    each:     0.030000   0.000000   0.030000 (  0.030000)
    times:    0.031000   0.000000   0.031000 (  0.031000)
    while:    0.024000   0.000000   0.024000 (  0.024000)
    ==== 10^6 ===
    inject:   0.377000   0.000000   0.377000 (  0.377000)
    each:     0.269000   0.000000   0.269000 (  0.268000)
    times:    0.251000   0.000000   0.251000 (  0.251000)
    while:    0.153000   0.000000   0.153000 (  0.154000)
    ==== 10^7 ===
    inject:   3.411000   0.000000   3.411000 (  3.411000)
    each:     2.558000   0.000000   2.558000 (  2.558000)
    times:    2.492000   0.000000   2.492000 (  2.493000)
    while:    1.386000   0.000000   1.386000 (  1.385000)
    ==== 10^8 ===
    inject:  33.743000   0.000000  33.743000 ( 33.743000)
    each:    25.408000   0.000000  25.408000 ( 25.408000)
    times:   25.127000   0.000000  25.127000 ( 25.128000)
    while:   13.834000   0.000000  13.834000 ( 13.835000)

sum_with_while (the quickest one of the lot) in JRuby is 2.5 times faster than Ruby 1.9, or almost 8 times faster than Ruby 1.8. inject, each and times are 10 to 13 times faster in JRuby than 1.9.

Believe it or not, even Rubinius is faster than Ruby 1.9 (excluding while):

    === 10^1 ===
    inject:   0.000222   0.000000   0.000222 (  0.000230)
    each:     0.000152   0.000000   0.000152 (  0.000154)
    times:    0.000140   0.000000   0.000140 (  0.000141)
    while:    0.000148   0.000000   0.000148 (  0.000149)
    === 10^2 ===
    inject:   0.000390   0.000000   0.000390 (  0.000394)
    each:     0.000232   0.000000   0.000232 (  0.000239)
    times:    0.000164   0.000000   0.000164 (  0.000165)
    while:    0.000153   0.000000   0.000153 (  0.000165)
    === 10^3 ===
    inject:   0.001210   0.000000   0.001210 (  0.001208)
    each:     0.000827   0.000000   0.000827 (  0.000828)
    times:    0.000396   0.000000   0.000396 (  0.000388)
    while:    0.000223   0.000000   0.000223 (  0.000224)
    === 10^4 ===
    inject:   0.015619   0.000000   0.015619 (  0.015602)
    each:     0.006985   0.000000   0.006985 (  0.006980)
    times:    0.002393   0.000000   0.002393 (  0.002397)
    while:    0.001104   0.000000   0.001104 (  0.001105)
    === 10^5 ===
    inject:   0.212239   0.000000   0.212239 (  0.212221)
    each:     0.155911   0.000000   0.155911 (  0.155898)
    times:    0.114280   0.000000   0.114280 (  0.114267)
    while:    0.081416   0.000000   0.081416 (  0.081400)
    === 10^6 ===
    inject:   2.339344   0.000000   2.339344 (  2.339332)
    each:     1.880382   0.000000   1.880382 (  1.880340)
    times:    1.441064   0.000000   1.441064 (  1.441048)
    while:    1.277817   0.000000   1.277817 (  1.277779)
    === 10^7 ===
    inject:  23.969469   0.000000  23.969469 ( 23.969452)
    each:    20.468222   0.000000  20.468222 ( 20.468203)
    times:   16.952604   0.000000  16.952604 ( 16.952586)
    while:   14.341948   0.000000  14.341948 ( 14.341921)
    === 10^8 ===
    inject: 285.504220   0.000000 285.504220 (285.504203)
    each:   233.995257   0.000000 233.995257 (233.995240)
    times:  200.869457   0.000000 200.869457 (200.869444)
    while:  196.518120   0.000000 196.518120 (196.518106)

Is this issue with Ruby 1.9 being addressed?

Update:

Upon further investigation, the issue has been confirmed on Mac OS X, but it doesn’t exist on Linux. From within a virtual machine running Ubuntu on my Mac, I got the following results:

    === 10^1 ===
    inject:   0.000000   0.000000   0.000000 (  0.000017)
    each:     0.000000   0.000000   0.000000 (  0.000009)
    times:    0.000000   0.000000   0.000000 (  0.000009)
    while:    0.000000   0.000000   0.000000 (  0.000007)
    === 10^2 ===
    inject:   0.000000   0.000000   0.000000 (  0.000025)
    each:     0.000000   0.000000   0.000000 (  0.000031)
    times:    0.000000   0.000000   0.000000 (  0.000024)
    while:    0.000000   0.000000   0.000000 (  0.000013)
    === 10^3 ===
    inject:   0.000000   0.000000   0.000000 (  0.000253)
    each:     0.000000   0.000000   0.000000 (  0.000131)
    times:    0.000000   0.000000   0.000000 (  0.000144)
    while:    0.000000   0.000000   0.000000 (  0.000001)
    === 10^4 ===
    inject:   0.000000   0.000000   0.000000 (  0.000000)
    each:     0.000000   0.000000   0.000000 (  0.000000)
    times:    0.000000   0.000000   0.000000 (  0.000000)
    while:    0.000000   0.000000   0.000000 (  0.000000)
    === 10^5 ===
    inject:   0.040000   0.000000   0.040000 (  0.028453)
    each:     0.020000   0.000000   0.020000 (  0.027007)
    times:    0.030000   0.000000   0.030000 (  0.025450)
    while:    0.010000   0.000000   0.010000 (  0.015040)
    === 10^6 ===
    inject:   0.360000   0.000000   0.360000 (  0.358824)
    each:     0.340000   0.000000   0.340000 (  0.333370)
    times:    0.370000   0.000000   0.370000 (  0.378156)
    while:    0.240000   0.000000   0.240000 (  0.243033)
    === 10^7 ===
    inject:   4.040000   0.000000   4.040000 (  4.046872)
    each:     4.120000   0.010000   4.130000 (  4.160456)
    times:    3.530000   0.000000   3.530000 (  3.544973)
    while:    2.610000   0.000000   2.610000 (  2.613019)
    === 10^8 ===
    inject:  44.870000   0.040000  44.910000 ( 45.146779)
    each:    37.590000   0.020000  37.610000 ( 37.803406)
    times:   36.550000   0.010000  36.560000 ( 36.618813)
    while:   26.690000   0.010000  26.700000 ( 26.747007)

Conclusion: the inject, each and times methods are much slower in Ruby 1.9 on Mac. As you can see from the comments, this seems to be caused by the fact that the function sigsetjmp() was introduced during revision 15124 and it happens to be much slower on Mac OS X than it is on Linux.

Google Translate’s bug and Google Suggest’s racial oddity

Antonio Cangiano March 22nd, 2008

Google Translate

You may have heard about Google launching their AJAX Language API. Translations on the fly via Javascript: sweet! Google Translate is not that bad, usually. It still messes up quite a few things in translation, but overall it’s still pretty acceptable.

Google uses statistical learning techniques, as opposed to a rule-based approach. From their FAQs:

Most state-of-the-art, commercial machine-translation systems in use today have been developed using a rule-based approach, and require a lot of work to define vocabularies and grammars.

Our system takes a different approach: we feed the computer billions of words of text, both monolingual text in the target language, and aligned text consisting of examples of human translations between the languages. We then apply statistical learning techniques to build a translation model. We’ve achieved very good results in research evaluations.

It’s a very hard problem to solve and the quality can be so-so at times. However, I’m going to unveil the most ridiculous bug I’ve ever encountered using this system. Do you notice anything strange in the following translation from German to English first, and from German to French second?

GERMAN: Output: 4 – 600 Ohm Made in Austria!! Funktionstüchtig! Die Kopfhörer haben einen Spitzen Sound der unverfälscht wieder gegeben wird!! Die Qualität der Kopfhörer ist einfach Spitze.

ENGLISH: Output: 4 - 600 ohms Made in USA! Funktionstüchtig! The headphones have a peak sound of the genuine will be given again! The quality of the headphones is simple tip.

FRENCH: Output: 4 - 600 Ohm Made in France! Fonctionne! Les casques ont un peu de son authentique sera à nouveau! La qualité des écouteurs est facile de pointe.

Clearly you should see an issue here. In case you don’t, I’ll be more explicit:

madeinusa.png

“Google Translate” sometimes changes the country mentioned within the source language to the main country of the translation language. That’s a pretty big bug they have right there. Certain terms should be translated verbatim using dictionary mapping, especially something as simple and hardcoded as countries.

Thanks to my friend Ludo who noticed this bug.

Google Suggest’s racial oddity

While we are on the topic of Google bugs and anomalies, I’ll add a small oddity to the mix. I must prefix this part of my post by clarifying that I respect all ethnicities and colors and have good friends from all over the place. I am against racism, but not against discussions about racism. I won’t publish anyone’s racist or offensive comments, be warned. What this post does is merely point out Google Suggest’s selective behavior, which of course gets picked up by Firefox’s Google search box in the top corner, too.

Google suggestions are based on the number of queries received and the number of results for any given query. This means that entering words in Google Suggest will reveal the most likely queries starting with that given term(s). In Google’s own words:

Our algorithms use a wide range of information to predict the queries users are most likely to want to see.

For example, if I write “money is”, Google will suggest: “money is the root of all evil”, “money is debt”, “money is power”, money is everything” and so on.

I’m an Italian programmer, so I tried “programmers are” and got the hilarious suggestion that “programmers are lazy”. :) Alright, what about “Italians are”? Here are the results:

italians.png

Some people are racist, that’s nothing new. These are stereotypes, for and against Italians, and this shouldn’t surprise anyone. And you can’t really blame Google either for what people have been typing in the most. Google is suggesting, automatically, based on the most popular queries. Okay, that’s for Italians. What about other nationalities? The most common stereotypes are all well represented. Americans, French, German, Spanish, Chinese, Indians, etc… What about “whites” in general?

whites.png

Sad, I know. The picture doesn’t change too much if you are looking for “Christians are”, “Muslims are”, “Jews are”, “gays are”, “Cops are”, “men are”, “women are” and so on.

Google won’t suggest anything if the queries are not popular enough. This means that “Caucasians are” is not going to yield any suggestions, but “Caucasians” (alone) will. Google could do a couple of things: either blacklist the few dozen racial terms which are popular enough to show up in the suggestions, or simply decide that by policy, suggestions are automated and therefore, if you are looking for stupid queries based on race, you shouldn’t get offended by the suggestions that you receive back.

A few years ago there used to be very reprehensible suggestions against black people, as one would expect given the results for the other ethnicities and nationalities, and the racism that unfortunately still exists today. A while ago though, Google did something rather odd. They removed “blacks” from the list, possibly after receiving complaints, and left everyone else in the suggestion engine. If you search for “blacks are” you won’t find any suggestions. And I’m pretty sure it’s still as popular as it ever was, and just as queries containing “whites are”, “Greeks are” or “Christians are” are. On top of that, even if you just search for “blacks” the engine will not suggest anything. To further convince you, even if you were to search for an unusual term like “purples” you’d still get two suggestions: “purples 80s” and “purples wxsand”. If this exclusion was the right thing to do, then Google should do the same for other groups as well. If it wasn’t, then why favor only one group?

I don’t know if we should consider this a form of “selective racism”, but it’s odd and I thought I’d point it out even if the subject is very delicate and risky. If you think about it, it’s not even a racial problem, it’s more of a question of how to make software engineering decisions that properly and equally handle potentially offensive outcomes for some of your users.

Next »