There’s an article on the official Google Research blog asserting that ‘nearly all binary searches … are broken’. What’s really interesting about it is that it’s not the algorithm that is broken, per se.
Allow me to digress for a moment. I don’t program in Java for a number of reasons. Mostly, it’s just that I don’t like the excessive baggage around the syntax (the absence of closures, for example). I suppose that the so-called ‘enterprise’ culture that’s built up around it is a bit of a turn off too. Perhaps another thing is the fact that Java just isn’t radical enough: despite all the advances, it’s still hobbled by a C/C++ mindset that it doesn’t really need to have.
I don’t blindly hate it, though. Working in Ruby all day, I recognise that there are things that Ruby (and, by extension, the wider Ruby community) could learn from Java, like better memory handling, or how to address some of the weaknesses in ActiveRecord.
To return to the topic at hand, however, the Google blog entry shows a classic binary search algorithm, and points out a flaw:
int mid =(low + high) / 2;
What’s the problem with this? I’ll quote it:
Specifically, it fails if the sum of
highis greater than the maximum positive
intvalue (231 – 1). The sum overflows to a negative value, and the value stays negative when divided by two. In C this causes an array index out of bounds with unpredictable results. In Java, it throws
You know what? This isn’t a problem with the algorithm itself. It’s a problem with the intersection of algorithm, language, and compiler. There’s nothing logically wrong with the statement, it’s just that it overflows in C and Java. That means that, if you are implementing it in C or Java (as opposed to pseudocode), you should rework the calculation to avoid this pitfall. One might even argue that a sufficiently advanced compiler ought to rephrase it automatically.
In Ruby, however, this arbitrary limitation doesn’t exist. Values that are too big for
Fixnum are automatically promoted to
Bignum. 231 + 1 is not negative, for example. Whilst it may not be the optimal calculation, the above method will work.
So which is broken? The algorithm, or the language that breaks it?
2006-06-04 18:39 UTC.