Broken algorithm, or broken language?

There’s an article on the official Google Research blog asserting that ‘nearly all binary searches … are broken’. What’s really interesting about it is that it’s not the algorithm that is broken, per se.

Allow me to digress for a moment. I don’t program in Java for a number of reasons. Mostly, it’s just that I don’t like the excessive baggage around the syntax (the absence of closures, for example). I suppose that the so-called ‘enterprise’ culture that’s built up around it is a bit of a turn off too. Perhaps another thing is the fact that Java just isn’t radical enough: despite all the advances, it’s still hobbled by a C/C++ mindset that it doesn’t really need to have.

I don’t blindly hate it, though. Working in Ruby all day, I recognise that there are things that Ruby (and, by extension, the wider Ruby community) could learn from Java, like better memory handling, or how to address some of the weaknesses in ActiveRecord.

To return to the topic at hand, however, the Google blog entry shows a classic binary search algorithm, and points out a flaw:

int mid =(low + high) / 2;

What’s the problem with this? I’ll quote it:

Specifically, it fails if the sum of low and high is greater than the maximum positive int value (231 – 1). The sum overflows to a negative value, and the value stays negative when divided by two. In C this causes an array index out of bounds with unpredictable results. In Java, it throws ArrayIndexOutOfBoundsException.

You know what? This isn’t a problem with the algorithm itself. It’s a problem with the intersection of algorithm, language, and compiler. There’s nothing logically wrong with the statement, it’s just that it overflows in C and Java. That means that, if you are implementing it in C or Java (as opposed to pseudocode), you should rework the calculation to avoid this pitfall. One might even argue that a sufficiently advanced compiler ought to rephrase it automatically.

In Ruby, however, this arbitrary limitation doesn’t exist. Values that are too big for Fixnum are automatically promoted to Bignum. 231 + 1 is not negative, for example. Whilst it may not be the optimal calculation, the above method will work.

So which is broken? The algorithm, or the language that breaks it?

Fixing a major TextMate annoyance

The Reevoo office is all-Mac, with the exception of one bargain-basement Windows XP box used for billing and debugging our website in Internet Explorer (to which, as regular readers will know, I bear an intense and righteous anger, but that’s not what I want to talk about today). We developers use TextMate as an editor. It’s a decent product with some well-thought-out features that really increase productivity, but it’s also hopelessly immature in many ways. In general, although I miss vim key mappings, I’m happy using TextMate for daily work.

Until recently, TextMate’s ‘Find in Project’ feature was really broken: its memory requirements when searching a large directory were so unreasonably huge that it would paralyse the operating system in half an hour of memory swapping ended only by the eventual death of the TextMate process. (In my estimation, Mac OS X’s memory management is also more than a little suspect.)

This sorry state of affairs would most frequently be triggered by attempting to search a large log directory in a Rails project. Given that running tests writes a lot to the logs, and we do a lot of testing, this was the rule rather than the exception. I had to learn not to use ‘Find in Project’ without first deleting everything in the log directory first. Sometimes I’d forget and get a very long enforced tea break. Whilst I enjoy breaks and recognise their health benefits, I prefer to take them on my own schedule—waiting for an unresponsive computer is anything but restful.

I’m happy to say that the rampant memory use has been fixed by a recent update to TextMate. Nonetheless, having to search the log directories makes searches much slower than they need be and tends to clutter the results. Ideally, I’d just like it to skip them entirely. Today, after five months of daily use, I finally worked out how.

Surely I can’t be the only one to have suffered this problem! I don’t think it’s just me being entirely daft: it’s genuinely difficult to find. Hidden in the Preferences dialog, behind the ‘Advanced’ section and in a tab named ‘Folder References, there is an input box labelled ‘Folder Pattern’. The name doesn’t really explain its function, but closer inspection reveals that it contains a regular expression telling TextMate which directories (or ‘Folders’ in Mac OS X parlance) it should ignore when loading a project.

TextMate Preferences screenshot

By default, it contains patterns for the data directories of a variety of version control systems. If we add log to these, it’ll ignore log directories too:

!.*/(.[^/]*|log|CVS|_darcs|{arch}|blib …

Next time we load a Rails project into TextMate the log directory simply won’t appear, and won’t be included in searches. Of course, if you actually wanted to look at the logs in TextMate, you’re out of luck—but then again, it can’t handle large text files anyway. Like I said earlier, it’s still an immature application.

Ruby on Rails month.ago considered harmful

Thirty days hath September,
April, June and November.
All the rest have thirty-one,
Excepting February alone,
Which hath twenty-eight days clear
And twenty-nine in each leap year.

I’m back at work today, and coincidentally found an interesting bug in our Ruby on Rails application. A test that had always worked before was suddenly failing. I happened to discover it first, but it wasn’t working for my colleagues either when they tried it. To add to the puzzle, it had worked perfectly on the continuous integration machine when the last change was checked in yesterday.

The failing component generates a report aggregating data for the last three months. In order to do this, it needs to know when the last three months begin. The obvious way is to use Rails’s time helper methods:

months = [ 
  Time.now.beginning_of_month,
  1.month.ago.beginning_of_month,
  2.months.ago.beginning_of_month
]

so that’s what I had done when writing the report. And, indeed, it works—up until the 31st of the month. Like a werewolf on the full moon, though, the bug rears its ugly head on the 31st day of any month:

[Mon May 01 00:00:00 BST 2006,
Mon May 01 00:00:00 BST 2006,
Sat Apr 01 00:00:00 BST 2006]

Why? Let’s see:

>> 1.month
=> 2592000
>> 30.days
=> 2592000

When you do 1.month.ago in Rails, you don’t get one month ago. You get thirty days ago. Maybe that’s close enough for most of the time, but it handles the edge cases remarkably poorly.

The solution in this case is quite simple:

months = [Time.now.beginning_of_month]
2.times do
  months << (months.last - 15.days).beginning_of_month
end

By subtracting fifteen days from the beginning of any month, we end up in the middle of the previous month. Taking the beginning of that month then gives us a reliable answer that works 365 days a year.

It’s obvious and easy once you realise, but it’s easy to overlook. So be careful when using month.ago in Rails. It doesn’t do what you might expect.

Matsushima

松島や
ああ松島や
松島や

Matsushima is one of the Three Views (三景) of Japan, according to a list composed in the 17th century. It’s one of the places the poet Bashō sold his house and risked his life to visit on his ‘Long Road to the North’ (おくの細道).

Matsushima literally means ‘pine islands’, and that’s what it is: a bay dotted with tiny islands weathered into sometimes-astonishing shapes, capped with luxuriant arboreal growth.

So what do they do, but build a great big ugly coal-fired power station there.

Matsushima with view of power station

Ah, Matsushima! indeed. It’s still a beautiful place, by the way.

Puzzling graffiti

This unusual piece of graffiti has been on a wall in Amerika-mura, downtown Osaka, for years, but it’s the first time I’ve got around to photographing it:

Korean graffiti in Osaka 오사카

If you don’t read Japanese, you’re probably thinking, ‘so what?’ The thing is, though, it isn’t Japanese.

It’s written in Korean, but it says ‘Osaka’. I’m really intrigued by how it came to be there: who wrote it, and why?

Another new country

I go away for a few days, and Europe sprouts yet another new country!

With the final, complete disintegration of Yugoslavia, does this mean the demise of the .yu domain lately allocated to Serbia and Montenegro?

More importantly, when do I get to set up my own micronation?

DRb and NSDistributedNotificationCenter in RubyCocoa

I’m writing an OS X GUI test runner for Ruby. It’s coming along nicely, and I’ll have something to show fairly soon. In the meantime, I’m going to share some things I discovered in the process of writing it.

First, DRb (Distributed Ruby) doesn’t play well with Cocoa objects, due to incompatibilities between Ruby’s threading and Cocoa’s calling model. Sometimes it works; mostly, it results in segmentation faults.

Instead of DRb, I decided to use NSDistributedNotificationCenter, OS X’s own distributed notification system. I couldn’t find many examples of usage on the internet. There were zero server examples in Ruby, and one in Python. After a bit of trial and error, I got it working; hopefully Google and this page will save someone else the trouble.

Here’s some sample server code:

require 'osx/cocoa'

include OSX

class Server < NSObject
  def foo(notification)
    p notification_to_hash(notification)
  end
  
  def notification_to_hash(notification)
    nsd = notification.userInfo
    return nsd.allKeys.to_a.inject({}){ |hash, oc_key| 
      hash[oc_key.to_s] = nsd.objectForKey(oc_key).to_s
      hash
    }
  end
end

server = Server.alloc.init

center = NSDistributedNotificationCenter.defaultCenter
center.addObserver_selector_name_object(
  server,
  "foo:", 
  "My Notification",
  "com.example.MyApp"
)

NSRunLoop.currentRunLoop.run

And the client:

require 'osx/cocoa'

include OSX

center = NSDistributedNotificationCenter.defaultCenter
center.postNotificationName_object_userInfo_deliverImmediately(
  "My Notification",
  "com.example.MyApp",
  {"foo" => "bar", "baz" => 3},
  true
)

It’s generally quite simple, but it’s important that the server object be an instance of an NSObject (or one of its descendants). Plain Ruby objects cause bus errors.

Bringing people together

Up the road from my office, there’s a café. The sign on the front says ‘Arlington’ and, in smaller letters underneath, ‘bringing people together’.

I can’t help but think of another Arlington, which also brings people together, although probably not in the same way the café owners mean.

Doggy paddle

The weather in the south of England generally comes in two kinds: grey, and sunny. Contrary to received wisdom, it doesn’t actually rain that much; in fact, it rains so little that a water shortage is a frequent problem.

Having, it seems, broken through the last many months of grey weather, London has been remarkably pleasant this last fortnight. I’ve been able to enjoy walks to and from work along the banks of the Thames. Whilst going via the river adds ten minutes to the walk, it is more pleasant by far.

Since my office is close to the South Bank, I sometimes take a midday break on a sunny day and walk up there to relax and avoid e-thrombosis.

Today, as I was leaning on the railings, I saw a man standing on the steps down to the water. I wasn’t quite sure what he was doing until I looked a bit further down. He had brought his dog (more accurately, a bitch) for a swim in the river. It was quite lovely, really. She stood on the step and tried to drink from the river, but was defeated by the movement of the waves until a particularly large wave washed completely—and, apparently, unexpectedly—over her head. Then she leapt into the river and took a swim. I hadn’t imagined that doggy paddle was quite so efficient, but she was soon a quarter of the way out into the river before the man whistled her back. This happened three or four times; I would have liked to see her try to make it all the way! It was the first time that I had seen a dog in the Thames.

After work, I needed to buy some bread for tomorrow’s sandwiches. I decided to go to Waterloo to get it: although it’s the wrong way to get home, there are several shops there, and the chance of actually finding something is greater. Well, I succeeded. Nothing interesting so far, admittedly. I went down to the Underground station to take the Jubilee line home.

Quel erreur.

Actually, it wasn’t that bad. There were no westbound trains, but I was going east. However, due to the general disruption, the eastbound trains were also adversely affected, and the journey took twice as long as it should have; most of the wasted time was spent in tunnels waiting for green lights.

On the positive side, the Tube staff did their best to keep us informed of the situation over the PA; what a strange situation that was! Apparently, someone had been ‘surfing’ on the back of the train near Baker Street. Unsurprisingly, they had to stop the train to investigate. A later announcement added the detail that the ‘surfer’ had been ‘apprehended’ in the tunnel.

I’ve got to admit, I was a litle disappointed to hear that the ‘surfer’ had avoided a deserved Darwin award. In a way, I’d have been gratified to hear that the train was delayed while they scraped the recently-abdicated-from-the-gene-pool ‘surfer’ off the tracks instead of them being arrested intact. But then, I am a bad person.

Comment spam

I got my first piece of comment spam today. Predictably, it was for dodgy medications.

Since I’ve been noticed by the spammers, I’ve implemented a simple system that ought to limit their future activities. I’m not going to give away all the details (I don’t want to help them too much!), but I will point out that it won’t tell you if your comment was rejected; it just won’t appear in the page. Please send me an email if a legitimate comment doesn’t get through.

Coding PHP is horrid, though—and it gets more painful the less one does it. Perhaps it’s about time that I restarted working on a Ruby engine for this site. I’ve picked up a lot of experience in my present job about web development in Ruby, and I have plenty of ideas for enhancements that I’d love to try out.

I just need to sit down for a couple of days and concentrate on it—finding that time is the hard part.