Twittering the Shipping Forecast

Mar 29, 2008 15:55 · 763 words · 4 minute read

Although the main use of Twitter is for real people to say what they’re up to “right now”, it was quickly apparent to your average hacker that it’s just as useful for any service or device that changes state on a regular basis. Hence Tom and Tom making Tower Bridge Twitter, so that the bridge announces every time it raises and lowers.

Talking with Russell Davies a few weeks ago, he mentioned the Shipping Forecast - which immediately struck me as something that would be fun (although not necessarily useful) to get Twittering.

The Shipping Forecast is one of those uniquely British institutions - four times a day, the Met Office produces a forecast for the seas around the British Isles which is then broadcast on BBC Radio 4. Being brought up on the coast of the Irish Sea by Radio 4-listening parents, it was part of my life from an early age - and because it’s issued in a standard format, it’s almost poetic. Entire generations of Brits have grown up with phrases like “Southwest, backing southeast for a time, 5 to 7, occasionally gale 8” becoming earworms. Hearing or reading it still brings back memories of a Roberts radio first thing in the morning, after Farming Today but before the Today programme.

Getting it to Twitter, though, presents one or two challenges. The first is getting hold of the source data - the Met Office being one of a number of British institutions that are forced by the grasping UK Government to be profit-generating, they want about £600 a year for the privilege of accessing a clean XML feed of the data. And although the forecast is published online by both the BBC and the Met Office, the quality of the HTML leaves a lot to be desired, at least from the point of view of scraping it.

In the case of the BBC, that’s because it’s presented primarily to be human-readable - in the case of the Met Office, it’s because they’re one of many brain-dead British public bodies that are Microsoft monocultures, and know or care nothing about standards and being good online citizens. Nor for that matter would they know good online design if they tripped over it. But that’s a rant for another day.

Back to Twittering. It’s a Ruby script that gets kicked off by a cron job four times a day, and uses the marvelous Hpricot gem to grab the appropriate page from the Met Office. Then the extraneous junk HTML is thrown away to leave just the table cell containing the forecast itself, which gets chopped up into a number of array elements by splitting it at the emboldened headings.

At this point the second problem arises - because the Forecast is intended to be read aloud, it’s fairly verbose. Fitting it into 140 characters is something of a problem. To get round this, the individual array elements get the snot parsed out of them, chopping down the character count by searching-and-replacing the content with abbreviations. It doesn’t make for pleasant-looking tweets, although if you’re familiar with the overall syntax and cadences of the verbal Forecast it’s actually surprisingly readable. Once squished down, each element is then tweeted out with the Twitter4R gem.

Originally I was going to set up a Twitter account for each forecast area, but that was a pain to set up (there’s lots of them) and awkward because area names like “Viking” had already been taken. So in the end I’ve compromised by pushing all the area forecasts out to the one Shippingcast account.

There’s room for improvement - cleaning up the abbreviations by using regular expressions would be one, or replacing obscure abbreviations with Unicode symbols being another. (There’s a key for the abbreviations I’ve used here) And it’s clearly not a particularly useful thing to be Twittering in the first place.

However, it’s made me realise just how useful it could be for public bodies to make their data available in structured, machine-readable form - and not to charge ridiculous amounts of money for it. The chances of the Met Office coming up with the idea of Twittering the Shipping forecast internally is next to nil - so this kind of “innovation” takes place externally. (I’m using innovation loosely here, but there are far more interesting and useful things that can be done with public data - anything that MySociety have done, for example). Making the information available would be trivial, if only there was the will to do it - and the potential benefits could be huge.