globeandmail.com

BlackBerry outage: Can you say 'we need a backup?'

Thursday, February 14, 2008

MATHEW INGRAM

It's a great feeling to walk around knowing that you can get your e-mails and other data whenever you want, whether it's through a mobile device such as a BlackBerry, or through a Web service like Google's Gmail or Google Calendar, or one of a dozen similar applications.

That's what it's like when you use what some call "cloud computing" services, where all your data is stored somewhere other than your desktop PC. You can get your documents anywhere, your e-mail, your music files, your photos, and just about anything else you could want.

Yes, it's a great feeling - until the Internet suddenly goes down. BlackBerry users felt that kind of pain this week, when a system upgrade caused a North America-wide outage that lasted for hours. Jim Balsillie, co-chief executive officer of Research In Motion, the maker of the BlackBerry, brushed it off as nothing serious - an "intermittent delay" - but in doing so he was also discounting the entire foundation of his company's spectacular success: Namely, the fact that RIM's devices have become indispensable for millions of people.

RIM's outages (the latest downtime was the second serious outage in less than a year) have likely created a lingering sense of unease in users who have come to depend on them, not just for business e-mail but as a way of staying in touch with friends and family. But they have also exposed a key weakness in the way that RIM's service operates, one that many BlackBerry users might not be aware of.

When it comes to Google or Microsoft or Yahoo or Amazon, most people are probably familiar with the football-field-sized "server farms" that such companies operate, each of which contains tens of thousands of computers. Google alone is estimated to have more than 500,000 servers located in over 35 warehouses around the globe.

Each one of these server farms is connected to one of the main entry points for the Internet, and if one goes offline for some reason, the data can be relatively easily shifted elsewhere in Google's network. The company's proprietary software, known as Google FS, ensures that data is always backed up in multiple places, so that an outage somewhere doesn't take entire services offline - or at least not very often.

RIM, however, funnels all of the e-mail messages and other data sent over its network in North America through a single "network operations centre" or NOC in Waterloo, Ont., (there's another NOC in Waterloo that serves the Asian market, and a similar operations centre in Britain that handles most of Europe).

The company says it has spent millions of dollars upgrading and expanding its network to handle increased loads and protect against outages, including (ironically) the upgrade that likely took down the network this week. But the fact remains that RIM's structure increases the likelihood of system-wide outages because of the NOC bottleneck.

At the same time, while Google's server infrastructure is more robust, it has outages like any other technology company. And all of Google's Web-based services - or Microsoft's, or Yahoo's, or Amazon's - are vulnerable to a broader Internet outage of some kind, like the one that large chunks of the Middle East experienced a week or so ago, when several major undersea cables were severed or broke down.

There are other issues as well, of course. There have been several reports recently of individuals who were locked out of their Google accounts due to breaches of the company's terms of use, meaning that several years worth of their e-mail was effectively gone, along with their calendar and all of their contacts and shared documents.

In at least one case, the individual concerned was the victim of identity theft, and had no knowledge of the breaches that led to Google blocking the account. Although they were able to get the company to restore their data, it took days of e-mails and protests to do so.

These and other cases such as the BlackBerry outage are a worthwhile reminder that the freedom that comes with "cloud computing" is a trade-off. In order to achieve that level of freedom, you not only have to rely on third parties to make sure that your data is secure, but you have to take the risk that at any point, all of that information in the cloud may suddenly become unavailable for reasons outside of your control.

What's the answer? One solution is to think like Google: In other words, keep multiple copies of your important files in multiple places, available in a number of different ways. Either that, or use the inevitable downtime as a kind of mini-vacation and enjoy the silence.

gam