The recent spate of IT glitches and ‘power outages’ at British Airways which caused the UK’s national carrier to cancel all its flights worldwide at the start of May bank holiday along with the WannaCry ransomware attack which ground the National Health Service to a halt have exposed again the importance of IT systems in today’s business. The complexity of these IT systems, the number of vulnerabilities that exist in critical software used by critical infrastructure sectors such as the NHS, airlines, telecom operators has made headlines once more.
In April, Google experienced a fairly significant cloud outage, but it was hardly news at all. In fact, it was likely the most widespread outage to hit a major public cloud to-date. The lack of coverage is strange, considering the industry’s watchful eyes like Brian Krebs and others. The even more recent Salesforce service outage seems to have received more attention. But despite the fact that Google seems to have gotten away with a “pass” this time, the glitch brings renewed attention to the fact that tech players large and small are continuing to deal with software robustness issues.
Google Compute Engine was down for a full 18 minutes around the 7 o’clock hour Pacific Time on April 11, disconnecting all users in all regions. This was a Google cloud outage, and the root cause was a network failure. Network outages appear to be an ongoing challenge for Google, this one being the biggest yet.
Software risks to the business, specifically Application Resiliency, headline a recent executive roundtable hosted by CAST and sponsored by IBM Italy, ZeroUno and the Boston Consulting Group. European IT executives from the financial services industry assembled to debate the importance of mitigating software risks to their business.
Southwest Airlines is the latest victim of the airline scandal. What scandal? It’s the one where airlines continue to cause travel delays due to poorly managed IT systems. It’s the one that caused Southwest to delay 836 flights on Monday and distribute HAND written tickets to passengers because of a ‘software glitch’. Southwest isn’t alone. United Airlines grounded hundreds of flights in July and American Airlines did the same in September and April. How long will consumers have to wait before these organizations figure out that the glitches are caused by bad software quality, which creates bad service?
If you’ve read the news lately, you’ve seen headline after headline (some, even on our blog) about computer glitches, technical failures, software risk, and hacks. The health of applications is now under more microscopic attention than ever before – because no matter whether internal or external causes prompt a software outage, the security and stability of your applications are paramount.
In 2014, the IT infrastructure at the Federal government’s Office of Personnel Management (OPM) was upgraded from a security rating of “material weakness” to one of “significant deficiency,” according to The Wall Street Journal’s CIO Report. Which means that the OPM, even after upgrading to mitigate software risk, wasn’t up to snuff. That is – to put simply – unacceptable. It is also both a dismal and infuriating fact to learn – especially for those who were among the 21 million present and past Federal employees, revealed last week, to have had their Social Security numbers and other personal information stolen in the recent data breach.
We’re sure that by now, you’ve seen all of the stories about last week’s computer turmoil at the New York Stock Exchange, United Airlines, the Wall Street Journal, and TD Ameritrade. And as a top-level executive you’ve probably launched an internal review, or at least asked yourself, “Could it happen here?”
The simple answer is, unfortunately, “yes, it most definitely could.”