They are there all right, ticking time bombs in government IT systems, as they are moved from legacy to contemporary cloud architectures. It’s no secret that there are huge initiatives underway in the federal government to reduce cost by eliminating redundancy. The area of redundancy that administrators point to most is at the network and infrastructure level.
As a result, there continues to be a massive shutdown of government-run data centers, with the organizations moving their systems to the cloud. Now, one data center can support an entire infrastructure. These colossal structural changes are saving taxpayers a ton of money. But will these moves keep the structural quality of the systems intact? And if not, will taxpayers be on the hook for substantial additional costs in the long run?
To answer those questions, we need to examine the factors in play. When an organization performs wholesale migrations from one architecture to another — and in the case of a migration to the cloud, from one platform to another — unknown variables are introduced to a known system. This increases the likelihood that errors never considered in past testing will be introduced.
Now, the way we usually flesh out such errors when moving a legacy system to the cloud is through load testing. It certainly makes sense to perform load testing to see if the old system scales properly on the new platform and, when dealing with the cloud, has the required elasticity. Once the system is hosted in the cloud, it might have to support more users than it was initially designed to.
But load testing is not enough. IT teams need to assess the system’s structural integrity, and do so before the system has been migrated. This allows the team to address structural problems in the architecture, application, and integration points before the system is moved to the cloud, and before load testing is conducted.
These defects are latent, but can become a major issue when the system moves from its proven platform to the untested cloud. Load testing won’t necessarily detect them (and in fact, rarely does). That’s a ticking time bomb, and one I fear is being introduced to government systems in a big way, thanks to this wholesale shift to the cloud.
Consider an office building that has termites, but nobody is aware of the infestation. The building is old, but looks great, and appears to be holding up well for its years. Then one day a new tenant moves in, and brings in some heavy machinery. Suddenly, the invisible structural problems in the building are going to be exposed in a dramatic way. The building will collapse.
Here’s a building that appeared sound, passed previous inspections, but because of a hidden structural problem, the landlord has a disaster on his hands. So why don’t more government and enterprise IT organizations test structural integrity before applying load? Two reasons: Perceived lack of ROI, and fear of the unknown. Let’s tackle both of these.
Perceived lack of ROI: The savings returned from attending to structural integrity are understood to be in the single digits. Most often it’s described as an incremental 3 to 5 percent gain over several years. That doesn’t sound like much. But the math changes when the organization is going from an internal data center with a known, stable architecture to the cloud, which is essentially the great unknown. The cost of a catastrophic failure of a government IT system can quickly reach in the millions of dollars to remediate. Then, of course, there are the costs we can’t calculate, like loss of citizen services, outages to key mission or enforcement systems, etc.
Fear of the unknown: Assessing structural integrity means visualizing the actual system, seeing the unseen, and fleshing out the unanticipated. That, it turns out, is a hair-raising prospect for many federal IT program offices. The lower a program office’s engineering maturity, the scarier the prospect becomes. They are understandably leery about putting their work under the microscope and exposing long-hidden problems and bad practices. Better to not let executive management — the Commission, PEO Executive, or CIO — have the power to visualize these systems.
An IT leader recently told me that, while CAST’s diagnostic technology made sense, he didn’t “want to give non-engineering people visibility into the engineering work.” Even though he knew our technology would benefit his team, his processes, and his products, his fears outweighed the reward.
Instead, he is continuing to use his existing QA process (a patchwork of open source tools and scripts) rather than justify an innovative new approach that could ultimately save his federal program team a ton of time and money. And maybe his job in the long run, should one of his mission-critical systems fail in a big way.
When moving to the cloud, it pays to examine the structural quality of the entire system before boxing things up. That way, you know the system you’re migrating is sound, and everything will work as expected. Once the system is moved, and as it evolves, repeat the diagnosis consistently with a fresh round of load testing to simulate the new conditions. Once structural quality is assured, then and only then can you be certain there are no ticking time bombs lurking in the architecture.