Load Testing Fails Facebook IPO for NASDAQ

Are you load testing? Great. Are you counting on load testing to protect your organization from a catastrophic system failure? Well, that’s not so great.
I bet if you surveyed enterprise application development teams around the world and asked them how they would ensure the elasticity and scalability of an enterprise application, they’d answer (in unison, I’d imagine): load testing. Do it early and often.
Well, that was the world view at NASDAQ before the Facebook (NASDAQ: FB) IPO — perhaps the highest profile IPO of the past five years. And we saw how well that worked out for them. NASDAQ has reserved some $40 million to repay its clients after they mishandled Facebook’s market debut. But it doesn’t look like it’ll end there. Investors and financial firms are claiming that NASDAQ’s mistake cost them upwards of $500 million. And with impending legal action on the way, who knows what the final tally will be.
If ever there was a case for the religious faithful (like me) of application quality assessment to evangelize that failure to test your entire system holistically has a high potential cost, here you go. NASDAQ has hard numbers on the table.
NASDAQ readily admitted to extensive load testing before the iconic Facebook IPO, and it was confident, based on load test results, that its system would hold up. It was wrong, because its test results were wrong. And the final result was an epic failure that will go down in IPO history (not to mention the ledger books of a few ticked-off investors).
Now, I’m not saying that you shouldn’t be load testing your applications. You should. But load testing is something that happens at the later stages of your application development process, and before the significant events you are anticipating might cause a linear or even exponential increase in capacity demands to your website or application.
However, load testing gets hung up in its prescriptive approach to user patterns. You create tests that assume prescribed scenarios which the application will have to deal with. Under this umbrella of testing, the unanticipated is never anticipated. You’re never testing the entire system. You are always testing a sequence of events.
But there is a way to test for the unexpected, and that’s (wait for it) application quality assessment and performance measurement. If NASDAQ had analyzed its entire system holistically in addition to traditional load testing, it would have (1) dramatically reduced the number of scenarios it needed to load test, and (2) figured out which type of user interaction could have broken the application when it was deployed.
Application quality assessment goes beyond the simple virtualization of large user loads. It drills into an application, down to the source code, to look for its weak points — usually where the data is flying about — and measures its adherence to architectural and coding standards. You wouldn’t test the structural integrity of a building by jamming it with the maximum number of occupants; you’d assess its foundations and its engineering.
NASDAQ could have drilled down into the source code of its system and identified and eliminated dangerous defects early on. This would have led to a resilient application that wouldn’t bomb when the IPO went live, and would have saved the company the $40 to $500 million we’re estimating it’s exposed to now for its defective application.
At the end of the day, quality assessment can succeed where load testing can, and has, failed. Had NASDAQ considered software quality analysis before Facebook had gone public, there’s a good chance it would still have $40 million burning a hole in its pocket. However, our friends at NASDAQ load tested how they thought users would be accessing their systems, then sat back in anticipation of arguably the most anticipated IPO in recent history. Little did they know they would be the ones making headlines the next day.

Structural Quality Metrics in Outsourcing SLAs

When I speak to customers and prospects trying to incorporate static code analysis into their software development processes, one of the most common questions that I get is “How do we incorporate the outputs of static analysis into SLAS?” Given the prevalence of outsourcing in Fortune 500 and Global 1000 companies, this question is not surprising. Companies have always struggled to measure the quality of products being delivered, beyond the typical defect densities measured after the fact.
To help organizations answer this and similar questions, I thought I would compile some frequently asked questions around introducing Structural Quality Metrics into SLAs.  Before I get into the details, I want to caution readers against using these metrics simply for monitoring, or as a tool to penalize vendors.  This approach invariably becomes counterproductive, and instead I recommend looking at these metrics as an opportunity to make the vendor-client relationship more transparent and fact-based—a win-win on both sides.
In addition to the FAQ’s below, for more on this topic you don’t want to miss our next webinar on May 16th.  We are pleased to have Stephanie Moore, Vice President and Principal Analyst with Forrester Research discussing how to “Ensure Application Quality with Vendor Management Vigilance.”  You can register here.
What kind of structural quality metrics can be included in SLAs?

Quality Indices: Code analysis solutions parse the source code and identify code patterns (rules) which could lead to
potential defects. By categorizing these improper code patterns into application health factors such as Security, Performance, Robustness, Changeability and Transferability, you can aggregate and assign a specific value to each category, like the Quality Index in the CAST Application Intelligence Platform (AIP). You should set a baseline for each of these health factors and monitor the overall health of the over time.
Specific Rules: Quality indices provide a macro picture of the structural quality of the application, however there are often
specific code patterns (rules) that you want to avoid.  For example, if the application is already suffering from performance issues, you want to make sure to avoid any rule that would further degrade the performance. These specific rules should be incorporated into SLAs as “Critical Rules” with Zero Tolerances.
Productivity: Amount charged per KLOC (kilo lines of code) or per Function Point. Static analysis solutions should provide the size of the code base that is added in a given release.  Along with KLOC, CAST AIP provides data on the number of Function Points that have been modified, added and deleted in a release.  This is a very good metric, specially in a multi-vendor scenario where you can see how different vendors are charging you and can set targets and monitor productivity for each vendor.

How do you set targets for Structural Quality Metrics?
The ideal way to set targets is to analyze your applications for a minimum of two to three releases and use the average scores as a baseline.
An alternative method is to use industry benchmark data.  CAST maintains data from hundreds of companies across different technologies and industries in a benchmarking repository called Appmarq, and it can be used to set targets based on industry averages or best-in-class performers.
When do you introduce Structural Quality Metrics into an SLA?
Of course, the best time to introduce Structural Quality Metrics into SLAs is at the beginning of the contract, when it is the easiest to set expectations on quality objectives based on the static analysis solution outputs.  However, if you are in the middle of a long-term contract with a vendor, you can try to make changes to the existing SLAs. A situation like this will require collaboration with the vendor to define common goals on why, how and when to use a static code analysis solution and what kind of metrics make the most sense in the context of those goals.
To hear an analyst perspective on achieving maturity in your outsourcing relationships, don’t forget to register for our webinar on May 16th with Forrester Analyst Stephanie Moore.

Blind Faith and Black Code

Gandhi once said “Faith should be enforced by reason, if it becomes blind it dies”. The same message is at the core of Dr. Bill Curtis’s “fourth wave in software engineering” –which suggests that faith in your application software should be enforced with measurement.
“Third wave of software engineering” – which is process driven, gave a method to the madness of software development. It brought in the much needed discipline, rigor, and standardized approach to it. After a brief period of lull in the software engineering activity, there is some excitement, as the fourth wave is unraveling itself. Software Analysis and Measurement (SAM), which is at the heart of the new measurement based approach to software engineering discipline, is being developed to address the issue of measurement. SAM focuses on the actual output of the software development – the code itself. You can learn more about SAM and Fourth Wave at CISQ website (www.it-cisq.org), which is sponsored by OMG and SEI to develop the new standards.
But more importantly I would like to recommend a new term today that can be measured, monitored and used in the context of SAM – “Black Code”. Analyzing the code using static analysis tools is one of the core requirements of SAM, the output of the analysis will be mined to provide insights that feed into management decision support systems. As organizations start adopting the SAM practices, they would need some new way to measure what portion of the code is actually analyzed and how much risk exposure do they have from the unanalyzed code. That is where the concept of the “Black Code” will be very useful. “Black Code” essentially refers to the portion of the code which is not analyzed and measured, code for which you have no visibility. The inspiration for the term comes from “Black-Box Testing”, which takes into account external perspective of the test object to derive test cases and there is no knowledge of the test object’s internal structure. In few years it will be common for executives to ask questions like – “How much black code do we have in our system?” I will expand more on this concept and how it can be measured and used in the next few blogs, but just want to get some initial feedback.
To sum it up – “Faith in your code should be enforced with measurement, if you are blind to your code, it becomes black”