Output-Based Application Management

You need an application delivered fast. And you’re willing to pay more to get it done quickly. But how much more should you pay?
That depends of course on your supplier’s productivity. The more productive they are, the more they can charge on a per-hour basis. That’s because their productivity enables them to deliver the same size application in fewer hours than a less productive supplier would be able to. Which means that the cost to deliver a function point (function point per $) might actually be less than a supplier whose labor rates are much lower!  In other words, a supplier with a higher labor cost can actually be more cost efficient – and at the end of the day, this output metric – cost efficiency – is what matters. (Along with application quality, as we’ll see a bit later.)
There’s nothing surprising about it, but a picture definitely helps sort out the relationship between labor cost, productivity, and cost efficiency.
How Labor Cost and Productivity Drive Cost Efficency
Each curve above tracks a productivity level. For example, if one supplier’s productivity is 10 function points per hour (that’s not realistic, but stick with me), he can charge $70 per hour and still be just as cost efficient (around 0.175 function points per dollar) as another supplier whose labor cost is $40 per hour.
Try setting cost efficiency and productivity thresholds in your contracts. Once you’ve done a few of these projects, you’ll have enough to create your own productivity-quality curves.
With an output measure like cost efficiency (Function Points delivered per dollar spent), you can make decisions based on the output (FP per $) rather than on the input ($ per hour). After all, what matters is not what someone’s charging per hour but what the job as a whole ends up costing you. Fixed-price contracts are supposed to get you there, but they’re not a panacea. The problem is they’re usually missing a handle on the critical output measures you need to effectively manage the development or enhancement of an application.
One missing output measure is supplier productivity. Another missing output measure is application quality. Your supplier can give you a bushel of function points per dollar, but it won’t matter if what they turned out was of poor quality, leading to erratic application behavior, continuous headaches in production, and an application that’s just a bear to enhance.
So what does the picture look like with application quality thrown in the mix? Just to make the bubbles quite distinct, quality here is measured on a scale of 1 (low quality) to 100 (very high quality).
Application Quality in the Mix
Interesting in a Jackson-Pollock-meets-Klimt sort of way, isn’t it? What I did was just randomly assign a quality between 1 and 100 to each point on each of the productivity curves.  You don’t always get what you pay for; it would be nice if the quality of an application increases as labor cost increases, but unfortunately, this is not always the case. So, a random assignment of quality is what we have here – just for simulation purposes.
Now the quality of an application is hard to define, let alone quantify. But there are some good software engineering guidelines on how to do it. There are automated solutions out there that measure application quality. (Since application quality is not just a matter of summing up the quality of an application’s different components – how these components are linked up is a critical factor in the overall quality of the application – avoid quality measurement solutions that treat application components as independent entities. By the way, this applies to most desktop-based code checkers.) Use these automated solutions to get a grip on the quality of a supplier’s output – not just the number of hours they put into the job.
What next? Set productivity and quality thresholds. What if cost efficiency had to always be above 0.2 function points per dollar? A quick glance at the bubble chart above shows that setting this floor on cost efficiency pretty much rules out labor costs that are $60 per hour or greater. When you need cost efficiency to be over a certain value, you naturally put a ceiling on labor cost.
And what if quality always had to be above 75? With these two thresholds set, we get the simplified diagram below which makes it easier to make the right tradeoffs between labor cost, cost efficiency and quality.
Filtering by Quality and Cost Efficiency Thresholds
When it comes to managing application development or enhancement, conventional wisdom gets it backwards – it focuses on the input metric of labor cost and not on the output metrics of productivity and cost efficiency. That’s why it is, relatively speaking, much easier to manage a complex manufacturing production line or an airline routing system than it is to manage even moderately complex software projects.
An advantage of cost efficiency as I’ve defined it is the following. Suppose you define the value delivered to the business in terms of dollars per function point. It could either be dollars of revenue or dollars of cost savings, no matter. Now, when you multiply cost efficiency (FP/$) with value delivered ($/FP) you get (drumroll…), a dimensionless quantity – ooh, spooky! Why not call this leverage? It’s the degree to which a bit of  functionality in your application powers the engine of your business.
So, what’s the leverage of your mission-critical applications? Wouldn’t it be cool to benchmark this number against your peers?

How Not To Do APM

A few weeks ago I moderated a webinar on Application Portfolio Management (APM) featuring Phil Murphy, Principal Analyst at Forrester Research.
It contains some excellent information on how to think about APM and I encourage you to download the presentation and the audio track. What follows are some thoughts about what I think is a fundamental misunderstanding of what APM is. In other words, how not to do APM.
For starters, think about this question for a moment: What information about the portfolio do you need to make better decisions about Application A over here? Really. What exactly?
If you know all there is to know about Application A, don’t you have enough information to do everything you need to do to it – retire/replace or modernize? Why does information about other applications in your portfolio matter to the decisions you make about Application A?
It would be nonsensical to ask such a question when you’re managing a financial portfolio. That’s because the best way to stay on the efficient frontier of highest return for the lowest risk is to pay attention to how the securities in your portfolio are interconnected – in particular, the extent to which their risks are correlated .

The Financial Portfolio Efficiency Frontier

Whatever you think about Modern Portfolio Theory (the theoretical underpinnings of financial investment), surely you couldn’t be doing that in an application portfolio! Apps are not like securities. There are some essential difficulties even when you put aside the fact that it seems damn near impossible to calculate the “return” on an application. And if you’re not using a measurement platform like CAST, I’d be really curious to hear from you on how to calculate the risk of an application.
First off, securities are fungible – if you don’t like one you can replace it with another in your portfolio. They’re exactly the same except for their risk/return profiles. There is a finite transaction cost to making the change, but the process of making the change can be accomplished by clicking a mouse.  To reduce your exposure you can sell some shares of a particular stock in your portfolio. I don’t know exactly what that is analogous to in an application portfolio, but try doing something like that to your application portfolio without ruining your weekend.
The problem is that even when applications are interconnected in some ways, they’re not really amenable to being treated like stocks in a financial portfolio. Again, if you know how to measure risk (and more about that at the end), you may find some correlations between application risk values. But I doubt it.
I’m simply mystified by the utility of those ubiquitous two-by-two bubble charts showing all the applications in the portfolio. The thing is, to know that an application has low business value and high cost, you don’t need to know anything about any of the other applications. So what’s the point? Seeing all my applications in one “dashboard” is pretty but useless.
Just a sum of the parts?
At best, if you want to keep thinking about it in this way, I suppose you could think of APM as akin to managing a financial portfolio where the transaction costs are orders of magnitude greater than what you would expect in a financial portfolio.
But applications are related, you’re thinking. And you would be right, except that there’s not much you can do, practically speaking, with these inter-relationships.
So how are applications inter-related? There are a handful of ways:

Transactional Interdependency: Two or more applications might be part of a critical business transaction. They need to work together to accomplish the task.
Sharing Functionality: Two applications might be using the same service or bunch of services. So messing with these services can have some nasty ripple effects.
Functional Dependency: Application A might depend on the outputs of Application B. Or it might send its outputs to Application B.
Resource Dependency: The skills and expertise levels it takes to work on Application A might be the same as for Application B.
Shared Patterns of Success and Failure: On review of your defect logs, you find that the kinds of problems that beset one application are similar to those that beset another. Similarly, applications that share a particular architectural quality might perform really well. You can detect these pockets by looking at all the apps together.

But so what? Practically speaking, it’s tough to switch resources from one application to another or sequence activities on applications based on the portfolio view. For ages, IT execs have pitched their business VPs in vain on how financing a bit of extra work on “shared plumbing” can really benefit the applications that run their businesses.
As Kelly Cannon, former VP of Shared Applications Services at Kaiser Permanente puts it, “Every CIO has been in annual discussions with business partners over where to allocate IT funds. Should we put it toward maintaining the critical operating platforms on which the business runs, or build new business functionality? The business always wins this one – the answer is new business functionality.”
So here’s what I suggest. Forget the inter-connections between applications. Forget the ubiquitous bubble charts. They’re nice in theory but you can’t win any arguments (much less win any funding) with them. Focus on single applications. Do measure risk and do measure return as best you can.  Measure them on the applications you care about, not all your applications.
And stay tuned for a forthcoming book from Capers Jones – The Economics of Software Quality. There are frameworks in there for making both kinds of measurements — risk and reward.

The Chaos Monkey

Sometime last year, Netflix began using  Amazon Web Services (AWS) to run their immensely successful video streaming business.  They moved their entire source of revenue to the cloud. They are now totally reliant on the performance of AWS.
How would you manage the business risk of such a move? Stop reading and write down your answer. Come on, humor me. Just outline it in bullet points.
OK, now read on (no cheating!).
Here’s what I would have done. Crossed my fingers and hoped for the best. Of course you monitor and you create the right remediation plans. But you wait for it to break before you do anything. And you keep hoping that nothing too bad will happen.
Obviously, I don’t think like the preternaturally freakishly smart genius engineers at Netflix. In the Netflix Tech Blog, here is how they describe what they did (I read about this first in Jeff Atwood’s blog, Coding Horror).
“One of the first systems our [Netflix] engineers built in AWS is called the Chaos Monkey. The Chaos Monkey’s job is to randomly kill instances and services within our architecture. If we aren’t constantly testing our ability to succeed despite failure, then it isn’t likely to work when it matters most – in the event of an unexpected outage.”
This is what proactive means! How many companies have the guts backed up by the technical prowess to do this (not to build the Chaos Monkey but deal with the destruction it leaves in its wake)?
It dawned on me that IT systems are constantly bombarded by countless chaos monkeys, or one chaos monkey if you prefer, controlling hundreds of variables. The best way to get ahead is to simulate the type of destruction these chaos monkeys might cause so you can be resilient rather than reactive to monkey strikes.
And strike the monkeys will. Especially when software is changing rapidly (Agile development, major enhancement, etc.). In these conditions, the structural quality of software can degrade over a period of time and over iterations or releases.
So I built a chaos monkey to simulate this deterioration of structural quality as an application is going through change. Here’s how it works.
An application starts with the highest structural quality – a 4.0 (zero is the minimum quality score). At the end of each iteration/release/enhancement one of three things might happen to this value:

It might, with a certain probability, increase [denoted by Prob(better quality)]
It might, with a certain probability, decrease [Prob(worse quality)]
It might, with a certain probability, stay the same [Prob(same quality)]

Of course we don’t know what each of these probabilities should be – once you have structural quality data at the end of each iteration for a few projects, you would be able to estimate it. And we don’t know how much the structural quality will increase or decrease by at each iteration, so we can try out a few values for this “step” increase or decrease.
After 24 iterations here is where the chaos monkey has left us.
The Structural Quality Chaos Monkey
Because structural quality is at the root of visible behavior, it can be difficult to detect and monitor in the rush and tumble of development (or rapid enhancement). Even when structural quality drifts downward in small steps, it can quickly accumulate and drive down an application’s reliability. It’s not that you or your teammates lack the knowledge; you simply don’t have the time to ferret out this information. Business applications contain thousands of classes, modules, batch programs, and database objects that need to work flawlessly together in the production environment. Automated structural quality measurement is the only feasible way to get a handle on structural quality drift.
Once you know what to expect from the chaos monkey, you can build in the things you need to do to prevent decline rather than be caught by surprise.
Long live the chaos monkey!

Rumsfeld on Software – Handling Unknown Unknowns

While former Secretary of Defense Donald Rumsfeld never spoke or wrote about software (as far as I know), his quip about unknown unknowns during the early months of the Iraq war is well known.
No matter what you think of Rumsfeld, his classification applies nicely to software and teaches us a lesson or two about building good software.

Some things you can test for right away. Some things you can anticipate and set aside to test for later. But the stuff in the top right in red is impossible to test for and not easy to plan for either. How an application and its environment will change is quite uncertain.
How do you handle this uncertainty?
By starting with static analysis, but not stopping there. You have to go beyond static analysis in five ways:

Analyze and measure the application as a whole not just its component parts in isolation. This means going wide on technology coverage — not just a plethora of languages, but being able to handle frameworks and databases. It means putting your measurements in the context of the application as a whole, not just parts of it.
Generate a detailed architectural view that can be readily updated. This gives you the visibility to see what’s changing.
Make sophisticated checks of patterns and anti-patterns in software engineering to catch design and bad-fix problems that are otherwise impossible to find and eradicate.
Provide actionable metrics that gives IT teams a sense of what to change (and in what sequence) to improve quality.
Automate, automate, automate! If you do 1 through 4 above, you would then be automating design and code reviews — provably known to be the most effective insurance against unknown unknowns.

Stock Exchange Failures – What Next?

So, a slew of high-profile failures.
Major stock exchanges suffering embarrassing technology breakdowns that have left some traders resorting to placing trades over old-fashioned phone lines! What next, ticker tape?
Euronext, Borsa Italiana (bought by the LSE in 2007), the Australian Stock Exchange (ASX) and London (LSE) are all suffering or suffered outages. And two days ago, it struck Bank of America.
That’s quite a line up – a total market capitalization of $150 billion which amounts to about 30% of Switzerland’s 2010 GDP!
Of course, we don’t know what went wrong for sure yet. A last straw that broke the camel’s back? Missing a step in a protocol because of miscommunication? A sequence of events, each one harmless on its own, but deadly when put together?
One thing we do know is these systems are incredibly complicated. As Lev writes in the previous blog post, London Bourse is Falling Down:
“An average mission critical application has just under 400,000 lines of code, 5,000 components, 1000 database tables and just under 1000 stored procedures. Architecturally, these components are layered such that an average transaction passes through about five layers between the user and the data. The applications that are in the top quartile of size and complexity comprise over 2.5 million lines of code, and though we haven’t analyzed the MilleniumIT trading system [the one that runs the LSE], it is probably well into the top quartile…probably well beyond that mark.”
You can only test for things you know will go wrong. But the software that runs these exchanges is complex enough that no human or team of humans can possibly know what to test for. It goes beyond what testing is even meant to do.
The only way to improve your odds on systems like these is to measure structural quality – the way in which the system is built to withstand unknown unknowns.
In a quick video, see how CAST makes structural quality visible.

Bill Curtis Keynotes CONSEG 2011

This week, Bill Curtis, SVP & Chief Scientist at CAST will be delivering a keynote presentation at CONSEG 2011 in Bangalore, India.
Bill’s presentation, entitled “The Structural Quality of Business Application Software: An Empirical Report on the State of the Practice.” will focus on the conference theme: addressing the aspects of software engineering that impact the quality of system processes and products.
In his presentation, Bill will cover:

The quality problem in externally-supplied software (sourced and COTS/packaged)
Measuring and managing the structural quality of applications
How to measure the impact of software structural quality on business outcomes

Dates: February 17-19, 2011
Location: Chancery Pavilion
Residency Road,
Bangalore 560025, India
Registration: http://www.conseg2011.org/register.html

Straight Talk on Technical Debt

The intensity and speed of Agile development puts a premium on satisfying the functional requirements of business users. But this intense focus comes at the expense of structural quality.
Can we measure this effect? Indeed. We can do it by calculating the technical debt accrued during the Agile development process.
And once we quantify technical debt we can begin to ask questions like “how much technical debt is too much?”
On February 17th, in San Francisco, Jim Highsmith, executive consultant for ThoughtWorks, and yours truly will host a session on technical debt. We’ll cover the following topics:

Are we measuring the right metrics in Agile?
How is technical debt measured in Agile?
What are the financial implications of technical debt?
How can we use technical debt to make the right tradeoffs between delivery speed today and cost of maintenance tomorrow?

Click here to register!
Date: Thursday, February 17th
Time: 6:00 p.m. -8:00 p.m.
Location: The Palace Hotel, San Francisco, CA