About Philippe Emmanuel Douziech

Philippe-Emmanuel Douziech is a Product Manager by CAST since 2005, previously in charge of the CAST Dashboard, now responsible for the CAST Quality Model. He participated to the CISQ workshops from the start to define the factors affecting applications’ vulnerability, availability and responsiveness to the end users, as well as applications’ maintenance cost, effort and duration. Prior to CAST, he was Product Manager by ORSYP, in charge of the event-driven peer-to-peer job scheduler Dollar Universe. Prior to ORSYP, he worked on Inertial Confinement Fusion simulation and experiment. Philippe-Emmanuel has a master degree in engineering and executive sciences from MINES ParisTech.

Would you be so nice as to not tell me the truth?

I recently found myself in yet another endless discussion about how bug fixes and extra capacity impact the results of a Software Analysis and Measurement (SAM) assessment.
My interlocutor’s first reaction is that it must be the computing configuration (i.e., the way to turn quality findings into an assessment score, status, etc.) that changed. Fixing bugs or adding extra capabilities won’t have that impact on assessment results. Therefore, keeping the computing configuration stable keeps the results stable.
Then, after explaining that finding new or more accurate dependencies would impact the SAM assessment results — thanks to a better understanding of complex behaviors, for instance — my interlocutor reluctantly accepted that it can have a tiny impact, but by no means a dramatic one. His main argument was this: In real life, one would not lose a certification because of additional knowledge. And this is where I tend to disagree with most when dealing with risk.
For example, when assessing the safety hazard of a plant:

Would the knowledge that a given construction material is a carcinogen not change the assessment result?
Couldn’t this cause a small or dramatic effect, depending of the amount of hazardous material found in the audited plant?
And wouldn’t the results change in an unpredictable way as, up to this point, no one cared about measuring the amount of the hazardous material?

At this point, my interlocutor started to become evasive because he still could not accept such changes in the SAM world.

What if I know that you have a proven CWE vulnerability in your code?
Should I keep silent, as you would not accept a dramatic impact of your assessment?
Should I minimize the risk, as you would only accept a tiny impact on the assessment outcome?

That is basically what 99 percent of people ask for (I should say 100 percent, but I would rather leave room for some people that remain rational in the digital world of IT).
Is a dramatic change disturbing? Yes, of course. But isn’t it also disturbing in the real world? Knowing what it will cost you to remove asbestos from the 56 floors of the Montparnasse Tower must be disturbing. I read it could cost up to 800,000 EUR per floor.
But that doesn’t change the fact that asbestos is now known to be a health hazard. I understand that some people — most likely the ones signing the checks — would be willing to say that the tower is as safe a place to work in as it was before the world knew asbestos was a health hazard and before the asbestos level was measured in the tower. But that is not a reason to hide the truth.
So the question now becomes: How do we handle the change?
To answer this question, we can look to the non-IT world (let me call it “the real world” from now on).
I also happen to work on a roll-bearing assembly line. Whenever a cutting tooth from a CNC cutter needed to be changed, not a single person in the plant would assume that you could fire up the cutter right away. Not before a proper re-calibration of the cutter had been done.
As for just-in-time strategy and productivity measurement in industrial process and basic house-cleaning principles, it seems the IT world is so different — or even superior — that the real world’s principles would not even apply.

How many people, even in the workplace, get their computer so full of garbage files and programs that they end up buying a brand-new computer? As if they would hoard junk in their home or office, then move to another home or office when the first one is full. (I know it does happen, but it usually ends up in reality TV shows.)
How many IT professionals think that productivity measurement is only about the produced volume of code, and completely disregard the quality of the production? The industrial world knows for a fact that volume without quality is not the path to growth a competitive edge.
How much effort does it take to convince IT professionals that just-in-time strategies and event-driven architecture can yield the same responsiveness to business requirements and the same resource-usage efficiency in delivering IT outcome as it did in assembly lines? To their benefits, it took many decades to convince the industrial world of these benefits. The pity is that IT doesn’t have the excuse of being the pioneer in this domain; it has a huge amount of knowledge and experience to leverage. Yet, it seldom is.

SAM is not the only measuring activity in the world. However, its practitioners need to reach the same level of maturity as their real-world counterparts.

The Holy Grail: Objective risk level estimation

In my last post we discussed the complimentary nature of remediation cost and risk level assessment. As a follow up, I wanted to dwell on the objective risk level assessment. Is it even possible? If not, how close to it can we get? How valuable is an estimation of the risk level? Could it be the Holy Grail of software analysis and measurement? Or is it even worth the effort?
Risk level
By definition, risk level is about the severity of the consequences of a situation and the probability for this situation to happen. A high-probability, low-impact situation can then “compete” with a low-probability, high-impact one for the riskiest situation award. But when dealing with software, what could be a good indicator of risk severity and risk probability?
This is a good starting point to look at objective risk level assessment, especially since software analysis and measurement is all about measuring things anyway.
Hopefully, when looking for risky patterns to measure in the source code, application structure, and architecture, the reasons to look for the patterns is well-known. What you’re not going to be able to measure are the consequences of these risky patterns on the organization’s activity. Would it cause revenue loss? Would it cause a downgraded customer satisfaction? Would it prevent the release of the “killer” Christmas offer?
This information can be collected, but not analyzed and measured from the software itself. However, this missing link shouldn’t stop us in our tracks.
Then, when looking at the way an application is built, (at its various execution paths, etc.) one can get an idea of how frequently a faulty piece of code or pattern is used. However, what we can’t see is the frequency with which end-users will use any given application feature or cause the application to use any given specific execution path. Is it a pattern that is used by a popular transaction from the web-facing shopping site? Is it a component used throughout the application as it participates in application cache management?
Risk severity level estimation with SAM
Some risky patterns, but not the vast majority, can be “direct hits.” They prove that risk is real, and waiting to happen. This makes estimating their severity a lot easier. The business context will still have to be found outside of the software itself (the caveat stated above is acknowledged) but, other than that, the outcome is obvious. For instance, CWE patterns such as cross-site scripting will allow for phishing, which can be incredibly risky if the application is web-facing.
The majority of risky patterns are not direct hits. They are evidence to support the trend that risk is real and very much alive in the enterprise. In this case, the amount of evidence will be the next best thing to proving the risk is real.
For example, let’s talk about complexity. I am not a big fan of the “complexity is bad” argument. I am more in favor of “unneeded complexity is bad.” (“No Silver Bullet,” in which Fred Brooks differentiates essential complexity from accidental complexity). Of course, determining the “unneeded” part is the true challenge.
I wouldn’t spend much time focusing on cyclomatic complexity. I would look at many complexity-related indicators to pinpoint the components that concentrate most if not all of them. I can still be proven wrong, but the component’s bad smell is so strong that I am not often disappointed. Comparing components with high cyclomatic complexity alone and components with high cyclomatic complexity, high integration complexity, high essential complexity, high depth of code, etc. as far as complexity matters, the latter components are the real winners, as they smell stronger for some good reasons.
This kind of reasoning can be applied to areas other than complexity (I used the complexity area as it widely and better understood).
You can easily compute a “performance bad smell” when a component starts to concentrate many issues that contribute to possible negative impact on performance levels. Each issue taken individually is not a direct hit, but when they start to pile up they can become an issue.
Therefore, as a general approach, computing a “bad smell” by looking at everything that is not right with a component in a given area can help estimate, but not yet measure, the risk severity level.
Risk probability estimation with SAM
Risk probability can be addressed with or without contextual information. Contextual information is always a plus, but might not be available. Therefore, always have a plan B handy.
First, let’s focus on plan A. I said earlier in this blog post that some information is missing from the source code, application structure, and architecture. However, SAM can help bridge the gap between the application innards and the business context and purpose. With a bridge, it can be a lot easier to know how frequent and important a feature using a faulty component will be in use, thus providing an estimate of the risk probability.
What kind of bridge am I talking about? I’m referring to the ability to map user-facing transactions (the application features in a sense) with the technical components within the applications that participate in the transaction execution. This bridge is objective as it relies on the actual application source code and structure. By the way, this bridge can also help estimate the risk severity because you’ll know how much damage would be caused had a transaction failed or performed too slowly.
What if I can get the information from the real world? Here is one possible plan B. With SAM, you can actually count “how many roads are there to Rome?” The roads are wired in the application source code and structure. Of course, it fall shorts of the Plan A. But it’s always applicable and it’s totally objective. The more roads, the higher the risk.
The Holy Grail
By nature, SAM-based risk level is incomplete, yet SAM-based risk level assessment can be automatically “objectivized.” And I am pretty sure there are new ways to be devised to move forward. Do you think that objective risk level assessment is the Holy Grail for SAM, or does the future hold more in store for us? Share your ideas in a comment.

Remediation cost versus risk level: Two sides of the same coin?

While working in a CISQ technical work group to propose the “best” quality model that would efficiently provide visibility on application quality (mostly to ensure their reliance, performance, and security), we discussed two approaches that would output exposure. The first is a remediation cost approach, which measures the distance to the required internal quality level. The other is a risk level approach, which estimates the impact internal quality issues can have on the business.
Although both are based on the same raw data, the information differs when we identify situations that do not comply with some coding, structural, and architectural practices. The former approach will estimate the cost to fix the situations while the latter approach will estimate the risk the situations create.
The remediation cost approach
This approach has appeal because:

It is simple to understand: we are talking effort and cost. Anyone can understand that fixing this type of issue takes that amount of time and money
It is simple to aggregate: effort or time simply adds up
It is simple to compare: more or less effort or time for this application to meet the requirements
It is simple to translate into an IT budget

However, its major drawback is that it does not estimate the consequences. Using the technical debt metaphor, this approach only estimates the principal of the technical debt (that is, the amount you own) without estimating the interest payments (the consequences on your development and maintenance activity as well as on the service level of the indebted application). Why should we care? Because you will have to decide: Which part of the debt am I going to repay? Where do I start for a maximum return on investment?
A half-day fix can relate to a situation that can crash the whole application. For example, an unknown variable in a massively used library might be nothing to fix, while the consequences on the application behavior in production are severe. However, the remediation cost does not convey any the sense of urgency. If I were to monitor the progress of the project, a leftover half day would not scare me and force me to decide to fix it no matter what, even if it meant postponing the release date.
If the application did crash, would the answer, “Oh, we were just a half-day away from the required internal quality …” be acceptable? I think not. Something should have told me that despite the short distance to the required internal quality, the incurred risk was too high.
The risk level estimation approach
This approach has a different kind of appeal. Its proponents say that its models are what truly matter: the risk an application faces regarding its resilience, its performance level in case of an unexpected peak of workload, its ability to ensure data integrity and confidentiality, its level of responsiveness to business requirements, its ability to fit in agile development contexts, and to benefit from all sourcing options.
It puts the focus back on the fact that applications are here to serve the business and serve it well. Technical debt would not matter so much if it had no consequences on the business — It would remain a development organization issue and not a corporate issue.
There are some headlines in the news about late and over-budget projects in the IT sector. There are many more headlines in the mainstream news about major application blackouts and sensitive data leaks.
However, risk-level automation’s major drawback is its lack of pure objectivity. What is the business impact of a vulnerability to SQL injection? Nothing, until you find out. This isn’t so much of a problem in an internal application, but much more in a web-facing, mission-critical, data-sensitive application.
The two sides of the same coin?
Are these irreconcilable differences? Not so much if you think of the impact on the business as the interest-that-matters of the technical debt, while remediation cost are the principal sum of the technical debt.
What does “interest-that-matters” mean? It means “it depends,” of course. It depends on the value the application delivers to your organization. It depends on your application development and maintenance strategies. The context is key. The same principal amount of technical debt carries widely different interests in different contexts.
Why not use the same unit, that is, $ or €? First, the amounts could be too huge to serve any value to the business (outside a Monopoly board game). They are also too unpredictable — as the amounts are application dependent and, even for a given application, the consequences are also difficult to predict.
As for any other risk, this is more about giving a status: Is the risk level tolerable?
Many different statuses can be used:

Severe, high, elevated, guarded, or low
Unacceptable, poor, acceptable, good, or excellent
Very high / extreme, high, moderate, or low

These statuses convey the interpretation of the risk assessment. The output already takes into account the different aspects of risk: likelihood and consequences in context.

Now what?
If you are convinced, as I am, of the complementary nature of remediation cost and risk level, you would nonetheless point out that the major hurdle: objective risk level estimation.
Stay tuned for my next post, where we’ll look at this major hurdle to providing visibility into application quality.
How have you gotten visibility into your application’s quality? Share your story in a comment.

Is Every Part of the Application equal when Assessing the Risk Level?

Risk detection is about identifying any threat that can negatively and severely impact the behavior of applications in operations, as well as the application maintenance and development activity. Then, risk assessment is about conveying the result of the detection through easy-to-grasp pieces of information. Part of this activity is about highlighting what it is you’re seeing while summarizing a plethora of information. But as soon as we utter the word “summarizing,” we risk losing some important context.
Application split impact as a strength in risk assessment
An application can be considered as a whole in its purpose of servicing one area of the business, yet it is composed of multiple technical and functional parts. In other words, an application is not about one single feature.
The ability to split an application into its main features, or groups of features or functional domains, is critical to map the occurrences of risky situations. Indeed, considering that every single piece of code or software construct is equivalent with regards to the risk an application incurs is valuable for objective comparison. Yet it misses the point that they serve different features and that these features are not equal, would they fail in operations. For instance:

The very location where the violations occur is key as it might be in a piece of code or in a construct that is supporting a non-critical feature that does not handle any sensitive data. Or, on the contrary, is supporting a mission-critical feature that does handle sensitive data with a customer-facing front-end over the Internet.
Likewise, a piece of code or software construct that is involved in many such critical features creates a much higher risk even though it is still occurring in one location.
Taking the context into account will help provide a better assessment than a purely objective one.

The same issue holds true when dealing with application upgrades as well. I faced a situation where the team in charge of evolving the application would complain about the huge difficulties to perform their task, saying it was, “terrible to maintain.” Paradoxically, the compliance ratio with applicable coding and architectural practices were pretty good. Issues related to less than one tenth of a percent of the code. The real issue is that the few occurrences of non-compliance were located in the very part of the application they had to evolve regularly in response to business requirements. It all made sense once they knew that this small fraction of the code was the one that mattered.
As it is critical to know the kind of application we are dealing with to adapt the risk assessment accordingly, this mapping ability will provide context to the findings; it will — or should I say must — change the resulting risk level assessment.
A walk through
Let’s look at the following situation of four applications composed of 10 components each:

The color is designed to provide you with a risk assessment of each component of these applications, with green being the right place to be and red being the wrong one. Would you say the risk level is the same in these four cases?
Then, let us look at another situation:

And now this one:

They all look different and I assume you would like to be responsible for the application showing in the first row, and dread the responsibility for the application in the third row.
And yet:

All of them are based on the same number of defects (10 percent)
Sample #1 uses a linear scale from green to red to show defect percent from 0 to 100
Sample #2 uses a linear scale from green to red to show defect percent from 0 to 50, then a red plateau when more than 50 percent of defects
Sample #3 uses a linear scale from green to red to show defect percent from 0 to 50, then a red plateau when more than 50 percent of defects with 3 modules that are more critical than the others

Does it mean there is no truth in it?
As for me, I would see an opportunity to deliver better risk assessment results:

What do you look for when assessing risk in an application?

Risk Detection and Benchmarking — Feuding Brothers?

Risk detection is the most valid justification to the Software Analysis and Measurement activity: identify any threat that can negatively and severely impact the behavior of applications in operations as well as the application maintenance and development activity.
“Most valid justification” sounds great, but it’s also quite difficult to manage. Few organizations keep track of software issues that originate from the software source code and architecture so that it is difficult to define objective target requirements that could support a “zero defects” approach. Without clear requirements, it is the best way to invest one’s time and resources in the wrong place: removing too few or too much non-compliant situation in the software source code and architecture, or in the wrong part of the application.
One answer is to benchmark analysis and measurement results so as to build a predictive model. This application is likely to be OK in operations for this kind of business because all these similar applications show the same results.
Different needs?
On the one hand, by nature, benchmarking imposes to compare apples with apples and oranges with oranges. In other words, measurement needs to be applicable to benchmarked applications — stability over time — so as to get a fair and valid benchmarking outcome.
On the other hand, risk detection for any given project:

benefits from the use of state-of-the-art “weapons”, i.e., the use of any means to identify serious threat, that should be kept up-to-date every day (as for software virus list)
should not care about fair comparison. It’s never a good excuse to say that the trading applications failed but that it showed better results than average
should heed contextual information about the application to better identify threats (an acquaintance of mine — a security guru — once said to me there are two types of software metrics: generic metrics and useful ones), i.e., the use of information that cannot be automatically found in the source code and architecture but that would turn a non-compliant situation into a major threat. For instance: In which part of the application is it located? Which amount of data is stored in the accessed database tables — in production, not only in the development and testing environment? What is the functional purpose of this transaction? What is the officially vetted input validation component?

Is this ground for a divorce on account of irreconcilable differences?
Are we bound to keep the activities apart with a state-of-the-art risk detection system and a common-denominator benchmarking capability?
That would be a huge mistake as management and project teams would use different indicators and draw different conclusions. Worst case scenario: Project teams identify a major threat they need resource to fix but management indicators tell the opposite so that management deny the request).
Now what?
Although not so simple, there are steps that can be taken to bridge the gap.
It would be to make sure:

that “contextual information” collection is part of the analysis and measurement process
that a lack of such information would show (using the officially-vetted input validation component example, not knowing which component issues are a problem that would impact the results; not an excuse for poor results which much too often the case
that the quality of the information is also assessed by human auditing

Are your risk detection and benchmarking butting heads ? Let us know in a comment. And keep your eyes on the blog for my next post about the benefits of a well-designed assesment model.