False positives in SAM — Achilles’ heel or Samson’s hair?

False positives are unavoidable and appear in every software application measurement system, with more or less importance. There are several causes to that situation.

First, the more we search for information, the higher the risk of false positives.
Second, the more complex the information is to search, the higher the risk of errors.
And third, the less sophisticated the technique used to scan the code, the higher the risk of having bad results.

In this last case, the different techniques commonly used varied from a simple grep search to syntax-based parsing, semantic resolution, and dataflow analysis.
However, the situation can be seen following two opposite points of view: a negative one, considering that false positives are the Achilles’ heel of SAM; and a more positive one, like Sampson’s hair, that considers false positives valuable information.
The Achilles’ heel
The false positives generated when analyzing software applications impact measurement results, making the risk evaluation increasingly difficult. We cannot be sure the violations we are looking at are true or false, even if we have an idea of the results for a given rule in advance. It is not possible to know where false positives can be and this is annoying for people who have to use them.
Their occurrence depends on the measurement itself, but also on the nature of the application that is measured. I have experienced this situation multiple times since I starting to work on SAM in 1990, and it is very irritating to have doubt surrounding your results. I often prefer to check and double-check a large number of cases, to be sure. But is it possible to be really sure? I think it’s not. We can only refine our analysis techniques more and more.
Usually, I have an idea about the number of violations a quality rule should generate and if results are too numerous, then I’m pretty sure there are false positives. The problem here is that false positives lead to wasted time checking results, and decreased confidence in the measurement system. Moreover, when the analysis engine is improved and the false positives are removed, it is possible users continue to see violations as such and discard them without paying attention.
False positives also disturb an application’s benchmarking. When doing this type of exercise, it is a good thing to take into account the error rate. Effectively, the comparison can become inconsistent if the number of false positives is too high. In this case, how do you know if the results position an application correctly compared to others?
Applications with different characteristics can generate a different number of false positives. If an application has been implemented by using a programming construct generating false positives, then the total number of false violations for this application will be abnormally higher than for other applications. In the end, the comparison will be biased.
Samson’s hair
On the other hand, let’s be optimistic! Some measures are more complex than others and require an expensive effort, and having false positives in the results is better than having no result at all. SAM systems have their strengths as well as their weaknesses!
Effectively, even if the measure is not perfect, it allows you to know if the situation is rather good or rather dramatic, and which components are impacted or not. Moreover, we are at least aware that the measure is not so easy to take and the results must be interpreted cautiously.
If the number of violations is too high, then the number of false positives can be also high and will be disseminated in long lists, meaning that searching for them will become harder. As a consequence, the uncertainty on the result value compared to reality becomes significant and must be taken into account when working on them.

But, if the number of results is not too high, then false positives have a limited impact for the user’s work. Effectively, in this case, they are easily visible among the list of violations and the user can quickly identify and filter them to evaluate the risk incurred by the application that has been assessed. Moreover, even if a false violation is taken into account, generally it does not change the results and conclusion a lot.
And, to remain constructive, a false positive also means that what was searched is not so easy to find or can have several facets. This can be used to justify future investment to improve the SAM system!