Why are there so many hurdles to efficient SAM benchmarking?

Two opposite sides
When dealing with Software Analysis and Measurement benchmarking, people’s behavior generally falls in one of the following two categories:

“Let’s compare anything and draw conclusions without giving any thought about relevance and applicability”
“There is always something that differs and nothing can ever be compared”

As often, there is no sensible middle ground.

Using Hibernate Frameworks: What Are the Best Practices?

I’ve recently been involved in helping CAST Research Labs analyze a large sample of Java EE applications (496 apps), looking to understand the impact of different frameworks on application structural quality. We analyzed these applications using CAST’s Application Intelligence Platform (AIP) to identify critical violations of architectural and coding practices. While looking at the critical violations that were detected by CAST, something struck me: The success ratio (i.e. the ratio between the number of times a rule is violated and the number of opportunities this rule could have been violated) for rules associated to Hibernate was particularly low, indicating issues related to performance and robustness for applications using this framework. (The details of the report will be published next week – we presented a preview of the analysis during a webinar in January.)
Hibernate is one of the most popular frameworks in the Object Relational Mapping area. It prevents you from dealing with the complex task of mapping objects to relational database allowing the development your data layer using only POJO, and keeping your application portable through existing databases. But at the same time, Hibernate solves any existing mapping issues, making it difficult to implement under correct performance and robustness standards.
In my previous post, I discussed whether frameworks could simplify our lives. In this post, I want to focus on Hibernate and which best practices you should follow when using it in your Java EE application.
The rules associated to Hibernate that had the lowest success ratios were the following.
Persistent classes should Implement hashCode() and equals()
In our analysis, this rule had the lowest success ratio (7.70 percent) across all frameworks analyzed, affirming that this architectural practice is too often ignored by developers. Although Hibernate guarantees that there is a unique instance for each row of the database in a session, you still need to supply your own implementation of the equals() and hashCode() methods for your persistent classes whenever you work with objects in a detached state. This is particularly true when you test these objects for equality, usually in hash-based collections.
Avoid using references to the ID in the persistent class’ method equals()
In our analysis, this rule had the second lowest success ratio (37.13 percent). It is possible for the programmer to define the meaning of Java Equality. However, Hibernate will only set the ID field when saving the object; it is therefore important not to use the ID field in the Java Equality definition when it is a surrogate key. For that reason, saving the object that has been added to a set collection results in identity change. In addition, the behavior of the Set/Map collection class is not specified when the value of an object is changed in a manner that impacts equals comparisons while the object is an element in the Set or is the key of a Map, you might corrupt your database.
For example:
Person p = new Person();
Set set = new HashSet();
set.add(p);
System.out.println(set.contains(p));
p.setId(new Long(5));
System.out.println(set.contains(p));
Prints: false
Other best practices that should be followed when using Hibernate with your Java EE applications are the following:
Never use array to map a collection
The details of an array mapping are virtually identical to those of a list. However, we strongly recommend against the use of arrays, since arrays can’t be lazily initialized (there is no way to proxy an array at the virtual machine level). Lists, maps, and sets are the most efficient collection types.
So, using array can affect your application performance when it contains many items: lazy loading, optimized dirty checking, and poor performance features for persistent collections.
Avoid public/protected setter for the generated identifier field
A primary key value must never change once it has been assigned. Since it is a generated key, it is automatically set by Hibernate, or by another JPA implementation or by another provider. The actual behavior of an application tries to modify the value of a primary key that is not defined.
Avoid many-to-many association
“Many to many” usage is discouraged when a simple bidirectional “many-to-one”/“one-to-many” will do the job. In particular, a many-to-many association might always be represented as two many-to-one associations to an intervening class. This model is usually easy to extend. In a real system, you might not have a many-to-many association as there is almost always other information that must be attached to each link between associated instances, such as the date and time when an item was added to a category. The best way to represent this information is via an intermediate association class. On top of this, changing the definition of a primary key and all foreign keys that refer to it is a frustrating task.
Persistent class method’s equals() and hashCode() must access its fields through getter methods
This rule is important: the object instance that is passed as ‘other’ might actually be a proxy object and not the actual instance that holds the persistent state. This is the case where there are lazy associations between classes. This is one area where Hibernate is not completely transparent. But it is good practice to use accessor methods instead of direct instance variable access. When we are tuning the performance of the application, a lazy association might actually be required.
This potential issue raises a ClassCastException and can cause the application to become unstable.
Avoid non serializable Entities
When Entity bean instance is to be passed by value as a detached object (for example, through a remote interface), the entity class must implement the Serializable interface.
Also, in some cases an OptimisticLockException will be thrown and wrapped by another exception, such as a RemoteException, when VM boundaries are crossed. Entities that might be referenced in wrapped exceptions should be Serializable so that marshaling will not fail. One of the consequences to not following this rule is receiving an exception when a non Serializable Entity is passed by value.
This is just an extract of best practices on Hibernate, but you can already see that not following them can have severe consequences in terms of robustness and performance. These rules can be quite obvious by an expert of Hibernate, but for the novice user, Hibernate can be tough to use. Abstract is big and complex and the user must spend more time in assessing the concept, function, and uses in the developing program.

Show your love for Java


Ask a dozen techies about the best programming language and you’ll likely end up with 13 opinions, and a few objects might get thrown. But has your love of your programming tools ever won you anything other than an argument? Well, now’s your chance!
We’re running a Facebook contest to find out which Java framework (if any) you prefer, and why. When the contest ends March 12, 2013, we will randomly select one name out of the proverbial hat to win a brand new Kindle Fire HD with an 8.9” display.
So what are you waiting for? Hop on over to our Facebook page, click on the Java Sweeps tab, like us, and tell us the framework you think is best.
Good luck!

Use static analysis tools to increase developers’ knowledge

Static code analysis is used more and more frequently to improve application software quality. Management and development teams put specific processes in place to scan the source code (automatically or not) and control the architecture of the applications they are in charge of. Multiple analyzers are deployed to parse the files that are involved in application implementation and configuration, and they generate results like lists of violations, ranking indexes, quality grades, and health factors.

Based on the information that is presented in dedicated tools like dashboards or code viewers, managers and team leaders can then decide which problems must be solved and the way the work has to be done. At the end of the analysis chain, action plans can be defined and shared with development teams to fix the problems. This is the first aspect of static analysis regarding application software quality, but this must not be the only one.
But what about the importance of understanding the problem?
The second aspect concerns the developer himself, as he is strongly impacted by quality improvement. Obviously he can use the results to fix any potential issues that have been identified, but he must understand the problem behind a violation and, above all, the remediation that must be applied to fix it.
Documentation, like coding guidelines, must be available and an analysis platform, like CAST AIP for instance, can provide him with a clear description of the rules that are violated. The documentation presents explanations about the problem and the associated remediation, with meaningful examples to illustrate how the violation is identified and how a possible remediation can be implemented. With that, the developer will be more confident regarding the tool, and he will spend less time finding the exact piece of code that has been pinpointed, and replace it with a better one in less time.
However, this aspect also concerns team leaders and mid-management. As I said above, they have to define action plans by taking into account the severity of problems and the bandwidth they have regarding development activities. The documentation is also important here, since it is going to help management understand the problems, their impact on the business, and the cost to fix them. Defining action plans is often a challenge for people who are not very technical. How do you select which violations need to be fixed first with limited resources?
Involving the actors
This second aspect is very important because it contributes to improving the developer’s knowledge of the technology and its impact on the software quality, which leads to decreased violations. I have already experienced developers in some shops who were not aware of tricky programming language behavior.
Developers wrote code with risky constructs without knowing the resulting behavior and induced defects in the software. This is especially true with complex programming languages, like C++ or framework-based environments like JEE. It’s also true when developers are newcomers with basic training and low experience. Thus, documentation with clear rationale, remediation, and with good examples that can be reused easily can be considered as a complement to training.
As I said in a previous post related to the developer’s choice, it is never too late to improve our knowledge and to avoid making the same mistake several times. Moreover, it is very frustrating to receive a list of violations (sometimes coming with unpleasant remarks!) without knowing what exactly the problem is and how to fix it. You get a notification saying there is a violation here, and so what? Frustration and misunderstanding often prevent involvement.
In addition to that, static code analysis could be an interesting way to make developers more involved in terms of software quality by producing better implementation. Also, it’s a good idea to search for similar cases that developers are aware of that may not have been detected by the static analyzers.
Finally, software quality improvement is not only a matter of tools searching for violations. It is also a matter of understanding and knowledge for all the actors who are involved in application development and maintenance. This is why the quality rule documentation, the coding guidelines, the architecture description, and easy access to these sources of information are very important and should be taken into consideration in software quality projects.