Third-generation programming languages (3GL) like COBOL or PL/1 are seen as outdated languages for “has-been” developers, and do not interest new ones anymore (there were even predictions saying that COBOL was going to die in mid-term.) These new developers prefer more modern technologies, like J2EE or .NET, and, worryingly, educational organizations provide few learning opportunities for 3GLs.
I’ve recently been involved in helping CAST Research Labs analyze a large sample of Java EE applications (496 apps), looking to understand the impact of different frameworks on application structural quality. We analyzed these applications using CAST’s Application Intelligence Platform (AIP) to identify critical violations of architectural and coding practices. While looking at the critical violations that were detected by CAST, something struck me: The success ratio (i.e. the ratio between the number of times a rule is violated and the number of opportunities this rule could have been violated) for rules associated to Hibernate was particularly low, indicating issues related to performance and robustness for applications using this framework. (The details of the report will be published next week – we presented a preview of the analysis during a webinar in January.)
Hibernate is one of the most popular frameworks in the Object Relational Mapping area. It prevents you from dealing with the complex task of mapping objects to relational database allowing the development your data layer using only POJO, and keeping your application portable through existing databases. But at the same time, Hibernate solves any existing mapping issues, making it difficult to implement under correct performance and robustness standards.
In my previous post, I discussed whether frameworks could simplify our lives. In this post, I want to focus on Hibernate and which best practices you should follow when using it in your Java EE application.
The rules associated to Hibernate that had the lowest success ratios were the following.
Persistent classes should Implement hashCode() and equals()
In our analysis, this rule had the lowest success ratio (7.70 percent) across all frameworks analyzed, affirming that this architectural practice is too often ignored by developers. Although Hibernate guarantees that there is a unique instance for each row of the database in a session, you still need to supply your own implementation of the equals() and hashCode() methods for your persistent classes whenever you work with objects in a detached state. This is particularly true when you test these objects for equality, usually in hash-based collections.
Avoid using references to the ID in the persistent class’ method equals()
In our analysis, this rule had the second lowest success ratio (37.13 percent). It is possible for the programmer to define the meaning of Java Equality. However, Hibernate will only set the ID field when saving the object; it is therefore important not to use the ID field in the Java Equality definition when it is a surrogate key. For that reason, saving the object that has been added to a set collection results in identity change. In addition, the behavior of the Set/Map collection class is not specified when the value of an object is changed in a manner that impacts equals comparisons while the object is an element in the Set or is the key of a Map, you might corrupt your database.
Person p = new Person();
Set set = new HashSet();
Other best practices that should be followed when using Hibernate with your Java EE applications are the following:
Never use array to map a collection
The details of an array mapping are virtually identical to those of a list. However, we strongly recommend against the use of arrays, since arrays can’t be lazily initialized (there is no way to proxy an array at the virtual machine level). Lists, maps, and sets are the most efficient collection types.
So, using array can affect your application performance when it contains many items: lazy loading, optimized dirty checking, and poor performance features for persistent collections.
Avoid public/protected setter for the generated identifier field
A primary key value must never change once it has been assigned. Since it is a generated key, it is automatically set by Hibernate, or by another JPA implementation or by another provider. The actual behavior of an application tries to modify the value of a primary key that is not defined.
Avoid many-to-many association
“Many to many” usage is discouraged when a simple bidirectional “many-to-one”/“one-to-many” will do the job. In particular, a many-to-many association might always be represented as two many-to-one associations to an intervening class. This model is usually easy to extend. In a real system, you might not have a many-to-many association as there is almost always other information that must be attached to each link between associated instances, such as the date and time when an item was added to a category. The best way to represent this information is via an intermediate association class. On top of this, changing the definition of a primary key and all foreign keys that refer to it is a frustrating task.
Persistent class method’s equals() and hashCode() must access its fields through getter methods
This rule is important: the object instance that is passed as ‘other’ might actually be a proxy object and not the actual instance that holds the persistent state. This is the case where there are lazy associations between classes. This is one area where Hibernate is not completely transparent. But it is good practice to use accessor methods instead of direct instance variable access. When we are tuning the performance of the application, a lazy association might actually be required.
This potential issue raises a ClassCastException and can cause the application to become unstable.
Avoid non serializable Entities
When Entity bean instance is to be passed by value as a detached object (for example, through a remote interface), the entity class must implement the Serializable interface.
Also, in some cases an OptimisticLockException will be thrown and wrapped by another exception, such as a RemoteException, when VM boundaries are crossed. Entities that might be referenced in wrapped exceptions should be Serializable so that marshaling will not fail. One of the consequences to not following this rule is receiving an exception when a non Serializable Entity is passed by value.
This is just an extract of best practices on Hibernate, but you can already see that not following them can have severe consequences in terms of robustness and performance. These rules can be quite obvious by an expert of Hibernate, but for the novice user, Hibernate can be tough to use. Abstract is big and complex and the user must spend more time in assessing the concept, function, and uses in the developing program.
I recently found myself in yet another endless discussion about how bug fixes and extra capacity impact the results of a Software Analysis and Measurement (SAM) assessment.
My interlocutor’s first reaction is that it must be the computing configuration (i.e., the way to turn quality findings into an assessment score, status, etc.) that changed. Fixing bugs or adding extra capabilities won’t have that impact on assessment results. Therefore, keeping the computing configuration stable keeps the results stable.
Then, after explaining that finding new or more accurate dependencies would impact the SAM assessment results — thanks to a better understanding of complex behaviors, for instance — my interlocutor reluctantly accepted that it can have a tiny impact, but by no means a dramatic one. His main argument was this: In real life, one would not lose a certification because of additional knowledge. And this is where I tend to disagree with most when dealing with risk.
For example, when assessing the safety hazard of a plant:
Would the knowledge that a given construction material is a carcinogen not change the assessment result?
Couldn’t this cause a small or dramatic effect, depending of the amount of hazardous material found in the audited plant?
And wouldn’t the results change in an unpredictable way as, up to this point, no one cared about measuring the amount of the hazardous material?
At this point, my interlocutor started to become evasive because he still could not accept such changes in the SAM world.
What if I know that you have a proven CWE vulnerability in your code?
Should I keep silent, as you would not accept a dramatic impact of your assessment?
Should I minimize the risk, as you would only accept a tiny impact on the assessment outcome?
That is basically what 99 percent of people ask for (I should say 100 percent, but I would rather leave room for some people that remain rational in the digital world of IT).
Is a dramatic change disturbing? Yes, of course. But isn’t it also disturbing in the real world? Knowing what it will cost you to remove asbestos from the 56 floors of the Montparnasse Tower must be disturbing. I read it could cost up to 800,000 EUR per floor.
But that doesn’t change the fact that asbestos is now known to be a health hazard. I understand that some people — most likely the ones signing the checks — would be willing to say that the tower is as safe a place to work in as it was before the world knew asbestos was a health hazard and before the asbestos level was measured in the tower. But that is not a reason to hide the truth.
So the question now becomes: How do we handle the change?
To answer this question, we can look to the non-IT world (let me call it “the real world” from now on).
I also happen to work on a roll-bearing assembly line. Whenever a cutting tooth from a CNC cutter needed to be changed, not a single person in the plant would assume that you could fire up the cutter right away. Not before a proper re-calibration of the cutter had been done.
As for just-in-time strategy and productivity measurement in industrial process and basic house-cleaning principles, it seems the IT world is so different — or even superior — that the real world’s principles would not even apply.
How many people, even in the workplace, get their computer so full of garbage files and programs that they end up buying a brand-new computer? As if they would hoard junk in their home or office, then move to another home or office when the first one is full. (I know it does happen, but it usually ends up in reality TV shows.)
How many IT professionals think that productivity measurement is only about the produced volume of code, and completely disregard the quality of the production? The industrial world knows for a fact that volume without quality is not the path to growth a competitive edge.
How much effort does it take to convince IT professionals that just-in-time strategies and event-driven architecture can yield the same responsiveness to business requirements and the same resource-usage efficiency in delivering IT outcome as it did in assembly lines? To their benefits, it took many decades to convince the industrial world of these benefits. The pity is that IT doesn’t have the excuse of being the pioneer in this domain; it has a huge amount of knowledge and experience to leverage. Yet, it seldom is.
SAM is not the only measuring activity in the world. However, its practitioners need to reach the same level of maturity as their real-world counterparts.
In today’s world, we expect everything to run efficiently. People do not have time to lose.
One small efficiency improvement, when spread over many users, can lead to massive time and money savings. This also applies to your business applications. How much time would you and your company save if your business applications were more efficient? Probably much more than you think.
But, in what forms can efficiency express itself? Well, for starters:
How many times have we not been upset by an application that does not start fast enough on our computers or on our smartphones? When a user faces this kind of annoyance, what happens? The company who developed this piece of software suffers from a bad public image. The user loses productivity every time he starts the application. Consider real-time applications used to launch rockets and satellites; such applications have no option for any delays.
We cannot forget the famous saying, “time is money.”
Therefore applications should be time efficient.
Some applications are more critical than others. If your music player crashes often, it is annoying, but not really important. But what if your email provider is not available for multiple hours? What if your spreadsheet application crashes every time you input too much data? And what if the train control software system or air traffic monitoring system crashes? The impact of such a crash is not just annoying, it can cost the lives of thousands of people, and it must be taken very seriously.
So such applications must be robust and survive in all conditions.
Everyone is ready to express their expectations or, more often than not, complain. But to fulfill these expectations and avoid complaints, requires a lot of work and constant actions.
The first action is the measurement of the application’s quality using appropriate tools. The early measurement allows the possibility to detect defects in term of speed or robustness in the early stages of the application development lifecycle, and therefore remediate it from the beginning. An example of remediation is to review the entire architecture of the application. If these types of defects are not detected from the beginning, the changes will be impossible later.
Once remediation is done and the application already fulfills the constraints in term of efficiency, continuous monitoring can detect other issues related to memory consumption or the use of a variable without it being initialized first. These types of issues seem, at first approach, to have small consequences, but it can end with the crashing of the entire application or a very low speed. For every type of defect found (memory consumption, performance, architecture, etc.) an appropriate remediation must be also planned.
Measurement is the key
Starting with a good level of efficiency for an application is good, keeping it there is even better. So the first step in order to keep a good level of efficiency is to do a constant measurement of quality applications using the appropriate tools to detect all the potential issues before it’s too late and the damage is already done.
As with reliability, the causes of performance inefficiency are often found in violations of good architectural and coding practice — which can be detected by measuring the static quality attributes of an application. These static attributes predict potential operational performance bottlenecks and future scalability problems, especially for applications requiring high execution speed for handling complex algorithms or huge volumes of data.
Assessing performance efficiency requires checking at least the following software engineering best practices and technical attributes:
Application architecture practices
Appropriate interactions with expensive and/or remote resources
Data access performance and data management
Memory, network, and disk space management
Compliance with object-oriented and structured programming best practices (as appropriate)
Compliance with SQL programming best practices
Have you had issues with efficiency in your software application development? Share your story in a comment.
When my organization decided to hire a new CTO, one of his top priorities was to look through our old support contracts and “cut the fat,” as it were. It was there, among the rubble, where we found a transformational tool that we had cast aside which could help us increase our development productivity and software quality. But in learning more about this tool we found that it hadn’t failed us, but rather, we failed it!
So my brand-new boss gave me a brand-new ultimatum: Integrate this tool into our software development lifecycle, or we’re dumping it.
The tool was CAST’s Application Intelligence Platform (AIP), used to increase an application’s structural quality during development. Previous teams had struggled with this tool because they thought of it as a plug-and-play solution that would fix all their problems, instead of an integral part of the development lifecycle. And when it didn’t work, all too often it was the tool itself that would get blamed. We knew we had to figure out a way to alter development’s perception of software quality in order to get this project off the shelf.
We started scanning applications with CAST, and immediately hit a roadblock. The reports were coming back with a lot of violations and many developers started to panic thinking, “the sky is falling!” We had to take a step back and explain that tackling every violation would be costly and ineffective — only critical violations were going to swing the quality needle. After those violations were fixed and rescanned, their quality scores improved by 30 percent … this would be our Aha! moment.
But even with the improved scores, we found our quality processes were still not being integrated into the development lifecycle. So we decided to stop making it optional. We asked senior management for a list of our most critical business applications, and we started approaching teams like police with a search warrant. With a swift knock on their door we could say, “Your application has been deemed critical by our senior leadership, and we’re here to check the code.”
With this approach, we got no objections at all. Teams made time because they understood that their app was critical, and we needed to capture specific quality metrics, such as complexity and transferability, as well as how well our development team coded to not only our own standards, but those of the rest of the industry. What’s more, after their application had been analyzed once, they could never release new code that was lower quality than the previous version. This made it very easy to implement a quality gate; one that was objective instead of anecdotal, and is supported by new violations popping up in the code.
We learned a lot in our first year — we scanned 12 applications as we refined our process even more. And development teams began to see how these scans could help them produce more efficient apps. Today, we’ve been able to make significant progress in our entire application portfolio, having increased the footprint of scans to 58 apps. In just over a year’s time, CAST had become an integral part of our software development lifecycle.
Looking forward, our next steps are to automate the process as part of a continuous delivery effort to increase deployment speed. We also plan to partner with software engineering managers to objectively determine who our best coders are across the board, and who we need to focus coaching on.
Lots of companies like to talk about software quality, but there aren’t many that live it, breathe it, and love it. But if you really want to get software quality assurance working in your organization, it needs to be institutionalized into your entire development process. That’s the power of the CAST AIP — the more we worked with it, the more difficult it’s been to develop quality software without it.
Do you know what happens to your cherished design patterns once your application is delivered and enters the hard, wildlife of exploitation, software evolution, and maintenance? Life is a jungle for the application code and health in the ecosystem of permanent software evolution, rapid maintenance, and changing software maintenance staff. It is likely that their life expectancy turns shorter than ever as the application evolution changes hands.
When you carefully crafted your design patterns, your intent was to exploit the experience of the “Gang of Four” masters and others by using proven, rock-solid arrangements of objects and their documented tradeoffs. Your goal was probably to help promote easier program changes and object reusability through these shared solutions.
However, findings in the field show that your interest and knowledge of design patterns — their purpose, their use, and their benefits — is not always shared as it could be. Lack of knowledge and misunderstanding of major design patterns is a plague. And the consequence is that many times once the team that originally developed the application hands it over to a new staff the life expectancy of the design patterns falls rapidly.
Several researchers have worked on this issue, mainly by proposing automatic detection and documentation of design patterns. Many thesis and research projects that tried (and still try) to automatically detect design patterns in existing code in order to document them. As an example, you can look at this 2007 thesis by Marcel Birkner titled “Objected-Oriented Design Pattern Detection Using Static And Dynamic Analysis In Java Software.”
More recently, and using advanced learning techniques, a team from the School of Computing of DePaul University, Chicago, Ill. produced a “Tactic-Centric Approach for Automating Traceability of Quality Concerns.” This system automates the documentation of “tactics,” a higher level of design patterns. The Chicago team explains in its research paper that:
”Unfortunately, software architectures tend to degrade over time as maintainers modify the system without understanding the underlying architectural decisions. Although this problem can be mitigated by manually tracing architectural decisions into the code, the cost and effort required to do this can be prohibitively expensive.”
At CAST, we regularly analyze recent object-oriented IT applications as well as aging ones, and we have seen the effect of both lack of understanding of design patterns and missing design-to-code traceability.
This is why the protection of design patterns out in the wild is one of our concerns at CAST. And we decided to provide such a protection for both our customers and our own software platform. However, we think that instead of writing or generating paper documentation that will hardly be read, it is necessary to automatically alert development teams and project managers each time that a design pattern is in danger. The idea came while working with customers on how to improve the evolutive maintenance of a mission critical application. To protect their design patterns, they had the idea to use CAST Architecture Checker to help monitor these precious pieces of code in their applications.
The very first pattern they checked was their custom implementation of the classic Model-View-Controler (MVC) pattern. Here is how it works. CAST Architecture Checker is used to define layers and authorized dependencies (or forbidden dependencies) gathered in an architecture model. This model is then checked either within CAST Architecture Checker or within CAST Application Intelligence Platform after static code analysis of all the programming languages in the application code. The violation dependencies are then displayed for remediation.
The idea was to define layers containing each class of the pattern, define the authorized dependencies, and activate the check to detect a violation of the pattern. In the case of the MVC pattern, we’ve defined three layers: one for model classes, another for view classes, and one for controller classes. When defining layers, we tell CAST Architecture Checker how to find M classes, V classes, and C classes using inheritance and naming convention or other conventions. Based on these definitions, the architects can check interactively for unwanted dependencies or integrate the checks into the application health check performed automatically (for example, each night or before new deliveries).
Architecture Checker became a pattern checker! So we started using it that way at CAST too. The idea is to check our own patterns to avoid any errors and degradation of the software platform. One of the patterns checked using CAST Architecture Checker on our own application source code is the interpreter pattern. The interpreter pattern, originally documented by the “Gang of Four,” is heavily used in the CAST family of 16 source code analyzers that cover the application analysis from end to end.
Using CAST quality platform on our own application source code is an operation we call “CAST on CAST.” This design pattern protection schema made part of CAST on CAST checks enables the entire development team to implement changes, continually evolving these analyzers, while being sure that precious patterns are not in danger.
As CAST Architecture Checker enables cross-languages checks, we are also looking at the protection of framework design patterns which involves programming languages, APIs, and XML configuration files.
How about you? Have you experienced any endangered design patterns? Which ones would you like to see protected in your application? Tell us in a comment.
I’ve been asked time and again how CAST is different from performance engineering. And here’s my answer: The CAST discipline of software analysis and measurement versus performance engineering couldn’t be more different. And I’ll explain why and how in a moment. But along with that, it should be noted that they also are like peanut butter and chocolate — they can go very well together.
Here’s the high level explanation, which I’ll drill into further for those of you who like details. Fundamentally, when you’re dealing with CAST, you’re improving code quality during the engineering phase and throughout the development of the product. CAST technology is used as the code is built, and even after you’ve done performance engineering.
Performance engineering, on the other hand, is just one phase late in the game, when the product is essentially considered complete and you’re doing a final round of outcome-driven optimization. Consider performance engineering as fine tuning the finished system to reach specific response time objectives, while CAST is part and parcel of the software engineering process.
Using dynamic analysis and load testing, performance engineering learns how a system is behaving at runtime and if there are things the development team could do to make the software perform faster. CAST is a design-time, proactive measure to avoid common performance problems in the first place.
Performance engineering is like taking a brand-new Mercedes onto the racetrack to see how it performs. The development is over, now you’re getting into fine tuning what you have. With CAST, we’re actually checking for structural problems while the car is still being built.
But I can see where the confusion comes from. The objectives of performance engineering are aligned fairly closely to those of CAST. When it comes to performance, CAST is intended for: eliminating late system deployment due to performance issues; eliminating avoidable system rework due to performance issues; and reducing increased software maintenance costs due to performance problems in production.
So, getting back to our car analogy, working with CAST is more like noticing that one tire on the car is smaller than the rest, or that two axles are not perfectly aligned, or that the engine is diesel while the spark plugs are meant for gasoline engines. These are issues that might not cause problems at low speeds, or right away, but after a few laps they’ll cause the tires to become bald or the engine to stop running — not an ideal situation for a racetrack.
This distinction becomes readily apparent when we take these similar processes and set them on the same task: improving the overall performance of a system. Performance engineers might trace the round-trip time of transactions, figure out which parts of the process take the longest, and then take measures (adjust the network, change the indexes in the database, adjust table spaces, optimize where they have the most processing power, etc.).
With CAST, you start with standard performance rules applied throughout the application’s development. But, this is where performance engineers can have an even greater impact. Along the way, if they notice a certain pattern that’s causing performance to suffer, the performance engineers can turn it into a custom rule. Then CAST can check to make sure all developers avoid such patterns at all times.
I know that some of you are probably reading this and thinking that we’re just splitting hairs. And while it’s true that performance engineering and CAST’s processes might share similar outcomes, the real business value comes in its implementation.
CAST’s platform is best used during the design phase of a product, so that code problems can be checked while there is still a chance to fix them. Performance engineering is done after the product is finished, like a quality control test, to see how well you made out during development.
Combining the two — with CAST streamlining the application during the design phase and performance engineering doing the fine tuning after its release and providing feedback to developers and architects — is a strategic advantage that can be used to create dynamic, high performance, quality applications. This is an essential part of avoiding waste in Lean Application Management. So you can go ahead and call it splitting hairs, but I call it a competitive edge.