The Rule of Three: NYSE, UAL, and WSJ Operations Foiled by Their Own Systems

The events of last Wednesday proved that things often do come in threes. The “rule of three” reared its ugly head, as technical failures occurred at three large American organizations: the New York Stock Exchange, United Airlines, and The Wall Street Journal. United Airlines grounded all flights nationwide, wasn’t able to conduct background checks of passengers, and left flight attendants handwriting tickets (many of which were not accepted by TSA agents). Then, the NYSE suspended trading for almost four hours, the first time in a decade that trading was halted during regular business hours. The Wall Street Journal’s homepage also faced difficulties and was offline for almost an hour.

Paying Down Technical Debt with Mircoservices: Miracle or Myth?

The growing problem of technical debt cannot be overlooked. InfoWorld Editor Eric Knorr recently highlighted the need for technical debt to be paid down, noting, “I wouldn’t be surprised to learn we’re now in the process of accumulating technical debt at historic rates.”
Technical debt is expected to exceed 1.5 million per business application.

Using Hibernate Frameworks: What Are the Best Practices?

I’ve recently been involved in helping CAST Research Labs analyze a large sample of Java EE applications (496 apps), looking to understand the impact of different frameworks on application structural quality. We analyzed these applications using CAST’s Application Intelligence Platform (AIP) to identify critical violations of architectural and coding practices. While looking at the critical violations that were detected by CAST, something struck me: The success ratio (i.e. the ratio between the number of times a rule is violated and the number of opportunities this rule could have been violated) for rules associated to Hibernate was particularly low, indicating issues related to performance and robustness for applications using this framework. (The details of the report will be published next week – we presented a preview of the analysis during a webinar in January.)
Hibernate is one of the most popular frameworks in the Object Relational Mapping area. It prevents you from dealing with the complex task of mapping objects to relational database allowing the development your data layer using only POJO, and keeping your application portable through existing databases. But at the same time, Hibernate solves any existing mapping issues, making it difficult to implement under correct performance and robustness standards.
In my previous post, I discussed whether frameworks could simplify our lives. In this post, I want to focus on Hibernate and which best practices you should follow when using it in your Java EE application.
The rules associated to Hibernate that had the lowest success ratios were the following.
Persistent classes should Implement hashCode() and equals()
In our analysis, this rule had the lowest success ratio (7.70 percent) across all frameworks analyzed, affirming that this architectural practice is too often ignored by developers. Although Hibernate guarantees that there is a unique instance for each row of the database in a session, you still need to supply your own implementation of the equals() and hashCode() methods for your persistent classes whenever you work with objects in a detached state. This is particularly true when you test these objects for equality, usually in hash-based collections.
Avoid using references to the ID in the persistent class’ method equals()
In our analysis, this rule had the second lowest success ratio (37.13 percent). It is possible for the programmer to define the meaning of Java Equality. However, Hibernate will only set the ID field when saving the object; it is therefore important not to use the ID field in the Java Equality definition when it is a surrogate key. For that reason, saving the object that has been added to a set collection results in identity change. In addition, the behavior of the Set/Map collection class is not specified when the value of an object is changed in a manner that impacts equals comparisons while the object is an element in the Set or is the key of a Map, you might corrupt your database.
For example:
Person p = new Person();
Set set = new HashSet();
p.setId(new Long(5));
Prints: false
Other best practices that should be followed when using Hibernate with your Java EE applications are the following:
Never use array to map a collection
The details of an array mapping are virtually identical to those of a list. However, we strongly recommend against the use of arrays, since arrays can’t be lazily initialized (there is no way to proxy an array at the virtual machine level). Lists, maps, and sets are the most efficient collection types.
So, using array can affect your application performance when it contains many items: lazy loading, optimized dirty checking, and poor performance features for persistent collections.
Avoid public/protected setter for the generated identifier field
A primary key value must never change once it has been assigned. Since it is a generated key, it is automatically set by Hibernate, or by another JPA implementation or by another provider. The actual behavior of an application tries to modify the value of a primary key that is not defined.
Avoid many-to-many association
“Many to many” usage is discouraged when a simple bidirectional “many-to-one”/“one-to-many” will do the job. In particular, a many-to-many association might always be represented as two many-to-one associations to an intervening class. This model is usually easy to extend. In a real system, you might not have a many-to-many association as there is almost always other information that must be attached to each link between associated instances, such as the date and time when an item was added to a category. The best way to represent this information is via an intermediate association class. On top of this, changing the definition of a primary key and all foreign keys that refer to it is a frustrating task.
Persistent class method’s equals() and hashCode() must access its fields through getter methods
This rule is important: the object instance that is passed as ‘other’ might actually be a proxy object and not the actual instance that holds the persistent state. This is the case where there are lazy associations between classes. This is one area where Hibernate is not completely transparent. But it is good practice to use accessor methods instead of direct instance variable access. When we are tuning the performance of the application, a lazy association might actually be required.
This potential issue raises a ClassCastException and can cause the application to become unstable.
Avoid non serializable Entities
When Entity bean instance is to be passed by value as a detached object (for example, through a remote interface), the entity class must implement the Serializable interface.
Also, in some cases an OptimisticLockException will be thrown and wrapped by another exception, such as a RemoteException, when VM boundaries are crossed. Entities that might be referenced in wrapped exceptions should be Serializable so that marshaling will not fail. One of the consequences to not following this rule is receiving an exception when a non Serializable Entity is passed by value.
This is just an extract of best practices on Hibernate, but you can already see that not following them can have severe consequences in terms of robustness and performance. These rules can be quite obvious by an expert of Hibernate, but for the novice user, Hibernate can be tough to use. Abstract is big and complex and the user must spend more time in assessing the concept, function, and uses in the developing program.

There is code duplication detection and code duplication detection

Many software solutions feature the detection of duplicated source code. Indeed, this is one cornerstone of software analysis and measurement:
It is easy to understand the value of dealing with duplicated code: avoiding the propagation of bugs and evolutions in all copies of the faulty piece of code, promoting reuse, and avoiding an unnecessarily large code base (especially when maintenance outsourcing is billed by the line of code).
Now that everyone is convinced of the importance of such capabilities, lets dive deeper into how to do it. There are various solutions and not all are equal.
Can the difference be explained without looking at an algorithm or cryptic formulas? Let’s try.

How to: tackle database performance issues

IT companies spend millions of dollars trying to recover losses incurred due to poor application performance. I am sure each one of us has complained about a machine or application being slow or even dead, and then spent time at the coffee machine waiting for the results of a long running query. How can we fix that?
Most of the business applications or systems are designed to retrieve and/or write information to a local hard disk or a database system.
Consider a typical multi-tier architecture. It will contain the client tier, web tier, application tier, and data tier as shown below.

The data tier represents the database and mainly acts as the storage/manager for business data. Usually when an end-user/client requests some information or executes a query on the client tier, he/she expects to have a response ASAP. However the client tier has to talk to the data tier in order to get back the appropriate information to the client. This might take a few microseconds or sometimes even a few hours depending on several parameters.
Common parameters responsible for such delays include:

Architecture of the system
Code complexity
Unoptimized SQL queries
Hardware (CPUs, RAM)
Number of users
Network traffic
Database size

Out of all these parameters, unoptimized SQL queries contribute to the majority (around 60-70%) of database performance issues.
To avoid these delays, let’s look at some common database optimization approaches. There are three main approaches to go about optimizing databases:

Optimize the database server hardware and network usage. This involves changing the hardware to a specific configuration to speed up the read/write onto the database.
For example, use RAID 10 if there are equal read/write activities or RAID 5 if there are more read operations. This task is often performed as part of the deployment planning or infrastructure planning in the requirement analysis phase of the Software Development Lifecycle (SDLC). This exercise is also referred as hardware sizing.
Optimize the database design. This involves the normalization of database. For example, you can go up to the third normal form of normalization, which will definitely help to make the database quicker. Usually this task is carried out during the design phase of the SDLC.
Optimize the database queries. This involves studying the query plan, analyzing the queries for use of indexes, and joining and simplifying the queries for better performance. It is the most critical and effective approach for optimizing the database. The activity of query optimization can start in implementation phase and continue during testing, evolution, and maintenance phases.

In this post, I will focus on the database/SQL query optimization techniques/guidelines alone. The idea is to help tackle some of the critical database performance issues.
Many databases come with a built-in optimizer, which performs optimization and helps improve the performance to a certain extent. However the results are not always promising. There are database monitoring tools which only capture information on the resources consumed by the database servers. This can help address 20% of the performance issues. The best way to go about query optimization is to review the functional areas, which take a long time to respond mainly because of the underlying SQL queries.
Below I have tried to list a few SQL rules with examples, based on experience and best practices, which can help optimize the database to a great extent.

Not using “WHERE” clause to filter the data will return all records and therefore make the system very slow
Example 1:
Original Query 1: select * from Production.TransactionHistory
Returns 113443 rows
Optimized Query 2: select * from Production.TransactionHistory where ProductID=712
Returns 2348 rows

As the number of records retrieved are less (using “where” clause) the query executes much faster.
Not using required column names in the “SELECT” part will take more time to return the same number of rows.
Example 2:
Original Query 1: select * from Production.TransactionHistory where ProductID=712
Returns 2348 rows
Optimized Query 2: select TransactionID, Quantity from Production.TransactionHistory where ProductID=712
Returns 2348 rows

Examples 1 & 2 might look quite obvious, but the idea is to think in a filtering mode and fetch the optimal set of data required for your application.
Using Cartesian joins that lead to Cartesian products kills performance, especially when large data sets are involved. A Cartesian join is a multiple-table query that does not explicitly state a join condition among the tables and results in a Cartesian product.
Example 3:
Query 1: select count(*) from Production.Product
Return 504 rows
Query 2: select count(*) from Production.TransactionHistory
Return 113443 rows
Query 3: select count(*) from Production.Product, Production.TransactionHistory
Return 57175272 rows (= 504 x 113443) -> Cartesian Product
Original Query 4: select P.ProductID,TH.Quantity from Production.Product P, Production.TransactionHistory TH
Return 57175272 rows
Optimized Query 5: select P.ProductID,TH.Quantity from Production.Product P, Production.TransactionHistory TH where P.ProductID = TH.ProductID
Return 113443 rows

Use Joins on indexed columns as much as possible.
Example 4:
Query 1: select P.ProductID,TH.Quantity from Production.Product P, Production.TransactionHistory TH where P.ProductID = TH.ProductID

Execute the query without any index for the first time. Re-run the same query after adding an index on column ProductID.
Avoid full table scans when dealing with larger tables. To prevent full table scans, we can add clustered indexes on the key columns with distinct values.
Example 5:
Query 1: select P.ProductID,PIn.LocationID from Production.Product P,Production.ProductInventory PIn where P.ProductID = PIn.ProductID

A. Execute the query without any indexes for the first time. It will table scan by default.
(Execution plan showing table scans for Product and ProductInventory tables below.)

B. Re-run the same query after adding clustered indexs on one of columns (LocationID).
(Execution plan showing clustered index scan on the table ProductInventory and table scan on table Product below.)

C. Re-run the same query after adding indexes on both columns ProductID and LocationID to avoid table scan.
(Execution Plan using index scan for both the tables Product and ProductInventory below.)

In some cases, where there are not many unique values, a table scan can be more efficient as indexes will not be used.
In general, subqueries tend to degrade database performance. In many cases, the alternate option is to use joins.
Example 6:
Non-correlated sub-query
Original Query 1: SELECT Name,ProductID FROM Production.Product WHERE ProductID NOT IN (SELECT ProductID FROM Production.TransactionHistory)
Correlated sub-query
Original Query 2: SELECT Name,ProductID FROM Production.Product P WHERE NOT EXISTS (SELECT ProductID FROM Production.TransactionHistory where P.ProductID=ProductID)
Replace sub-query by join
Optimized Query 3: SELECT Name,P.ProductID FROM Production.Product P LEFT OUTER JOIN Production.TransactionHistory TH On (P.ProductID = TH.ProductID) where TH.ProductID is NULL

Too many indexes on a single table are costly. Every index increases the time it takes to perform INSERTS, UPDATES and DELETES, so the number of indexes should not be too high.
There is an additional disk and memory cost involved with each index. The best solution is to monitor your system performance. If there is need to improve the performance, you can go for more indexes.
When a SQL query is executed in a loop, it will require several roundtrips between the client and the database server. This consumes network resources/bandwidth and hurts performance. Therefore SQL queries inside loops should be avoided. The recommended workaround is to create one query using a temporary table. In this case, only one network round trip will be required. And further optimize the query.
There are few database analyzers in the market which check the SQL code against such rules and help identifying the weak SQL queries.
I will continue to blog on this subject to cover advanced optimizing guidelines linked to Stored Procedures, Cursors, Views and Dynamic SQL. Hope this post gives you a few tips to identify and resolve database performance issues.
Please feel free to share your feedback/questions on this blog, or experiences with any tools you have tried for database optimization.

Do software frameworks simplify your life?

We’re covering Java frameworks and their impact on application quality in an upcoming webinar, Java Applications and Coffee: The Variations are Endless, on Jan 29. As part of that, I wanted to share some insights along the lines of what we might discuss during the webinar. But first, what is a software framework?
A software framework is an abstraction in which software provides generic functionality. It is universal and can be reused by different applications.
Nowadays, it seems impossible to start the development of an application without thinking about frameworks. Some of them appear as a de-facto standards used in most applications, such as log4j with its implementation in other languages.
For other frameworks, it can be more difficult to choose. For example if we look at ORM frameworks in J2EE technology, you can see that there are more and more new frameworks.

And there are many more presentation frameworks.
The problem with this list of frameworks — that are supposed to simplify the coding of an application — is that you must master the framework itself in addition to java.  And it’s not as easy as it seems. Take, for example, all the books written just for the Hibernate framework:

When analyzing several applications that use hibernate, I often found that persistent classes do not implement hashCode() and/or equals() for example
You have to override the equals() and hashCode() methods if you:

intend to put instances of persistent classes in a Set (the recommended way to represent many valued associations); and
intend to use reattachment of detached instances.

What does that mean? It means that Hibernate guarantees if there is a unique instance for each row of the database inside a session. But whenever you work with objects in detached state, and especially if you test them for equality (usually in hash-based collections), you need to supply your own implementation of the equals() and hashCode() methods for your persistent classes.
Nevertheless, it’s possible to build a complex application with identity (default) equals as long as you exercise discipline when dealing with detached objects from different sessions. If this concept of equality isn’t what you want, you must override equals() in your persistent classes. But this method requires discipline and it’s easy to make a mistake.
Thanks to CAST’s solution, you can check that this rule is enforced but many other one related to robustness, security and performances.
CAST’s solution considers J2EE application not as a single Java[/JSP] application, but provides rules for the most common frameworks such as Struts 1 & 2, Tiles, JSF, Spring, Hibernate, JPA compatible frameworks, and EJB. CAST’s product takes into account java annotations, XML files, and of course Java language to check these rules.
In addition, it can be extended to manage other frameworks.
As we have seen, implementing a framework is not so easy
It is obvious that frameworks simplify the development; it saves you from reinventing the wheel. And they usually come with a community built in. The bigger the community, the better the framework will be in terms of stability and completeness.
But as we’ve seen, frameworks come with their own rules that must be followed to avoid mistakes that can come up later in the development lifecycle and are difficult to diagnose. This is why it is important to have a static analyzer check that the frameworks are following best practices.
Again, if you’re interested in learning more about the resiliency of Java frameworks, be sure to check out our most recent CRASH report, which compared the quality and stability of Java frameworks for enterprise applications. Keep in mind, this is the only available repository in the world of real business software that has been subjected to this level of scrutiny. And for a deeper dive into the research results, be sure to register for our Jan. 29 webinar, Java Applications and Coffee: The Variations are Endless, which covers the full findings of the research.

Reduce Software Risk through Improved Quality Measures with CAST, TCS and OMG

Webinar Summary
I had the pleasure of moderating a panel discussion with Bill Martorelli, Principal Analyst at Forrester Research Inc; Dr. Richard Mark Soley, Chairman and CEO of Object Management Group (OMG); Siva Ganesan, VP & Global Head of Assurance Services at Tata Consultancy Services (TCS); and Lev Lesokhin, EVP, Strategy & Market Development at CAST.
We focused on industry trends, and specifically discussed how standardizing quality measures can have a big impact on reducing software risk.  This interactive format allowed attendees to hear four distinct perspectives on the challenges and progress that is being made within organizations directly, and also at systems integrators.
Mr. Martorelli started the discussion by providing insight into four powerful dynamics reshaping our ecosystem:

Innovation revolution
As-a-Service as a norm
Changing demographics
Rise of social and mobile

Mr. Martorelli punctuated the importance in preparing for these shifts by highlighting the impact poor quality can have on the business:

Poor performing, unstable applications
Diminished potential for brand loyalty, market share, revenues
Costly outages and unfavorable publicity

Dr. Soley from OMG built on Mr. Martorelli’s observations by discussing how standards bodies, such as OMG, SEI and CISQ, are helping industry respond to these challenges by providing specific standards and guidance to gain visibility into business critical applications, control outsourcers, and benchmark in-house and outsource development teams.
Mr. Martorelli emphasized the focus he has seen at client organizations in shifting quality to the left, and how quality is bleeding into many new stakeholders’ responsibilities.
Some of the trends covered during the discussion included:

Moving test and quality to the left of the waterfall
Addressing architectural sprawl with more architectural and engineering know-how
Seeing quality measurement become an important component of service levels
Emerging combined professional services/managed services offerings
Shifting responsibility for quality management to the business user
Favoring more results-driven approaches over conventional staffing-based testing services

Mr. Ganesan from TCS provided insight into how TCS Assurance Services is evolving to meet these new challenges.  Mr. Ganesan explained TCS’s rationale for evolving beyond code checkers and simple code hygiene and the need to employ automated, structural analysis to provide world class service to their clients and ensure more reliable, high quality deliverables.
We’d like to thank each of our panelists for their time and insight.  We received a high-level of interest from attendees with a lot of questions submitted for our speakers.  Please find a selection of these questions below.   If you’d like to listen to the recording of the webinar, click here.
It is clear how one might apply this to new development, but how does one approach applying a code quality metric to an existing portfolio? Would not the changes be overwhelming?
In truth, this is very possible and happens to be a significant non-starter for many organizations.  The sudden accounting of all the potential issues within applications could be perceived as daunting.  However, many solutions have a tendency to generate a lot of ‘noise’ during their analysis.  At CAST, we propose a risk-based approach: one that focuses on the identification of the most critical violations rather than all possible violations. We also focus on the new violations being added, rather than the ones sitting in your systems for years. This way, your critical path during an initial technical assessment of an application or portfolio should focus on identifying the most critical risks.  CAST AIP provides a Transaction-wide Risk Index that displays the different transactions of the application sorted by risk category (performance, robustness or security). By focusing on these violations, you will improve the critical transactions of the application.  Additionally, AIP generates a Propagated Risk Index to illustrate the objects/ rule pairing that will have the biggest impact on improving the overall health of the application or system.  Any analysis without this level of detail and prioritization will certainly create more obstacles than it removes.
How do you see the use of Open Source code changing software risk?
Open Source, just like code developed by your own team or partner, injects risk into systems. And just like any other code, the biggest risk is lack of visibility into that code.  Studies have found that in general open source code is better than industry averages.  Other studies suggest that the quality of the code is a factor of the testing approach of that open source community.  Code that is tested continuously tends to have fewer defects.  It is nearly impossible to suggest that Open Source is more risky.  What is possible is to suggest that receiving code from any source, Open or contracted, without a proper and objective measure of that deliverable adds risk to your systems.
Bill Martorelli mentioned “Technical/Code Debt” as a quality metric; could you explain a little further, please?
The term “Technical Debt”, first defined by Ward Cunningham in 1992, is having a renaissance. A wide variety of ways to define and calculate Technical Debt are emerging.
While the methods may vary, how you define and calculate Technical Debt makes a big difference to the accuracy and utility of the result. Some authors count the need for upgrades as Technical Debt; however this can lead to some very large estimates. At CAST, our calculation of Technical Debt is data-driven, leading to an objective, conservative, and actionable estimate.
We define Technical Debt in an application as the effort required to fix only those problems that are highly likely to cause severe business disruption and remain in the code when an application is released; it does not include all problems, just the most serious ones.
Based on this definition, we estimate that the Technical Debt of an average-sized application of 300,000 lines of code is $1,083,000 – so, a million dollars. For further details on our calculation method and results on the current state of software quality, please see the CRASH Report (CAST Report on Application Software Health).
Here’s a community dedicated to the awareness and education of the topic:
I have heard a lot focused on Quality discussion today, but curious about this group’s perspective on the other component of CAST AIP, function point analysis?
In addition to measuring a system’s quality, the ability to measure the number of function points as well as precise measures of the changes in the number and complexity of all application components makes it possible to accurately measure development team productivity.  Employing CAST AIP as a productivity measurement solution enables:

The calculation of a productivity baseline of either in-house our offshore teams.
The tracking of productivity over time by month or release.
The ability to automatically generate measures of quality and complexity.
The identification of the root cause of process inefficiencies
The capability to measure effectiveness of process improvements.

CAST AIP and the CISQ Automated Function Point Specification: The CISQ Automated Function Point Specification produced by the CISQ team led by David Herron of the David Consulting Group has recently passed an important milestone. CISQ has worked with the OMG Architecture Board to get the specification properly represented in OMG’s existing meta-models. This specification was defined as closely as possible to the IFPUG counting guidelines, while providing the specificity required for automation. This fall it was approved for a 3-month public review on the OMG website. All comments received will be reviewed at the December OMG Technical Meeting, and the relevant OMG boards will vote on approving it as an OMG-supported specification (OMG’s equivalent of a standard). From there, it will undergo OMG’s fast-track process with ISO to have it considered for inclusion in the relevant ISO standard.  We believe this standard will expand the use of Function Point measures by dramatically reducing their cost and improving their consistency.
Is the industry average of production incidents 1 per week and 1 outage per month?/ Are these major incidents and outages for the enterprise?
Here’s a site that provides additional insight into the impact of outages.