1.0 Introduction 2.0 Algorithm Analysis 3.0 Data Structures 3.1 Sets and SortedSets 3.2 Lists 3.3 Maps and Sorted Maps 4.0 Summary 5.0 JVM Tuning 5.1 Server Mode 5.2 Sizing the Heap 5.3 GC Monitoring & Tuning 6.0 I/O Buffering6.1 Tuning with
Java Performance Tuning emphasizes techniques that provide big performance gains with minimal code restructuring. It gives you crucial guidance that helps you tune without destroying your program’s architecture. Blindly changing things in an effort to make a program run faster is a great way to create buggy, unmaintainable code. Java Performance Tuning gets you to work efficiently and effectively, resulting in code that is robust, maintainable, and fast. Performance includes to:
1. Minimize the number of objects your program creates, particularly critical for J2EE applications
2. Optimize the use of strings
3. Avoid performance penalties from inefficient code
4. Improve the behavior of loops and switches
5. Optimize I/O behavior
6. Use appropriate algorithms for sorting and other common tasks
7. Use threads effectively
8. Optimize the performance of distributed systems
9. Speed up servlets and JSPs
10. Structure JDBC usage efficiently
11. Use effective design patterns to optimize EJB performance
Regardless of the complexity of your application and regardless if you are running a stand-alone application or a J2EE application inside an application server, before you release your application to the public, you are going to have to address its performance. From a very high-level view, performance tuning can be put into three categories:
1. Tuning the application itself
2. Tuning the deployment environment
3. Tuning the application’s external dependencies
In the context of a stand-alone application, your environment consists of the Java Virtual Machine (JVM) that your application is running in. If you are running an application inside an application server then there are hoards of configuration options specific to the application server and its interactions with backend resources. External resources include all applications upon which your application depends (e.g. a database). Let us focus on core tuning of the application itself. Core application tuning can be broken down into the following categories:
1. Algorithms that you use and the ones that you implement
2. Data Structures that you use and the ones that you implement
2.0 Algorithm Analysis
When you solve a problem programmatically, you will eventually derive some kind of algorithm; this is usually manifested in one or more methods. Because programming languages only provide you the tools, or infrastructure, to build solutions and do not implement them for you, a problem can be solved multiple ways. Depending on your requirements, one way of solving a problem may be better than another. The best solution is going to be the one that solves the problem in a "reasonable" amount of time;"reasonable" will be defined by your business requirements and the amount of data your algorithm will be working with.
Keeping in mind that only you will know what constitutes a "reasonable" response time for you application, what is the best way to determine how long an algorithm will take to complete? In the field of algorithm analysis, we do not look at the exact time that we expect the algorithm to run, but rather how the algorithm is going to behave with respect to the amount of data it must process. Furthermore we look at three distinct values:
1. The Best-case Time
2. The Average-case Time
3. The Worst-case Time
In most cases we are only concerned with the worst-case time, although the average case time in certain situations is more interesting. The notation that we use to represent our worst-case findings is referred to as "Big-Oh" and is represented by a capital "O". The result of the analysis is in units "N" where "N" represents the number of elements that the algorithm must process. If an algorithm must search through all of the elements then it is said to be linear and its Big-Oh is written as "O(N)". If an algorithm repeatedly subdivides the number of elements it must search through to find its value then it is said to run in logarithmic time and is written as "O(log N )". The most typical runtimes are as follows:
|O(log N)||Logarithmic||Examines up to the logarithm of the number of elements; usually the result of subdividing the elements and narrowing in on the one of interest|
|O(N)||Linear||Can examines every element|
|O(N log N)||N log N||Usually consists of an outer loop that examines all elements (linear) and an inner loop that examines up to the logarithm of the number of elements (logarithmic)|
|O(N2)||Quadratic||Usually consists of an outer loop that examines every element and an inner loop that examines every element|
|O(N3)||Cubic||Outer loop, middle loop, and inner loop that all examine every element|
The analysis of algorithms involves calculating the maximum number of times that the working elements will be evaluated; this is usually accomplished via iteration (loops) or through recursion (a method that calls itself). For example:
// Assume an array ‘a’ for(int i=0; i
This loop iterates over all of the elements in the array ‘a’ and thus runs in linear, or O( N ), time. Another example:// Assume an array ‘a’ for(int i=0; i
In this example, both the outer loop and the inner loop iterate over the array, thus this code runs in quadratic, or O( N2 ), time.
What this means practically is that as the number of elements you have to work on increases, the amount of time that the algorithm is going to take increases by the "Big-Oh" value, referred to as the order of the algorithm. And in reality quadratic algorithms are almost always impractical when the input is more than a few thousand and cubic algorithms are impractical for input sizes as small as a few hundred. Most algorithms must run in subquadratic time (better than O(N2)) to be practical for any large set of input data.
Consider for example sorting an array of integers. There are several sorting algorithms, listed here in order of ascending performance:
1. Bubble Sort O(N2)
2. Selection Sort O(N2)
3. Insertion Sort O(N2)
4. Shell Sort O(N2)
5. Heap Sort O(N log N)
6. Merge Sort O(N log N)
7. Quick Sort O(N log N)
Your choice of algorithms will be based off of the amount of data you have to sort, the performance requirements, and how much time you have to implement it. If you have billions of records to sort, you can throw out all of the O(N2) algorithms and then focus on the difference between the O(N log N) algorithms. If memory is a concern then you would want to use Heap Sort (it can sort completely in-place and is great for large data sets), but if it is not you might want to consider Merge or Quick Sort: Quick Sort is the fastest of the group and although it is also the most complicated to implement, the implementation is not overwhelming. On the other hand if you have 100 records to sort, you might as well implement a Bubble Sort because you can effectively implement it in a few minutes.
The summary of this discussion is that you need to carefully examine your code and try to remove as many loops and more importantly inner loops as you can. Furthermore, if there is an existing solution, you need to evaluate the different implementations and choose the one most appropriate for your specific requirements.
3.0 Data Structures
The Java Collection classes provide implementations of data structures in three distinct categories, defined as interfaces:
Sets: a collection of unique elements
Lists: a collection of elements, duplicates allowed
Maps: a mapping of keys to values
And there are two sub-interfaces derived from these:
1. SortedSet: an ordered collection of unique elements
2. SortedMap: a mapping of ordered keys to values
The various implementations of each have their advantages and disadvantages that depend on your requirements. The purpose of this section is to present the performance and usage tradeoffs of the most popular Collection Classes.
3.1 Sets and SortedSets
From a performance standpoint, the Set interface has three methods of interest: add(), remove(), and contains(); add() adds a new object to the Set, remove removes a specified object from the Set and contains() returns a boolean value of true if the specified element is in the Set or false otherwise. The HashSet performs each of these methods in constant time, which is the best you can do. The TreeSet performs each of these methods in O( log N ) time, which is good.
The functional tradeoff between the three is that the elements in the TreeSet are sorted in their "natural" order, those in the LinkedHashSet are ordered in the sequence they were inserted in, and HashSet is not sorted in any way. Therefore, if it is a requirement that you have to be able to display the contents of the Set in its "natural" order, then you must use a TreeSet; if you need to preserve the input order then you must use LinkedHashSet; if the order is not a requirement then you are better off using a HashSet.