Wednesday 17 September 2014

Performance Improvement techniques in Collections


This topic illustrates the performance improvement techniques in Collections with the following sections:
Collection is a group of objects.  java.util package provides important types of collections. There are two fundamental types of collections they are Collection and Map. Collection types hold a group of objects, Eg. Lists and Sets where as Map types hold group of objects as key, value pairs Eg. HashMap and Hashtable.
This section examples are tested on Windows 2000, millennium, 320mb RAM, JDK 1.3 and JDK1.2.2
Note: This section assumes that reader has some basic knowledge of Java Collections.
List types represent an ordered collection of objects. ArrayList, Vector, Stack and LinkedList are the List implementation classes. All  List types support basic operations - adding objects, removing objects, accessing objects and iterating through the list. So which one to choose since all the list implementations support these basic operations? Performance is different for each class based on specific operations. So your choice is driven by the performance and the requirement options. Your requirement could be
1. Thread safe collection
2. Size of collection (large or small collection)
3. Type of operation ( adding, removing, accessing or iterating )
If you want your collection to be thread safe then Vector or Stack must be used because both have synchronized methods. While ArrayList and LinkedList are not thread safe. Stack is meant for specific LIFO (last in - first out) operations, this can be filtered down based on this specific requirement. If you don't want your collection to be thread safe then you have can choose between ArrayList or LinkedList. General concept from performance point of view is that ArrayList gives better performance when accessing and iterating objects whereas LinkedList gives better performance when adding and removing objects. Although true in most cases, sometimes there is an exception.
The following are the benchmarks shown for each operation separately.
Adding objects:
The table below shows time taken when I run ListAddTest.java which adds elements at different positions and is run on different JDK versions to measure performance bench marks.  Time shown here is in milli seconds.
Size of collectionType of operationJDK versionArrayList with out initializationArrayList with initializationVector with out initializationVector with initializationLinkedList
50000 objectsAdding objects at end1.2195153025390
1.37010602050
Adding objects at middle1.22488251325782508162413
1.3254325532619257496553
Adding objects at first1.2553860535472837745
1.3649958485568607830
10000 objectsAdding objects at end1.2215357060325
Adding objects at middle1.211191114161100110875662172
Adding objects at first1.211587111996211883112029385

conclusion is
Type of operationArrayList with out initializationArrayList with initializationVector with out initializationVector with initializationLinkedList
Adding objects at endfast (but slower than initialization)fastfast (but sligtly slower than initialization and slower than ArrayList because of synchronization)fast(but sligtly slower than ArrayList because of synchronization)fast ( but slightly slower than ArrayList and Vector)
Adding objects at middleslow ( slower than when adding objects at last)slow ( slower than when adding objects at last)slow ( slower than when adding objects at last)slow ( slower than when adding objects at last)worse( worse than every operation)
Adding objects at firstslow ( slower than when adding objects at last and middle)slow ( slower than when adding objects at last and middle)slow ( slower than when adding objects at last and middle)slow ( slower than when adding objects at last and middle)slow ( slower than when adding objects at last and middle)

The reason is
1. JDK 1.3 gives best performance because of HotSpot Virtual Machine.
2. The initial size for ArrayList and Vector is 10. ArrayList increases its capacity by half approximately whenever its capacity reaches maximum (10) but Vector increases its capacity by double whenever its capacity reaches maximum. That is the reason why ArrayList takes more time than Vector if it is not initialized with proper size though ArrayList is not synchronized. As soon as it reaches its maximum capacity when adding objects, it creates one more bigger array ( with 15 capacity for ArrayList approximately and 20 capacity for Vector) and copies the previous and new objects into new array. Obviously it is expensive to create new array and copy objects. So best approach is to initialize the ArrayList and Vector with proper size using constructors or using ensureCapacity(int capacity) which gives good performance.  If you initialize with proper size then the ArrayList gives better performance than Vector.
ArrayList with initialization gives better performance than others because its methods are non-synchronized. Synchronized methods are bit expensive because JVM has to lock the objects whenever it finds synchronized methods.
Vector takes slightly more time than ArrayList when you use JDK1.3 Hotspot JVM ,if you are not sure that whether your collection needs to be thread safe or not then it is better to use Vector to have higher safety.
You can convert an ArrayList as thread safe collection using Collections.synchronizedList(ArrayList object) but it is more expensive than using a Vector.

3. ArrayList and Vector maintain internal Object array ( Object[]) to store objects. So whenever you add an object, they add it to the end of the array which is  fine as long as it doesn't reach its maximum capacity. If you want to add an object at any other position, it creates a new object array and recopies all the objects which is expensive. That is the reason why adding objects at middle and beginning of collection takes a long time than when it is adding at the end. 
4. LinkedList gives good performance when adding elements at the end and beginning but it is worse when adding objects at middle because it needs to scan the node whenever it needs to add an object. LinkedList cannot be initialized.
The constructors for ArrayList and Vector to initialize with proper size are
ArrayList( int initialcapacity)
Vector( int initialcapacity)
Vector( int initialcapacity, int capacityIncrement)
You can give incremental capacity in Vector to change the default increment in capacity.
Here is the ListAddTest.java source code.

package com.performance.util;
 //This class gives the List classes  benchmarks for adding objects at  end,middle and first
 import java.util.List;
import java.util.ArrayList;
import java.util.LinkedList;
import java.util.Vector;
 public class ListAddTest {
              private static final int NUM = 50000;
             private static String[] objs = null;
 public void addLast(List list) {
              long startTime = System.currentTimeMillis();
              for(int i=0;i<NUM;i++){
                          list.add(objs[i]);
             }
             long endTime = System.currentTimeMillis();
             System.out.println("Time taken for adding Objects at End: "
                                                                        + (endTime - startTime) );
              }
public void addFirst(List list) {
              long startTime = System.currentTimeMillis();
              for(int i=0;i<NUM;i++){
                          list.add(0,objs[i]);
             }
            long endTime = System.currentTimeMillis();
             System.out.println("Time taken for adding Objects at First : "
                                                                        + (endTime - startTime) );
            }
public void addMiddle(List list) {
             long startTime = System.currentTimeMillis();
             for(int i=0;i<NUM;i++){
                          list.add(i/2,objs[i]);
             }
             long endTime = System.currentTimeMillis();
             System.out.println("Time taken for adding Objects at Middle : "
                                                                        + (endTime - startTime) );
            }
public void doTest(List list) {
                          addLast(list);
                          clear(list);
                          addMiddle(list);
                          clear(list);
                          addFirst(list);
                          clear(list);
             }
public void clear(List col){
                          if(!col.isEmpty())
                                          col.clear();
}
public static void main(String[] args){
             objs = new String[NUM];
             for(int i=0;i<NUM;i++){
                          objs[i] = "Object"+i;
             }
             ListAddTest col = new ListAddTest();
             ArrayList collection1 = new ArrayList();
             col.doTest(collection1);
             ArrayList collection1A = new ArrayList(NUM);
             col.doTest(collection1A);
             Vector collection2 = new Vector();
             col.doTest(collection2);
             Vector collection2A = new Vector(NUM);
             col.doTest(collection2A);
             LinkedList collection4 = new LinkedList();
             col.doTest(collection4);
             }
}

Removing objects:
The table below shows time taken when I run ListRemoveTest.java on removing elements from different positions to measure performance bench marks.  Time shown here is in milliseconds.

Size of collectionJDK versionType of operationArrayListVectorLinkedList
20000 objects1.3Removing objects from end05012
Removing objects from middle3703022240
Removing objects from first6156170
50000 objectsRemoving objects from end605015
Removing objects from middle1810181046850
Removing objects from first615558620
80000 objectsRemoving objects from end15520
Removing objects from middle54275972139952
Removing objects from first308273637027

The conclusion for removing objects is
  1. All classes take approximately same time when removing objects from end
  2. ArrayList and Vector give similar performance with slight difference because of JDK1.3 Hotspot JVM.
  3. LinkedList  gives worst performance when removing objects from middle (similar to adding objects at middle).
  4. LinkedList gives better performance when removing objects from the beginning.
  5. Only LinkedList gives better performance when removing objects from the beginning.
Here is the ListRemoveTest.java source code

package com.performance.util;
// This class shows the benchmarks of List types for removing objects at end, middle and first 
import java.util.List;
import java.util.ArrayList;
import java.util.LinkedList;
import java.util.Vector;
import java.util.Arrays;
public class ListRemoveTest {
             private static final int NUM = 20000;
             private static Object[] objs = null;
public void removeLast(List list) {
             long startTime = System.currentTimeMillis();
             for(int i=NUM;i>0;i--){
                          list.remove(i-1);
             }
             long endTime = System.currentTimeMillis();
             System.out.println("Time taken for removing Objects at End: "
                                                                        + (endTime - startTime) );
             }
public void removeFirst(List list) {
             long startTime = System.currentTimeMillis();
             for(int i=0;i<NUM;i++){
                          list.remove(0);
             }
             long endTime = System.currentTimeMillis();
             System.out.println("Time taken for removing Objects at First : "
                                                                        + (endTime - startTime) );
             }
public void removeMiddle(List list) {
             long startTime = System.currentTimeMillis();
             for(int i=0;i<NUM;i++){
                          list.remove((NUM-i)/2);
             }
             long endTime = System.currentTimeMillis();
             System.out.println("Time taken for removing Objects at Middle : "
                                                                        + (endTime - startTime) );
             }
public void doTest(List collection) {
                          collection.addAll(getList()); // fill the List
                          removeLast(collection);
                          clear(collection);
                          collection.addAll(getList()); // fill the List
                          removeMiddle(collection);
                          clear(collection);
                          collection.addAll(getList()); // fill the List
                          removeFirst(collection);
                          clear(collection);
             }
public void clear(List col){
                          if(!col.isEmpty())
                                          col.clear();
}
public List getList(){
             objs = new Object[NUM];
             for(int i=0;i<NUM;i++){
                          objs[i] = new Object();
             }
              return Arrays.asList(objs);       
}
public static void main(String[] args){
             ListRemoveTest col = new ListRemoveTest();
             ArrayList collection1 = new ArrayList();
             col.doTest(collection1);
             Vector collection2 = new Vector();
             col.doTest(collection2);
             LinkedList collection4 = new LinkedList();
             col.doTest(collection4);
             }
}

Accessing objects:
The table below shows time taken when I run ListAccessTest.java for accessing elemnts at different positions to measure performance bench marks.  Time shown here is in milliseconds.

Size of collectionJDK versionType of operationArrayListVectorLinkedList
25000 objects1.3Accessing objects from end004701
Accessing objects from middle0021002
Accessing objects from first0015
50000 objectsAccessing objects from end04545100
Accessing objects from middle00112100
Accessing objects from first02015
100000 objectsAccessing objects from end050211850
Accessing objects from middle030457090
Accessing objects from first0015
The conclusion is
  1. ArrayList and Vector give best performance because they access objects using index. Vector takes slightly more time but it is negligible.
  2. LinkedList gives worst performance  when accessing objects at end and middle because it has to scan nodes to access objects.
Here is the ListAccessTest.java source code

package com.performance.util;
// This class shows the benchmarks of List types for accessing objects at end, middle and first  
import java.util.List;
import java.util.ArrayList;
import java.util.LinkedList;
import java.util.Vector;
import java.util.Arrays;
public class ListAccessTest {
             private static final int NUM = 25000;
             private static Object[] objs = null;
public void getFromLast(List list) {
             long startTime = System.currentTimeMillis();
             for(int i=NUM;i>0;i--){
                          list.get(i-1);
             }
              long endTime = System.currentTimeMillis();
             System.out.println("Time taken for getting Objects at Last: "
                                                                        + (endTime - startTime) );
             }
public void getFromFirst(List list) {
             long startTime = System.currentTimeMillis();
             for(int i=0;i<NUM;i++){
                          list.get(0);
             }
             long endTime = System.currentTimeMillis();
             System.out.println("Time taken for getting Objects at First : "
                                                                        + (endTime - startTime) );
             }
public void getFromMiddle(List list) {
             long startTime = System.currentTimeMillis();
             for(int i=0;i<NUM;i++){
                          list.get(NUM/2);
             }
             long endTime = System.currentTimeMillis();
             System.out.println("Time taken for getting Objects at Middle : "
                                                                        + (endTime - startTime) );
             }
public void doTest(List list) {
                          list.addAll(getList());
                          getFromLast(list);
                          getFromMiddle(list);
                          getFromFirst(list);
             }
public void clear(List col){
                          if(!col.isEmpty())
                                          col.clear();
}
public static List getList(){
             objs = new Object[NUM];
             for(int i=0;i<NUM;i++){
                          objs[i] = new Object();
             }
              return Arrays.asList(objs);       
}
public static void main(String[] args){
             ListAccessTest col = new ListAccessTest();
             ArrayList collection1 = new ArrayList();
             col.doTest(collection1);
             Vector collection2 = new Vector();
             col.doTest(collection2);
             LinkedList collection4 = new LinkedList();
             col.doTest(collection4);
             }
}

Iterating collection:
Iterating collection using all three types of classes ,ArrayList ,Vector and LinkedList gives similar performance because they need not do any extra work  they simply iterate one by one. So I did not give any benchmark results. You can use any Iterator . But using ListIterator gives more flexibility than Iterator and Enumeration. You can traverse both sides

Optimization techniques in Sets
Set is a collection of unique objects, it doesn't allow duplicate objects and modification of existing objects. Set types also allow basic operations like adding objects, removing objects, accessing objects, iterating objects but do not allow modification. There are two implementations of the Set interface they are HashSet and TreeSet. 
HashSet gives better performance than TreeSet because , TreeSet is an ordered collection of objects and the objects are sorted while they are inserted in the TreeSet where as in case of HashSet objects are added in an adhoc manner. It is expensive to do all basic operations in TreeSet because it has to compare and sort every object. We can get better performance by using a HashSet and converting  it to a TreeSet later on.
HashSet and TreeSet are backed by HashMap and TreeMap respectively. Whenever we use a HashSet we can specify an initial capacity and load factor using constructors. The default size for a HashSet is 11 and it's load factor is 0.75. Load factor determines at which capacity HashSet has to be resized. It's internal structure will become double in size when it reaches it's maximum capacity based on load factor. HashSet scales well when it is initialized with proper size and default load factor. When you know the the number of objects to be added, it is better to initialize with that capacity and put load factor as 1.0f. The objects in HashSet are stored and retrieved through hash code which provides constant look up time.  I did not give any bench marks for these two Sets because We don't have many options here to compare and evaluate. We have two Sets to choose. Use TreeSet if you want sorted collection otherwise use HashSet.
The constructors for the HashSet to initialize with proper size are:
HashSet(int initialcapacity)
HashSet(int initialcapacity, float loadfactor)

Map is a collection of key and value object associations. You can do all basic operations as in Lists and Sets such as adding , removing ,and accessing key-value pairs. There are four types of Map implementations they are HashMap, Hashtable, WeakHashMap and TreeMap.
HashMap, Hashtable and WeakHashMap have similar implementations. TreeMap is meant for sorted collection, this can be filtered down based on the requirement. Then we have other three types to choose from. The choice  again depends upon your requirement. Your requirement could be
1. Thread safe
2. Type of operation ( basic operations )
HashMap and  WeakHashMap are not synchronized whereas Hashtable is synchronized.
WeakHashMap is a special purpose map which uses an internal hashtable. When there are no more references to key object except weak reference maintained by WeakHashMap, the garbage collector reclaims the key object and mapping between key object and value object is also reclaimed, if the value object does not have any other references then the value object is also reclaimed by the garbage collector.
If you want your Map type collection to be thread safe, then you need to use Hashtable otherwise use HashMap. HashMap gives better performance than Hashtable because of it's non-synchronized methods. The reason I did not give any bench marks for Map types is that it is pretty straight forward to choose Map type depending on requirement .
You can improve performance by using proper initial size and load factor in the constructor for all types of Map types except TreeMap.
The constructors are
HashMap(int initialcapacity)
HashMap(int initialcapacity, float loadfactor)
Hashtable(int initialcapacity)
Hashtable(int initialcapacity, float loadfactor)
WeakHashMap(int initialcapacity)
WeakHashMap(int initialcapacity, float loadfactor)
When the number of objects exceed loadfactor capacity, then the capacity of the class increases to (2*capacity + 1). The default load factor is 0.75.   
All these classes work in accordance with hash values which are used to identify the value objects. The key objects are converted to integer called hash code by using hashing algorithms which is used as an index for value objects. Hash code must be same for two equal objects, i.e. when tested with the equals() method,it must return true.  The hash code is determined by using hashCode() method. The hash code of a collection is determined by cumulating all the hash codes of the associated objects.



    Lists:
  1. Use ArrayList with proper initialization if you don't want thread safe for the collection whenever you  add/remove/access objects at end and middle of collection.
  2. Use Vector with proper initialization if you want thread safe for the collection whenever you  add/remove/access objects at end and middle of collection.
  3. Use LinkedList if you don't want thread safe for the collection whenever you  add/remove/access objects at beginning of collection.
  4. Use synchronized LinkedList if you want thread safe for the collection whenever you add/remove/access objects at beginning of collection.
  5. Use ListIterator than Iterator and Enumeration for List types
Sets:
  1. Use HashSet for maintaining unique objects if you don't want thread safe for the collection for all basic(add/remove/access) operations otherwise use synchronized HashSet for thread safe.
  2. Use TreeSet for ordered and sorted set of unique objects for non-thread safe collection otherwise use synchronized TreeSet for thread safe
Maps:
  1. Use HashMap for non-thread safe map collection otherwise use Hashtable for thread safe collection.
  2. Use TreeMap for non-thread safe ordered map collection otherwise use synchronized TreeMap for thread safe.


Feed back

No comments:

Post a Comment

Angular Tutorial (Update to Angular 7)

As Angular 7 has just been released a few days ago. This tutorial is updated to show you how to create an Angular 7 project and the new fe...