Java application latency reduction and pitfalls

One of the hard and ambiguous problems I dealt with in my application development career was improving latency for a distributed data retrieval application.

It was a containerized Java application which was used to serve product ads on one of the biggest retail websites. The idea was to reduce latency so that it can provide room for additional processing especially to run and experiment advance machine learning models to serve better ads for customers.

One of the techniques I used was memory analysis to get insights into activity around the JVM memory usage. Although it sounds trivial but I discovered major roadblocks which took sometime for me to figure out. In the end I was able to overcome each of them and successfully reduced application p99 latency from 400ms to 240ms.

Latency reduction was a new challenge for me so I needed optimal tools to tackle it. There were many tools available both open source and paid but I found eclipse memory analysis tool MAT to be most useful among the free ones. There are many articles on how to install and use MAT so I won’t go in detail for the same.

In this article I will cover the challenges related to memory analysis of large production application and how to overcome the same.

Challenges

  1. JVM heap memory footprint of large applications are quite big and in my case it was around 100 GB. Analyzing such big heap dump requires a lot of memory to run the analyzer tool and is usually slow on regular laptops.

  2. Large heap dumps consume equal amount of disk space. In case there is not enough disk a heap dump command would fail or in worst case fill up the root partition and crash the host on which it is run.

  3. Heap dump is a stop the world event. Taking heap dump pauses all activity in the application which can result in health check failure for the service and can result in it being terminated making it hard to grab the heap dump file.

Solution

  1. In case of large heap dump it would be best to use a cloud based resource like AWS EC2 with sufficient memory and disk space.

  2. To solve disk space issue, if the application is running on some cloud resource then they usually have separate storage attached to it. Separate storage can be increased before taking heap dump.

  3. Check if application is monitored using a periodical health check e.g. if it is part of a load balancer. In such case, it needs to be taken out of the serving fleet to avoid it being terminated once heap dump command is started.

  4. Take multiple heap dumps at certain interval to capture change in service state.

Improvements

  1. One of the biggest culprit was an in-memory cache which was resulting in excessive retained heap, thus, resulting in frequent garbage collection with latency impact.

  2. Memory analysis gave major clue regarding how data index which was used for data retrieval was being used. It turned out that full index was loaded in JVM heap and was also stored on tmpfs, thus, using twice the amount of required memory which was unnecessary and was also resulting in frequent garbage collection.

Conclusion

Analyzing memory for any large scale production application is critical.

Caching data within application may be useful but should be closely monitored to keep an eye for any degradation over time.

Heap dump analysis is powerful tool if done using correct machines and tools else it can become painful.

Keep an eye for health check routines for production application while taking heap dump to successfully collect the same.

I did not dived too much into details to keep the article short. In case anyone wants more information feel free to message me.