Last week was JBoss World, and it was exciting to be a part of it. I gave a presentation on performance tuning our Enterprise Application Platform or EAP, and it was packed. In fact, people were sitting on the floor in pretty much all available space. What struck me about the presentation, and much of the discussion I had with individuals afterwards, is that JVM tuning is a big topic. So, I thought I would share some of what I learned over the past couple of months as I was preparing for my presentation.
In preparing for my presentation, I wrote an EJB 3 application, wrote a load test for it, and applied optimizations to various configuration parameters within the EAP, the JVM and the operating system. In particular, one JVM and OS setting really made a huge difference in throughput, and its something that I wanted to share here.
When using a 64-bit OS, in my case Fedora 8 and RHEL 5.1, I wanted to investigate the usage of large page memory support, or HugeTLB as its referred to within the Linux kernel. What I found was very scarce documentation around using this, and that that documentation was incomplete to actually make it work. What I also found, is it makes a huge difference in overall throughput and response times of an application, when using heap sizes above 2GB.
So, without further ado, let's dive into how to set this up. These instructions are for Linux, specifically for Fedora 8 and RHEL 5.1, but the results should be generally applicable to any 64-bit OS and 64-bit JVM that supports large page memory (which all the proprietary UNIX's do, and I found an MSDN article describing how to use this on 64-bit Windows).
You must have root access for these settings. First, you need to set the kernel parameter for shared memory to be at least as big as you need for the amount of memory you want to set aside for the JVM to use as large page memory. Personally, I like to just set it to the maximum amount of memory in the server, so I can play with different heap sizes without having to adjust this every time. You set this by putting the following entry into /etc/sysctl.conf:
kernel.shmmax = n
where n is the number of bytes. So, if you have a server with 8GB of RAM, then you would set it to 8589934592, or 1024*1024*1024*8, which is 8GB.
Second, you need to set a virtual memory kernel parameter to tell the OS how many large memory pages you want to set aside. You set this by putting the following entry into /etc/sysctl.conf:
vm.nr_hugepages = n
where n is the number of pages, based on the page size listed in /proc/meminfo. If you cat /proc/meminfo you will see the large page size of your particular system. This varies depending on the architecture of the system. Mine, is an old Opteron system, and it has a page size of 2048 KB, as shown by the following line in /proc/meminfo:
Hugepagesize: 2048 kB
So, I wanted to set this to 6GB. I set the parameter to 3072, which is (1024*1024*1024*6)/(1024*1024*2). Which is 6GB divided by 2MB, since 2048 KB is 2MB.
After this, you need to set another virtual memory parameter, to give permission for your process to access the shared memory segment. In /etc/group, I created a new group, called hugetlb, you can call it whatever you like, as long as it doesn't collide with any other group name, and it had a value of 501 on my system (it will vary, depending on whether you use the GUI tool, like I did, or whether you do it at the command line, and vary depending on what groups you already have defined). You put that group id in /etc/sysctl.conf as follows:
vm.hugetlb_shm_group = gid
where gid, in my case was 501. You also add that group to whatever your user id is that the JVM will be running as. In my case this was a user called jboss.
Now, that concludes the kernel parameter setup, but there is still one more OS setting, which changes the users security permissions to allow the user to use the memlock system call, to access the shared memory. Large page shared memory is locked into memory, and cannot be swapped to disk. Another major advantage to using large page memory. Having your heap space swapped to disk can be catastrophic for an application. So, you set this parameter in /etc/security/limits.conf as follows:
jboss soft memlock n
jboss hard memlock n
where n is equal to the number of huge pages, set in vm.nr_hugepages, times the page size from /proc/meminfo, which in my example would be, 3072*2048 = 6291456. This concludes the OS setup, and now we can actually configure the JVM.
The JVM parameter for the Sun JVM is -XX:+UseLargePages (for BEA JRocket its -XXlargePages, and for IBM's JVM its -lp). If you have everything setup correctly, then you should be able to look at /proc/meminfo and see that the large pages are being used after starting up the JVM.
A couple of additional caveats and warnings. First, you can dynamically have the kernel settings take affect by using sysctl -p. In most cases, if the server has been running for almost any length of time, you may not get all the pages you requested, because large pages requires contiguous memory. You may have to reboot to have the settings take affect. Second, when you allocate this memory, it is removed from the general memory pool and is not accessible to applications that don't have explicit support for large page memory, and are configured to access it. So, what kind of results can you expect?
Well, in my case, I was able to achieve an over 3x improvement in my EJB 3 application, of which fully 60 to 70% of that was due to using large page memory with a 3.5GB heap. Now, a 3.5GB heap without the large memory pages didn't provide any benefit over smaller heaps without large pages. Besides the throughput improvements, I also noticed that GC frequency was cut down by two-thirds, and GC time was also cut down by a similar percentage (each individual GC event was much shorter in duration). Of course, your mileage will vary, but this one optimization is worth looking at for any high throughput application.
Good luck!
In preparing for my presentation, I wrote an EJB 3 application, wrote a load test for it, and applied optimizations to various configuration parameters within the EAP, the JVM and the operating system. In particular, one JVM and OS setting really made a huge difference in throughput, and its something that I wanted to share here.
When using a 64-bit OS, in my case Fedora 8 and RHEL 5.1, I wanted to investigate the usage of large page memory support, or HugeTLB as its referred to within the Linux kernel. What I found was very scarce documentation around using this, and that that documentation was incomplete to actually make it work. What I also found, is it makes a huge difference in overall throughput and response times of an application, when using heap sizes above 2GB.
So, without further ado, let's dive into how to set this up. These instructions are for Linux, specifically for Fedora 8 and RHEL 5.1, but the results should be generally applicable to any 64-bit OS and 64-bit JVM that supports large page memory (which all the proprietary UNIX's do, and I found an MSDN article describing how to use this on 64-bit Windows).
You must have root access for these settings. First, you need to set the kernel parameter for shared memory to be at least as big as you need for the amount of memory you want to set aside for the JVM to use as large page memory. Personally, I like to just set it to the maximum amount of memory in the server, so I can play with different heap sizes without having to adjust this every time. You set this by putting the following entry into /etc/sysctl.conf:
kernel.shmmax = n
where n is the number of bytes. So, if you have a server with 8GB of RAM, then you would set it to 8589934592, or 1024*1024*1024*8, which is 8GB.
Second, you need to set a virtual memory kernel parameter to tell the OS how many large memory pages you want to set aside. You set this by putting the following entry into /etc/sysctl.conf:
vm.nr_hugepages = n
where n is the number of pages, based on the page size listed in /proc/meminfo. If you cat /proc/meminfo you will see the large page size of your particular system. This varies depending on the architecture of the system. Mine, is an old Opteron system, and it has a page size of 2048 KB, as shown by the following line in /proc/meminfo:
Hugepagesize: 2048 kB
So, I wanted to set this to 6GB. I set the parameter to 3072, which is (1024*1024*1024*6)/(1024*1024*2). Which is 6GB divided by 2MB, since 2048 KB is 2MB.
After this, you need to set another virtual memory parameter, to give permission for your process to access the shared memory segment. In /etc/group, I created a new group, called hugetlb, you can call it whatever you like, as long as it doesn't collide with any other group name, and it had a value of 501 on my system (it will vary, depending on whether you use the GUI tool, like I did, or whether you do it at the command line, and vary depending on what groups you already have defined). You put that group id in /etc/sysctl.conf as follows:
vm.hugetlb_shm_group = gid
where gid, in my case was 501. You also add that group to whatever your user id is that the JVM will be running as. In my case this was a user called jboss.
Now, that concludes the kernel parameter setup, but there is still one more OS setting, which changes the users security permissions to allow the user to use the memlock system call, to access the shared memory. Large page shared memory is locked into memory, and cannot be swapped to disk. Another major advantage to using large page memory. Having your heap space swapped to disk can be catastrophic for an application. So, you set this parameter in /etc/security/limits.conf as follows:
jboss soft memlock n
jboss hard memlock n
where n is equal to the number of huge pages, set in vm.nr_hugepages, times the page size from /proc/meminfo, which in my example would be, 3072*2048 = 6291456. This concludes the OS setup, and now we can actually configure the JVM.
The JVM parameter for the Sun JVM is -XX:+UseLargePages (for BEA JRocket its -XXlargePages, and for IBM's JVM its -lp). If you have everything setup correctly, then you should be able to look at /proc/meminfo and see that the large pages are being used after starting up the JVM.
A couple of additional caveats and warnings. First, you can dynamically have the kernel settings take affect by using sysctl -p. In most cases, if the server has been running for almost any length of time, you may not get all the pages you requested, because large pages requires contiguous memory. You may have to reboot to have the settings take affect. Second, when you allocate this memory, it is removed from the general memory pool and is not accessible to applications that don't have explicit support for large page memory, and are configured to access it. So, what kind of results can you expect?
Well, in my case, I was able to achieve an over 3x improvement in my EJB 3 application, of which fully 60 to 70% of that was due to using large page memory with a 3.5GB heap. Now, a 3.5GB heap without the large memory pages didn't provide any benefit over smaller heaps without large pages. Besides the throughput improvements, I also noticed that GC frequency was cut down by two-thirds, and GC time was also cut down by a similar percentage (each individual GC event was much shorter in duration). Of course, your mileage will vary, but this one optimization is worth looking at for any high throughput application.
Good luck!