Fine tuning Java G1 Garbage Collector

java_logoG1 is the new garbage collector in Java’s Virtual Machine. In JDK 7 / JRE 7, he is still the current CMS collector garage detached. As a basis for this article, I recommend my article on the CMS GC from 31.12.2010 zulesen. The name G1 stands for “Garbage First.” It refers to de basic idea of G1 first strongly “contaminated” areas (regions) to put in order.

G1 works in parallel and nearly concurrent. He has been optimized for agility. While cleaning it provides only very short breaks: are called the stop-the-world. Generally, he is pursuing the same intention as CMS: the Java heap is subdivided into young and tenured generation. In detail, the implementation has been solved, however, very different.

The heap is no longer divided into two Generations. Now it is split into n “region” of 1 MiB. The regions are marked in a global with CardSpace one-byte values. Each of these regions may therefore be part of the YoungGen, Eden, From or To-Survivor-Spaces, or tenured. Therefore, the basic idea in G1 and CMS is the same. The reaction was matched only by 1-MiB-regions on agility and thread safety. So very short breaks are possible and the GC can run independent threads al Gusto. This unnecessary GC runs are avoided and the stop-the-world pauses minimized.

Since I have been working on a request-intensive project I wanted to share the results of the test we have realized. But before a little intro from Oracle:

The Garbage-First (G1) collector is a server-style garbage collector, targeted for multi-processor machines with large memories. It meets garbage collection (GC) pause time goals with a high probability, while achieving high throughput. The G1 garbage collector is fully supported in Oracle JDK 7 update 4 and later releases. The G1 collector is designed for applications that:

Can operate concurrently with applications threads like the CMS collector.
Compact free space without lengthy GC induced pause times.
Need more predictable GC pause durations.
Do not want to sacrifice a lot of throughput performance.
Do not require a much larger Java heap.
G1 is planned as the long term replacement for the Concurrent Mark-Sweep Collector (CMS). Comparing G1 with CMS, there are differences that make G1 a better solution. One difference is that G1 is a compacting collector. G1 compacts sufficiently to completely avoid the use of fine-grained free lists for allocation, and instead relies on regions. This considerably simplifies parts of the collector, and mostly eliminates potential fragmentation issues. Also, G1 offers more predictable garbage collection pauses than the CMS collector, and allows users to specify desired pause targets.

 

Setup:

Server: ProLiant DL585 G7
Memory : 98304 MB (8192 MB 1333 MHz x 12)
Processor: 4 sockets(16 cores each) 64 cores AMD Opteron™ 6300 Series

Processor Speed 2800 MHz
Execution technology 16/16 cores; 16 threads
Memory Technology 64-bit Capable
Internal L1 cache 768 KB
Internal L2 cache 16384 KB
Internal L3 cache 16384 KB

Tomcat v 7.0.42
Java version: Java(TM) SE Runtime Environment (build 1.7.0_40-b43)
Operating system: Red Hat Enterprise Linux Server release 6.4 (Santiago) (Default gui install)
Kernel: Linux 2.6.32-358.el6.x86_64 #1 SMP Tue Jan 29 11:47:41 EST 2013 x86_64 x86_64 x86_64 GNU/Linux

We are working on a project where we will be using Terracotta Big Memory to contantly get certain data from our 3 node Oracle 11G RAC in order to decrease the load on Oracle servers. This system will help us respond to the 12 million + daily requests that we are receiving and processing. As well as the correct answer for the requests the aim is to respond to the clientes under 1 sec. In that case which leaves me with the >600ms window to respond, after receiving the request.

Here are the tests and the results we have realized on 02 Oct 2013:

Test01

Result 1.9 seconds.

Test02

Result 959 ms

Test03

Result 1sec 32 ms

Test04

Result 882 ms

Test05

Result 885 ms

Test06

Result 744 ms

Test07

Result 614 ms

Test08

Result 815 ms

Test09

Result 809 ms

Test10

Result 704 ms

Test11

Result 939.23 ms

Test12

Result 1 sec 39 ms

Test13

Result 736,71 ms

Test14

Result 880 ms

Test15

Result 900ms

Test16 

Result 956 ms

Test17 (Repeated at 16.18h )

Result 808 ms

Test18 (Repeated 16.24h) 

Result 777,13

Test19

Result 917 ms

Test20

Result 885

Test 21 FINAL

Result 590 ms

Since we have acrhived the optimum GC method and needed options, keeping the GC config same we have started toi fiddle with sysctl.conf:

Test21 (Sysctl config01)

Result 700 ms

Test22 (Syscrl Config02)

Result 626 ms

Test23 (Sysctl Config03)

Result 660 ms

Test24 (Sysctl Config04)

Result 594.48 ms

Test25 (Sysctl Config05)

Result 639

Test26 (Sysctl Config06) (Best result until now)

Result 586.16 ms

After all 26 tests which took about 7 hours of time we have managed to drop the response time from 1.9 sec to 586.16ms. A1/3 ratio.

Here are the last results we got from the aplication:

Type Total Rq Timeout >500ms >600ms >1000 Average
GC1_Unlock 6905.00 1481.00 2719.00 504.00 1711.00 673.12
GC1_Heap_occupancy_%0 6809.00 1771.00 2595.00 417.00 1609.00 716.85
GC1_HO%80 7214.00 1145.00 3348.00 613.00 1668.00 620.86
GC1_HO%80_PT_1000 3803.00 2618.00 425.00 83.00 329.00 1308.78
GC1_HO%80_ParallGCThreads1000 7407.00 6327.00 3612.00 726.00 1635.00 593.43
GC1_HO%80_ParallGCThreads10 7833.00 832.00 4152.00 746.00 1819.00 540.90
GC1_HO%80_ParallGCThreads5 7074.00 1259.00 3280.00 437.00 1623.00 618.65
GC1_HO%80_PT_10th_1sec_CG10 8414.00 614.00 5724.00 762.00 1214.00 455.57
GC1_HO%80_PT_10th_1sec_CG10_MPM_100 7476.00 968.00 3435.00 583.00 2059.00 588.19
GC1_HO%80_PT_10th_1sec_CG10_NewRatio10 7539.00 985.00 3707.00 642.00 1802.00 577.64
GC1_HO%100_PT_10th_1sec_CG10 7931.00 846.00 4707.00 754.00 1395.00 511.68
GC1_HO%80_PT_5th_1sec_CG5 6994.00 1451.00 3128.00 417.00 1609.00 639.86
GC1_HO%80_PT_2th_1sec 6886.00 1634.00 3174.00 466.00 1015.00 657.95
GC1_HO%80_PT_8th_1sec 7854.00 842.00 4240.00 673.00 1817.00 541.88
GC1_HO%80_PT_6th_1sec 7221.00 1296.00 3169.00 536.00 1821.00 622.29
GC1_HO%80_PT_4th_1sec 6824.00 1616.00 3131.00 437.00 1206.00 654.00
Repeat
GC1_HO%80_ParallGCThreads5 6944.00 1519.00 3196.00 439.00 1384.00 635.33
GC1_HO%80_PT_10th_1sec_CG10_newRatio10 7533.00 1115.00 3690.00 565.00 1731.00 587.11
GC1_HO%80_PT_10th_1sec_CG10 7670.00 920.00 4404.00 582.00 1498.00 534.64
GC1_HO%80_PT_20th_1sec_CG20 7090.00 1269.00 3020.00 701.00 1674.00 641.47
GC1_HO%80_PT_20th_1sec_CG10 7212.00 1000.00 3454.00 603.00 1717 599.25
GC1_HO%80_PT_10th_1sec_CG10 rep 16:52 8434.00 616.00 5491.00 794.00 1353.00 459.17
GC1_HO%80_PT_10th_1sec_CG10 SYSCTL Config01 8076.00 752.00 4672.00 685.00 1713.00 513.26
GC1_HO%80_PT_10th_1sec_CG10 SYSCTL Config02 8363.00 619.00 5208.00 869.00 1519.00 478.86
GC1_HO%80_PT_10th_1sec_CG10 SYSCTL Config03 8179.00 695.00 4542.00 863.00 1842.00 513.66
GC1_HO%80_PT_10th_1sec_CG10 SYSCTL Config04 8543.00 535.00 5630.00 895.00 1310.00 446.84
GC1_HO%80_PT_10th_1sec_CG10 SYSCTL Config05 8307.00 542.00 5132.00 892.00 1571.00 475.57
GC1_HO%80_PT_10th_1sec_CG10 SYSCTL Config06 8575.00 512.00 5696.00 844.00 1401.00 454.37

Leave Your Comment