DTrace to Troubleshoot Java Native Memory Problems

时间：2016-07-01 14:47:48 阅读：240 评论：0 收藏：0 [点我收藏+]

标签：

How to Use DTrace to Troubleshoot Java Native Memory Problems on Oracle Solaris 11

Hands-On Labs of the System Admin and Developer Community of OTN

by Wang Yu, Vincent Liu, and Gary Wang

This lab will introduce the basic concepts of DTrace and provide exercises for using DTrace to identify common Java native memory problems, such as direct buffer overflow, Inflater/Deflater memory leak, and thread stack overflow.

Published February 2014

Want to comment on this article? Post the link on Facebook‘s OTN Garage page. Have a similar article to share? Bring it up on Facebook or Twitter and let‘s discuss.

Table of Contents
Introduction
Intended Audience
System Requirements
Prerequisites
Notes for Users
Installation and Configuration (15 Minutes)
Exercise 1: Introduction to DTrace (30 Minutes)
Exercise 2: Identifying and Solving "Inflater/Deflater Memory Leak" Problems (30 Minutes)
Exercise 3: Identifying and Solving "Cannot Create Native Threads" Problems (40 Minutes)
Exercise 4: Identifying and Solving "Direct ByteBuffer Memory Leak" Problems (40 Minutes)
Summary
See Also
About the Authors

Introduction

DTrace is a comprehensive dynamic tracing framework for the Oracle Solaris operating system. DTrace provides a powerful infrastructure for troubleshooting kernel and application problems in production environments. It allows administrators and developers to concisely answer arbitrary questions about the behavior of user applications and operating systems.

Java native memory leaks are one of the hardest technical issues to solve. JProfiler, VisualVM, heap dump, and thread dump tools have limited capabilities for handling this kind of issue. Google Performance Tools are very popular for identifying issues caused by native memory allocation using malloc. These tools provide a high-performance multithreaded malloc() implementation, with checkpoints for collecting needed information.

The DTrace facility provided by Oracle Solaris 11 is able to detect and troubleshoot various Java native memory problems. Moreover, DTrace is easy to learn and lightweight, and it can be safely used in a production environment.

This lab will introduce the basic concepts of DTrace and provide exercises for using DTrace to identify common Java native memory problems, such as direct buffer overflow, Inflater/Deflater memory leak, and thread stack overflow. It will take you approximately 155 minutes to finish this lab.

Intended Audience

This hands-on lab assumes you have some basic knowledge about the following technologies:

Java language programming
Oracle Solaris or a similar UNIX or Linux OS

System Requirements

Ensure your system meets the following requirements:

Operating system: Solaris 11.1
Memory: 2 GB
Available disk space: 20 GB

Prerequisites

Please download the following software.

Apache Ant 1.9.2 (or a more recent version), which you will install in the "Installation and Configuration" section
Java SE 7.0 U45 (or a more recent version), which you will install in the "Installation and Configuration" section

Then go here for instructions on how to create the directory structure, .d files, .java files, and build.xml files used in the lab exercises.

Notes for Users

Be aware of the following:

For this lab, the GNOME desktop environment is preferred over Oracle Solaris 11 (with desktop packages installed).
In order to open a terminal window in GNOME, right-click any point on the background of the desktop, and selectOpen Terminal in the pop-up menu.
The following source code editors are provided by the Oracle Solaris 11 desktop environment:
- vi (type vi in a terminal window)
- gedit (type gedit in a terminal window)

Installation and Configuration (15 Minutes)

In this section, we are going to prepare our hands-on lab environment. Before you start, make sure you have downloaded the required software, as described in the Prerequisites section.

Install JDK 7

Use the following procedure to install JDK 7 on Oracle Solaris.

Open a command-line console, and move the jdk-7u45-solaris-i586.gz (or later) file you downloaded to the directory where you want to install the JDK.
Uncompress the file, for example using a command similar to the following:
```
$ tar -xvfz jdk-7u45-solaris-i586.gz
```
This command creates a directory named jdk1.7.0_45 in which the JDK is installed.
Add <JDK_7_Installation>/bin to your PATH variable:

For example, for an Oracle Solaris system, type the following (if you are using the bash shell), but replace<JDK_7_Installation> with the directory that corresponds to your environment:
```
$ export PATH=<JDK_7_Installation>/bin:$PATH
```
Add <JDK_7_Installation> to your JAVA_HOME variable:

For example, for an Oracle Solaris system, type the following (if you are using the bash shell), but replace<JDK_7_Installation> with the directory that corresponds to your environment:
```
$ export JAVA_HOME=<JDK_7_Installation>
```
Note: The variable setting above affects only the current console; putting it in your profile file ($HOME/.profile for the bash shell) will make this variable available for all the consoles.

Install Ant

Use the following procedure to install Ant on Oracle Solaris.

Open a command-line console and move the apache-ant-1.9.2-bin.zip (or later) file you downloaded to the directory where you want to install Ant.
Unzip the file, for example, by typing a command similar to the following:
```
$ unzip apache-ant-1.9.2-bin.zip
```
This unzip command creates a directory called apache-ant-1.9.2 into which Ant is installed.
Add <ant_Installation>/bin to your PATH variable, and add <ant_Installation> to your ANT_HOME variable:

For example, for an Oracle Solaris system, type the following (if you are using the bash shell), but replace<ant_Installation> with the directory that corresponds to your environment:
```
$ export PATH=<ant_Installation>/bin:$PATH
$ export ANT_HOME=<ant_Installation>
```
Note: The variable settings above affect only the current console; putting them in your profile file ($HOME/.profile for the bash shell) will make these variables available for all the consoles.

Verify the Installations

Open a command-line console, and type the jps command. If you see a Java process list similar to the following, the JDK is installed successfully.
```
$ jps
1935 Jps
1859 App
```
Open a command-line console, and type the ant command. If you see output similar to the following, Ant is installed successfully.
```
$ ant
Buildfile: build.xml does not exist!
Build failed
```

Exercise 1: Introduction to DTrace (30 Minutes)

In this exercise, we are going to learn basic concepts about D language. The D script files used in this exercise are located at <lab_root>/dtrace_javanative/DScripts.

Background Information: What Is DTrace?

DTrace is a comprehensive dynamic tracing facility that is built into Oracle Solaris and can be used by administrators and developers on live production systems to examine the behavior of both user programs and the operating system itself. DTrace enables you to explore your system to understand how it works, track down performance problems across many layers of software, or locate the cause of aberrant behavior. DTrace lets you create your own custom programs to dynamically instrument the system and provide immediate, concise answers to arbitrary questions you can formulate using the DTrace D programming language.

A D program source file consists of one or more probe clauses that describe the instrumentation to be enabled by DTrace. Each probe clause has the following general form:

probe descriptions
/ predicates /
{
     action statements;
}

Exercise 1, Step 1: Write a ‘Hello, world!‘ Program

After logging in, click the right mouse button on the desktop and choose Open Terminal to bring up a terminal window.
DTrace execution requires root user privilege, so switch to root before trying DTrace commands:
```
ouser@solaris11:~$ su
Password: 
root@solaris11:~#
```

Type dtrace to print the different options for the DTrace command:

root@solaris11:~# dtrace
Usage: dtrace [-32|-64] [-aACeFGhHlqSvVwZ] [-b bufsz] [-c cmd] [-D name[=def]]
    [-I path] [-L path] [-o output] [-p pid] [-s script] [-U name]
    [-x opt[=val]] [-X a|c|s|t]

    [-P provider [[ predicate ] action ]]
    [-m [ provider: ] module [[ predicate ] action ]]
    [-f [[ provider: ] module: ] func [[ predicate ] action ]]
    [-n [[[ provider: ] module: ] func: ] name [[ predicate ] action ]]
    [-i probe-id [[ predicate ] action ]] [ args ... ]

    predicate -> ‘/‘ D-expression ‘/‘
       action -> ‘{‘ D-statements ‘}‘

    -32 generate 32-bit D programs and ELF files
    -64 generate 64-bit D programs and ELF files

    -a  claim anonymous tracing state
    -A  generate driver.conf(4) directives for anonymous tracing
    -b  set trace buffer size
    -c  run specified command and exit upon its completion
    -C  run cpp(1) preprocessor on script files
    -D  define symbol when invoking preprocessor
    -e  exit after compiling request but prior to enabling probes
    -f  enable or list probes matching the specified function name
    -F  coalesce trace output by function
    -G  generate an ELF file containing embedded dtrace program
    -h  generate a header file with definitions for static probes
    -H  print included files when invoking preprocessor
    -i  enable or list probes matching the specified probe id
    -I  add include directory to preprocessor search path
    -l  list probes matching specified criteria
    -L  add library directory to library search path
    -m  enable or list probes matching the specified module name
    -n  enable or list probes matching the specified probe name
    -o  set output file
    -p  grab specified process-ID and cache its symbol tables
    -P  enable or list probes matching the specified provider name
    -q  set quiet mode (only output explicitly traced data)
    -s  enable or list probes according to the specified D script
    -S  print D compiler intermediate code
    -U  undefine symbol when invoking preprocessor
    -v  set verbose mode (report stability attributes, arguments)
    -V  report DTrace API version
    -w  permit destructive actions
    -x  enable or modify compiler and tracing options
    -X  specify ISO C conformance settings for preprocessor
    -Z  permit probe descriptions that match zero probes

Now let‘s try to write a "Hello, world!" program using D language:

ouser@solaris11:~# dtrace -n ‘BEGIN {trace("hello,world");exit(0);}‘
dtrace: description ‘BEGIN ‘ matched 1 probe
 CPU     ID                    FUNCTION:NAME
   0      1                           :BEGIN   hello,world

Alternatively, you can compose a script like the <lab_root>/dtrace_javanative/DScripts/ex1-hello.d file:

BEGIN 
{
    trace("hello, world");
    exit(0);
}

Then run the script file:

root@solaris11:~# dtrace -s ex1-hello.d
dtrace: script ‘ex1-hello.d‘ matched 1 probe
 CPU     ID                    FUNCTION:NAME
   0      1                           :BEGIN   hello,world

Exercise 1, Step 2: Explore Probes

A probe is a location or activity to which DTrace can bind a request to perform a set of actions, such as recording a stack trace, a time stamp, or the argument to a function. Probes are like programmable sensors scattered all over your Oracle Solaris system in interesting places. If you want to figure out what‘s going on, you use DTrace to program the appropriate sensors to record the information that is of interest to you.

Let‘s see what kinds of probes there are in Oracle Solaris 11.1 by running the following command:

root@solaris11:~# dtrace -l | more
   ID   PROVIDER            MODULE                          FUNCTION NAME
    1     dtrace                                                     BEGIN
    2     dtrace                                                     END
    3     dtrace                                                     ERROR
    4 javascript2010         libxul.so __1cCjsHExecute6FpnJJSContext_pnIJSScript_
rnIJSObject_pnCJSFValue__b_ execute-done
    5 javascript2010         libxul.so __1cCjsNExecuteKernel6FpnJJSContext_pnIJSS
cript_rnIJSObject_rknCJSFValue_n0ALExecuteType_pn0AKStackFrame_p8_b_ execute-done
    6 javascript2010         libxul.so __1cCjsHExecute6FpnJJSContext_pnIJSScript_
rnIJSObject_pnCJSFValue__b_ execute-start
    7 javascript2010         libxul.so __1cCjsNExecuteKernel6FpnJJSContext_pnIJSS
cript_rnIJSObject_rknCJSFValue_n0ALExecuteType_pn0AKStackFrame_p8_b_ execute-start
    8 nfsmapid1010          nfsmapid                  cb_update_domain daemon-domain
    9 kerberos995    mech_krb5.so.1                         k5_mk_rep krb_ap_rep-make
   10 kerberos995    mech_krb5.so.1                       krb5_rd_rep krb_ap_rep-read
   11 kerberos995    mech_krb5.so.1              krb5_mk_req_extended krb_ap_req-make
   12 kerberos995    mech_krb5.so.1                rd_req_decoded_opt krb_ap_req-read
   13 kerberos995    mech_krb5.so.1                     krb5_mk_ncred krb_cred-make
   14 kerberos995    mech_krb5.so.1                krb5_rd_cred_basic krb_cred-read
   15 kerberos995    mech_krb5.so.1                     krb5_mk_error krb_error-make
   16 kerberos995    mech_krb5.so.1                     krb5_rd_error krb_error-read
   17 kerberos995    mech_krb5.so.1               krb5_encode_kdc_rep krb_kdc_rep-make
--More--

Every probe in DTrace has two names: a unique integer ID and a human-readable string name that is composed of four parts, shown as separate columns in the dtrace output. When writing out the full human-readable name of a probe, write all four parts of the name separated by colons like this:

provider:module:function:name

Type the following command to determine how many probes are available:
```
root@solaris11:~# dtrace -l | wc -l
   81183
```
Please note that the number of probes varies depending on your OS and the software you have installed.

In order to locate a DTrace probe, we can also use the following options to filter the list:

-P for provider
-m for module
-f for function
-n for name

For instance, if you want to list all the probes from the syscall provider, run the following command:

root@solaris11:~# dtrace -l -P syscall | more
   ID   PROVIDER            MODULE                          FUNCTION NAME
 7067    syscall                                               nosys entry
 7068    syscall                                               nosys return
 7069    syscall                                               rexit entry
 7070    syscall                                               rexit return
 7071    syscall                                                read entry
 7072    syscall                                                read return
 7073    syscall                                               write entry
 7074    syscall                                               write return
 7075    syscall                                               close entry
 7076    syscall                                               close return
 7077    syscall                                              linkat entry
 7078    syscall                                              linkat return
 7079    syscall                                           symlinkat entry
 7080    syscall                                           symlinkat return
 7081    syscall                                               chdir entry
 7082    syscall                                               chdir return
 7083    syscall                                               gtime entry
 7084    syscall                                               gtime return
 7085    syscall                                                 brk entry
 7086    syscall                                                 brk return
 7087    syscall                                               lseek entry
--More--

Exercise 1, Step 3: Explore Predicates and Actions

Predicates are expressions enclosed between slashes (/ and /) that are evaluated at probe-firing time to determine whether the associated actions should be executed. Predicates are the primary conditional construct used for building more complex control flow in a D program. You can omit the predicate section of the probe clause entirely for any probe, in which case the actions are always executed when the probe fires.

Probe actions are described by a list of statements separated by semicolons (;) and enclosed between braces ({ and }). If you want to note only that a particular probe fired on a particular CPU without tracing any data or performing any additional actions, you can specify an empty set of braces with no statements inside.

Now you will use a script to list all system calls, excluding write, for all the processes running on your current system. See the following code in <lab_root>/dtrace_javanative/DScripts/ex1-predicate-1.d:
```
syscall:::
/probefunc!="write"/
{
    printf("probefunc:%s, pid:%d, execname:%s\n",probefunc, pid, execname);
}
```

Run ex1-predicate-1.d. You might get a result similar to the following. Press Ctrl-c to stop this program.

root@solaris11:~# dtrace -qs ex1-predicate-1.d
^C
probefunc:ioctl, pid:2084, execname:dtrace
probefunc:ioctl, pid:2084, execname:dtrace
probefunc:ioctl, pid:2084, execname:dtrace
probefunc:ioctl, pid:2084, execname:dtrace
probefunc:portfs, pid:1829, execname:nautilus
probefunc:portfs, pid:1829, execname:nautilus
probefunc:portfs, pid:1829, execname:nautilus
probefunc:portfs, pid:1829, execname:nautilus
probefunc:fstatat64, pid:1829, execname:nautilus
probefunc:fstatat64, pid:1829, execname:nautilus
probefunc:clock_gettime, pid:1829, execname:nautilus
probefunc:clock_gettime, pid:1829, execname:nautilus
probefunc:recv, pid:1829, execname:nautilus
probefunc:recv, pid:1829, execname:nautilus
probefunc:clock_gettime, pid:1829, execname:nautilus
probefunc:clock_gettime, pid:1829, execname:nautilus
probefunc:pollsys, pid:1829, execname:nautilus
probefunc:mmap, pid:2084, execname:dtrace
probefunc:mmap, pid:2084, execname:dtrace
probefunc:lwp_park, pid:2084, execname:dtrace
probefunc:recv, pid:1871, execname:VBoxClient
probefunc:recv, pid:1871, execname:VBoxClient
probefunc:nanosleep, pid:1871, execname:VBoxClient
probefunc:lwp_cond_wait, pid:1844, execname:java
probefunc:nanosleep, pid:1871, execname:VBoxClient
probefunc:recv, pid:1871, execname:VBoxClient
probefunc:recv, pid:1871, execname:VBoxClient
probefunc:nanosleep, pid:1871, execname:VBoxClient
probefunc:lwp_cond_wait, pid:1844, execname:java
probefunc:lwp_cond_signal, pid:1844, execname:java
probefunc:lwp_cond_signal, pid:1844, execname:java
probefunc:lwp_cond_wait, pid:1844, execname:java
probefunc:lwp_cond_wait, pid:1844, execname:java
probefunc:lwp_cond_signal, pid:1844, execname:java
probefunc:lwp_cond_signal, pid:1844, execname:java

In order to discover all of the no-write system calls caused by Java, we need to update the original file to be like ex1-predicate-2.d, as shown below:
```
syscall:::
/probefunc!="write" && execname=="java"/
{
    printf("probefunc:%s, pid:%d, execname:%s\n",probefunc, pid, execname);
}
```

Run ex1-predicate-2.d. You will get results similar to this:

root@solaris11:~# dtrace -qs ex1-predicate-2.d
^C
probefunc:lwp_cond_wait, pid:1844, execname:java
probefunc:lwp_cond_wait, pid:1844, execname:java
probefunc:lwp_cond_signal, pid:1844, execname:java
probefunc:lwp_cond_signal, pid:1844, execname:java
probefunc:lwp_cond_wait, pid:1844, execname:java
probefunc:lwp_cond_wait, pid:1844, execname:java
probefunc:lwp_cond_signal, pid:1844, execname:java
probefunc:lwp_cond_signal, pid:1844, execname:java
probefunc:lwp_cond_wait, pid:1844, execname:java
probefunc:lwp_cond_wait, pid:1844, execname:java
probefunc:lwp_cond_wait, pid:1844, execname:java
probefunc:lwp_cond_wait, pid:1844, execname:java
probefunc:lwp_cond_wait, pid:1844, execname:java
probefunc:lwp_cond_wait, pid:1844, execname:java
probefunc:lwp_cond_signal, pid:1844, execname:java
probefunc:lwp_cond_signal, pid:1844, execname:java
probefunc:lwp_cond_wait, pid:1844, execname:java
probefunc:lwp_cond_wait, pid:1844, execname:java
probefunc:lwp_cond_signal, pid:1844, execname:java
probefunc:lwp_cond_signal, pid:1844, execname:java
probefunc:lwp_cond_wait, pid:1844, execname:java
probefunc:lwp_cond_wait, pid:1844, execname:java
probefunc:lwp_cond_wait, pid:1844, execname:java
probefunc:lwp_cond_wait, pid:1844, execname:java
probefunc:lwp_cond_wait, pid:1844, execname:java
probefunc:lwp_cond_wait, pid:1844, execname:java
probefunc:lwp_cond_signal, pid:1844, execname:java
probefunc:lwp_cond_signal, pid:1844, execname:java

Exercise 1, Step 4: Explore Aggregations

When instrumenting the system to answer performance-related questions, it is useful to consider how data can be aggregated to answer a specific question rather than thinking in terms of data gathered by individual probes. For example, if you wanted to know the number of system calls by user ID, you would not necessarily care about the data collected at each system call. You simply want to see a table of user IDs and system calls. In DTrace, you will find it is very easy to achieve this by using aggregations.

In D language, the syntax for an aggregation is the following:

@name[ keys ] = aggfunc ( args );

where:

name is the name of the aggregation.
keys is a comma-separated list of D expressions.
aggfunc is a DTrace aggregating function.
args is a comma-separated list of arguments appropriate for the aggregating function.

In this example, we‘d like to know how many system calls are invoked within five seconds. Here‘s the code in the<lab_root>/dtrace_javanative/DScripts/ex1-aggregation-1.d file.
```
syscall:::entry
{
   @counts["syscall numbers"]=count();
}
tick-5sec
{
    exit(0);
}
```

Run ex1-aggregation-1.d, which will exit in five seconds; you will see results similar to this:

root@solaris11:~# dtrace -s ex1-aggregation-1.d
dtrace: script ‘ex1-aggregation-1.d‘ matched 215 probes
 CPU     ID                    FUNCTION:NAME
   0  81183                       :tick-5sec 

  syscall numbers                                                3420

To take a further step to check which application is making the most system calls, we need to update the original file to be like ex1-aggregation-2.d, as shown below:
```
syscall:::entry
{
    @counts[execname]=count();
}
tick-5sec
{
    exit(0);
}
```

Run the ex1-aggregation-2.d file, which will give you results like this:

root@solaris11:~# dtrace -s ex1-aggregation-2.d
dtrace: script ‘ex1-aggregation-2.d‘ matched 215 probes
 CPU     ID                    FUNCTION:NAME
   0  81183                       :tick-5sec 

  automountd                                                        1
  dhcpagent                                                         3
  nwam-manager                                                      4
  devfsadm                                                          6
  gnome-settings-d                                                  7
  isapython2.6                                                      8
  sendmail                                                         10
  metacity                                                         11
  xscreensaver                                                     12
  gnome-power-mana                                                 15
  updatemanagernot                                                 16
  VBoxService                                                      37
  VBoxClient                                                      354
  gnome-terminal                                                  361
  Xorg                                                            393
  java                                                            596
  dtrace                                                         1573

The following example, ex1-aggregation-3.d, displays the average time spent in the write system call, organized by process name. This example uses the avg aggregating function, specifying time stamp - self->ts as the argument. The example averages the wall clock time spent in the system call:

syscall:::entry
{
    self->ts=timestamp;
}
syscall:::return
/self->ts/
{
    @time[execname]=avg(timestamp-self->ts);
    self->ts=0;
}

About `self->`: Thread-Local Variables DTrace provides the ability to declare variable storage that is local to each operating system thread, as well as the global variables mentioned previously. Thread-local variables are useful when we want to enable a probe and mark every thread that fires the probe with some tags or data. In a D program, we can use thread-local variables to share a common name but refer to different data storage associated with different threads. Thread-local variables are referenced by applying the `->` operator to the special identifier `self`.

About self->: Thread-Local Variables
DTrace provides the ability to declare variable storage that is local to each operating system thread, as well as the global variables mentioned previously. Thread-local variables are useful when we want to enable a probe and mark every thread that fires the probe with some tags or data. In a D program, we can use thread-local variables to share a common name but refer to different data storage associated with different threads. Thread-local variables are referenced by applying the -> operator to the special identifier self.

Run the ex1-aggregation-3.d file. You need to wait for several seconds, and then press Ctrl-c. A result similar to Listing 1 will be shown:

root@solaris11:~# dtrace -s ex1-aggregation-3.d
dtrace: script ‘ex1-aggregation-3.d‘ matched 428 probes
^C

  fmd                                                           18894
  nscd                                                          25965
  utmpd                                                         28892
  ssh-agent                                                     37801
  gconfd-2                                                      79348
  nwam-manager                                                  79479
  hald                                                          88817
  hald-addon-acpi                                               95590
  dtrace                                                      3307590
  VBoxClient                                                 13981578
  Xorg                                                       15435316
  gnome-terminal                                             20763581
  java                                                       40776641
  gnome-power-mana                                           58499707
  xscreensaver                                              184487447
  updatemanagernot                                          241129100
  sendmail                                                  280901476
  VBoxService                                               358442957
  gnome-settings-d                                          392415577
  metacity                                                  409615593
  isapython2.6                                              502919529
  dhcpagent                                                 800092245
  devfsadm                                                  914232179

Listing 1

The average time records shown in Listing 1 are displayed in nanoseconds.

If you want to know the distribution of system call duration from different applications, you need to use the quantizeaggregation function in ex1-aggregation-4.d:
```
syscall:::entry
{
    self->ts=timestamp;
}
syscall:::return
/self->ts/
{
    @time[execname]=quantize(timestamp-self->ts);
    self->ts=0;
}
```

Run ex1-aggregation-4.d and wait for several seconds. You should get a result like Listing 2:

root@solaris11:~# dtrace -s ex1-aggregation-4.d
dtrace: script ‘ex1-aggregation-4.d‘ matched 428 probes
^C

  nwam-manager                                      
           value  ------------- Distribution ------------- count    
           16384 |                                         0        
           32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 3        
           65536 |                                         0        

  sendmail                                          
           value  ------------- Distribution ------------- count    
             512 |                                         0        
            1024 |@@@@                                     1        
            2048 |@@@@@@@@@@@@@                            3        
            4096 |                                         0        
            8192 |@@@@                                     1        
           16384 |@@@@                                     1        
           32768 |@@@@@@@@@@@@@                            3        
           65536 |                                         0        

  gnome-power-mana                                  
           value  ------------- Distribution ------------- count    
            1024 |                                         0        
            2048 |@@@@@@@@@@@                              4        
            4096 |@@@@@@@@@                                3        
            8192 |@@@@@@@@@                                3        
           16384 |                                         0        
           32768 |                                         0        
           65536 |                                         0        
          131072 |@@@                                      1        
          262144 |@@@                                      1        
          524288 |                                         0        
         1048576 |@@@                                      1        
         2097152 |                                         0        
         4194304 |                                         0        
         8388608 |                                         0        
        16777216 |                                         0        
        33554432 |                                         0        
        67108864 |                                         0        
       134217728 |                                         0        
       268435456 |@@@                                      1        
       536870912 |                                         0  
    ...

Listing 2

Note that the rows for the frequency distribution are always power-of-two values. Each row indicates the count of the number of elements greater than or equal to the corresponding value, but less than the next larger row value. For example, the output in Listing 2 shows that sendmail had three system calls taking between 2,048 nanoseconds and 4,096 nanoseconds, inclusive.

Exercise 2: Identifying and Solving "Inflater/Deflater Memory Leak" Problems (30 Minutes)

Java Native Interface (JNI) is used to call native code from Java. This native code can allocate native OS resources and also create Java objects. The memory allocated by the JNI code is not managed by the GC (Garbage Collection) threads, so it should be freed by the native code itself. Therefore, memory leaks caused by JNI are particularly nasty and hard to find out.

In JDK, the java.util.zip.Inflater(Deflater) class provides support for general-purpose decompression using the popular zlib compression library. The Inflater(Deflater) class passes its input and output buffers directly tozlib after acquiring them via the GetPrimitiveArrayCritical JNI call. If these classes are not used properly, there will be native memory leaks that cannot be found by common profiling tools.

In this exercise, we will learn how to use a simple DTrace script to trace the malloc native syscalls at run time, in order to identify the root cause.

The D-script files used in this exercise are located in <lab_root>/dtrace_javanative/DScripts. The Java source code files used in this exercise are located in <lab_root>/dtrace_javanative/exercises/ex2.

Background Information

Check the following Java classes, which compress and uncompress data by using java.util.zip.Deflater andjava.util.zip.Inflater. The source files (Compresser.java and Decompresser.java) are located in<lab_root>/dtrace_javanative/ exercises/ex2/src/nativememory/compresser.

package nativememory.compresser;

import java.util.zip.Deflater;

public class Compresser {
    private Deflater deflater;
    
    public Compresser(){
        deflater = new Deflater();
    }
    
    public byte[] compress(byte[] input) {
        byte[] output = new byte[8000];
        deflater.setInput(input);
        deflater.finish();
        int compressedDataLength = deflater.deflate(output);
        //deflater.end();
        byte[] result = new byte[compressedDataLength];
        System.arraycopy(output, 0, result, 0, compressedDataLength);
        return result;
    }
}

package nativememory.compresser;

import java.util.zip.DataFormatException;
import java.util.zip.Inflater;

public class Decompresser {
    private Inflater inflater;
    
    public Decompresser(){
        inflater = new Inflater();
    }
    
    public byte[] decompress(byte[] input) throws DataFormatException {
        byte[] output = new byte[8000];
        inflater.setInput(input);
        int uncompressedDataLength = inflater.inflate(output);
        //inflater.end();
        byte[] result = new byte[uncompressedDataLength];
        System.arraycopy(output, 0, result, 0, uncompressedDataLength);
        return result;
    }
}

In this exercise, we will have a running thread which does a loop of work compressing and decompressing data in the background. The InflatorWorkingObject.java file is located in <lab_root>/dtrace_javanative/ exercises/ex2/src/nativememory.

public class InflatorWorkingObject extends DefaultWorkingObject{

    @Override
    public void loopWork() {
        try {
            // Encode a long String into bytes
            Random generator = new Random();
            int messageID = generator.nextInt(10000);
            String inputString = "This is the data with mmessageID: "+messageID;
            byte[] input = inputString.getBytes("UTF-8");
            byte[] result_compress;
            byte[] result_uncompress;
            
            //compress
            Compresser compresser = new Compresser();
            result_compress = compresser.compress(input);
            System.out.println("Data compressed!");
            //decompress
            Decompresser decompresser = new Decompresser();
            result_uncompress = decompresser.decompress(result_compress); 
            
            // Decode the bytes into a String
            String outputString = new String(result_uncompress,"UTF-8");
            System.out.println("Data Decompressed: "+outputString);
            Thread.sleep(200);
            
        } catch (InterruptedException ex) {
        } catch (java.io.UnsupportedEncodingException ex) {
        } catch (java.util.zip.DataFormatException ex) {
        }
    }
}

Exercise 2, Step 1: Run a Data Compressing/Decompressing Program that Has Native Memory Leaks

Go to the directory <lab_root>/dtrace_javanative/ exercises/ex2, and execute the ant command as follows:

ouser@solaris11:~$ cd dtrace_javanative/exercises/ex2
ouser@solaris11:~/dtrace_javanative/exercises/ex2$ ant

The ant command runs a multithread program that simulates multiple users compressing and uncompressing data. After running for a while, the program produces output similar to the following. You can stop this program by pressing the Enter key at any time, and you can rerun it by entering the ant command.

Note: In order to finish the following steps, you must keep this program running.

[java] Data Decompressed: This is the data with mmessageID: 8036
[java] Data compressed!
[java] Data Decompressed: This is the data with mmessageID: 6029
[java] Data compressed!
[java] Data Decompressed: This is the data with mmessageID: 4046
[java] Data compressed!
[java] Data Decompressed: This is the data with mmessageID: 7587
[java] Data compressed!
[java] Data Decompressed: This is the data with mmessageID: 6213
[java] Data compressed!
[java] Data Decompressed: This is the data with mmessageID: 205
[java] Data compressed!
[java] Data Decompressed: This is the data with mmessageID: 5249
[java] Data compressed!
[java] Data Decompressed: This is the data with mmessageID: 2687
[java] Data compressed!
[java] Data Decompressed: This is the data with mmessageID: 4588
[java] Data compressed!
[java] Data Decompressed: This is the data with mmessageID: 1971
[java] Data compressed!
[java] Data Decompressed: This is the data with mmessageID: 3686
[java] Data compressed!
[java] Data Decompressed: This is the data with mmessageID: 3259
[java] Data compressed!

Exercise 2, Step 2: Monitor the Program with JDK Tools

Use the jps tool to find the process ID (pid) for this Java program; you will get output like Listing 3:
```
ouser@solaris11:~$ jps -v
1839 App -Djava.security.policy=/usr/share/vpanels/java.policy
2010 OOW -DDescription=OOWDtraceNativeLab_ex2 -Xms256m -Xmx256m
2011 Jps -Dapplication.home=/home/gary/sf_OOW/jdk1.7.0_25 -Xms8m
2008 Launcher -Dant.home=/home/gary/sf_OOW/apache-ant-1.9.2 -Dant.library.dir=/home/gary/sf_OOW/apache-ant-1.9.2/lib
```
Listing 3

By using jps -v, we can find each Java processes with its process ID and startup arguments. From the output in Listing 3, we can tell that the process ID of the above Java program on my machine is 2010 (it might different on your machine), because it has a name of OOW with other arguments. Then we can pick it out from other Java processes.
Use the prstat -p <pid> system command to monitor the memory usage of this process:
```
PID  USERNAME  SIZE   RSS STATE   PRI NICE      TIME  CPU PROCESS/NLWP       
2010 gary      723M  209M sleep    59    0   0:00:05 4.8% java/11
```
Listing 4

From the output of prstat in Listing 4, we can see that the process with PID 2010 is consuming more than 700 MB of virtual memory, which is still increasing.

Use the jmap -heap <pid> command to monitor the Java heap usage of this program. (You need to use your process ID to replace <pid>):

Heap Usage:
PS Young Generation
Eden Space:
   capacity = 71630848 (68.3125MB)
   used     = 45988288 (43.85784912109375MB)
   free     = 25642560 (24.45465087890625MB)
   64.20179194304666% used
From Space:
   capacity = 8912896 (8.5MB)
   used     = 0 (0.0MB)
   free     = 8912896 (8.5MB)
   0.0% used
To Space:
   capacity = 8912896 (8.5MB)
   used     = 0 (0.0MB)
   free     = 8912896 (8.5MB)
   0.0% used
tenured generation:
   capacity = 178978816 (170.6875MB)
   used     = 0 (0.0MB)
   free     = 178978816 (170.6875MB)
   0.0% used
Perm Generation:
   capacity = 12582912 (12.0MB)
   used     = 1838216 (1.7530593872070312MB)
   free     = 10744696 (10.246940612792969MB)
   14.60882822672526% used

690 interned Strings occupying 41984 bytes.

Listing 5

From the output of jmap in Listing 5, we can see that process with PID 2010 has consumed 68.3 MB (Eden) + 8.5 MB (From) + 8.5 MB (To) + 170.69 MB (tenured) + 12 MB (Perm) of memory, which is less than the memory size (723 MB) detected by prstat. So there must be some native memory leaks in this Java process.

Exercise 2, Step 3: Write a DTrace Program to Identify the Problem

We know that malloc is the most common way to allocate memory from the system. For this reason, we can first use DTrace to monitor malloc usage to find a clue. In Oracle Solaris 11, malloc is integrated in standard C library functions. So we consider using the pid provider to analyze the behavior of the process.

About the `pid` Provider The DTrace `pid` provider allows you to trace the internal execution of processes such as a Java application or a database. It can trace the entry and return of any function in a user process as well as any instruction specified by an absolute address or function offset. The `pid` provider is so powerful that you‘re very likely to use it in your daily work. The `pid` provider actually defines a class of providers. Each process can potentially have its own associated `pid`provider. A process with an ID of 123, for example, would be traced by using the `pid123` provider. Usually we use a variable such as `$target` or `$1` as a parameter to define the provider dynamically.

About the pid Provider
The DTrace pid provider allows you to trace the internal execution of processes such as a Java application or a database. It can trace the entry and return of any function in a user process as well as any instruction specified by an absolute address or function offset. The pid provider is so powerful that you‘re very likely to use it in your daily work.

The pid provider actually defines a class of providers. Each process can potentially have its own associated pidprovider. A process with an ID of 123, for example, would be traced by using the pid123 provider. Usually we use a variable such as $target or $1 as a parameter to define the provider dynamically.

First we will determine how frequently malloc is being called by this Java process. As shown in Listing 6, the script<lab_root>/dtrace_javanative/DScripts/ex2.d calculates the number of malloc and free calls by the process:

#!/usr/sbin/dtrace -qs

pid$1::malloc:entry
{
   printf("process(%d) is calling malloc %d times\n",$1,self->malloc++);
}

pid$1::free:entry
{
   printf("process(%d) is calling %d times\n",$1,self->free++);
}

Listing 6

Note: $1 is the first argument for running the script, $2 is the second argument, and so on.

Switch to the root user before trying DTrace commands:

ouser@solaris11:~$ su
Password: 
root@solaris11:~#

Execute ex2.d with the Java PID. You will get a result like the result shown in Listing 7. Type Ctrl-c to stop the script.

root@solaris11: ./ex2.d 2010
process(2010) is calling malloc 0 times
process(2010) is calling malloc 1 times
process(2010) is calling malloc 2 times
process(2010) is calling malloc 3 times
process(2010) is calling malloc 4 times
process(2010) is calling malloc 5 times
process(2010) is calling malloc 6 times
process(2010) is calling malloc 7 times
process(2010) is calling malloc 8 times
process(2010) is calling malloc 9 times
process(2010) is calling malloc 10 times
process(2010) is calling malloc 11 times
process(2010) is calling malloc 12 times
process(2010) is calling malloc 13 times

Listing 7

We can see in Listing 7 that most of the output is about malloc calls. Given more time, you might see only a few freecalls.

In order to see more clearly, we can run ex2.1.d, using an aggregate function and the profile provider to observe the result every two seconds. Listing 8 is the source code in which the profile provider is used.

#!/usr/sbin/dtrace -qs
pid$1:libc:malloc:entry
{
    @malloc_count["number of malloc"] = count() ;
}

pid$1:libc:free:entry
{
    @free_count["number of free"] = count() ;
}

profile:::tick-2sec
{
    printa(@malloc_count) ;
    printa(@free_count) ;
    trunc(@malloc_count) ;
    trunc(@free_count) ;
}

Listing 8

About profile-n Probes
A profile-n probe fires every fixed interval on every CPU at high interrupt level. The probe‘s firing interval is denoted by the value of n: the interrupt source will fire n times per second. n may also have an optional suffix, in which case nis interpreted to be in the units denoted by the suffix. Valid suffixes and the units they denote are as follows:

Suffix	Time Units
`nsec` or `ns`	nanoseconds
`usec` or `us`	microseconds
`msec` or `ms`	milliseconds
`sec` or `s`	seconds
`min` or `m`	minutes
`hour` or `h`	hours
`day` or `d`	days
`hz`	hertz (frequency per second)

In ex2.1.d, probe profile:::tick-2sec fires every two seconds, prints the number of malloc and free calls recorded during the past two seconds, and clears the buffer using the trunc function.

Run ex2.1.d with the Java PID. You will get a result similar to the output shown in Listing 9:
```
root@solaris11:~/DScripts# ./ex2.1.d 2010

  number of malloc                                                312

  number of malloc                                                304

  number of malloc                                                304

  number of malloc                                                304

  number of malloc                                                304

    ^C
```
Listing 9

The output in Listing 9 shows that the process with PID 2010 is issuing about 300 malloc calls every two seconds, and there is no free call during this period. (Note that if @free_count is 0, dtrace will not allocate buffer for it, so it is not displayed.)

Now we can say that this should account for the increasing usage of virtual memory for this Java process. Then how we can find the cause and fix the problem?
The ustack action in DTrace can be used to trace the process‘s stack. So we can use ustack to discover the code path that executes the malloc call. Running ex2.2.d will print out the process‘s thread stack when malloc is called.
```
#!/usr/sbin/dtrace -qs

pid$1:libc:malloc:entry
{
    @stack_count[ustack(4)] = count() ;
}
```
Please note that the input argument of ustack denotes stack frame depth. Here we use 4, which is enough to locate the statements that make malloc calls. You can try different numbers in order to get further information.

Run ex2.2d with the Java PID for a few seconds, and then type Ctrl-c to stop the script. You will see a result similar to Listing 10:

root@solaris11:~/DScripts# ./ex2.2.d 2010
^C

              libc.so.1`malloc
              libzip.so`zcalloc+0x26
              libzip.so`deflateInit2_+0x19b
              libzip.so`Java_java_util_zip_Deflater_init+0x5d
              145

              libc.so.1`malloc
              libzip.so`zcalloc+0x26
              libzip.so`deflateInit2_+0x1b5
              libzip.so`Java_java_util_zip_Deflater_init+0x5d
              145

              libc.so.1`malloc
              libzip.so`zcalloc+0x26
              libzip.so`deflateInit2_+0x1cf
              libzip.so`Java_java_util_zip_Deflater_init+0x5d
              145

              libc.so.1`malloc
              libzip.so`zcalloc+0x26
              libzip.so`inflateInit2_+0x88
              libzip.so`Java_java_util_zip_Inflater_init+0x4f
              145

Listing 10

In Listing 10, we can tell that the Inflater and Deflater classes in the java.util.zip package are makingmalloc calls. This is a well-known problem with the Inflater and Deflater classes. Now we can locate the root cause in the Java source code.

About `Inflater` and `Deflater` Memory Usage The `Inflater` and `Deflater` classes pass their input and output buffers directly to the `zlib` compression library after acquiring them via the `GetPrimitiveArrayCritical` JNI call. Both of these classes have `finalize()`methods that release c-heap buffers allocated by native code. However, the finalizable objects can be discovered only by GC, which occurs when there is pressure on the Java heap. The c-heap memory allocated by the `Inflater` and `Deflater` classes will not trigger GC, so the c-heap may be filled up before GC detects and enqueues unreachable finalizable objects. The same buffers can be freed before `Inflater` and `Deflater` objects are finalized by calling `end()`. So the preferred way to use `Deflater` and `Inflater` is to call the `end()` method after use in order to avoid native memory leaks.

About Inflater and Deflater Memory Usage
The Inflater and Deflater classes pass their input and output buffers directly to the zlib compression library after acquiring them via the GetPrimitiveArrayCritical JNI call. Both of these classes have finalize()methods that release c-heap buffers allocated by native code.

However, the finalizable objects can be discovered only by GC, which occurs when there is pressure on the Java heap. The c-heap memory allocated by the Inflater and Deflater classes will not trigger GC, so the c-heap may be filled up before GC detects and enqueues unreachable finalizable objects.

The same buffers can be freed before Inflater and Deflater objects are finalized by calling end(). So the preferred way to use Deflater and Inflater is to call the end() method after use in order to avoid native memory leaks.

Exercise 2, Step 4: Modify the Java Program and Run It Again

Stop the Java program by pressing the Enter key in the console that was started in Exercise 2, Step 1.
Use an editor to open the files named Compresser.java and Decompresser.java, which are located in<lab_root>/dtrace_javanative/ exercises/ex2/src/nativememory/compresser.

Add in the Java statements highlighted in bold in Listing 11:

package nativememory.compresser;

import java.util.zip.Deflater;

public class Compresser {
    private Deflater deflater;
    
    public Compresser(){
        deflater = new Deflater();
    }
    
    public byte[] compress(byte[] input) {
        byte[] output = new byte[8000];
        deflater.setInput(input);
        deflater.finish();
        int compressedDataLength = deflater.deflate(output);
        deflater.end();
        byte[] result = new byte[compressedDataLength];
        System.arraycopy(output, 0, result, 0, compressedDataLength);
        return result;
    }
}


package nativememory.compresser;

import java.util.zip.DataFormatException;
import java.util.zip.Inflater;

public class Decompresser {
    private Inflater inflater;
    
    public Decompresser(){
        inflater = new Inflater();
    }
    
    public byte[] decompress(byte[] input) throws DataFormatException {
        byte[] output = new byte[8000];
        inflater.setInput(input);
        int uncompressedDataLength = inflater.inflate(output);
        inflater.end();
        byte[] result = new byte[uncompressedDataLength];
        System.arraycopy(output, 0, result, 0, uncompressedDataLength);
        return result;
    }
}

Listing 11

Repeat Exercise 2, Step 1 through Step 3 to see the difference.

Exercise 3: Identifying and Solving "Cannot Create Native Threads" Problems (40 Minutes)

One of the common native memory problems of Java EE production systems is OutOfMemoryError: unable to create new native thread, thrown by Java HotSpot VM when it is unable to create new Java threads.

Java HotSpot VM memory is split into three memory spaces:

The Java heap
The PermGen (permanent generation) space
The native memory

These three parts of the memory space compete with each other for the limited amount of memory. If the Java heap and PermGen occupy too much space, there will be little left for the native memory. Please note that the maximum amount of virtual memory space for a 32-bit Java Virtual Machine (JVM) process is 4 GB (perhaps much lower on Windows or Linux).

Java threads require native memory from the OS. If this space is not big enough, the OS will refuse any further native thread and memory allocation and "unable to create new native thread" errors will be thrown. In this exercise, we will use a DTrace script to detect the native memory leaks caused by Java thread creation.

Background Information

The ThreadMemoryWorkingObject Java class shown in Listing 12 does a loop of work accepting a business request and processing it in a running thread.

Refer to<lab_root>/dtrace_javanative/exercises/ex3/src/nativememory/ThreadMemoryWorkingObject.java.

public class ThreadMemoryWorkingObject extends DefaultWorkingObject{

    @Override
    public void loopWork() {
        try {
            BusinessRequest request = RequestAccept.accept();
            RequestProcess.process(request);
            Thread.sleep(200);
        } catch (InterruptedException ex) {
          ex.printStackTrace();
        }
    }
}

Listing 12

The RequestProcess class shown in Listing 13 is a business request processor that processes each request by starting a dedicated thread.

Refer to <lab_root>/exercises/ex2/src/nativememory/threadcreation/RequestProcess.java.

public class RequestProcess {
    static public void process(final BusinessRequest request) {
        new Thread(new Runnable() {
            public void run() {
                System.out.println(request.getRequestString());
                try {
                    Thread.sleep(300000);
                } catch (Exception e) {
                    System.err.println(e);
                }
            }
        }).start();
    }
}

Listing 13

Exercise 3, Step 1: Run a Request Processing Program that Causes Native Memory Leaks

Go to the directory of <lab_root>/dtrace_javanative/exercises/ex3, and execute the ant command, as follows:

ouser@solaris11:~$ cd dtrace_javanative/exercises/ex3
ouser@solaris11:~/dtrace_javanative/exercises/ex3$ ant

The ant command runs a program consisting of the Java classes mentioned in Listing 12 and Listing 13. This program accepts and processes incoming requests. The processing logic is to create a dedicated Java thread for each request, which continuously generates new Java threads and releases them later.

After running for a while (about one minute, depending on your environment), the program might generate the output shown in Listing 14 or something similar.

You can stop this program by pressing the Enter key at any time, and you can restart it by running ant again.

[java] requestID is: 6768
[java] requestID is: 7425
[java] requestID is: 2676
[java] requestID is: 1701
[java] requestID is: 1624
[java] requestID is: 7856
[java] requestID is: 7230
[java] requestID is: 2313
[java] requestID is: 3956
[java] requestID is: 9852
[java] Exception in thread "pool-1-thread-2" java.lang.OutOfMemoryError: unable to create new native thread
[java]     at java.lang.Thread.start0(Native Method)
[java]     at java.lang.Thread.start(Thread.java:693)
[java]     at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
[java]     at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1017)
[java]     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1163)
[java]     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[java]     at java.lang.Thread.run(Thread.java:724)

Listing 14

From the output in Listing 14, you can see the OutOfMemoryError exception thrown by the JVM. Now, let‘s try to find out what caused the error.

Exercise 3, Step 2: Monitor the Program with JDK Tools

After getting the exception, press the Enter key to stop the program, and then restart it by running ant again. Keep the program running, and use the jps tool to find the process ID for this Java program. You will get the output like Listing 15:
```
ouser@solaris11:~$ jps -v
1839 App -Djava.security.policy=/usr/share/vpanels/java.policy
3641 Jps -Dapplication.home=/home/gary/sf_OOW/jdk1.7.0_25 -Xms8m
3640 OOW -DDescription=OOWDtraceNativeLab_ex3 -Xms128m -Xmx128m -XX:MaxPermSize=128m -XX:PermSize=128m -Xss2m
3638 Launcher -Dant.home=/home/gary/sf_OOW/apache-ant-1.9.2 -Dant.library.dir=/home/gary/sf_OOW/apache-ant-1.9.2/lib
```
Listing 15

By using jps -v, we can find each Java processes with its process ID and startup arguments. From the output in Listing 15, we can see that the process ID of the Java program on my machine is 3640 (it might be different on your machine), because it has a name of OOW with other arguments that can help us to differentiate it from other Java processes.
Use the prstat -p <pid> system command to monitor the memory usage of this process. You will get output similar to Listing 16:
```
PID  USERNAME  SIZE   RSS STATE   PRI NICE   TIME    CPU  PROCESS/NLWP      
3640 ouser    276M   36M sleep    59    0   0:00:39 4.6% java/845
```
Listing 16

From the output in Listing 16, we see that the process with PID 3640 is consuming 276 MB virtual memory and this number is stable. Since such small memory usage should not cause OutOfMemoryError, we need to do further tracing.

Use the jmap -heap <pid> command to monitor the Java heap usage of this program. (Use your process ID to replace <pid>).

Heap Usage:
PS Young Generation
Eden Space:
   capacity = 34603008 (33.0MB)
   used     = 4815664 (4.5925750732421875MB)
   free     = 29787344 (28.407424926757812MB)
   13.916894161339963% used
From Space:
   capacity = 5767168 (5.5MB)
   used     = 303200 (0.289154052734375MB)
   free     = 5463968 (5.210845947265625MB)
   5.2573464133522725% used
To Space:
   capacity = 5767168 (5.5MB)
   used     = 0 (0.0MB)
   free     = 5767168 (5.5MB)
   0.0% used
PS Old Generation
   capacity = 90177536 (86.0MB)
   used     = 0 (0.0MB)
   free     = 90177536 (86.0MB)
   0.0% used
PS Perm Generation
   capacity = 134217728 (128.0MB)
   used     = 1881216 (1.7940673828125MB)
   free     = 132336512 (126.2059326171875MB)
   1.4016151428222656% used

Listing 17

From the output in Listing 17, we see that process with this PID has consumed about 33 MB (Eden) + 5.5 MB (From) +5.5 MB (To) + 86 MB (PS Old Generation) Java heap memory and 128 MB PermGen space, which is very small. There is no reason the program should be hitting the memory limit and causing OutOfMemoryError. Then, what on earth is causing the memory leaks in this Java process?

Exercise 3, Step 3: Write a DTrace Program to Identify the Problem

After getting OutOfMemoryError, from the output of prstat shown in Listing 16, we can see that the values in columnsSIZE and RSS are small, and there are about 845 threads in the process. This cannot explain why we cannot create new native threads.

The memory usage of a process can be categorized into two categories: virtual memory usage and physical memory usage (RSS). The virtual memory size is the amount of virtual address space allocated to the process. The physical memory is the amount of real memory pages allocated to a process.

Run the ps command to display the process‘s virtual and physical memory usage (use your own PID number to replace 3640):
```
ouser@solaris11:~$ ps -eo pid,vsz,rss | grep 3640
 3640 282624 68608
```
Listing 18

The output shown in Listing 18 is the same as with prstat: virtual memory is about 276 MB and RSS is about 67 MB.

We use the following ex3.d script to monitor the behavior of malloc and free for the process at a fixed time interval (refer to <lab_root>/dtrace_javanative/DScripts/ex3.d):

pid$1::malloc:entry
{
        @size["malloc size"] = quantize(arg0);
        @heap["total_size"]  = sum(arg0) ;
        count_time++ ;
}

pid$1::free:entry
{
        count_time--;
}

profile:::tick-2sec
{
        printf("(malloc-free) times is %d\n", count_time);
        count_time =0 ;
        printa(@size) ;
        printa(@heap) ;
        trunc(@size) ;
        trunc(@heap) ;
}

Switch to the root user before trying DTrace commands:

ouser@solaris11:~$ su
Password: 
root@solaris11:~#

Run ex3.d with the PID. You will get output similar to Listing 19:

root@solaris11:~/dtrace_javanative/DScripts# ./ex3.d 3640
Tracing... Hit Ctrl-C to end.
(malloc-free) times is 162

  malloc size                                       
           value  ------------- Distribution ------------- count    
               8 |                                         0        
              16 |@@@@@@@@@@@                              50       
              32 |@@@@                                     20       
              64 |@@                                       10       
             128 |@@@@@@@                                  30       
             256 |@@@@@@@@@                                40       
             512 |@@@@                                     20       
            1024 |@@                                       10       
            2048 |                                         0        


  total_size                                                    60400

Listing 19

The output in Listing 19 shows that the amount of memory used by malloc every two seconds (total_size) is usually about 60 K. Therefore, the program will use about 60 K x 300 = 18 MB of memory for malloc every 10 minutes. So we can say that OutOfMemoryError is not caused by malloc.

Another way to check memory allocation is to use mmap (void *mmap(void *addr, size_t len, int prot, int flags, int fildes, off_t off)). The mmap() function establishes a mapping between a process‘s address space and a file or a shared memory object. The munmap() function does the reverse; it deletes the mapping relationship and releases the memory resources.

So we next want to check the use of mmap and munmap for the process. The following script (ex3.1.d) can do this for you:
```
syscall::mmap:entry
/pid == $1 /
{
   printf("calling mmap, with size %d\n", arg1) ;
}

syscall::munmap:entry
/pid == $1 /
{
   printf("calling munmap(%d)\n",arg1);
}
```
arg1 is the second argument of the mmap system call, which is the memory size.

Run ex3.1.d with <pid>:

root@solaris11:~/DScripts# ./ex3.1.d 3640
calling mmap, with size 2097152
calling mmap, with size 2097152
calling mmap, with size 2097152
calling mmap, with size 2097152
calling mmap, with size 2097152
calling mmap, with size 2097152
calling mmap, with size 2097152
calling mmap, with size 2097152
calling mmap, with size 2097152
calling mmap, with size 2097152

Listing 20

After monitoring this process for a while, we notice in Listing 20 that there are no munmap calls, but there are continuous mmap calls with a size of 2 MB. This might be the cause of OutOfMemoryError.

To verify this, we use the following ex3.2.d script to check the frequency of mmap calls and the mmap call stack every four seconds.

syscall::mmap:entry
/pid == $1/
{
        @mmap_num["number of mmaps"] = count();
        @mmap_size["size of mmaps"]  = quantize(arg1) ;
        @mmap_path[ustack()] = count() ;
}

profile:::tick-4sec
{
        printa(@mmap_num);
        printa(@mmap_size);
        printa(@mmap_path);

        clear(@mmap_num);
        clear(@mmap_size);
        clear(@mmap_path);
}

The quantize aggregate function is used to show the frequency distribution of the memory size.

Run ex3.2.d with <pid>. The output will look like Listing 21:

root@solaris11:~/DScripts# ./ex3.2.d 3640

  number of mmaps                                                  28

  size of mmaps                                     
           value  ------------- Distribution ------------- count    
         1048576 |                                         0        
         2097152 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 28       
         4194304 |                                         0        



              libc.so.1`mmap+0x15
              libc.so.1`_thrp_create+0xd2
              libc.so.1`thr_create+0x32
              libjvm.so`__1cCosNcreate_thread6FpnGThread_n0AKThreadType_I_b_+0x29d
              libjvm.so`__1cKJavaThread2t5B6MpFp0pnGThread__vI_v_+0x4c1
              libjvm.so`JVM_StartThread+0x133
              0xfb4af9d0
              0xfb4afcb3
              0xfb403ab9
              0xfb40353b
              0xfb403ab9
              0xfb4003d1

Listing 21

There are 28 mmap calls every four seconds. The size of each call is about 2 MB. According to the user stack, the mmapcalls originated from thr_create, which creates a new OS thread. Thus, when the program creates a new thread, at least 2 MB memory will be mapped to the process‘s address space. Since this is a 32-bit JVM, the maximum address space is 4 GB, including kernel space and other address spaces for code segments, shared libraries, and so on. It seems that this is the cause for OutOfMemoryError.

Next, we will trace the thr_create system call to check how each system call consumes 2 MB of address space.thr_create is defined as follows:

int thr_create(void *stack_base, 
                    size_t stack_size, 
                    void *(*start_func) (void*), 
                    void *arg, 
                    long flags, 
                    thread_t *new_thread_ID);

The following script (ex3.3.d) shows the behavior of the thr_create call for a process by checking the distribution ofstack_size (arg1) of the newly created threads.

#!/usr/sbin/dtrace -qs 

pid$1::thr_create:entry
{
   @thread_num["number of threads"] = count();
   @thread["thread_stack"] = quantize(arg1); 
}
profile:::tick-4sec
{
        printa(@thread_num);
        printa(@thread);
        clear(@thread_num);
        clear(@thread);
}

Run ex3.3.d with <pid>:

root@solaris11:~/DScripts# ./ex3.3.d 3640

  number of threads                                                 184

  thread_stack                                      
           value  ------------- Distribution ------------- count    
         1048576 |                                         0        
         2097152 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 184      
         4194304 |                                         0

Listing 22

From the output in Listing 22, we can see that 2 MB is the stack size for every newly created thread. So the program will throw OutOfMemoryError after creating about 900 threads, because the total virtual address space of this Java process will reach the limitation of 4 GB of memory.

So, why didn‘t the RSS and SIZE fields in the output of prstat (Listing 16) reflect the memory usage? Let‘s turn to mmapfor an explanation by using the following script (ex3.4.d) to check the argument of mmap:

syscall::mmap:entry
/pid == $1 /
{
   printf("calling mmap, with flag set to %d\n", arg3) ; 
   /*arg3 is the 4th argument, which is flag*/
    }

Run ex3.4.d with <pid>. The output will be like this:

root@solaris11:~/DScripts/ex3# ./ex3.4.d 3640
calling mmap, with flag set to 322
calling mmap, with flag set to 322
calling mmap, with flag set to 322
calling mmap, with flag set to 322
calling mmap, with flag set to 322
calling mmap, with flag set to 322
^C

By consulting the man page of mmap, we know that the flag value is the bitwise inclusive OR of the options that are defined in </usr/include/sys/mman.h>. That is, the flag value (322) equals MAP_PRIVATE | MAP_ANON | MAP_NORESERVE.

Since each mmap with MAP_NORESERVE will not allocate physical memory, the memory usage cannot be seen from the output of prstat. However, mmap system calls consume the virtual memory space, which is limited (about 4 GB) in a 32-bit process. And this leads to native memory leaks.

About Java Thread Stack Size In a Java application, each thread has its own stack. The stack is used to hold return addresses, function/method call arguments, and so on. As a result, if a thread tends to process larger structures via recursive algorithms, it will need a larger stack. The `-Xss` JVM option is used to define the limit of the Java thread stack size, which also indirectly limits how deep a stack can get. When a Java thread is created, the defined size of virtual memory is reserved; that is, the address space is reserved. The actual memory is allocated in pages and they are allocated only when used. If the `-Xss` option is not defined, the default size will be used. This default size is platform-specific: Oracle Solaris SPARC 32-bit JVM: 512 KB Oracle Solaris SPARC 64-bit JVM: 1024 KB Oracle Solaris x86 32-bit JVM: 320 KB Linux AMD64 64-bit JVM: 1024 KB In this exercise, we defined the stack size to be 2 MB. Please refer to the build file`<lab_root>/dtrace_javanative/exercises/ ex3/build.xml`, where you will find that this runtime parameter is defined as follows: `<jvmarg value="-Xss2m"/>` Please feel free to modify it to see the different effects.

About Java Thread Stack Size
In a Java application, each thread has its own stack. The stack is used to hold return addresses, function/method call arguments, and so on. As a result, if a thread tends to process larger structures via recursive algorithms, it will need a larger stack. The -Xss JVM option is used to define the limit of the Java thread stack size, which also indirectly limits how deep a stack can get.

When a Java thread is created, the defined size of virtual memory is reserved; that is, the address space is reserved. The actual memory is allocated in pages and they are allocated only when used.

If the -Xss option is not defined, the default size will be used. This default size is platform-specific:

Oracle Solaris SPARC 32-bit JVM: 512 KB
Oracle Solaris SPARC 64-bit JVM: 1024 KB
Oracle Solaris x86 32-bit JVM: 320 KB
Linux AMD64 64-bit JVM: 1024 KB

In this exercise, we defined the stack size to be 2 MB. Please refer to the build file<lab_root>/dtrace_javanative/exercises/ ex3/build.xml, where you will find that this runtime parameter is defined as follows:

<jvmarg value="-Xss2m"/>

Please feel free to modify it to see the different effects.

Exercise 3, Step 4: Modify the Program and Run It Again

Stop the program by pressing Enter in the console that was started in Exercise 3, Step 1.
Use an editor to open the file<lab_root>/dtrace_javanative/exercises/ex3/src/nativememory/threadcreation/RequestProcess.java.

Modify the code line highlighted in bold in Listing 23. This code will change the sleep time from 30 seconds to 30 ms, which will release the thread resource more frequently.

public class RequestProcess {
    static public void process(final BusinessRequest request) {
        new Thread(new Runnable() {
            public void run() {
                System.out.println(request.getRequestString());
                try {
                    Thread.sleep(30);
                } catch (Exception e) {
                    System.err.println(e);
                }
            }
        }).start();

    }
}

Listing 23

Repeat Exercise 3, Step 1 through Exercise 3, Step 3 to see the difference.

Exercise 4: Identifying and Solving "Direct ByteBuffer Memory Leak" Problems (40 Minutes)

In Java, almost all the objects exist in the heap. Usually, the Java heap is managed by the garbage collector. Automatic garbage collector makes Java one of the most productive languages. However, automatic garbage collection also causes unpredictable halt time for Java processes; even the latest garbage collection algorithm will pause the JVM for a relatively long time (varying from several seconds to tens of seconds) if more than a 10 GB heap size is used. Therefore, even though many servers are equipped with dozens of gigabytes (even terabytes) of RAM, we still prefer to use only a few gigabytes of them because of the possible garbage collection performance hits.

With the introduction of the direct ByteBuffer in Java‘s new I/O (NIO) package, memory can be allocated without being tracked by the garbage collector, because it can be available to native code for tasks such as avoiding the copying of data to and from the kernel for I/O. Java HotSpot VM has a "back door" that provides a number of low-level operations for manipulating memory directly. This back door—sun.misc.Unsafe—is widely used by the JDK itself in packages such asjava.nio and java.util.concurrent. Unsafe allows us to allocate and deallocate memory explicitly via theallocateMemory and freeMemory methods. Allocated memory is not under management by GC and is not limited to the maximum JVM heap size.

Functionalities based on NIO‘s off-heap buffers have been widely used among popular products such as Oracle Coherence, Terracotta BigMemory, and Apache DirectMemory. Obviously, the drawback of this approach is that you have to take responsibility for memory management yourself, which might lead to native memory leaks.

In this exercise, we will use a DTrace script to detect the native memory leaks caused by "NIO DirectBuffer" management.

Background Information

The following Java class (QueueWriterWorkingObject) runs a thread that does a loop of work for creating a data structure and putting it into a queue. The queue data is stored in DirectBuffer by writing the data structure into the location pointed to by the memory address. (Refer to<lab_root>/dtrace_javanative/exercises/ex4/src/nativememory/QueueWriterWorkingObject.java.)

public class QueueWriterWorkingObject extends DefaultWorkingObject{

    public static long idIncreatment = 0;
    public static QueueDataAccess qda = new QueueDataAccess();

    @Override
    public void loopWork() {
        try {
            qda.setObjectOffset(NativeMemoryManager.createOneMessageSpace());
            qda.setMessageId(idIncreatment);
            qda.setMessageType(‘B‘);
            qda.setReceiverId(idIncreatment+100);
            qda.setSenderId(idIncreatment+300);
            qda.setMessage("hello,OOW");
            idIncreatment++;
            Thread.sleep(20);
        } catch (InterruptedException ex) {
          ex.printStackTrace();
        }
    }

}

The following Java class (QueueReaderWorkingObject) runs a thread that does a loop of work for consuming messages from a queue. The queue message is read from DirectBuffer by locating the memory address of the data structure. (Refer to<lab_root>/dtrace_javanative/exercises/ex4/src/nativememory/QueueReaderWorkingObject.java.)

public class QueueReaderWorkingObject extends DefaultWorkingObject{
 
    public static long idIncreatment = 0;
    public static QueueDataAccess qda = new QueueDataAccess();

    @Override
    public void loopWork() {
        try {
            Thread.sleep(20);
            long offset = NativeMemoryManager.getMessageAddress(idIncreatment);
            if (offset >= 0) {
                qda.setObjectOffset(NativeMemoryManager.getMessageAddress(idIncreatment));
                long messageid = qda.getMessageId();
                char messageType = qda.getMessageType();
                long receiverid = qda.getReceiverId();
                long senderid = qda.getSenderId();
                byte[] message = qda.getMessage();
                idIncreatment++;
                System.out.println("we got messageid: " + messageid + " with type " + messageType 
                  + " from user "+ senderid + " to user " + receiverid + " with message: " 
                  + new String(message));
                NativeMemoryManager.destroyMessage(idIncreatment);
            } else {
                Thread.sleep(10);
            }
            
            
        } catch (InterruptedException ex) {
          ex.printStackTrace();
        }
    }

}

Exercise 4, Step 1: Run a Program that Produces and Consumes Queue Messages and Causes Native Memory Leaks

Go to the directory <lab_root>/dtrace_javanative/exercises/ex4, and execute the ant command, as follows:

ouser@solaris11:~$ cd dtrace_javanative/exercises/ex4
ouser@solaris11:~/dtrace_javanative/exercises/ex4$ ant

The ant command runs a program that prints the queue message reader‘s log information. After running for a while, the program might produce output similar to the following. You can stop this program by pressing the Enter key at any time, and you can restart it by running ant again.

[java] we got messageid: 0 with type B from user 300 to user 100 with message: hello,OOW
[java] we got messageid: 1 with type B from user 301 to user 101 with message: hello,OOW
[java] we got messageid: 2 with type B from user 302 to user 102 with message: hello,OOW
[java] we got messageid: 3 with type B from user 303 to user 103 with message: hello,OOW
[java] we got messageid: 4 with type B from user 304 to user 104 with message: hello,OOW
[java] we got messageid: 5 with type B from user 305 to user 105 with message: hello,OOW
[java] we got messageid: 6 with type B from user 306 to user 106 with message: hello,OOW
[java] we got messageid: 7 with type B from user 307 to user 107 with message: hello,OOW
[java] we got messageid: 8 with type B from user 308 to user 108 with message: hello,OOW
[java] we got messageid: 9 with type B from user 309 to user 109 with message: hello,OOW
[java] we got messageid: 10 with type B from user 310 to user 110 with message: hello,OOW
[java] we got messageid: 11 with type B from user 311 to user 111 with message: hello,OOW
[java] we got messageid: 12 with type B from user 312 to user 112 with message: hello,OOW
[java] we got messageid: 13 with type B from user 313 to user 113 with message: hello,OOW

Exercise 4, Step 2: Monitor the Program with JDK Tools

While running the program started in Exercise 4, Step 1, use the jps tool to find the PID of the program. You should get output similar to Listing 24:
```
ouser@solaris11:~$ jps -v
1839 App -Djava.security.policy=/usr/share/vpanels/java.policy
3952 Launcher -Dant.home=/home/gary/sf_OOW/apache-ant-1.9.2 -Dant.library.dir=/home/gary/sf_OOW/apache-ant-1.9.2/lib
3955 Jps -Dapplication.home=/home/gary/sf_OOW/jdk1.7.0_25 -Xms8m
3954 OOW -DDescription=OOWDtraceNativeLab_ex4 -Xms128m -Xmx128m -XX:+ExtendedDTraceProbes
```
Listing 24

By using jps -v, we can find each Java process with its PID and startup arguments. From the output in Listing 24, we can tell that the PID of the Java program on my machine is 3954 (it might be different on your machine), because it has a name of OOW with other arguments.
Use the prstat -p <pid> system command to monitor the memory usage of this process, as shown in Listing 25.
```
 PID USERNAME  SIZE    RSS STATE   PRI NICE      TIME  CPU PROCESS/NLWP      
3954 ouser      201M   29M sleep    59    0   0:00:03 2.5% java/12
```
Listing 25

From the output shown in Listing 25, we can see that the process with PID 3954 is consuming about 201 MB of virtual memory and this number is still increasing.

Use the jmap -heap <pid> command to monitor the Java heap usage of this program (you can use your PID to replace <pid>).

Heap Usage:
PS Young Generation
Eden Space:
   capacity = 34630848 (33.3125MB)
   used     = 4848624 (4.564041137695312MB)
   free     = 29702224 (28.34845886230469MB)
   14.000426185098354% used
From Space:
   capacity = 5762896 (5.5MB)
   used     = 0 (0.0MB)
   free     = 5762896 (5.5MB)
   0.0% used
To Space:
   capacity = 5762896 (5.5MB)
   used     = 0 (0.0MB)
   free     = 5762896 (5.5MB)
   0.0% used
tenured generation:
   capacity = 90178816 (86.0MB)
   used     = 0 (0.0MB)
   free     = 90178816 (86.0MB)
   0.0% used
Perm Generation:
   capacity = 12582912 (12.0MB)
   used     = 1869608 (1.7829971313476562MB)
   free     = 10713304 (10.217002868652344MB)
   14.858309427897135% used

760 interned Strings occupying 47136 bytes.

Listing 26

From the output in Listing 26, we see that the process with PID 3954 had consumed about 128 MB of heap memory and 12 MB of PermGen space, which is less than the result of the prstat output shown in Listing 25. So there must be some native memory leaks in this Java process.

Exercise 4, Step 3: Use DTrace to Trace the Problem

Now we will explore how to use the Java HotSpot VM provider to analyze a Java problem and quickly locate the root cause in the Java source code.

In the output of prstat shown in Listing 25, the values in both the SIZE and RSS fields are increasing. Therefore, we need to observe the malloc and mmapcalls of this Java process by using ex4.d:

pid$1::malloc:entry
{
   printf("calling malloc(%d) %d times\n",arg0,malloc++);
}

pid$1::free:entry
{
   printf("calling free %d times\n",free++);
}

syscall::mmap:entry,
syscall::munmap:entry
/pid == $1/
{
   printf("calling %s with size %d\n",probefunc, arg1) ;
}

Switch to the root user before trying any DTrace commands:
```
ouser@solaris11:~$ su
Password: 
root@solaris11:~#
```

Run ex4.d with the PID. You will get output similar to Listing 27:

root@solaris11:~/dtrace_javanative/DScripts# ./ex4.d 3954
Tracing... Hit Ctrl-C to end.
calling malloc(2003000) 0 times
calling malloc(2003000) 1 times
calling malloc(2003000) 2 times
calling malloc(2003000) 3 times
calling malloc(2003000) 4 times
calling malloc(2003000) 5 times
calling malloc(2003000) 6 times
calling malloc(2003000) 7 times
calling malloc(2003000) 8 times
calling malloc(2003000) 9 times
calling malloc(2003000) 10 times
^C

Listing 27

As you can see in Listing 27, there is no mmap call. This process uses only malloc for memory allocation with a fixed size of about 2 MB and without calling free. Therefore, it is the malloc calls that continuously eat up memory.

Now we will use the ustack function to check the user-level stack when malloc is being called. Refer to the following script (ex4.1.d):
```
pid$1:libc:malloc:entry
{
   @stack_count[ustack(),arg0] = count() ;
}
```
Run ex4.1.d with the PID. You will get the output similar to Listing 28:
```
root@solaris11:~/dtrace_javanative/DScripts# ./ex4.1.d 3954
^C


              libc.so.1`malloc
              libjvm.so`__1cCosGmalloc6FI_pv_+0x2d
              libjvm.so`Unsafe_AllocateMemory+0x17e
              0xfa20ab12
              0xfa262960
          2003000                12
```
Listing 28

As you can see in Listing 28, all of the malloc calls have a size of about 2 MB and occurred 12 times during the running time of this script. And all of these malloc calls come from Unsafe_AllocateMemory. Therefore, we can easily locate the root cause in the Java source code.

However, the real world is not that simple. Usually there are lots of malloc calls from different segments of source code. As a result, the output of running ex4.1.d will be unreadable and it will be difficult for us to locate the cause.
Actually we have a better choice—to use the DTrace hotspot provider—which provides probes to track the lifespan of the VM, thread start and stop events, GC memory pool statistics, method compilations, and monitor activity. With a startup flag, additional probes—such as method-entry probes and method-return probes—are enabled that can be used to monitor the running Java program and object allocations.

Now we will use method-entry and method-return probes for tracking this Java process. These probes can provide a fine-grained examination of Java thread execution. They will fire any time a method is entered or returned. The method-entry and method-return probes take the following arguments:
- args[0]: The Java thread ID of the thread that is entering or leaving the method
- args[1]: A pointer to UTF-8 string data that contains the name of the class of the method
- args[2]: The length of the class name data (in bytes)
- args[3]: A pointer to UTF-8 string data that contains the name of the method
- args[4]: The length of the method name data (in bytes)
- args[5]: A pointer to UTF-8 string data that contains the signature of the method
- args[6]: The length of the signature (in bytes)
The following script (ex4.2.d) uses the probes mentioned above to find out which classes and methods makemalloc calls.
```
hotspot$1:::method-entry
/ stringof(copyin(arg1,6) )== "native"/
{
        self->class = stringof(copyin(arg1, arg2 + 1));
        self->class[arg2] = ‘\0‘;
        self->method = stringof(copyin(arg3, arg4 + 1));
        self->method[arg4] = ‘\0‘;
}

hotspot$1:::method-return
/ self->class != NULL /
{
   self->class = 0 ;
   self->method = 0 ;
}

pid$1::malloc:entry
/ self->class != NULL /
{
   printf("calling malloc(%d) in %s.%s\n",arg0,self->class,self->method);
   @malloc_size[self->class,self->method] = sum(arg0) ;
   @malloc_dist[self->class,self->method] = quantize(arg0) ;
}

dtrace:::END
{
        printf("\nJava malloc byte distributions by class and method,\n\n");
        printa("   %s, %s, bytes total = %@d %@d\n", @malloc_size,
            @malloc_dist) ;
}
```
We added / stringof(copyin(arg1,6) )== "native"/ in the predicate field in order to focus on the classes of interest without being distracted by other irrelevant pieces of code.

Run ex4.2.d with the PID. You will get the output similar to Listing 29:

root@solaris11:~/dtrace_javanative/DScripts# ./ex4.2.d 3954
calling malloc(2003000) in nativememory/DataAccess/NativeMemoryManager.createOneMessageSpace
calling malloc(2003000) in nativememory/DataAccess/NativeMemoryManager.createOneMessageSpace
calling malloc(2003000) in nativememory/DataAccess/NativeMemoryManager.createOneMessageSpace
calling malloc(2003000) in nativememory/DataAccess/NativeMemoryManager.createOneMessageSpace
calling malloc(2003000) in nativememory/DataAccess/NativeMemoryManager.createOneMessageSpace
calling malloc(2003000) in nativememory/DataAccess/NativeMemoryManager.createOneMessageSpace
calling malloc(2003000) in nativememory/DataAccess/NativeMemoryManager.createOneMessageSpace
calling malloc(2003000) in nativememory/DataAccess/NativeMemoryManager.createOneMessageSpace
calling malloc(2003000) in nativememory/DataAccess/NativeMemoryManager.createOneMessageSpace
calling malloc(2003000) in nativememory/DataAccess/NativeMemoryManager.createOneMessageSpace
calling malloc(2003000) in nativememory/DataAccess/NativeMemoryManager.createOneMessageSpace
calling malloc(2003000) in nativememory/DataAccess/NativeMemoryManager.createOneMessageSpace
calling malloc(2003000) in nativememory/DataAccess/NativeMemoryManager.createOneMessageSpace
^C

Java malloc byte distributions by class and method,

   nativememory/DataAccess/NativeMemoryManager, createOneMessageSpace, bytes total = 26039000 
           value  ------------- Distribution ------------- count    
          524288 |                                         0        
         1048576 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 13       
         2097152 |                                         0

Listing 29

As you can see in Listing 29, all of the malloc calls are called by the createOneMessageSpace method of theNativeMemoryManager class.

Exercise 4, Step 4: Modify the Program and Run It Again

Stop the program by pressing the Enter key in the console that was started in Exercise 4, Step 1.
Use an editor to open the file<lab_root>/dtrace_javanative/exercises/ex4/src/nativememory/DataAccess/NativeMemoryManager.java.

In this file, we can find that NativeMemoryManager allocates one page of memory space for every 100 records.

Add the code lines highlighted in bold in Listing 30. This code will free a native memory page if all of the 100 records in it are consumed.

public class NativeMemoryManager {
    
    public static int recordPerPage = 100;
    public static volatile long nextMessageid;
    public static long[] memoryPagesOffset = new long[1000];
    public static final long recordsize = QueueDataAccess.getObjectSize();
    public static final long requiredHeap = recordPerPage * QueueDataAccess.getObjectSize();
    
    
    
    public static long getMessageAddress(final long messageid){
        if (messageid>nextMessageid-2) return -1;
        long reminder = messageid%recordPerPage;
        int index = (int) ((messageid-reminder)/recordPerPage);
        return memoryPagesOffset[index]+reminder*recordsize;
    }
    
    
    synchronized public static long createOneMessageSpace(){
        long reminder = nextMessageid%recordPerPage;
        int index = (int) ((nextMessageid-reminder)/recordPerPage);
        if(reminder ==0) {
            long address = Util.unsafe.allocateMemory(requiredHeap);
            memoryPagesOffset[index]=address;
        }
        
        nextMessageid++;
        return memoryPagesOffset[index]+reminder*recordsize;
        
    }
    
    public static void destroyMessage(final long messageid) {
        long reminder = messageid%recordPerPage;
        int index = (int) ((messageid-reminder)/recordPerPage);
        if (reminder == 0 && index >1) Util.unsafe.freeMemory(memoryPagesOffset[index-1]);
        
    }
    
}

Listing 30

Repeat the process from Exercise 4, Step 1 through Exercise 4, Step 3 to see the difference.

Summary

You have successfully completed the "How to Use DTrace to Troubleshoot Java Native Memory Problems on Oracle Solaris 11" hands-on lab! Starting by learning basic concepts, we used a set of powerful DTrace scripts to solve different Java native memory problems. The usage of predicates, aggregations, and the pid, profile, and hotspot providers were also demonstrated.

About the Authors

Wang Yu presently works for Oracle‘s ISV Engineering group as an Oracle Solaris (Java) technology engineer and technology architecture consultant. His duties include supporting local independent software vendors (ISVs) and evangelizing about and consulting on important Oracle Solaris and Java technologies.

Wensheng (Vincent) Liu is a technical consultant in Oracle‘s ISV Engineering group. His responsibility includes providing technical support to local ISVs and evangelizing about and consulting on important Oracle Solaris and system-related technologies.

Gang (Gary) Wang, manager of Oracle‘s ISV Engineering group, leads ISV Engineering teams in Beijing helping ISVs in China, Japan, and Korea to make their applications run best on Oracle Solaris, Oracle servers, and Oracle storage.

Revision 1.0, 02/25/2014

Revision 1.0, 02/25/2014

DTrace to Troubleshoot Java Native Memory Problems

标签：

原文地址：http://www.cnblogs.com/zengkefu/p/5632990.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行