Debugging the Unthinkable: JVM Crash Analysis with gdb

When a Java application crashes with a segmentation fault or fatal error, it can be one of the most challenging scenarios for developers. The JVM itself—typically a rock-solid foundation—has encountered something it cannot handle. In these situations, traditional Java debugging tools are insufficient, and you need to dive deeper with native debugging tools like gdb (GNU Debugger).

This article explores how to analyze JVM crashes using gdb, from capturing crash dumps to interpreting core files and extracting meaningful information.

Table of Contents

When Does the JVM Crash?

The JVM is a complex native application written in C/C++. It can crash due to:

Native Memory Corruption: Bugs in JNI code or native libraries
JVM Bugs: Rare issues in the JVM itself (more common in newer versions)
System Resource Exhaustion: Running out of memory, file descriptors, etc.
Hardware Issues: Faulty memory, CPU problems, or disk errors
Operating System Bugs: Kernel-level issues affecting the JVM

Common symptoms include:

Segmentation fault (SIGSEGV)
Bus error (SIGBUS)
Fatal error logs with hs_err_pid files
Abrupt process termination without stack traces

Prerequisites for JVM Crash Analysis

Essential Tools:

# On Ubuntu/Debian
sudo apt-get install gdb openjdk-17-dbg
# On RHEL/CentOS
sudo yum install gdb java-17-openjdk-debuginfo
# On Amazon Linux 2023
sudo dnf install gdb java-17-openjdk-debuginfo

Key Components:

gdb: The GNU Debugger for analyzing core dumps and live processes
Debug Symbols: JVM debug packages (openjdk-XX-dbg or java-XX-openjdk-debuginfo)
Core Dump Configuration: Proper system setup for core dump generation

Configuring the System for Crash Analysis

Enable Core Dumps:

# Check current limits
ulimit -a
# Enable unlimited core dumps (current session)
ulimit -c unlimited
# Permanent configuration
echo "ulimit -c unlimited" >> ~/.bashrc
echo "/tmp/core.%e.%p" | sudo tee /proc/sys/kernel/core_pattern
# For systemd services, add to service file:
# [Service]
# LimitCORE=infinity

JVM Crash Dump Options:

# Generate core dump on OutOfMemoryError
java -XX:+CrashOnOutOfMemoryError -jar app.jar
# Generate core dump on any crash (default behavior)
java -XX:+CreateCOREOnCrash -jar app.jar
# Explicit core dump location
java -XX:ErrorFile=/var/log/hs_err_pid%p.log -jar app.jar

Basic gdb Commands for JVM Analysis

Starting gdb with a Core Dump:

gdb /usr/bin/java core.1234
# or
gdb --core=core.1234 /usr/bin/java

Essential gdb Commands:

(gdb) bt                     # Backtrace - most important first command
(gdb) bt full               # Detailed backtrace with local variables
(gdb) info threads          # List all threads
(gdb) thread apply all bt   # Backtrace for all threads
(gdb) info registers        # Show CPU registers
(gdb) x/10i $pc            # Disassemble instructions at program counter
(gdb) print expr           # Print variable value
(gdb) where                # Current stack trace

Step-by-Step Crash Analysis Workflow

Scenario 1: Live Process Crashed with Core Dump

# 1. Find the core dump
find / -name "core.*" -o -name "java.core.*" 2>/dev/null
# 2. Load core dump with debug symbols
gdb /usr/lib/jvm/java-17-openjdk/bin/java core.java.1234
# 3. Get comprehensive thread information
(gdb) info threads
(gdb) thread apply all bt full
# 4. Focus on the crashing thread
(gdb) thread 1
(gdb) bt full

Scenario 2: Analyzing a Running JVM

# Attach to running JVM process
sudo gdb -p 1234
# Get thread information
(gdb) info threads
(gdb) thread apply all bt
# Detach without killing process
(gdb) detach
(gdb) quit

Interpreting JVM Crash Signatures

Common Crash Patterns:

1. SIGSEGV in Native Code:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f8a5b7fe700 (LWP 12345)]
0x00007f8a4a3b2150 in SomeNativeFunction () from /path/to/libnative.so
(gdb) bt
#0  0x00007f8a4a3b2150 in SomeNativeFunction () from /path/to/libnative.so
#1  0x00007f8a5a1c3e20 in Java_com_example_NativeClass_nativeMethod ()
#2  0x00007f8a6b2a1c40 in ?? ()

2. JVM Internal Crash:

Program received signal SIGILL, Illegal instruction.
0x00007f8a6a5c3d10 in VM_Version::get_processor_features() ()
(gdb) bt
#0  0x00007f8a6a5c3d10 in VM_Version::get_processor_features() ()
#1  0x00007f8a6a5c1a20 in VM_Version::initialize() ()

Advanced JVM-Specific gdb Commands

JVM Debug Symbols Commands:

# Ensure debug symbols are loaded
(gdb) info sharedlibrary
(gdb) set debug-file-directory /usr/lib/debug
# JVM-specific debugging (requires debug symbols)
(gdb) p *thread
(gdb) p *this

Examining JVM Memory:

(gdb) info proc mappings        # Show memory map
(gdb) x/100x 0x7f8a5a000000    # Examine memory at address
(gdb) x/10s 0x7f8a5a123456     # Examine as strings

Real-World Crash Analysis Examples

Example 1: JNI Code Crash

# Core dump shows crash in JNI code
gdb /usr/bin/java core.1234
(gdb) bt
#0  0x00007f345a2b1150 in process_buffer (env=0x7f3444007890, 
obj=0x7f3444012ab0, buffer=0x0, len=1024) at jni_native.c:45
#1  0x00007f345a2b12a0 in Java_com_myapp_NativeProcessor_process
(env=0x7f3444007890, obj=0x7f3444012ab0, buffer=0x0, len=1024)
at jni_native.c:89
(gdb) frame 0
(gdb) print buffer
$1 = (unsigned char *) 0x0  # NULL pointer dereference!

Analysis: The JNI code is trying to use a NULL buffer pointer, causing SIGSEGV.

Example 2: Heap Corruption

(gdb) bt
#0  0x00007f8e1a4c9d50 in G1ParScanThreadState::copy_to_survivor_space(
oopDesc*, markWord, oopDesc*) ()
#1  0x00007f8e1a4c8b20 in G1ParScanThreadState::trim_queue() ()
#2  0x00007f8e1a4c7e10 in G1ParScanThreadState::steal() ()

Analysis: Crash during garbage collection, possibly due to heap corruption from native code.

Using gdb with hs_err_pid Files

The JVM generates hs_err_pid<pid>.log files containing valuable information:

# Extract key information from hs_err file
grep -A 10 -B 5 "Problematic frame" hs_err_pid12345.log
grep "CURRENT_THREAD" hs_err_pid12345.log
grep "Stack:" hs_err_pid12345.log -A 20

Correlate with gdb:

# Find the crashing address from hs_err file
CRASH_ADDR=$(grep "Problematic frame" hs_err_pid12345.log | \
awk -F'=' '{print $3}' | awk '{print $1}')
# Examine that address in gdb
gdb --core=core.12345 /usr/bin/java
(gdb) x/10i $CRASH_ADDR

Automated Crash Analysis Script

Create a script for consistent crash analysis:

#!/bin/bash
# analyze_crash.sh
CORE_DUMP=$1
PID=$(echo $CORE_DUMP | grep -o '[0-9]\+' | head -1)
echo "=== JVM Crash Analysis Report ==="
echo "Core dump: $CORE_DUMP"
echo "PID: $PID"
echo
# Check for hs_err file
HS_ERR_FILE="hs_err_pid${PID}.log"
if [ -f "$HS_ERR_FILE" ]; then
echo "Found hs_err file: $HS_ERR_FILE"
grep "Problematic frame" "$HS_ERR_FILE"
echo
fi
# Load core dump in gdb and extract information
gdb -batch -ex "thread apply all bt full" -ex "quit" \
/usr/bin/java "$CORE_DUMP" 2>/dev/null | \
head -100
echo "=== End of Report ==="

Usage:

chmod +x analyze_crash.sh
./analyze_crash.sh core.1234

Best Practices for JVM Crash Analysis

Always Install Debug Symbols:

   # Match JDK version with debug symbols
java -version
sudo apt-get install openjdk-17-dbg

Configure Core Dumps Proactively:

   # Add to JVM startup options
-XX:+CrashOnOutOfMemoryError
-XX:ErrorFile=/var/log/java/hs_err_pid%p.log
-XX:OnError="gdb -batch -ex 'thread apply all bt' -ex 'quit' /usr/bin/java %p"

Preserve Evidence:

   # Archive all crash artifacts
tar czf crash_analysis_$(date +%Y%m%d_%H%M%S).tar.gz \
core.* hs_err_pid*.log /path/to/app.jar

Reproduce in Development:

Use the same JDK version and build
Same system configuration
Same application version and data

Common Solutions to JVM Crashes

JNI-Related Crashes:

Validate all native method parameters
Use JNI_ABORT for read-only buffers
Check for memory leaks in native code
Verify pointer validity before dereferencing

Memory-Related Crashes:

Monitor native memory usage with NMT
Use -XX:MaxDirectMemorySize to limit direct buffers
Check for native memory leaks in third-party libraries

Garbage Collection Crashes:

Try different GC algorithms (-XX:+UseG1GC)
Reduce heap size if experiencing memory fragmentation
Update to latest JVM patch release

Conclusion

JVM crash analysis with gdb is a critical skill for Java developers and operators dealing with complex applications, especially those using JNI, native libraries, or running under heavy load. By mastering these techniques, you can:

Quickly Identify Root Causes: From core dumps and crash logs
Reduce Mean Time to Resolution (MTTR): With systematic analysis approaches
Improve Application Stability: By identifying and fixing underlying issues
Communicate Effectively: Provide detailed crash reports to library vendors or JVM teams

Remember that while gdb provides low-level insights, the best solution is often to prevent crashes through proper coding practices, comprehensive testing, and proactive monitoring of both Java and native components.