Mastering Flame Graphs: Visualizing Performance with Async-Profiler in Java

Flame graphs have revolutionized how developers analyze performance in Java applications. When combined with Async-Profiler, they create a powerful toolkit for identifying bottlenecks and optimizing code. This comprehensive guide explores how to generate and interpret flame graphs to uncover performance insights.

What are Flame Graphs?

Flame graphs are visual representations of stack traces that show the frequency of code execution across your application. Created by Brendan Gregg, they provide an intuitive way to understand where your application spends most of its CPU time or where memory allocation occurs.

Key Characteristics:

  • Width represents the frequency of stack traces
  • Height shows the depth of call stacks
  • Color typically indicates stack type (Java, C++, Kernel)
  • Each rectangle represents a stack frame

Setting Up Async-Profiler

Installation

# Download latest async-profiler
wget https://github.com/async-profiler/async-profiler/releases/download/v2.9/async-profiler-2.9-linux-x64.tar.gz
tar -xzf async-profiler-2.9-linux-x64.tar.gz
cd async-profiler-2.9-linux-x64

Basic Usage

# Profile CPU for 60 seconds and generate flame graph
./profiler.sh -d 60 -f /tmp/flamegraph.svg <pid>
# Profile allocations
./profiler.sh -d 60 -e alloc -f /tmp/alloc-flamegraph.svg <pid>

Generating Flame Graphs: Common Scenarios

1. CPU Profiling

# Simple CPU profiling
./profiler.sh -d 30 -f cpu-profile.svg <pid>
# With more detailed output
./profiler.sh -d 60 --chunksize 10m -f detailed-cpu.svg <pid>
# Include kernel stacks
./profiler.sh -d 30 -e cpu -i 1ms --all-kernel -f cpu-kernel.svg <pid>
# Include user and kernel stacks
./profiler.sh -d 30 -e cpu --all-user --all-kernel -f full-stacks.svg <pid>

2. Memory Allocation Profiling

# Allocation profiling
./profiler.sh -d 60 -e alloc -f alloc-flamegraph.svg <pid>
# Live object monitoring
./profiler.sh -d 60 -e live -f live-objects.svg <pid>
# Heap usage profiling
./profiler.sh -d 60 -e heap -f heap-usage.svg <pid>

3. Advanced Profiling Scenarios

# Profile with specific event
./profiler.sh -d 30 -e lock -f lock-contention.svg <pid>
# Wall-clock profiling (measures elapsed time)
./profiler.sh -d 60 -e wall -t -f wall-clock.svg <pid>
# Profile with custom interval
./profiler.sh -d 120 -i 10ms -f custom-interval.svg <pid>

Java-Specific Integration

JVM Attach Modes

# Attach to running JVM
./profiler.sh start <pid>
./profiler.sh stop <pid> -f profile.svg
# Profile from application start
java -agentpath:/path/to/libasyncProfiler.so=start,event=cpu,file=startup.svg \
-jar myapp.jar

Containerized Environments

# Profile Java app in Docker
docker exec <container> /profiler.sh -d 30 -f /tmp/profile.svg <pid>
# Copy profile out of container
docker cp <container>:/tmp/profile.svg ./profile.svg

Interpreting Flame Graphs

Reading the Visualization

[    java.lang.Thread.run()         ]  ← Top (shallow)
[    java.util.concurrent.ThreadPoolExecutor$Worker.run() ]
[    com.example.Service.process()  ]
[    com.example.Database.query()   ]  ← Bottom (deep)
[    mysql.driver.executeQuery()    ]

Common Patterns and What They Mean

1. Wide Top Frames

▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆▆ [java.util.HashMap.get() ]

Interpretation: This method is called frequently from many different callers. May indicate a hotspot.

2. Tall Narrow Stacks

[ Top ]
[     ]
[ ... ] ← Deep call chain
[     ]
[Bottom]

Interpretation: Deep call stack with relatively low individual frequency.

3. Plateaus

[ com.example.Validator.validate()    ] ← Wide plateau
[   com.example.Validator.checkRule() ]
[     com.example.Validator.verify()  ]

Interpretation: A single method dominating execution time across multiple call paths.

Real-World Examples

Identifying CPU Bottlenecks

# Generate profile for web application
./profiler.sh -d 180 -f /tmp/webapp-cpu.svg $(pgrep -f tomcat)

Common Findings:

  • JSON/XML serialization dominating
  • Inefficient database query processing
  • Cryptographic operations
  • Regular expression compilation

Memory Allocation Analysis

# Profile allocations in microservice
./profiler.sh -d 120 -e alloc -f /tmp/alloc-profile.svg $(jps | grep MyService | cut -d' ' -f1)

Common Findings:

  • Excessive string creation in loops
  • Unnecessary object creation in hot paths
  • Inefficient collection usage
  • Memory leaks in cached data

Advanced Features and Integration

Programmatic Control

// Start profiling programmatically
AsyncProfiler.getInstance().start("cpu", 1000000);
// Critical section
performExpensiveOperation();
// Stop and get results
String svg = AsyncProfiler.getInstance().stop();
Files.write(Paths.get("profile.svg"), svg.getBytes());

Continuous Profiling

# Continuous profiling with rolling outputs
while true; do
./profiler.sh -d 300 -f /tmp/profiles/profile-$(date +%s).svg <pid>
sleep 10
done

Integration with APM Tools

# Generate profile and upload to monitoring system
./profiler.sh -d 60 -f /tmp/current.svg <pid>
curl -X POST -F "profile=@/tmp/current.svg" \
https://monitoring.company.com/api/profiles

Best Practices

1. Profile Under Realistic Load

# Profile during load test
./profiler.sh -d 300 -f production-load.svg <pid>

2. Use Appropriate Duration

  • Short runs (30-60s): For immediate issues
  • Medium runs (2-5min): For typical performance analysis
  • Long runs (10-30min): For intermittent issues

3. Compare Before/After

# Before optimization
./profiler.sh -d 60 -f before-optimization.svg <pid>
# After optimization  
./profiler.sh -d 60 -f after-optimization.svg <pid>

4. Container-Aware Profiling

# Dockerfile example
FROM openjdk:11-jre
COPY async-profiler /opt/async-profiler
ENV PATH="/opt/async-profiler:${PATH}"

Troubleshooting Common Issues

Permission Problems

# Fix permissions for container profiling
docker run --cap-add=ALL --security-opt seccomp=unconfined ...

Missing Symbols

# Include debug symbols
./profiler.sh --include-klass <classname> -d 60 -f debug.svg <pid>

High Overhead

# Use lower frequency for reduced overhead
./profiler.sh -d 60 -i 10ms -f low-overhead.svg <pid>

Conclusion

Flame graphs generated with Async-Profiler provide unparalleled visibility into Java application performance. By mastering their generation and interpretation, developers can:

  • Quickly identify performance bottlenecks
  • Optimize resource-intensive code paths
  • Validate performance improvements
  • Understand complex call stack relationships

The combination of easy generation, intuitive visualization, and detailed stack trace information makes flame graphs an essential tool in every Java developer's performance optimization toolkit.

Remember that the most valuable insights often come from comparing flame graphs across different loads, versions, or configurations, revealing the true impact of your code changes on application performance.

Leave a Reply

Your email address will not be published. Required fields are marked *


Macro Nepal Helper