The Java Virtual Machine's (JVM) tiered compilation strategy is a well-known performance story: start in the interpreter, gather profiling data, and then compile hot methods to optimized native code. But what about a long-running method that's already in the middle of execution? Must it finish its slow, interpreted run before benefiting from compilation? This is where On-Stack Replacement (OSR) performs its magic, allowing the JVM to swap code for a method while it is still executing.
The Problem: The Long-Running Loop
Imagine a hot method that is a single, massive loop. It could be processing a large file, running a complex simulation, or handling a lengthy computation.
public void processBatch(DataBatch batch) {
// This method is called and starts executing in the interpreter.
for (int i = 0; i < 1_000_000; i++) { // This loop takes 30 seconds!
Data item = batch.get(i);
// ... complex processing logic ...
applyBusinessRules(item);
updateAggregates(item);
}
}
Under a naive system, this method would be identified as "hot" early on, and the JIT compiler would queue it up for compilation. However, the method is already running. The compilation would finish, but the current activation of the method—the one stuck in that 30-second loop—would continue to plod along in the interpreter. The optimized version would only be used for the next call to processBatch. This is a huge missed opportunity.
On-Stack Replacement solves this exact problem. It allows the JVM to replace the currently executing code of a method with its optimized version in the middle of its execution, without waiting for it to return.
How OSR Works: A Technical Dive
OSR is a complex process, but its conceptual steps can be broken down:
- Detection and Compilation: The JVM's profiling mechanism detects that a method has become "hot" based on its invocation or loop back-edge counters. The JIT compiler compiles the method, but with a special entry point: it generates code not just for a normal method entry, but for starting execution at a specific bytecode index, typically the beginning of a long-running loop.
- Safepoint and Stack Frame Transformation: The JVM brings the thread executing the method to a safepoint—a known, consistent state where the JVM can safely manipulate thread stacks. It then performs a delicate surgery:
- It takes the current interpreter frame on the stack, which contains all the local variables, operands, and program counter for the slow code.
- It materializes an equivalent compiled frame for the newly compiled, optimized code.
- It maps the state from the interpreter frame to the compiled frame. This involves translating the interpreter's representation of local variables and the program counter (a bytecode index) into the corresponding machine-state for the compiled code (register values and a machine code address).
- Execution Transfer: The thread's instruction pointer is set to the OSR entry point in the newly compiled code. The thread then resumes execution, but now at full native speed, right in the middle of the loop.
In essence, OSR performs a "mid-flight upgrade" of a method's execution engine. The following sequence diagram visualizes this intricate process:
sequenceDiagram participant T as Java Thread participant JIT as JIT Compiler participant JVM as JVM Runtime T->>T: Executes method in interpreter Note over T: Long-running loop begins loop Each Loop Iteration T->>JVM: Increments loop back-edge counter end T->>JIT: Method detected as HOT (via counter) Note over T, JIT: Normal execution continues JIT->>JIT: Compiles method<br>with OSR entry point T->>JVM: Reaches a safepoint JIT->>JVM: Signals OSR code is ready JVM->>T: Transforms interpreter frame<br>to compiled frame JVM->>T: Sets instruction pointer<br>to OSR code Note over T: Execution RESUMES in<br>optimized native code T->>T: Finishes loop at high speed
Why OSR is Crucial for Performance
- Faster Time to Peak Performance: OSR is a key reason why modern JVMs achieve peak performance so quickly. It ensures that not just future calls, but currently executing heavy workloads, benefit from JIT compilation. Without OSR, the first invocation of a method with a long loop would always suffer through the full interpreted cost.
- Handling Initialization and Startup: Many applications have initialization routines or first-request processing that involves long-running methods. OSR ensures that even these one-off or startup-phase tasks are optimized, leading to better perceived startup times and responsiveness.
Observing OSR in Action
While OSR is largely transparent, you can see its effects in JVM compilation logs.
Using -XX:+PrintCompilation
This flag logs method compilations. OSR compilations are marked with a % symbol.
// A normal compilation (non-OSR) 123 56 3 java.util.ArrayList::add (25 bytes) // An OSR compilation (notice the '%') 124 57% 3 MyClass::processBatch @ 15 (59 bytes)
In the log above:
57%: The%indicates this was an On-Stack Replacement compilation.@ 15: This is the bytecode index (BCI) where the OSR entry point was generated. In this case, it's for the loop that starts at BCI 15.
Using -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation
This provides even more detailed output, which can be analyzed with the JITWatch tool. You can see the specific reason for the compilation (e.g., backedge_count).
The Limits and Costs of OSR
OSR is not a free lunch and has its complexities:
- Engineering Complexity: The frame transformation and state mapping is one of the most complex parts of the JVM. It requires precise knowledge of both the interpreter's and the compiler's stack frame layouts.
- Less Optimal Code: OSR code is often not as well-optimized as a full method compilation. The compiler has less context about the method's start and must make assumptions about the state at the OSR entry point. For example, loop unrolling might be more conservative.
- Overhead: The process of stopping the world at a safepoint and transforming the frame has a small but non-zero cost.
Conclusion
On-Stack Replacement is a testament to the sophistication of the JVM's adaptive optimization system. It solves a critical problem in a just-in-time compiled world: ensuring that all expensive execution, even that which has already begun, can benefit from runtime profiling and compilation. While you may never code directly for OSR, understanding its existence and role demystifies Java's performance characteristics, explaining how it can optimize long-running methods "on the fly" and deliver such impressive speed in real-world scenarios. It is the JIT's powerful tool for ensuring no expensive operation is left behind.