Scaling Your Java Microservices: A Practical Guide to the Horizontal Pod Autoscaler (HPA)

Table of Contents

Article

In the world of microservices, traffic is rarely constant. A sudden spike in users can overwhelm your Spring Boot service, leading to slow response times and errors. Conversely, running too many instances during low-traffic periods wastes resources and money. The Horizontal Pod Autoscaler (HPA) is Kubernetes' native solution to this problem, providing automatic, metric-based scaling for your Java applications.

For Java developers, understanding HPA is crucial for building resilient, cost-effective, and truly cloud-native applications.

What is the Horizontal Pod Autoscaler (HPA)?

The HPA is a Kubernetes controller that automatically adjusts the number of Pods in a Deployment, StatefulSet, or other similar resource to match observed demand.

Horizontal Scaling: This means scaling out (adding more Pods) and scaling in (removing Pods). This is different from Vertical Pod Autoscaling (VPA), which adjusts the CPU/memory requests of a single Pod.
The Core Logic: The HPA continuously monitors a set of metrics (like average CPU utilization). If the metrics exceed your target, the HPA increases the replica count. If they fall below, it decreases it.

How HPA Works: The Control Loop

The HPA operates on a simple but effective control loop:

Metrics Collection: The HPA queries the Metrics Server (or a custom metrics API) to get current metric values for your Pods.
Calculation: It calculates the desired replica count using the formula:
desiredReplicas = ceil[currentReplicas * (currentMetricValue / desiredMetricValue)]
Scaling Action: It updates the .spec.replicas field of the target resource (e.g., your Deployment) to match the desired count.
The Kubernetes scheduler then takes over, creating or terminating Pods to meet the new replica count.

A Practical Example: Scaling a Spring Boot Application

Let's walk through configuring an HPA for a typical Java microservice.

Prerequisites:

Metrics Server: Must be installed in your cluster. It provides core resource metrics like CPU and memory.
Resource Requests: Your Java application's Pod spec must have CPU and/or memory requests defined. The HPA uses these as a reference for calculating utilization percentages.

Step 1: The Java Deployment with Resource Requests

Your Deployment YAML must specify resource requests. This is non-negotiable for CPU/Memory-based HPA.

# spring-boot-app-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-java-app
spec:
selector:
matchLabels:
app: my-java-app
template:
metadata:
labels:
app: my-java-app
spec:
containers:
- name: app
image: my-registry/spring-boot-app:latest
resources:
# REQUESTS are crucial for HPA calculation
requests:
cpu: 500m   # 0.5 CPU cores
memory: 512Mi
limits:
cpu: 1000m  # 1 CPU core
memory: 1Gi
ports:
- containerPort: 8080

Step 2: The HorizontalPodAutoscaler Manifest

This YAML defines the autoscaling policy for the Deployment above.

# java-app-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-java-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-java-app
minReplicas: 2    # Always run at least 2 instances for availability
maxReplicas: 10   # Don't scale beyond 10 instances
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50  # Target: average Pod CPU use is 50% of its request (500m).
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 70  # Target: average Pod memory use is 70% of its request (512Mi).
behavior: # (Optional) Fine-tune scaling speed
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 minutes before scaling down again
policies:
- type: Percent
value: 50   # Don't remove more than 50% of current replicas at once

What this means in practice:
If the average CPU usage across all Pods exceeds 50% of their requested 500m (i.e., 250m), the HPA will add more Pods. If it drops significantly below 50%, it will start removing Pods, but never going below 2.

Beyond CPU and Memory: Custom Metrics for Java Apps

While CPU is a common scaling metric, it's often not the best one for Java applications. A more sophisticated approach uses custom metrics based on application-level data.

Popular Custom Metrics for Java:

HTTP requests per second (from an Istio or Nginx ingress controller)
JVM Heap Memory Usage
Average REST API Latency
Kafka consumer lag
Internal thread pool queue size (e.g., Tomcat's thread pool)

To use these, you need to install Prometheus Adapter, which allows the HPA to query metrics from a Prometheus monitoring system.

Example HPA using a custom metric:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
spec:
scaleTargetRef:
# ... same as before
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: 100  # Scale to maintain an average of 100 RPS per Pod

Java-Specific Considerations & Best Practices

Warm-Up Time: JVMs need time to warm up (JIT compilation, class loading, cache population). A new Pod isn't immediately performant. Use the behavior section to slow down scale-up and prevent a flood of cold starts.
Garbage Collection: Be mindful of GC cycles causing brief CPU/Memory spikes. The HPA's averaging period helps, but tuning your JVM flags is critical.
Thread Pools: If your application uses a fixed thread pool (e.g., in Tomcat or for parallel processing), ensure its size is appropriate for the CPU requests. A Pod with cpu: 500m should not have a thread pool of 200 threads.
Liveness and Readiness Probes: These are essential. The HPA controls the number of Pods, but the Service mesh relies on readiness probes to know when a new Pod is ready to receive traffic.

Conclusion

The Horizontal Pod Autoscaler is a foundational tool for running Java applications efficiently in Kubernetes. Moving beyond simple CPU scaling to using custom, application-aware metrics allows Java teams to build systems that are not only resilient to traffic spikes but also highly cost-optimized.

By combining HPA with well-defined resource requests, thoughtful JVM tuning, and application-specific metrics, you can ensure your Java microservices automatically right-size themselves to meet demand, delivering a seamless user experience while controlling cloud costs.

Pyroscope Profiling in Java
Explains how to use Pyroscope for continuous profiling in Java applications, helping developers analyze CPU and memory usage patterns to improve performance and identify bottlenecks.
https://macronepal.com/blog/pyroscope-profiling-in-java/

OpenTelemetry Metrics in Java: Comprehensive Guide
Provides a complete guide to collecting and exporting metrics in Java using OpenTelemetry, including counters, histograms, gauges, and integration with monitoring tools. (MACRO NEPAL)
https://macronepal.com/blog/opentelemetry-metrics-in-java-comprehensive-guide/

OTLP Exporter in Java: Complete Guide for OpenTelemetry
Explains how to configure OTLP exporters in Java to send telemetry data such as traces, metrics, and logs to monitoring systems using HTTP or gRPC protocols. (MACRO NEPAL)
https://macronepal.com/blog/otlp-exporter-in-java-complete-guide-for-opentelemetry/

Thanos Integration in Java: Global View of Metrics
Explains how to integrate Thanos with Java monitoring systems to create a scalable global metrics view across multiple Prometheus instances.

https://macronepal.com/blog/thanos-integration-in-java-global-view-of-metrics

Time Series with InfluxDB in Java: Complete Guide (Version 2)
Explains how to manage time-series data using InfluxDB in Java applications, including storing, querying, and analyzing metrics data.

https://macronepal.com/blog/time-series-with-influxdb-in-java-complete-guide-2

Time Series with InfluxDB in Java: Complete Guide
Provides an overview of integrating InfluxDB with Java for time-series data handling, including monitoring applications and managing performance metrics.

https://macronepal.com/blog/time-series-with-influxdb-in-java-complete-guide

Implementing Prometheus Remote Write in Java (Version 2)
Explains how to configure Java applications to send metrics data to Prometheus-compatible systems using the remote write feature for scalable monitoring.

https://macronepal.com/blog/implementing-prometheus-remote-write-in-java-a-complete-guide-2

Implementing Prometheus Remote Write in Java: Complete Guide
Provides instructions for sending metrics from Java services to Prometheus servers, enabling centralized monitoring and real-time analytics.

https://macronepal.com/blog/implementing-prometheus-remote-write-in-java-a-complete-guide

Building a TileServer GL in Java: Vector and Raster Tile Server
Explains how to build a TileServer GL in Java for serving vector and raster map tiles, useful for geographic visualization and mapping applications.

https://macronepal.com/blog/building-a-tileserver-gl-in-java-vector-and-raster-tile-server

Indoor Mapping in Java
Explains how to create indoor mapping systems in Java, including navigation inside buildings, spatial data handling, and visualization techniques.