Article
In the world of microservices, traffic is rarely constant. A sudden spike in users can overwhelm your Spring Boot service, leading to slow response times and errors. Conversely, running too many instances during low-traffic periods wastes resources and money. The Horizontal Pod Autoscaler (HPA) is Kubernetes' native solution to this problem, providing automatic, metric-based scaling for your Java applications.
For Java developers, understanding HPA is crucial for building resilient, cost-effective, and truly cloud-native applications.
What is the Horizontal Pod Autoscaler (HPA)?
The HPA is a Kubernetes controller that automatically adjusts the number of Pods in a Deployment, StatefulSet, or other similar resource to match observed demand.
- Horizontal Scaling: This means scaling out (adding more Pods) and scaling in (removing Pods). This is different from Vertical Pod Autoscaling (VPA), which adjusts the CPU/memory requests of a single Pod.
- The Core Logic: The HPA continuously monitors a set of metrics (like average CPU utilization). If the metrics exceed your target, the HPA increases the replica count. If they fall below, it decreases it.
How HPA Works: The Control Loop
The HPA operates on a simple but effective control loop:
- Metrics Collection: The HPA queries the Metrics Server (or a custom metrics API) to get current metric values for your Pods.
- Calculation: It calculates the desired replica count using the formula:
desiredReplicas = ceil[currentReplicas * (currentMetricValue / desiredMetricValue)] - Scaling Action: It updates the
.spec.replicasfield of the target resource (e.g., your Deployment) to match the desired count. - The Kubernetes scheduler then takes over, creating or terminating Pods to meet the new replica count.
A Practical Example: Scaling a Spring Boot Application
Let's walk through configuring an HPA for a typical Java microservice.
Prerequisites:
- Metrics Server: Must be installed in your cluster. It provides core resource metrics like CPU and memory.
- Resource Requests: Your Java application's Pod spec must have CPU and/or memory
requestsdefined. The HPA uses these as a reference for calculating utilization percentages.
Step 1: The Java Deployment with Resource Requests
Your Deployment YAML must specify resource requests. This is non-negotiable for CPU/Memory-based HPA.
# spring-boot-app-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: my-java-app spec: selector: matchLabels: app: my-java-app template: metadata: labels: app: my-java-app spec: containers: - name: app image: my-registry/spring-boot-app:latest resources: # REQUESTS are crucial for HPA calculation requests: cpu: 500m # 0.5 CPU cores memory: 512Mi limits: cpu: 1000m # 1 CPU core memory: 1Gi ports: - containerPort: 8080
Step 2: The HorizontalPodAutoscaler Manifest
This YAML defines the autoscaling policy for the Deployment above.
# java-app-hpa.yaml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: my-java-app-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: my-java-app minReplicas: 2 # Always run at least 2 instances for availability maxReplicas: 10 # Don't scale beyond 10 instances metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 50 # Target: average Pod CPU use is 50% of its request (500m). - type: Resource resource: name: memory target: type: Utilization averageUtilization: 70 # Target: average Pod memory use is 70% of its request (512Mi). behavior: # (Optional) Fine-tune scaling speed scaleDown: stabilizationWindowSeconds: 300 # Wait 5 minutes before scaling down again policies: - type: Percent value: 50 # Don't remove more than 50% of current replicas at once
What this means in practice:
If the average CPU usage across all Pods exceeds 50% of their requested 500m (i.e., 250m), the HPA will add more Pods. If it drops significantly below 50%, it will start removing Pods, but never going below 2.
Beyond CPU and Memory: Custom Metrics for Java Apps
While CPU is a common scaling metric, it's often not the best one for Java applications. A more sophisticated approach uses custom metrics based on application-level data.
Popular Custom Metrics for Java:
- HTTP requests per second (from an Istio or Nginx ingress controller)
- JVM Heap Memory Usage
- Average REST API Latency
- Kafka consumer lag
- Internal thread pool queue size (e.g., Tomcat's thread pool)
To use these, you need to install Prometheus Adapter, which allows the HPA to query metrics from a Prometheus monitoring system.
Example HPA using a custom metric:
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler spec: scaleTargetRef: # ... same as before metrics: - type: Pods pods: metric: name: http_requests_per_second target: type: AverageValue averageValue: 100 # Scale to maintain an average of 100 RPS per Pod
Java-Specific Considerations & Best Practices
- Warm-Up Time: JVMs need time to warm up (JIT compilation, class loading, cache population). A new Pod isn't immediately performant. Use the
behaviorsection to slow down scale-up and prevent a flood of cold starts. - Garbage Collection: Be mindful of GC cycles causing brief CPU/Memory spikes. The HPA's averaging period helps, but tuning your JVM flags is critical.
- Thread Pools: If your application uses a fixed thread pool (e.g., in Tomcat or for parallel processing), ensure its size is appropriate for the CPU requests. A Pod with
cpu: 500mshould not have a thread pool of 200 threads. - Liveness and Readiness Probes: These are essential. The HPA controls the number of Pods, but the Service mesh relies on readiness probes to know when a new Pod is ready to receive traffic.
Conclusion
The Horizontal Pod Autoscaler is a foundational tool for running Java applications efficiently in Kubernetes. Moving beyond simple CPU scaling to using custom, application-aware metrics allows Java teams to build systems that are not only resilient to traffic spikes but also highly cost-optimized.
By combining HPA with well-defined resource requests, thoughtful JVM tuning, and application-specific metrics, you can ensure your Java microservices automatically right-size themselves to meet demand, delivering a seamless user experience while controlling cloud costs.
Pyroscope Profiling in Java
Explains how to use Pyroscope for continuous profiling in Java applications, helping developers analyze CPU and memory usage patterns to improve performance and identify bottlenecks.
https://macronepal.com/blog/pyroscope-profiling-in-java/
OpenTelemetry Metrics in Java: Comprehensive Guide
Provides a complete guide to collecting and exporting metrics in Java using OpenTelemetry, including counters, histograms, gauges, and integration with monitoring tools. (MACRO NEPAL)
https://macronepal.com/blog/opentelemetry-metrics-in-java-comprehensive-guide/
OTLP Exporter in Java: Complete Guide for OpenTelemetry
Explains how to configure OTLP exporters in Java to send telemetry data such as traces, metrics, and logs to monitoring systems using HTTP or gRPC protocols. (MACRO NEPAL)
https://macronepal.com/blog/otlp-exporter-in-java-complete-guide-for-opentelemetry/
Thanos Integration in Java: Global View of Metrics
Explains how to integrate Thanos with Java monitoring systems to create a scalable global metrics view across multiple Prometheus instances.
https://macronepal.com/blog/thanos-integration-in-java-global-view-of-metrics
Time Series with InfluxDB in Java: Complete Guide (Version 2)
Explains how to manage time-series data using InfluxDB in Java applications, including storing, querying, and analyzing metrics data.
https://macronepal.com/blog/time-series-with-influxdb-in-java-complete-guide-2
Time Series with InfluxDB in Java: Complete Guide
Provides an overview of integrating InfluxDB with Java for time-series data handling, including monitoring applications and managing performance metrics.
https://macronepal.com/blog/time-series-with-influxdb-in-java-complete-guide
Implementing Prometheus Remote Write in Java (Version 2)
Explains how to configure Java applications to send metrics data to Prometheus-compatible systems using the remote write feature for scalable monitoring.
https://macronepal.com/blog/implementing-prometheus-remote-write-in-java-a-complete-guide-2
Implementing Prometheus Remote Write in Java: Complete Guide
Provides instructions for sending metrics from Java services to Prometheus servers, enabling centralized monitoring and real-time analytics.
https://macronepal.com/blog/implementing-prometheus-remote-write-in-java-a-complete-guide
Building a TileServer GL in Java: Vector and Raster Tile Server
Explains how to build a TileServer GL in Java for serving vector and raster map tiles, useful for geographic visualization and mapping applications.
https://macronepal.com/blog/building-a-tileserver-gl-in-java-vector-and-raster-tile-server
Indoor Mapping in Java
Explains how to create indoor mapping systems in Java, including navigation inside buildings, spatial data handling, and visualization techniques.