Horizontal Pod Autoscaler in Java: Dynamic Scaling for Kubernetes Applications

The Horizontal Pod Autoscaler (HPA) is a critical Kubernetes component that automatically scales the number of pods in a deployment based on observed CPU utilization or other custom metrics. For Java applications, understanding and implementing HPA is essential for building responsive, cost-effective cloud-native systems.

Table of Contents

Understanding Horizontal Pod Autoscaler

HPA automatically adjusts the number of pod replicas in a deployment, replica set, or stateful set based on observed metrics. The key components are:

Metrics Source: CPU, memory, or custom metrics from various adapters
Scaling Logic: Algorithm that determines when and how to scale
Target Value: The desired metric value that HPA tries to maintain
Scale Limits: Minimum and maximum number of pods

Native HPA vs. Custom Scaling in Java

While Kubernetes provides built-in HPA, Java applications often need custom scaling logic for business metrics, queue depths, or complex performance indicators.

Implementing HPA with Kubernetes Client

1. Project Setup and Dependencies

Maven Dependencies:

<dependencies>
<dependency>
<groupId>io.fabric8</groupId>
<artifactId>kubernetes-client</artifactId>
<version>6.8.1</version>
</dependency>
<dependency>
<groupId>io.fabric8</groupId>
<artifactId>kubernetes-model</artifactId>
<version>6.8.1</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
<version>3.1.0</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
<version>3.1.0</version>
</dependency>
</dependencies>

2. Creating HPA with Kubernetes Java Client

// HpaManager.java
@Component
public class HpaManager {
private final KubernetesClient kubernetesClient;
public HpaManager(KubernetesClient kubernetesClient) {
this.kubernetesClient = kubernetesClient;
}
public void createCpuBasedHPA(String namespace, String deploymentName, 
int minPods, int maxPods, int targetCpuUtilization) {
HorizontalPodAutoscaler hpa = new HorizontalPodAutoscalerBuilder()
.withNewMetadata()
.withName(deploymentName + "-hpa")
.withNamespace(namespace)
.endMetadata()
.withNewSpec()
.withScaleTargetRef(new CrossVersionObjectReferenceBuilder()
.withApiVersion("apps/v1")
.withKind("Deployment")
.withName(deploymentName)
.build())
.withMinReplicas(minPods)
.withMaxReplicas(maxPods)
.addNewMetric()
.withType("Resource")
.withNewResource()
.withName("cpu")
.withNewTarget()
.withType("Utilization")
.withAverageUtilization(targetCpuUtilization)
.endTarget()
.endResource()
.endMetric()
.endSpec()
.build();
kubernetesClient.autoscaling().v2()
.horizontalPodAutoscalers()
.inNamespace(namespace)
.create(hpa);
}
public void createCustomMetricHPA(String namespace, String deploymentName,
String metricName, int targetValue) {
HorizontalPodAutoscaler hpa = new HorizontalPodAutoscalerBuilder()
.withNewMetadata()
.withName(deploymentName + "-custom-hpa")
.withNamespace(namespace)
.endMetadata()
.withNewSpec()
.withScaleTargetRef(new CrossVersionObjectReferenceBuilder()
.withKind("Deployment")
.withName(deploymentName)
.withApiVersion("apps/v1")
.build())
.withMinReplicas(2)
.withMaxReplicas(10)
.addNewMetric()
.withType("Pods")
.withNewPods()
.withMetricName(metricName)
.withTargetAverageValue(targetValue)
.endPods()
.endMetric()
.endSpec()
.build();
kubernetesClient.autoscaling().v2()
.horizontalPodAutoscalers()
.inNamespace(namespace)
.create(hpa);
}
public HorizontalPodAutoscaler getHPAStatus(String namespace, String hpaName) {
return kubernetesClient.autoscaling().v2()
.horizontalPodAutoscalers()
.inNamespace(namespace)
.withName(hpaName)
.get();
}
public void updateHPAScaling(String namespace, String hpaName, 
int minReplicas, int maxReplicas) {
kubernetesClient.autoscaling().v2()
.horizontalPodAutoscalers()
.inNamespace(namespace)
.withName(hpaName)
.edit(h -> new HorizontalPodAutoscalerBuilder(h)
.editSpec()
.withMinReplicas(minReplicas)
.withMaxReplicas(maxReplicas)
.endSpec()
.build());
}
}

3. Custom Metrics Exporter for Java Applications

// CustomMetricsExporter.java
@Component
public class CustomMetricsExporter {
private final MeterRegistry meterRegistry;
private final AtomicInteger activeUsers = new AtomicInteger(0);
private final AtomicDouble requestQueueSize = new AtomicDouble(0.0);
public CustomMetricsExporter(MeterRegistry meterRegistry) {
this.meterRegistry = meterRegistry;
initializeCustomMetrics();
}
private void initializeCustomMetrics() {
// Gauge for active users
Gauge.builder("app_active_users")
.description("Number of active users")
.register(meterRegistry, activeUsers, AtomicInteger::get);
// Gauge for request queue size
Gauge.builder("app_request_queue_size")
.description("Size of request processing queue")
.register(meterRegistry, requestQueueSize, AtomicDouble::get);
// Custom business metric - transactions per second
Meter.builder("app_transactions_per_second")
.description("Transactions processed per second")
.unit("transactions")
.register(meterRegistry);
}
public void setActiveUsers(int users) {
activeUsers.set(users);
}
public void setRequestQueueSize(double size) {
requestQueueSize.set(size);
}
public void recordTransaction() {
meterRegistry.counter("app_transactions_total").increment();
}
@EventListener
public void handleRequestEvent(RequestProcessedEvent event) {
// Update metrics based on application events
recordTransaction();
}
}
// Configuration for metrics endpoint
@Configuration
@EnableScheduling
public class MetricsConfig {
@Bean
public CustomMetricsExporter customMetricsExporter(MeterRegistry meterRegistry) {
return new CustomMetricsExporter(meterRegistry);
}
}

4. Custom HPA Controller for Complex Scaling Logic

// CustomHpaController.java
@Component
@EnableScheduling
public class CustomHpaController {
private final KubernetesClient kubernetesClient;
private final ApplicationMetricsService metricsService;
private final HpaManager hpaManager;
public CustomHpaController(KubernetesClient kubernetesClient,
ApplicationMetricsService metricsService,
HpaManager hpaManager) {
this.kubernetesClient = kubernetesClient;
this.metricsService = metricsService;
this.hpaManager = hpaManager;
}
@Scheduled(fixedDelay = 30000) // Run every 30 seconds
public void performCustomScaling() {
List<ApplicationScalingRule> rules = getScalingRules();
for (ApplicationScalingRule rule : rules) {
ScalingDecision decision = evaluateScalingRule(rule);
if (decision.shouldScale()) {
executeScaling(rule.getDeploymentName(), decision);
}
}
}
private ScalingDecision evaluateScalingRule(ApplicationScalingRule rule) {
double currentMetricValue = metricsService.getMetricValue(rule.getMetricName());
int currentReplicas = getCurrentReplicas(rule.getDeploymentName());
if (currentMetricValue > rule.getScaleUpThreshold() && 
currentReplicas < rule.getMaxReplicas()) {
return new ScalingDecision(true, currentReplicas + 1, "Metric above threshold");
}
if (currentMetricValue < rule.getScaleDownThreshold() && 
currentReplicas > rule.getMinReplicas()) {
return new ScalingDecision(true, currentReplicas - 1, "Metric below threshold");
}
return ScalingDecision.noScaling();
}
private void executeScaling(String deploymentName, ScalingDecision decision) {
kubernetesClient.apps().deployments()
.inNamespace("default")
.withName(deploymentName)
.scale(decision.getTargetReplicas());
log.info("Scaled deployment {} to {} replicas. Reason: {}", 
deploymentName, decision.getTargetReplicas(), decision.getReason());
}
private int getCurrentReplicas(String deploymentName) {
Deployment deployment = kubernetesClient.apps().deployments()
.inNamespace("default")
.withName(deploymentName)
.get();
return deployment.getSpec().getReplicas();
}
private List<ApplicationScalingRule> getScalingRules() {
// In practice, this could come from a configuration file or database
return List.of(
new ApplicationScalingRule("user-service", "app_active_users", 
50, 10, 2, 5),
new ApplicationScalingRule("order-service", "app_transactions_per_second", 
100, 20, 3, 10)
);
}
}
// Supporting classes
class ApplicationScalingRule {
private final String deploymentName;
private final String metricName;
private final double scaleUpThreshold;
private final double scaleDownThreshold;
private final int minReplicas;
private final int maxReplicas;
// Constructor, getters...
}
class ScalingDecision {
private final boolean shouldScale;
private final int targetReplicas;
private final String reason;
// Constructor, getters...
public static ScalingDecision noScaling() {
return new ScalingDecision(false, -1, "No scaling required");
}
}

5. HPA Configuration Service

// HpaConfigurationService.java
@Service
public class HpaConfigurationService {
private final KubernetesClient kubernetesClient;
public HpaConfigurationService(KubernetesClient kubernetesClient) {
this.kubernetesClient = kubernetesClient;
}
public void configureHpaForDeployment(HpaConfigRequest request) {
validateHpaConfig(request);
HorizontalPodAutoscaler hpa = buildHpaFromRequest(request);
// Apply the HPA
kubernetesClient.autoscaling().v2()
.horizontalPodAutoscalers()
.inNamespace(request.getNamespace())
.createOrReplace(hpa);
log.info("Configured HPA for deployment {} in namespace {}", 
request.getDeploymentName(), request.getNamespace());
}
public HpaStatus getHpaStatus(String namespace, String deploymentName) {
HorizontalPodAutoscaler hpa = kubernetesClient.autoscaling().v2()
.horizontalPodAutoscalers()
.inNamespace(namespace)
.withName(deploymentName + "-hpa")
.get();
if (hpa == null) {
throw new HpaNotFoundException("HPA not found for deployment: " + deploymentName);
}
return extractHpaStatus(hpa);
}
private HorizontalPodAutoscaler buildHpaFromRequest(HpaConfigRequest request) {
HorizontalPodAutoscalerBuilder builder = new HorizontalPodAutoscalerBuilder()
.withNewMetadata()
.withName(request.getDeploymentName() + "-hpa")
.withNamespace(request.getNamespace())
.withLabels(request.getLabels())
.endMetadata()
.withNewSpec()
.withScaleTargetRef(new CrossVersionObjectReferenceBuilder()
.withApiVersion("apps/v1")
.withKind("Deployment")
.withName(request.getDeploymentName())
.build())
.withMinReplicas(request.getMinReplicas())
.withMaxReplicas(request.getMaxReplicas());
// Add metrics
for (MetricConfig metric : request.getMetrics()) {
builder = builder.addNewMetric()
.withType(metric.getType())
.withNewResource()
.withName(metric.getResourceName())
.withNewTarget()
.withType("Utilization")
.withAverageUtilization(metric.getTargetValue())
.endTarget()
.endResource()
.endMetric();
}
return builder.endSpec().build();
}
private void validateHpaConfig(HpaConfigRequest request) {
if (request.getMinReplicas() > request.getMaxReplicas()) {
throw new InvalidHpaConfigException("Min replicas cannot be greater than max replicas");
}
if (request.getMinReplicas() < 1) {
throw new InvalidHpaConfigException("Min replicas must be at least 1");
}
}
}
// DTO classes
class HpaConfigRequest {
private String namespace;
private String deploymentName;
private int minReplicas;
private int maxReplicas;
private Map<String, String> labels;
private List<MetricConfig> metrics;
// Constructors, getters, setters...
}
class MetricConfig {
private String type;
private String resourceName;
private int targetValue;
// Constructors, getters, setters...
}
class HpaStatus {
private int currentReplicas;
private int desiredReplicas;
private List<MetricStatus> metrics;
private String conditions;
// Constructors, getters, setters...
}

6. REST API for HPA Management

// HpaController.java
@RestController
@RequestMapping("/api/v1/hpa")
public class HpaController {
private final HpaConfigurationService hpaService;
private final HpaManager hpaManager;
public HpaController(HpaConfigurationService hpaService, HpaManager hpaManager) {
this.hpaService = hpaService;
this.hpaManager = hpaManager;
}
@PostMapping("/configure")
public ResponseEntity<String> configureHpa(@RequestBody HpaConfigRequest request) {
try {
hpaService.configureHpaForDeployment(request);
return ResponseEntity.ok("HPA configured successfully");
} catch (Exception e) {
return ResponseEntity.badRequest().body("Failed to configure HPA: " + e.getMessage());
}
}
@GetMapping("/status/{namespace}/{deployment}")
public ResponseEntity<HpaStatus> getHpaStatus(
@PathVariable String namespace,
@PathVariable String deployment) {
try {
HpaStatus status = hpaService.getHpaStatus(namespace, deployment);
return ResponseEntity.ok(status);
} catch (HpaNotFoundException e) {
return ResponseEntity.notFound().build();
}
}
@PostMapping("/scale/{namespace}/{deployment}")
public ResponseEntity<String> manualScale(
@PathVariable String namespace,
@PathVariable String deployment,
@RequestParam int replicas) {
try {
hpaManager.updateHPAScaling(namespace, deployment + "-hpa", replicas, replicas);
return ResponseEntity.ok("Manual scaling initiated");
} catch (Exception e) {
return ResponseEntity.badRequest().body("Scaling failed: " + e.getMessage());
}
}
}

Best Practices for HPA with Java Applications

Proper Resource Requests: Always set CPU and memory requests for accurate scaling
Warm-up Periods: Account for JVM warm-up time in scaling decisions
Custom Metrics: Use application-specific metrics for meaningful scaling
Conservative Scaling: Avoid aggressive scaling that causes thrashing
Monitoring: Implement comprehensive monitoring of scaling events

# Example HPA configuration
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: java-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: java-application
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Pods
pods:
metric:
name: app_requests_per_second
target:
type: AverageValue
averageValue: "100"

Horizontal Pod Autoscaling is a powerful tool for Java applications running in Kubernetes. By combining native HPA capabilities with custom Java-based scaling logic, you can create highly responsive and efficient auto-scaling systems that optimize both performance and cost.