The Horizontal Pod Autoscaler (HPA) is a critical Kubernetes component that automatically scales the number of pods in a deployment based on observed CPU utilization or other custom metrics. For Java applications, understanding and implementing HPA is essential for building responsive, cost-effective cloud-native systems.
Understanding Horizontal Pod Autoscaler
HPA automatically adjusts the number of pod replicas in a deployment, replica set, or stateful set based on observed metrics. The key components are:
- Metrics Source: CPU, memory, or custom metrics from various adapters
- Scaling Logic: Algorithm that determines when and how to scale
- Target Value: The desired metric value that HPA tries to maintain
- Scale Limits: Minimum and maximum number of pods
Native HPA vs. Custom Scaling in Java
While Kubernetes provides built-in HPA, Java applications often need custom scaling logic for business metrics, queue depths, or complex performance indicators.
Implementing HPA with Kubernetes Client
1. Project Setup and Dependencies
Maven Dependencies:
<dependencies> <dependency> <groupId>io.fabric8</groupId> <artifactId>kubernetes-client</artifactId> <version>6.8.1</version> </dependency> <dependency> <groupId>io.fabric8</groupId> <artifactId>kubernetes-model</artifactId> <version>6.8.1</version> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> <version>3.1.0</version> </dependency> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-actuator</artifactId> <version>3.1.0</version> </dependency> </dependencies>
2. Creating HPA with Kubernetes Java Client
// HpaManager.java
@Component
public class HpaManager {
private final KubernetesClient kubernetesClient;
public HpaManager(KubernetesClient kubernetesClient) {
this.kubernetesClient = kubernetesClient;
}
public void createCpuBasedHPA(String namespace, String deploymentName,
int minPods, int maxPods, int targetCpuUtilization) {
HorizontalPodAutoscaler hpa = new HorizontalPodAutoscalerBuilder()
.withNewMetadata()
.withName(deploymentName + "-hpa")
.withNamespace(namespace)
.endMetadata()
.withNewSpec()
.withScaleTargetRef(new CrossVersionObjectReferenceBuilder()
.withApiVersion("apps/v1")
.withKind("Deployment")
.withName(deploymentName)
.build())
.withMinReplicas(minPods)
.withMaxReplicas(maxPods)
.addNewMetric()
.withType("Resource")
.withNewResource()
.withName("cpu")
.withNewTarget()
.withType("Utilization")
.withAverageUtilization(targetCpuUtilization)
.endTarget()
.endResource()
.endMetric()
.endSpec()
.build();
kubernetesClient.autoscaling().v2()
.horizontalPodAutoscalers()
.inNamespace(namespace)
.create(hpa);
}
public void createCustomMetricHPA(String namespace, String deploymentName,
String metricName, int targetValue) {
HorizontalPodAutoscaler hpa = new HorizontalPodAutoscalerBuilder()
.withNewMetadata()
.withName(deploymentName + "-custom-hpa")
.withNamespace(namespace)
.endMetadata()
.withNewSpec()
.withScaleTargetRef(new CrossVersionObjectReferenceBuilder()
.withKind("Deployment")
.withName(deploymentName)
.withApiVersion("apps/v1")
.build())
.withMinReplicas(2)
.withMaxReplicas(10)
.addNewMetric()
.withType("Pods")
.withNewPods()
.withMetricName(metricName)
.withTargetAverageValue(targetValue)
.endPods()
.endMetric()
.endSpec()
.build();
kubernetesClient.autoscaling().v2()
.horizontalPodAutoscalers()
.inNamespace(namespace)
.create(hpa);
}
public HorizontalPodAutoscaler getHPAStatus(String namespace, String hpaName) {
return kubernetesClient.autoscaling().v2()
.horizontalPodAutoscalers()
.inNamespace(namespace)
.withName(hpaName)
.get();
}
public void updateHPAScaling(String namespace, String hpaName,
int minReplicas, int maxReplicas) {
kubernetesClient.autoscaling().v2()
.horizontalPodAutoscalers()
.inNamespace(namespace)
.withName(hpaName)
.edit(h -> new HorizontalPodAutoscalerBuilder(h)
.editSpec()
.withMinReplicas(minReplicas)
.withMaxReplicas(maxReplicas)
.endSpec()
.build());
}
}
3. Custom Metrics Exporter for Java Applications
// CustomMetricsExporter.java
@Component
public class CustomMetricsExporter {
private final MeterRegistry meterRegistry;
private final AtomicInteger activeUsers = new AtomicInteger(0);
private final AtomicDouble requestQueueSize = new AtomicDouble(0.0);
public CustomMetricsExporter(MeterRegistry meterRegistry) {
this.meterRegistry = meterRegistry;
initializeCustomMetrics();
}
private void initializeCustomMetrics() {
// Gauge for active users
Gauge.builder("app_active_users")
.description("Number of active users")
.register(meterRegistry, activeUsers, AtomicInteger::get);
// Gauge for request queue size
Gauge.builder("app_request_queue_size")
.description("Size of request processing queue")
.register(meterRegistry, requestQueueSize, AtomicDouble::get);
// Custom business metric - transactions per second
Meter.builder("app_transactions_per_second")
.description("Transactions processed per second")
.unit("transactions")
.register(meterRegistry);
}
public void setActiveUsers(int users) {
activeUsers.set(users);
}
public void setRequestQueueSize(double size) {
requestQueueSize.set(size);
}
public void recordTransaction() {
meterRegistry.counter("app_transactions_total").increment();
}
@EventListener
public void handleRequestEvent(RequestProcessedEvent event) {
// Update metrics based on application events
recordTransaction();
}
}
// Configuration for metrics endpoint
@Configuration
@EnableScheduling
public class MetricsConfig {
@Bean
public CustomMetricsExporter customMetricsExporter(MeterRegistry meterRegistry) {
return new CustomMetricsExporter(meterRegistry);
}
}
4. Custom HPA Controller for Complex Scaling Logic
// CustomHpaController.java
@Component
@EnableScheduling
public class CustomHpaController {
private final KubernetesClient kubernetesClient;
private final ApplicationMetricsService metricsService;
private final HpaManager hpaManager;
public CustomHpaController(KubernetesClient kubernetesClient,
ApplicationMetricsService metricsService,
HpaManager hpaManager) {
this.kubernetesClient = kubernetesClient;
this.metricsService = metricsService;
this.hpaManager = hpaManager;
}
@Scheduled(fixedDelay = 30000) // Run every 30 seconds
public void performCustomScaling() {
List<ApplicationScalingRule> rules = getScalingRules();
for (ApplicationScalingRule rule : rules) {
ScalingDecision decision = evaluateScalingRule(rule);
if (decision.shouldScale()) {
executeScaling(rule.getDeploymentName(), decision);
}
}
}
private ScalingDecision evaluateScalingRule(ApplicationScalingRule rule) {
double currentMetricValue = metricsService.getMetricValue(rule.getMetricName());
int currentReplicas = getCurrentReplicas(rule.getDeploymentName());
if (currentMetricValue > rule.getScaleUpThreshold() &&
currentReplicas < rule.getMaxReplicas()) {
return new ScalingDecision(true, currentReplicas + 1, "Metric above threshold");
}
if (currentMetricValue < rule.getScaleDownThreshold() &&
currentReplicas > rule.getMinReplicas()) {
return new ScalingDecision(true, currentReplicas - 1, "Metric below threshold");
}
return ScalingDecision.noScaling();
}
private void executeScaling(String deploymentName, ScalingDecision decision) {
kubernetesClient.apps().deployments()
.inNamespace("default")
.withName(deploymentName)
.scale(decision.getTargetReplicas());
log.info("Scaled deployment {} to {} replicas. Reason: {}",
deploymentName, decision.getTargetReplicas(), decision.getReason());
}
private int getCurrentReplicas(String deploymentName) {
Deployment deployment = kubernetesClient.apps().deployments()
.inNamespace("default")
.withName(deploymentName)
.get();
return deployment.getSpec().getReplicas();
}
private List<ApplicationScalingRule> getScalingRules() {
// In practice, this could come from a configuration file or database
return List.of(
new ApplicationScalingRule("user-service", "app_active_users",
50, 10, 2, 5),
new ApplicationScalingRule("order-service", "app_transactions_per_second",
100, 20, 3, 10)
);
}
}
// Supporting classes
class ApplicationScalingRule {
private final String deploymentName;
private final String metricName;
private final double scaleUpThreshold;
private final double scaleDownThreshold;
private final int minReplicas;
private final int maxReplicas;
// Constructor, getters...
}
class ScalingDecision {
private final boolean shouldScale;
private final int targetReplicas;
private final String reason;
// Constructor, getters...
public static ScalingDecision noScaling() {
return new ScalingDecision(false, -1, "No scaling required");
}
}
5. HPA Configuration Service
// HpaConfigurationService.java
@Service
public class HpaConfigurationService {
private final KubernetesClient kubernetesClient;
public HpaConfigurationService(KubernetesClient kubernetesClient) {
this.kubernetesClient = kubernetesClient;
}
public void configureHpaForDeployment(HpaConfigRequest request) {
validateHpaConfig(request);
HorizontalPodAutoscaler hpa = buildHpaFromRequest(request);
// Apply the HPA
kubernetesClient.autoscaling().v2()
.horizontalPodAutoscalers()
.inNamespace(request.getNamespace())
.createOrReplace(hpa);
log.info("Configured HPA for deployment {} in namespace {}",
request.getDeploymentName(), request.getNamespace());
}
public HpaStatus getHpaStatus(String namespace, String deploymentName) {
HorizontalPodAutoscaler hpa = kubernetesClient.autoscaling().v2()
.horizontalPodAutoscalers()
.inNamespace(namespace)
.withName(deploymentName + "-hpa")
.get();
if (hpa == null) {
throw new HpaNotFoundException("HPA not found for deployment: " + deploymentName);
}
return extractHpaStatus(hpa);
}
private HorizontalPodAutoscaler buildHpaFromRequest(HpaConfigRequest request) {
HorizontalPodAutoscalerBuilder builder = new HorizontalPodAutoscalerBuilder()
.withNewMetadata()
.withName(request.getDeploymentName() + "-hpa")
.withNamespace(request.getNamespace())
.withLabels(request.getLabels())
.endMetadata()
.withNewSpec()
.withScaleTargetRef(new CrossVersionObjectReferenceBuilder()
.withApiVersion("apps/v1")
.withKind("Deployment")
.withName(request.getDeploymentName())
.build())
.withMinReplicas(request.getMinReplicas())
.withMaxReplicas(request.getMaxReplicas());
// Add metrics
for (MetricConfig metric : request.getMetrics()) {
builder = builder.addNewMetric()
.withType(metric.getType())
.withNewResource()
.withName(metric.getResourceName())
.withNewTarget()
.withType("Utilization")
.withAverageUtilization(metric.getTargetValue())
.endTarget()
.endResource()
.endMetric();
}
return builder.endSpec().build();
}
private void validateHpaConfig(HpaConfigRequest request) {
if (request.getMinReplicas() > request.getMaxReplicas()) {
throw new InvalidHpaConfigException("Min replicas cannot be greater than max replicas");
}
if (request.getMinReplicas() < 1) {
throw new InvalidHpaConfigException("Min replicas must be at least 1");
}
}
}
// DTO classes
class HpaConfigRequest {
private String namespace;
private String deploymentName;
private int minReplicas;
private int maxReplicas;
private Map<String, String> labels;
private List<MetricConfig> metrics;
// Constructors, getters, setters...
}
class MetricConfig {
private String type;
private String resourceName;
private int targetValue;
// Constructors, getters, setters...
}
class HpaStatus {
private int currentReplicas;
private int desiredReplicas;
private List<MetricStatus> metrics;
private String conditions;
// Constructors, getters, setters...
}
6. REST API for HPA Management
// HpaController.java
@RestController
@RequestMapping("/api/v1/hpa")
public class HpaController {
private final HpaConfigurationService hpaService;
private final HpaManager hpaManager;
public HpaController(HpaConfigurationService hpaService, HpaManager hpaManager) {
this.hpaService = hpaService;
this.hpaManager = hpaManager;
}
@PostMapping("/configure")
public ResponseEntity<String> configureHpa(@RequestBody HpaConfigRequest request) {
try {
hpaService.configureHpaForDeployment(request);
return ResponseEntity.ok("HPA configured successfully");
} catch (Exception e) {
return ResponseEntity.badRequest().body("Failed to configure HPA: " + e.getMessage());
}
}
@GetMapping("/status/{namespace}/{deployment}")
public ResponseEntity<HpaStatus> getHpaStatus(
@PathVariable String namespace,
@PathVariable String deployment) {
try {
HpaStatus status = hpaService.getHpaStatus(namespace, deployment);
return ResponseEntity.ok(status);
} catch (HpaNotFoundException e) {
return ResponseEntity.notFound().build();
}
}
@PostMapping("/scale/{namespace}/{deployment}")
public ResponseEntity<String> manualScale(
@PathVariable String namespace,
@PathVariable String deployment,
@RequestParam int replicas) {
try {
hpaManager.updateHPAScaling(namespace, deployment + "-hpa", replicas, replicas);
return ResponseEntity.ok("Manual scaling initiated");
} catch (Exception e) {
return ResponseEntity.badRequest().body("Scaling failed: " + e.getMessage());
}
}
}
Best Practices for HPA with Java Applications
- Proper Resource Requests: Always set CPU and memory requests for accurate scaling
- Warm-up Periods: Account for JVM warm-up time in scaling decisions
- Custom Metrics: Use application-specific metrics for meaningful scaling
- Conservative Scaling: Avoid aggressive scaling that causes thrashing
- Monitoring: Implement comprehensive monitoring of scaling events
# Example HPA configuration apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: java-app-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: java-application minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Pods pods: metric: name: app_requests_per_second target: type: AverageValue averageValue: "100"
Horizontal Pod Autoscaling is a powerful tool for Java applications running in Kubernetes. By combining native HPA capabilities with custom Java-based scaling logic, you can create highly responsive and efficient auto-scaling systems that optimize both performance and cost.