Introduction
Running Java applications in Kubernetes introduces new complexities for monitoring. Traditional VM-based monitoring approaches fall short in dynamic containerized environments where pods are ephemeral and resources are shared. This guide covers the essential patterns, tools, and best practices for effectively monitoring your Java applications in Kubernetes, ensuring you can maintain performance, debug issues, and meet SLOs in production.
Article: Mastering Java Application Monitoring in Kubernetes Environments
Monitoring Java applications in Kubernetes requires a multi-layered approach that combines application-level metrics, JVM insights, container resources, and Kubernetes cluster state. Let's explore the complete monitoring stack.
The Four Pillars of Kubernetes Monitoring for Java
- Application Metrics - Business logic, custom metrics
- JVM Metrics - Memory, GC, threads, classloading
- Container Metrics - CPU, memory, disk I/O
- Kubernetes Metrics - Pod status, replicas, services
1. Instrumenting Your Java Application
Using Micrometer for Application Metrics
Micrometer is the de facto standard for Java application metrics in Kubernetes. It provides a vendor-neutral interface that can export to various monitoring systems.
Maven Dependencies:
<dependency> <groupId>io.micrometer</groupId> <artifactId>micrometer-core</artifactId> <version>1.11.5</version> </dependency> <dependency> <groupId>io.micrometer</groupId> <artifactId>micrometer-registry-prometheus</artifactId> <version>1.11.5</version> </dependency>
Spring Boot Configuration:
# application.yaml management: endpoints: web: exposure: include: health,info,metrics,prometheus endpoint: health: show-details: always metrics: enabled: true prometheus: enabled: true metrics: export: prometheus: enabled: true distribution: percentiles-histogram: http.server.requests: true
Custom Business Metrics:
@Service
public class OrderService {
private final MeterRegistry meterRegistry;
private final Counter orderCreationCounter;
private final Timer orderProcessingTimer;
private final Gauge activeOrdersGauge;
private final AtomicInteger activeOrders = new AtomicInteger(0);
public OrderService(MeterRegistry meterRegistry) {
this.meterRegistry = meterRegistry;
// Counter for tracking order creation
this.orderCreationCounter = Counter.builder("orders.created")
.description("Total number of orders created")
.tag("service", "order-service")
.register(meterRegistry);
// Timer for tracking order processing duration
this.orderProcessingTimer = Timer.builder("orders.processing.time")
.description("Time taken to process orders")
.register(meterRegistry);
// Gauge for active orders
this.activeOrdersGauge = Gauge.builder("orders.active")
.description("Number of active orders being processed")
.register(meterRegistry, activeOrders);
}
public Order createOrder(OrderRequest request) {
activeOrders.incrementAndGet();
return orderProcessingTimer.record(() -> {
try {
// Business logic here
Order order = processOrder(request);
orderCreationCounter.increment();
return order;
} finally {
activeOrders.decrementAndGet();
}
});
}
@Timed(value = "orders.special.process", description = "Time to process special orders")
public Order processSpecialOrder(OrderRequest request) {
// Method automatically timed by @Timed annotation
return processOrder(request);
}
}
2. JVM Monitoring Essentials
JVM Micrometer Configuration:
@Configuration
public class JvmMonitoringConfig {
@Bean
MeterRegistryCustomizer<MeterRegistry> metricsCommonTags() {
return registry -> registry.config().commonTags(
"application", "order-service",
"namespace", System.getenv().getOrDefault("NAMESPACE", "default"),
"pod", System.getenv().getOrDefault("HOSTNAME", "unknown")
);
}
}
// JVM metrics are automatically exposed when using Spring Boot Actuator
// including memory, GC, threads, class loading, etc.
Critical JVM Metrics to Monitor:
jvm_memory_used_bytes- Heap and non-heap memory usagejvm_gc_pause_seconds- GC duration and frequencyjvm_threads_live- Live thread countjvm_classes_loaded- Class loading statisticsprocess_cpu_usage- CPU utilization
3. Kubernetes Deployment with Monitoring
Deployment with Proper Labels and Probes:
apiVersion: apps/v1 kind: Deployment metadata: name: order-service labels: app: order-service version: v1 spec: replicas: 3 selector: matchLabels: app: order-service template: metadata: labels: app: order-service version: v1 annotations: prometheus.io/scrape: "true" prometheus.io/port: "8080" prometheus.io/path: "/actuator/prometheus" spec: containers: - name: order-service image: my-registry/order-service:1.0.0 ports: - containerPort: 8080 env: - name: JAVA_OPTS value: "-XX:+UseG1GC -Xmx512m -Xms512m -XX:MaxRAM=700m" - name: MANAGEMENT_ENDPOINTS_WEB_EXPOSURE_INCLUDE value: "health,metrics,prometheus,info" resources: requests: memory: "768Mi" cpu: "250m" limits: memory: "1024Mi" cpu: "500m" livenessProbe: httpGet: path: /actuator/health/liveness port: 8080 initialDelaySeconds: 60 periodSeconds: 10 timeoutSeconds: 5 readinessProbe: httpGet: path: /actuator/health/readiness port: 8080 initialDelaySeconds: 30 periodSeconds: 5 timeoutSeconds: 3 startupProbe: httpGet: path: /actuator/health/readiness port: 8080 failureThreshold: 30 periodSeconds: 10
4. Custom Health Indicators
Spring Boot Health Indicators:
@Component
public class DatabaseHealthIndicator implements HealthIndicator {
private final DataSource dataSource;
public DatabaseHealthIndicator(DataSource dataSource) {
this.dataSource = dataSource;
}
@Override
public Health health() {
try (Connection conn = dataSource.getConnection()) {
if (conn.isValid(1000)) {
return Health.up()
.withDetail("database", "Available")
.withDetail("validationQuery", "SUCCESS")
.build();
}
} catch (Exception e) {
return Health.down(e)
.withDetail("database", "Unavailable")
.build();
}
return Health.unknown().build();
}
}
@Component
public class ExternalServiceHealthIndicator implements HealthIndicator {
private final RestTemplate restTemplate;
public ExternalServiceHealthIndicator(RestTemplate restTemplate) {
this.restTemplate = restTemplate;
}
@Override
public Health health() {
try {
ResponseEntity<String> response = restTemplate.getForEntity(
"http://payment-service/actuator/health", String.class);
if (response.getStatusCode().is2xxSuccessful()) {
return Health.up()
.withDetail("payment-service", "Available")
.build();
} else {
return Health.down()
.withDetail("payment-service", "Unhealthy")
.withDetail("statusCode", response.getStatusCodeValue())
.build();
}
} catch (Exception e) {
return Health.down(e)
.withDetail("payment-service", "Unreachable")
.build();
}
}
}
5. Distributed Tracing with Jaeger
Dependencies:
<dependency> <groupId>io.opentelemetry</groupId> <artifactId>opentelemetry-api</artifactId> </dependency> <dependency> <groupId>io.micrometer</groupId> <artifactId>micrometer-tracing-bridge-otel</artifactId> </dependency>
Configuration:
management: tracing: sampling: probability: 1.0 otlp: tracing: endpoint: http://jaeger-collector:4317
6. Logging Configuration for Kubernetes
Structured Logging with Logback:
<!-- logback-spring.xml -->
<configuration>
<springProperty scope="context" name="appName" source="spring.application.name" defaultValue="java-app"/>
<springProperty scope="context" name="namespace" source="kubernetes.namespace" defaultValue="default"/>
<appender name="JSON" class="ch.qos.logback.core.ConsoleAppender">
<encoder class="net.logstash.logback.encoder.LogstashEncoder">
<customFields>{"app":"${appName}","namespace":"${namespace}","pod":"${HOSTNAME}"}</customFields>
</encoder>
</appender>
<root level="INFO">
<appender-ref ref="JSON" />
</root>
</configuration>
Java Logging with MDC:
@RestController
public class OrderController {
private static final Logger logger = LoggerFactory.getLogger(OrderController.class);
@PostMapping("/orders")
public ResponseEntity<Order> createOrder(@RequestBody OrderRequest request) {
// Add contextual information for tracing
MDC.put("orderId", request.getOrderId());
MDC.put("customerId", request.getCustomerId());
logger.info("Creating new order",
Map.of("items", request.getItems().size(),
"total", request.getTotalAmount()));
try {
Order order = orderService.createOrder(request);
logger.info("Order created successfully");
return ResponseEntity.ok(order);
} catch (Exception e) {
logger.error("Failed to create order", e);
throw e;
} finally {
MDC.clear();
}
}
}
7. Prometheus Queries for Java in Kubernetes
Critical Alerting Rules:
# prometheus-rules.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: java-app-alerts
labels:
prometheus: k8s
role: alert-rules
spec:
groups:
- name: java-app
rules:
- alert: HighJVMMemoryUsage
expr: (sum by (pod) (container_memory_usage_bytes{pod=~".*", container="order-service"})) / (sum by (pod) (kube_pod_container_resource_limits{pod=~".*", container="order-service", resource="memory"})) > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "High JVM Memory Usage on {{ $labels.pod }}"
description: "JVM memory usage is above 80% for pod {{ $labels.pod }}"
- alert: GCPScavengeDurationHigh
expr: rate(jvm_gc_pause_seconds_sum{gc="G1 Young Generation"}[5m]) > 0.1
for: 2m
labels:
severity: warning
annotations:
summary: "High GC Pressure on {{ $labels.pod }}"
description: "Garbage collection is taking significant time on {{ $labels.pod }}"
- alert: ApplicationErrorRateHigh
expr: rate(http_server_requests_seconds_count{status=~"5.."}[5m]) / rate(http_server_requests_seconds_count[5m]) > 0.05
for: 3m
labels:
severity: critical
annotations:
summary: "High Error Rate on {{ $labels.pod }}"
description: "5xx error rate is above 5% for pod {{ $labels.pod }}"
- alert: PodRestartFrequently
expr: rate(kube_pod_container_status_restarts_total[1h]) > 0
for: 0m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.pod }} is restarting frequently"
8. Grafana Dashboards for Java/Kubernetes
Key Dashboard Panels:
- Application Performance: Request rate, error rate, latency percentiles
- JVM Metrics: Heap usage, GC duration, thread states
- Container Resources: CPU, memory, network I/O
- Business Metrics: Orders processed, active users, cache hit rates
9. Kubernetes HPA with Custom Metrics
Horizontal Pod Autoscaler:
apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: order-service-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: order-service minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Pods pods: metric: name: orders_processing_time_seconds target: type: AverageValue averageValue: "500m"
Best Practices Summary
- Use Structured Logging - JSON format with proper context
- Export JVM Metrics - Enable comprehensive JVM monitoring
- Configure Proper Resource Limits - Prevent resource starvation
- Implement Meaningful Health Checks - Liveness, readiness, and startup probes
- Use Distributed Tracing - For microservices architectures
- Set Up Meaningful Alerts - Focus on symptoms, not causes
- Monitor Business Metrics - Connect technical metrics to business value
- Use Sidecar Pattern - For log aggregation when needed
- Label Everything - Consistent labels across metrics and logs
- Test Your Monitoring - Regularly verify alerts and dashboards
Conclusion
Monitoring Java applications in Kubernetes requires a holistic approach that combines application insights, JVM metrics, container resources, and Kubernetes cluster state. By implementing the patterns and tools discussed here—Micrometer for metrics, structured logging, proper health checks, and comprehensive alerting—you can gain deep visibility into your Java applications' behavior and performance in Kubernetes.
The key is to start with the fundamentals (application metrics and health checks) and gradually add more sophisticated monitoring (distributed tracing, custom business metrics) as your needs evolve. This layered approach ensures you can effectively troubleshoot issues, optimize performance, and deliver reliable Java applications in your Kubernetes environment.
Call to Action: Start by instrumenting one of your Java applications with Micrometer today. Export basic JVM metrics and set up a simple Grafana dashboard. What performance insights will you uncover about your application's behavior in Kubernetes?
Pyroscope Profiling in Java
Explains how to use Pyroscope for continuous profiling in Java applications, helping developers analyze CPU and memory usage patterns to improve performance and identify bottlenecks.
https://macronepal.com/blog/pyroscope-profiling-in-java/
OpenTelemetry Metrics in Java: Comprehensive Guide
Provides a complete guide to collecting and exporting metrics in Java using OpenTelemetry, including counters, histograms, gauges, and integration with monitoring tools. (MACRO NEPAL)
https://macronepal.com/blog/opentelemetry-metrics-in-java-comprehensive-guide/
OTLP Exporter in Java: Complete Guide for OpenTelemetry
Explains how to configure OTLP exporters in Java to send telemetry data such as traces, metrics, and logs to monitoring systems using HTTP or gRPC protocols. (MACRO NEPAL)
https://macronepal.com/blog/otlp-exporter-in-java-complete-guide-for-opentelemetry/
Thanos Integration in Java: Global View of Metrics
Explains how to integrate Thanos with Java monitoring systems to create a scalable global metrics view across multiple Prometheus instances.
https://macronepal.com/blog/thanos-integration-in-java-global-view-of-metrics
Time Series with InfluxDB in Java: Complete Guide (Version 2)
Explains how to manage time-series data using InfluxDB in Java applications, including storing, querying, and analyzing metrics data.
https://macronepal.com/blog/time-series-with-influxdb-in-java-complete-guide-2
Time Series with InfluxDB in Java: Complete Guide
Provides an overview of integrating InfluxDB with Java for time-series data handling, including monitoring applications and managing performance metrics.
https://macronepal.com/blog/time-series-with-influxdb-in-java-complete-guide
Implementing Prometheus Remote Write in Java (Version 2)
Explains how to configure Java applications to send metrics data to Prometheus-compatible systems using the remote write feature for scalable monitoring.
https://macronepal.com/blog/implementing-prometheus-remote-write-in-java-a-complete-guide-2
Implementing Prometheus Remote Write in Java: Complete Guide
Provides instructions for sending metrics from Java services to Prometheus servers, enabling centralized monitoring and real-time analytics.
https://macronepal.com/blog/implementing-prometheus-remote-write-in-java-a-complete-guide
Building a TileServer GL in Java: Vector and Raster Tile Server
Explains how to build a TileServer GL in Java for serving vector and raster map tiles, useful for geographic visualization and mapping applications.
https://macronepal.com/blog/building-a-tileserver-gl-in-java-vector-and-raster-tile-server
Indoor Mapping in Java
Explains how to create indoor mapping systems in Java, including navigation inside buildings, spatial data handling, and visualization techniques.