Service Level Indicators (SLIs) in Java: Measuring Service Reliability

Service Level Indicators (SLIs) are quantitative measures of a service's behavior that directly impact user experience. They form the foundation for Service Level Objectives (SLOs) and Service Level Agreements (SLAs). Implementing SLIs in Java applications is crucial for measuring and maintaining service reliability.

Core SLI Concepts

Key SLI Categories

  • Availability: Uptime and error rates
  • Latency: Response time percentiles
  • Throughput: Request volume and capacity
  • Quality: Data accuracy and business correctness

SLI Implementation Framework

Dependencies

<!-- Micrometer for metrics -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-core</artifactId>
<version>1.11.0</version>
</dependency>
<!-- For Prometheus integration -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
<version>1.11.0</version>
</dependency>
<!-- For custom SLI calculations -->
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-math3</artifactId>
<version>3.6.1</version>
</dependency>

Core SLI Implementation

Example 1: SLI Framework and Base Classes

// SLI Type Enum
public enum SLIType {
AVAILABILITY,
LATENCY,
THROUGHPUT,
QUALITY,
FRESHNESS,
CORRECTNESS,
COVERAGE
}
// SLI Definition
@Data
public class SLIDefinition {
private final String name;
private final String description;
private final SLIType type;
private final String measurementUnit;
private final Map<String, String> labels;
private final Duration windowSize;
private final Duration evaluationInterval;
public SLIDefinition(String name, String description, SLIType type, 
String measurementUnit, Duration windowSize) {
this.name = name;
this.description = description;
this.type = type;
this.measurementUnit = measurementUnit;
this.windowSize = windowSize;
this.evaluationInterval = Duration.ofMinutes(1); // Default
this.labels = new HashMap<>();
}
public SLIDefinition withLabel(String key, String value) {
this.labels.put(key, value);
return this;
}
public SLIDefinition withEvaluationInterval(Duration interval) {
this.evaluationInterval = interval;
return this;
}
}
// SLI Measurement Result
@Data
public class SLIMeasurement {
private final SLIDefinition definition;
private final double value;
private final Instant timestamp;
private final Map<String, String> context;
private final String unit;
private final boolean success;
private String errorMessage;
public SLIMeasurement(SLIDefinition definition, double value, Instant timestamp) {
this.definition = definition;
this.value = value;
this.timestamp = timestamp;
this.context = new HashMap<>();
this.unit = definition.getMeasurementUnit();
this.success = true;
}
public SLIMeasurement(SLIDefinition definition, String errorMessage, Instant timestamp) {
this.definition = definition;
this.value = Double.NaN;
this.timestamp = timestamp;
this.context = new HashMap<>();
this.unit = definition.getMeasurementUnit();
this.success = false;
this.errorMessage = errorMessage;
}
public SLIMeasurement withContext(String key, String value) {
this.context.put(key, value);
return this;
}
}
// Base SLI Calculator Interface
public interface SLICalculator {
String getName();
SLIType getType();
SLIDefinition getDefinition();
SLIMeasurement calculate(Instant timestamp);
boolean supportsWindow(Duration windowSize);
void recordDataPoint(double value, Map<String, String> labels);
void recordDataPoint(double value, Map<String, String> labels, Instant timestamp);
}
// Abstract Base Calculator
public abstract class AbstractSLICalculator implements SLICalculator {
protected final SLIDefinition definition;
protected final MeterRegistry meterRegistry;
protected final Clock clock;
protected AbstractSLICalculator(SLIDefinition definition, MeterRegistry meterRegistry) {
this.definition = definition;
this.meterRegistry = meterRegistry;
this.clock = meterRegistry.config().clock();
}
@Override
public String getName() {
return definition.getName();
}
@Override
public SLIType getType() {
return definition.getType();
}
@Override
public SLIDefinition getDefinition() {
return definition;
}
protected Counter createCounter(String name, Map<String, String> tags) {
return Counter.builder(name)
.tags(tags)
.register(meterRegistry);
}
protected Timer createTimer(String name, Map<String, String> tags) {
return Timer.builder(name)
.tags(tags)
.register(meterRegistry);
}
protected DistributionSummary createDistributionSummary(String name, Map<String, String> tags) {
return DistributionSummary.builder(name)
.tags(tags)
.register(meterRegistry);
}
protected Gauge createGauge(String name, Map<String, String> tags, ToDoubleFunction<?> function) {
return Gauge.builder(name)
.tags(tags)
.register(meterRegistry, function);
}
}

Example 2: Availability SLI Calculator

@Component
@Slf4j
public class AvailabilitySLICalculator extends AbstractSLICalculator {
private final Counter totalRequests;
private final Counter successfulRequests;
private final Counter failedRequests;
private final SlidingTimeWindowReservoir reservoir;
public AvailabilitySLICalculator(MeterRegistry meterRegistry) {
super(createAvailabilityDefinition(), meterRegistry);
Map<String, String> baseTags = Map.of(
"sli_name", definition.getName(),
"sli_type", definition.getType().name()
);
this.totalRequests = createCounter("sli_requests_total", baseTags);
this.successfulRequests = createCounter("sli_requests_successful", baseTags);
this.failedRequests = createCounter("sli_requests_failed", baseTags);
// 10-minute sliding window for availability calculation
this.reservoir = new SlidingTimeWindowReservoir(
definition.getWindowSize().toMillis(), 
TimeUnit.MILLISECONDS
);
}
private static SLIDefinition createAvailabilityDefinition() {
return new SLIDefinition(
"http_availability",
"HTTP request availability percentage",
SLIType.AVAILABILITY,
"percent",
Duration.ofMinutes(10)
);
}
@Override
public SLIMeasurement calculate(Instant timestamp) {
try {
long windowStart = timestamp.minus(definition.getWindowSize()).toEpochMilli();
long windowEnd = timestamp.toEpochMilli();
// Get data from reservoir for the window
List<RequestDataPoint> dataPoints = reservoir.getDataPoints(windowStart, windowEnd);
if (dataPoints.isEmpty()) {
return new SLIMeasurement(definition, "No data available for calculation", timestamp);
}
long total = dataPoints.size();
long successful = dataPoints.stream().filter(RequestDataPoint::isSuccess).count();
double availability = (double) successful / total * 100.0;
SLIMeasurement measurement = new SLIMeasurement(definition, availability, timestamp);
measurement.withContext("total_requests", String.valueOf(total));
measurement.withContext("successful_requests", String.valueOf(successful));
measurement.withContext("failed_requests", String.valueOf(total - successful));
measurement.withContext("window_size", definition.getWindowSize().toString());
log.debug("Calculated availability: {}% for window ending at {}", 
availability, timestamp);
return measurement;
} catch (Exception e) {
log.error("Failed to calculate availability SLI", e);
return new SLIMeasurement(definition, "Calculation error: " + e.getMessage(), timestamp);
}
}
@Override
public boolean supportsWindow(Duration windowSize) {
return windowSize.compareTo(Duration.ofMinutes(1)) >= 0 && 
windowSize.compareTo(Duration.ofHours(24)) <= 0;
}
@Override
public void recordDataPoint(double value, Map<String, String> labels) {
recordDataPoint(value, labels, Instant.now(clock));
}
@Override
public void recordDataPoint(double value, Map<String, String> labels, Instant timestamp) {
boolean success = value >= 0; // Negative values indicate failure
RequestDataPoint dataPoint = new RequestDataPoint(timestamp.toEpochMilli(), success, labels);
reservoir.add(dataPoint);
totalRequests.increment();
if (success) {
successfulRequests.increment();
} else {
failedRequests.increment();
}
log.trace("Recorded availability data point: success={}, labels={}", success, labels);
}
public void recordHttpRequest(boolean success, String method, String endpoint, int statusCode) {
Map<String, String> labels = Map.of(
"method", method,
"endpoint", endpoint,
"status_code", String.valueOf(statusCode)
);
recordDataPoint(success ? 1.0 : -1.0, labels);
}
// Data point for request tracking
@Data
private static class RequestDataPoint {
private final long timestamp;
private final boolean success;
private final Map<String, String> labels;
}
// Sliding time window reservoir
private static class SlidingTimeWindowReservoir {
private final long windowMillis;
private final ConcurrentSkipListMap<Long, RequestDataPoint> dataPoints;
public SlidingTimeWindowReservoir(long windowMillis, TimeUnit unit) {
this.windowMillis = unit.toMillis(windowMillis);
this.dataPoints = new ConcurrentSkipListMap<>();
}
public void add(RequestDataPoint dataPoint) {
dataPoints.put(dataPoint.getTimestamp(), dataPoint);
cleanupOldDataPoints(dataPoint.getTimestamp());
}
public List<RequestDataPoint> getDataPoints(long startTime, long endTime) {
cleanupOldDataPoints(endTime);
return new ArrayList<>(dataPoints.subMap(startTime, true, endTime, true).values());
}
private void cleanupOldDataPoints(long currentTime) {
long cutoffTime = currentTime - windowMillis;
dataPoints.headMap(cutoffTime).clear();
}
}
}

Example 3: Latency SLI Calculator

@Component
@Slf4j
public class LatencySLICalculator extends AbstractSLICalculator {
private final Timer requestTimer;
private final SlidingTimeWindowReservoir reservoir;
private final List<Double> targetPercentiles = List.of(50.0, 95.0, 99.0, 99.9);
public LatencySLICalculator(MeterRegistry meterRegistry) {
super(createLatencyDefinition(), meterRegistry);
Map<String, String> baseTags = Map.of(
"sli_name", definition.getName(),
"sli_type", definition.getType().name()
);
this.requestTimer = createTimer("sli_request_latency", baseTags);
this.reservoir = new SlidingTimeWindowReservoir(
definition.getWindowSize().toMillis(), 
TimeUnit.MILLISECONDS
);
}
private static SLIDefinition createLatencyDefinition() {
return new SLIDefinition(
"http_latency",
"HTTP request latency percentiles",
SLIType.LATENCY,
"milliseconds",
Duration.ofMinutes(5)
);
}
@Override
public SLIMeasurement calculate(Instant timestamp) {
try {
long windowStart = timestamp.minus(definition.getWindowSize()).toEpochMilli();
long windowEnd = timestamp.toEpochMilli();
List<LatencyDataPoint> dataPoints = reservoir.getDataPoints(windowStart, windowEnd);
if (dataPoints.isEmpty()) {
return new SLIMeasurement(definition, "No data available for calculation", timestamp);
}
// Calculate percentiles
DescriptiveStatistics stats = new DescriptiveStatistics();
dataPoints.forEach(point -> stats.addValue(point.getLatencyMs()));
Map<String, Double> percentiles = calculatePercentiles(stats, targetPercentiles);
double p99 = percentiles.getOrDefault("p99", Double.NaN);
SLIMeasurement measurement = new SLIMeasurement(definition, p99, timestamp);
measurement.withContext("data_points", String.valueOf(dataPoints.size()));
measurement.withContext("window_size", definition.getWindowSize().toString());
// Add all percentiles to context
percentiles.forEach((key, value) -> 
measurement.withContext(key, String.format("%.2f", value)));
measurement.withContext("mean", String.format("%.2f", stats.getMean()));
measurement.withContext("min", String.format("%.2f", stats.getMin()));
measurement.withContext("max", String.format("%.2f", stats.getMax()));
log.debug("Calculated latency SLI - P99: {}ms, data points: {}", p99, dataPoints.size());
return measurement;
} catch (Exception e) {
log.error("Failed to calculate latency SLI", e);
return new SLIMeasurement(definition, "Calculation error: " + e.getMessage(), timestamp);
}
}
private Map<String, Double> calculatePercentiles(DescriptiveStatistics stats, List<Double> percentiles) {
Map<String, Double> results = new HashMap<>();
for (Double percentile : percentiles) {
double value = stats.getPercentile(percentile);
results.put("p" + percentile.intValue(), value);
}
return results;
}
@Override
public boolean supportsWindow(Duration windowSize) {
return windowSize.compareTo(Duration.ofSeconds(30)) >= 0 && 
windowSize.compareTo(Duration.ofHours(1)) <= 0;
}
@Override
public void recordDataPoint(double value, Map<String, String> labels) {
recordDataPoint(value, labels, Instant.now(clock));
}
@Override
public void recordDataPoint(double value, Map<String, String> labels, Instant timestamp) {
LatencyDataPoint dataPoint = new LatencyDataPoint(
timestamp.toEpochMilli(), 
value, 
labels
);
reservoir.add(dataPoint);
// Also record in micrometer timer
requestTimer.record((long) value, TimeUnit.MILLISECONDS);
log.trace("Recorded latency data point: {}ms, labels={}", value, labels);
}
public Timer.Sample startTimer() {
return Timer.start(clock);
}
public void stopTimer(Timer.Sample sample, String method, String endpoint, int statusCode) {
Map<String, String> labels = Map.of(
"method", method,
"endpoint", endpoint,
"status_code", String.valueOf(statusCode)
);
sample.stop(requestTimer);
// For more detailed analysis, we also record in our reservoir
// Note: This is simplified - in practice you'd capture the actual duration
recordDataPoint(getDurationFromSample(sample), labels);
}
private double getDurationFromSample(Timer.Sample sample) {
// This is a simplified implementation
// In practice, you'd need to track the actual duration
return 0.0; // Placeholder
}
@Data
private static class LatencyDataPoint {
private final long timestamp;
private final double latencyMs;
private final Map<String, String> labels;
}
private static class SlidingTimeWindowReservoir {
private final long windowMillis;
private final ConcurrentSkipListMap<Long, LatencyDataPoint> dataPoints;
public SlidingTimeWindowReservoir(long windowMillis, TimeUnit unit) {
this.windowMillis = unit.toMillis(windowMillis);
this.dataPoints = new ConcurrentSkipListMap<>();
}
public void add(LatencyDataPoint dataPoint) {
dataPoints.put(dataPoint.getTimestamp(), dataPoint);
cleanupOldDataPoints(dataPoint.getTimestamp());
}
public List<LatencyDataPoint> getDataPoints(long startTime, long endTime) {
cleanupOldDataPoints(endTime);
return new ArrayList<>(dataPoints.subMap(startTime, true, endTime, true).values());
}
private void cleanupOldDataPoints(long currentTime) {
long cutoffTime = currentTime - windowMillis;
dataPoints.headMap(cutoffTime).clear();
}
}
}

Example 4: Throughput SLI Calculator

@Component
@Slf4j
public class ThroughputSLICalculator extends AbstractSLICalculator {
private final Counter requestCounter;
private final SlidingTimeWindowReservoir reservoir;
public ThroughputSLICalculator(MeterRegistry meterRegistry) {
super(createThroughputDefinition(), meterRegistry);
Map<String, String> baseTags = Map.of(
"sli_name", definition.getName(),
"sli_type", definition.getType().name()
);
this.requestCounter = createCounter("sli_throughput_requests", baseTags);
this.reservoir = new SlidingTimeWindowReservoir(
definition.getWindowSize().toMillis(), 
TimeUnit.MILLISECONDS
);
}
private static SLIDefinition createThroughputDefinition() {
return new SLIDefinition(
"http_throughput",
"HTTP requests per second",
SLIType.THROUGHPUT,
"requests_per_second",
Duration.ofMinutes(1)
);
}
@Override
public SLIMeasurement calculate(Instant timestamp) {
try {
long windowStart = timestamp.minus(definition.getWindowSize()).toEpochMilli();
long windowEnd = timestamp.toEpochMilli();
List<ThroughputDataPoint> dataPoints = reservoir.getDataPoints(windowStart, windowEnd);
if (dataPoints.isEmpty()) {
return new SLIMeasurement(definition, "No data available for calculation", timestamp);
}
// Calculate requests per second
long totalRequests = dataPoints.size();
double windowSeconds = definition.getWindowSize().toSeconds();
double requestsPerSecond = totalRequests / windowSeconds;
SLIMeasurement measurement = new SLIMeasurement(definition, requestsPerSecond, timestamp);
measurement.withContext("total_requests", String.valueOf(totalRequests));
measurement.withContext("window_seconds", String.valueOf(windowSeconds));
measurement.withContext("window_size", definition.getWindowSize().toString());
// Calculate per-endpoint throughput if available
Map<String, Long> endpointCounts = dataPoints.stream()
.filter(point -> point.getLabels().containsKey("endpoint"))
.collect(Collectors.groupingBy(
point -> point.getLabels().get("endpoint"),
Collectors.counting()
));
endpointCounts.forEach((endpoint, count) -> 
measurement.withContext("endpoint_" + endpoint + "_rps", 
String.format("%.2f", count / windowSeconds)));
log.debug("Calculated throughput: {} RPS for window ending at {}", 
requestsPerSecond, timestamp);
return measurement;
} catch (Exception e) {
log.error("Failed to calculate throughput SLI", e);
return new SLIMeasurement(definition, "Calculation error: " + e.getMessage(), timestamp);
}
}
@Override
public boolean supportsWindow(Duration windowSize) {
return windowSize.compareTo(Duration.ofSeconds(10)) >= 0 && 
windowSize.compareTo(Duration.ofMinutes(10)) <= 0;
}
@Override
public void recordDataPoint(double value, Map<String, String> labels) {
recordDataPoint(value, labels, Instant.now(clock));
}
@Override
public void recordDataPoint(double value, Map<String, String> labels, Instant timestamp) {
ThroughputDataPoint dataPoint = new ThroughputDataPoint(timestamp.toEpochMilli(), labels);
reservoir.add(dataPoint);
requestCounter.increment();
log.trace("Recorded throughput data point, labels={}", labels);
}
public void recordRequest(String method, String endpoint) {
Map<String, String> labels = Map.of(
"method", method,
"endpoint", endpoint
);
recordDataPoint(1.0, labels);
}
@Data
private static class ThroughputDataPoint {
private final long timestamp;
private final Map<String, String> labels;
}
private static class SlidingTimeWindowReservoir {
private final long windowMillis;
private final ConcurrentSkipListMap<Long, ThroughputDataPoint> dataPoints;
public SlidingTimeWindowReservoir(long windowMillis, TimeUnit unit) {
this.windowMillis = unit.toMillis(windowMillis);
this.dataPoints = new ConcurrentSkipListMap<>();
}
public void add(ThroughputDataPoint dataPoint) {
dataPoints.put(dataPoint.getTimestamp(), dataPoint);
cleanupOldDataPoints(dataPoint.getTimestamp());
}
public List<ThroughputDataPoint> getDataPoints(long startTime, long endTime) {
cleanupOldDataPoints(endTime);
return new ArrayList<>(dataPoints.subMap(startTime, true, endTime, true).values());
}
private void cleanupOldDataPoints(long currentTime) {
long cutoffTime = currentTime - windowMillis;
dataPoints.headMap(cutoffTime).clear();
}
}
}

SLI Management and Orchestration

Example 5: SLI Manager Service

@Service
@Slf4j
public class SLIManagerService {
private final Map<String, SLICalculator> calculators;
private final ScheduledExecutorService scheduler;
private final List<SLIListener> listeners;
private final Map<String, SLIMeasurement> lastMeasurements;
public SLIManagerService(List<SLICalculator> calculatorList) {
this.calculators = calculatorList.stream()
.collect(Collectors.toMap(SLICalculator::getName, Function.identity()));
this.scheduler = Executors.newScheduledThreadPool(2);
this.listeners = new CopyOnWriteArrayList<>();
this.lastMeasurements = new ConcurrentHashMap<>();
startPeriodicEvaluation();
}
public void registerCalculator(SLICalculator calculator) {
calculators.put(calculator.getName(), calculator);
log.info("Registered SLI calculator: {}", calculator.getName());
}
public void unregisterCalculator(String calculatorName) {
calculators.remove(calculatorName);
log.info("Unregistered SLI calculator: {}", calculatorName);
}
public void addListener(SLIListener listener) {
listeners.add(listener);
}
public void removeListener(SLIListener listener) {
listeners.remove(listener);
}
private void startPeriodicEvaluation() {
// Evaluate all SLIs every minute
scheduler.scheduleAtFixedRate(this::evaluateAllSLIs, 0, 1, TimeUnit.MINUTES);
log.info("Started periodic SLI evaluation");
}
public void evaluateAllSLIs() {
Instant evaluationTime = Instant.now();
for (SLICalculator calculator : calculators.values()) {
try {
evaluateSLI(calculator, evaluationTime);
} catch (Exception e) {
log.error("Failed to evaluate SLI: {}", calculator.getName(), e);
}
}
}
public SLIMeasurement evaluateSLI(String calculatorName) {
return evaluateSLI(calculatorName, Instant.now());
}
public SLIMeasurement evaluateSLI(String calculatorName, Instant timestamp) {
SLICalculator calculator = calculators.get(calculatorName);
if (calculator == null) {
throw new IllegalArgumentException("Unknown SLI calculator: " + calculatorName);
}
return evaluateSLI(calculator, timestamp);
}
private SLIMeasurement evaluateSLI(SLICalculator calculator, Instant timestamp) {
SLIMeasurement measurement = calculator.calculate(timestamp);
lastMeasurements.put(calculator.getName(), measurement);
// Notify listeners
for (SLIListener listener : listeners) {
try {
listener.onSLIMeasurement(measurement);
} catch (Exception e) {
log.error("Listener failed to process SLI measurement: {}", calculator.getName(), e);
}
}
return measurement;
}
public Map<String, SLIMeasurement> getLastMeasurements() {
return new HashMap<>(lastMeasurements);
}
public SLIMeasurement getLastMeasurement(String calculatorName) {
return lastMeasurements.get(calculatorName);
}
public List<SLIDefinition> getSLIDefinitions() {
return calculators.values().stream()
.map(SLICalculator::getDefinition)
.collect(Collectors.toList());
}
public SLIHealthReport generateHealthReport() {
SLIHealthReport report = new SLIHealthReport();
report.setGeneratedAt(Instant.now());
Map<String, SLIHealthStatus> statuses = new HashMap<>();
for (SLICalculator calculator : calculators.values()) {
SLIMeasurement lastMeasurement = lastMeasurements.get(calculator.getName());
SLIHealthStatus status = calculateHealthStatus(calculator, lastMeasurement);
statuses.put(calculator.getName(), status);
}
report.setStatuses(statuses);
report.setOverallHealth(calculateOverallHealth(statuses));
return report;
}
private SLIHealthStatus calculateHealthStatus(SLICalculator calculator, SLIMeasurement measurement) {
SLIHealthStatus status = new SLIHealthStatus();
status.setSliName(calculator.getName());
status.setSliType(calculator.getType());
status.setLastMeasurement(measurement);
status.setLastUpdated(measurement != null ? measurement.getTimestamp() : Instant.now());
if (measurement == null || !measurement.isSuccess()) {
status.setHealth(SLIHealth.UNKNOWN);
status.setMessage("No valid measurement available");
} else {
// Apply health rules based on SLI type and value
status.setHealth(evaluateSLIHealth(calculator, measurement));
status.setMessage(String.format("Current value: %.2f %s", 
measurement.getValue(), measurement.getUnit()));
}
return status;
}
private SLIHealth evaluateSLIHealth(SLICalculator calculator, SLIMeasurement measurement) {
// Simplified health evaluation - in practice, this would use SLO targets
switch (calculator.getType()) {
case AVAILABILITY:
return measurement.getValue() >= 99.9 ? SLIHealth.HEALTHY : 
measurement.getValue() >= 99.0 ? SLIHealth.DEGRADED : SLIHealth.UNHEALTHY;
case LATENCY:
return measurement.getValue() <= 100.0 ? SLIHealth.HEALTHY : 
measurement.getValue() <= 500.0 ? SLIHealth.DEGRADED : SLIHealth.UNHEALTHY;
case THROUGHPUT:
// Throughput health depends on expected load
return SLIHealth.HEALTHY; // Simplified
default:
return SLIHealth.UNKNOWN;
}
}
private SLIHealth calculateOverallHealth(Map<String, SLIHealthStatus> statuses) {
if (statuses.isEmpty()) {
return SLIHealth.UNKNOWN;
}
long unhealthyCount = statuses.values().stream()
.filter(status -> status.getHealth() == SLIHealth.UNHEALTHY)
.count();
long degradedCount = statuses.values().stream()
.filter(status -> status.getHealth() == SLIHealth.DEGRADED)
.count();
if (unhealthyCount > 0) {
return SLIHealth.UNHEALTHY;
} else if (degradedCount > 0) {
return SLIHealth.DEGRADED;
} else {
return SLIHealth.HEALTHY;
}
}
@PreDestroy
public void cleanup() {
scheduler.shutdown();
try {
if (!scheduler.awaitTermination(30, TimeUnit.SECONDS)) {
scheduler.shutdownNow();
}
} catch (InterruptedException e) {
scheduler.shutdownNow();
Thread.currentThread().interrupt();
}
}
// Listener interface for SLI measurements
public interface SLIListener {
void onSLIMeasurement(SLIMeasurement measurement);
}
// Health status enums and classes
public enum SLIHealth {
HEALTHY,
DEGRADED,
UNHEALTHY,
UNKNOWN
}
@Data
public static class SLIHealthStatus {
private String sliName;
private SLIType sliType;
private SLIHealth health;
private String message;
private SLIMeasurement lastMeasurement;
private Instant lastUpdated;
}
@Data
public static class SLIHealthReport {
private Instant generatedAt;
private SLIHealth overallHealth;
private Map<String, SLIHealthStatus> statuses;
}
}

Integration with Web Framework

Example 6: Spring Boot Integration and HTTP Filter

@Configuration
@EnableConfigurationProperties(SLIConfigurationProperties.class)
@Slf4j
public class SLIAutoConfiguration {
@Bean
@ConditionalOnMissingBean
public AvailabilitySLICalculator availabilitySLICalculator(MeterRegistry meterRegistry) {
return new AvailabilitySLICalculator(meterRegistry);
}
@Bean
@ConditionalOnMissingBean
public LatencySLICalculator latencySLICalculator(MeterRegistry meterRegistry) {
return new LatencySLICalculator(meterRegistry);
}
@Bean
@ConditionalOnMissingBean
public ThroughputSLICalculator throughputSLICalculator(MeterRegistry meterRegistry) {
return new ThroughputSLICalculator(meterRegistry);
}
@Bean
@ConditionalOnMissingBean
public SLIManagerService sliManagerService(List<SLICalculator> calculators) {
return new SLIManagerService(calculators);
}
@Bean
public SLIHttpFilter sliHttpFilter(SLIManagerService sliManagerService,
AvailabilitySLICalculator availabilityCalculator,
LatencySLICalculator latencyCalculator,
ThroughputSLICalculator throughputCalculator) {
return new SLIHttpFilter(sliManagerService, availabilityCalculator, 
latencyCalculator, throughputCalculator);
}
@Bean
public SLIController sliController(SLIManagerService sliManagerService) {
return new SLIController(sliManagerService);
}
}
// Configuration properties
@ConfigurationProperties(prefix = "sli")
@Data
public class SLIConfigurationProperties {
private boolean enabled = true;
private Evaluation evaluation = new Evaluation();
private Health health = new Health();
@Data
public static class Evaluation {
private Duration interval = Duration.ofMinutes(1);
private Duration defaultWindow = Duration.ofMinutes(5);
}
@Data
public static class Health {
private boolean enabled = true;
private Duration checkInterval = Duration.ofSeconds(30);
}
}
// HTTP Filter for automatic SLI collection
@Component
@Slf4j
public class SLIHttpFilter implements Filter {
private final SLIManagerService sliManagerService;
private final AvailabilitySLICalculator availabilityCalculator;
private final LatencySLICalculator latencyCalculator;
private final ThroughputSLICalculator throughputCalculator;
public SLIHttpFilter(SLIManagerService sliManagerService,
AvailabilitySLICalculator availabilityCalculator,
LatencySLICalculator latencyCalculator,
ThroughputSLICalculator throughputCalculator) {
this.sliManagerService = sliManagerService;
this.availabilityCalculator = availabilityCalculator;
this.latencyCalculator = latencyCalculator;
this.throughputCalculator = throughputCalculator;
}
@Override
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain)
throws IOException, ServletException {
if (!(request instanceof HttpServletRequest) || !(response instanceof HttpServletResponse)) {
chain.doFilter(request, response);
return;
}
HttpServletRequest httpRequest = (HttpServletRequest) request;
HttpServletResponse httpResponse = (HttpServletResponse) response;
String method = httpRequest.getMethod();
String endpoint = getEndpoint(httpRequest);
long startTime = System.currentTimeMillis();
try {
chain.doFilter(request, response);
// Request succeeded
long duration = System.currentTimeMillis() - startTime;
int statusCode = httpResponse.getStatus();
// Record SLI data points
recordSLIDataPoints(method, endpoint, statusCode, duration, true);
} catch (Exception e) {
// Request failed
long duration = System.currentTimeMillis() - startTime;
// Record SLI data points for failure
recordSLIDataPoints(method, endpoint, 500, duration, false);
throw e;
}
}
private void recordSLIDataPoints(String method, String endpoint, int statusCode, 
long duration, boolean success) {
try {
// Availability
availabilityCalculator.recordHttpRequest(success, method, endpoint, statusCode);
// Latency (only for successful requests or all, depending on your needs)
if (success || recordLatencyForFailures()) {
Map<String, String> latencyLabels = Map.of(
"method", method,
"endpoint", endpoint,
"status_code", String.valueOf(statusCode)
);
latencyCalculator.recordDataPoint(duration, latencyLabels);
}
// Throughput
throughputCalculator.recordRequest(method, endpoint);
log.debug("Recorded SLI data points - method: {}, endpoint: {}, status: {}, duration: {}ms",
method, endpoint, statusCode, duration);
} catch (Exception e) {
log.error("Failed to record SLI data points", e);
}
}
private String getEndpoint(HttpServletRequest request) {
// Normalize endpoint to remove variable parts
String path = request.getRequestURI();
// Remove IDs and other variable parts from path
path = path.replaceAll("/\\d+", "/{id}");
path = path.replaceAll("/[a-fA-F0-9-]{36}", "/{uuid}"); // UUIDs
return path;
}
private boolean recordLatencyForFailures() {
// Configuration option - whether to record latency for failed requests
return true;
}
}
// REST Controller for SLI data
@RestController
@RequestMapping("/api/sli")
@Slf4j
public class SLIController {
private final SLIManagerService sliManagerService;
public SLIController(SLIManagerService sliManagerService) {
this.sliManagerService = sliManagerService;
}
@GetMapping("/definitions")
public ResponseEntity<List<SLIDefinition>> getSLIDefinitions() {
return ResponseEntity.ok(sliManagerService.getSLIDefinitions());
}
@GetMapping("/measurements")
public ResponseEntity<Map<String, SLIMeasurement>> getLastMeasurements() {
return ResponseEntity.ok(sliManagerService.getLastMeasurements());
}
@GetMapping("/measurements/{sliName}")
public ResponseEntity<SLIMeasurement> getMeasurement(@PathVariable String sliName) {
SLIMeasurement measurement = sliManagerService.getLastMeasurement(sliName);
if (measurement == null) {
return ResponseEntity.notFound().build();
}
return ResponseEntity.ok(measurement);
}
@PostMapping("/measurements/{sliName}/evaluate")
public ResponseEntity<SLIMeasurement> evaluateSLI(@PathVariable String sliName) {
try {
SLIMeasurement measurement = sliManagerService.evaluateSLI(sliName);
return ResponseEntity.ok(measurement);
} catch (IllegalArgumentException e) {
return ResponseEntity.notFound().build();
} catch (Exception e) {
log.error("Failed to evaluate SLI: {}", sliName, e);
return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR).build();
}
}
@GetMapping("/health")
public ResponseEntity<SLIManagerService.SLIHealthReport> getHealthReport() {
SLIManagerService.SLIHealthReport report = sliManagerService.generateHealthReport();
return ResponseEntity.ok(report);
}
@GetMapping("/health/{sliName}")
public ResponseEntity<SLIManagerService.SLIHealthStatus> getSLIHealth(@PathVariable String sliName) {
SLIManagerService.SLIHealthReport report = sliManagerService.generateHealthReport();
SLIManagerService.SLIHealthStatus status = report.getStatuses().get(sliName);
if (status == null) {
return ResponseEntity.notFound().build();
}
return ResponseEntity.ok(status);
}
}

Advanced SLI Features

Example 7: SLO Tracking and Error Budget

@Service
@Slf4j
public class SLOService {
private final SLIManagerService sliManagerService;
private final Map<String, SLODefinition> sloDefinitions;
public SLOService(SLIManagerService sliManagerService) {
this.sliManagerService = sliManagerService;
this.sloDefinitions = new ConcurrentHashMap<>();
loadDefaultSLODefinitions();
// Register listener for SLO tracking
sliManagerService.addListener(this::trackSLOCompliance);
}
@Data
public static class SLODefinition {
private final String name;
private final String sliName;
private final double target; // e.g., 99.9 for 99.9% availability
private final Duration window; // SLO evaluation window
private final Duration budgetWindow; // Error budget window
public SLODefinition(String name, String sliName, double target, 
Duration window, Duration budgetWindow) {
this.name = name;
this.sliName = sliName;
this.target = target;
this.window = window;
this.budgetWindow = budgetWindow;
}
}
@Data
public static class SLOStatus {
private String sloName;
private String sliName;
private double currentValue;
private double target;
private double compliance; // Current compliance percentage
private double errorBudget; // Remaining error budget percentage
private double errorBudgetConsumed; // Error budget consumed percentage
private Instant lastUpdated;
private SLIComplianceLevel complianceLevel;
public boolean isCompliant() {
return compliance >= target;
}
}
public enum SLIComplianceLevel {
WITHIN_SLO,      // Within SLO target
AT_RISK,         // Approaching SLO violation
BREACHING        // Currently breaching SLO
}
private void loadDefaultSLODefinitions() {
// Define default SLOs
sloDefinitions.put("availability-slo", new SLODefinition(
"availability-slo",
"http_availability",
99.9, // 99.9% availability
Duration.ofDays(30), // 30-day rolling window
Duration.ofDays(30)  // 30-day error budget
));
sloDefinitions.put("latency-slo", new SLODefinition(
"latency-slo", 
"http_latency",
100.0, // P99 latency <= 100ms
Duration.ofDays(7), // 7-day rolling window
Duration.ofDays(30) // 30-day error budget
));
}
public void registerSLO(SLODefinition sloDefinition) {
sloDefinitions.put(sloDefinition.getName(), sloDefinition);
log.info("Registered SLO: {}", sloDefinition.getName());
}
public SLOStatus calculateSLOStatus(String sloName) {
SLODefinition slo = sloDefinitions.get(sloName);
if (slo == null) {
throw new IllegalArgumentException("Unknown SLO: " + sloName);
}
// In a real implementation, you would query historical data
// For this example, we'll use the current measurement
SLIMeasurement measurement = sliManagerService.getLastMeasurement(slo.getSliName());
if (measurement == null || !measurement.isSuccess()) {
return createUnknownStatus(slo);
}
return calculateStatusFromMeasurement(slo, measurement);
}
private SLOStatus createUnknownStatus(SLODefinition slo) {
SLOStatus status = new SLOStatus();
status.setSloName(slo.getName());
status.setSliName(slo.getSliName());
status.setTarget(slo.getTarget());
status.setCompliance(0.0);
status.setErrorBudget(100.0); // Assume full budget when unknown
status.setErrorBudgetConsumed(0.0);
status.setLastUpdated(Instant.now());
status.setComplianceLevel(SLIComplianceLevel.BREACHING);
return status;
}
private SLOStatus calculateStatusFromMeasurement(SLODefinition slo, SLIMeasurement measurement) {
SLOStatus status = new SLOStatus();
status.setSloName(slo.getName());
status.setSliName(slo.getSliName());
status.setCurrentValue(measurement.getValue());
status.setTarget(slo.getTarget());
status.setLastUpdated(measurement.getTimestamp());
// Calculate compliance (simplified)
double compliance = calculateCompliance(slo, measurement);
status.setCompliance(compliance);
// Calculate error budget (simplified)
ErrorBudget errorBudget = calculateErrorBudget(slo, compliance);
status.setErrorBudget(errorBudget.getRemainingPercent());
status.setErrorBudgetConsumed(errorBudget.getConsumedPercent());
status.setComplianceLevel(determineComplianceLevel(compliance, errorBudget));
return status;
}
private double calculateCompliance(SLODefinition slo, SLIMeasurement measurement) {
// Simplified compliance calculation
// In practice, this would use historical data over the SLO window
switch (slo.getSliName()) {
case "http_availability":
// For availability, higher is better
return Math.min(measurement.getValue(), 100.0);
case "http_latency":
// For latency, lower is better - convert to a "good" percentage
double maxAcceptableLatency = slo.getTarget() * 2; // Allow some headroom
double latencyScore = Math.max(0, 100.0 - (measurement.getValue() / maxAcceptableLatency * 100.0));
return Math.min(latencyScore, 100.0);
default:
return measurement.getValue();
}
}
private ErrorBudget calculateErrorBudget(SLODefinition slo, double compliance) {
// Simplified error budget calculation
double errorRate = 100.0 - compliance;
double allowedErrorRate = 100.0 - slo.getTarget();
if (allowedErrorRate <= 0) {
return new ErrorBudget(100.0, 0.0);
}
double consumedPercent = (errorRate / allowedErrorRate) * 100.0;
double remainingPercent = Math.max(0, 100.0 - consumedPercent);
return new ErrorBudget(remainingPercent, consumedPercent);
}
private SLIComplianceLevel determineComplianceLevel(double compliance, ErrorBudget errorBudget) {
if (compliance >= 99.0 && errorBudget.getRemainingPercent() > 50.0) {
return SLIComplianceLevel.WITHIN_SLO;
} else if (compliance >= 95.0 && errorBudget.getRemainingPercent() > 10.0) {
return SLIComplianceLevel.AT_RISK;
} else {
return SLIComplianceLevel.BREACHING;
}
}
private void trackSLOCompliance(SLIMeasurement measurement) {
// Track SLO compliance for relevant SLIs
sloDefinitions.values().stream()
.filter(slo -> slo.getSliName().equals(measurement.getDefinition().getName()))
.forEach(slo -> {
SLOStatus status = calculateSLOStatus(slo.getName());
logSLOStatus(status);
if (status.getComplianceLevel() == SLIComplianceLevel.BREACHING) {
triggerSLOAlert(status);
}
});
}
private void logSLOStatus(SLOStatus status) {
log.info("SLO Status - {}: Compliance={}%, Target={}%, Error Budget={}%, Level={}",
status.getSloName(),
String.format("%.2f", status.getCompliance()),
String.format("%.2f", status.getTarget()),
String.format("%.2f", status.getErrorBudget()),
status.getComplianceLevel());
}
private void triggerSLOAlert(SLOStatus status) {
// Implement SLO breach alerting
log.warn("SLO BREACH - {}: Compliance={}% (target: {}%), Error Budget={}%",
status.getSloName(),
String.format("%.2f", status.getCompliance()),
String.format("%.2f", status.getTarget()),
String.format("%.2f", status.getErrorBudget()));
// Send alert to monitoring system, PagerDuty, etc.
}
@Data
private static class ErrorBudget {
private final double remainingPercent;
private final double consumedPercent;
}
}

Best Practices

Configuration Example

# application-sli.yml
sli:
enabled: true
evaluation:
interval: 1m
default-window: 5m
health:
enabled: true
check-interval: 30s
# SLO Definitions
slo:
definitions:
availability:
name: "availability-slo"
sli-name: "http_availability"
target: 99.9
window: 30d
budget-window: 30d
latency:
name: "latency-slo"
sli-name: "http_latency" 
target: 100.0
window: 7d
budget-window: 30d

Conclusion

Service Level Indicators provide crucial visibility into service reliability and performance:

Key Implementation Patterns:

  1. Modular Calculators: Separate calculators for different SLI types
  2. Sliding Windows: Time-based data aggregation
  3. Health Evaluation: Automated compliance checking
  4. SLO Integration: Error budget tracking and alerting

Common SLI Categories:

  • Availability: Uptime and success rates
  • Latency: Response time percentiles
  • Throughput: Request volume and capacity
  • Quality: Data accuracy and business metrics

Operational Benefits:

  • Proactive Monitoring: Early detection of reliability issues
  • Data-Driven Decisions: Objective basis for capacity planning
  • Clear Communication: Standardized reliability metrics
  • Continuous Improvement: Tracking progress toward reliability goals

Implementing SLIs in Java applications enables organizations to measure what matters most to their users and make informed decisions about reliability investments.

Leave a Reply

Your email address will not be published. Required fields are marked *


Macro Nepal Helper