Availability testing validates that your system remains operational and accessible under various conditions. It's crucial for ensuring SLA compliance and user satisfaction.
Core Concepts of Availability Testing
What is Availability Testing?
- Measuring system uptime and reliability
- Testing under normal and peak load conditions
- Validating failover and recovery mechanisms
- Ensuring compliance with Service Level Agreements (SLAs)
Key Metrics:
- Uptime Percentage:
(Total Time - Downtime) / Total Time * 100 - Mean Time Between Failures (MTBF): Average time between system failures
- Mean Time To Recovery (MTTR): Average time to restore service after failure
- Error Rate: Percentage of failed requests
- Response Time: Time taken to respond to requests
Dependencies and Setup
Maven Dependencies
<properties>
<spring-boot.version>3.1.0</spring-boot.version>
<junit.version>5.9.2</junit.version>
<testcontainers.version>1.18.3</testcontainers.version>
<resilience4j.version>2.0.2</resilience4j.version>
<awaitility.version>4.2.0</awaitility.version>
</properties>
<dependencies>
<!-- Spring Boot -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
<version>${spring-boot.version}</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
<version>${spring-boot.version}</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<version>${spring-boot.version}</version>
<scope>test</scope>
</dependency>
<!-- Testing -->
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter</artifactId>
<version>${junit.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.testcontainers</groupId>
<artifactId>junit-jupiter</artifactId>
<version>${testcontainers.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.awaitility</groupId>
<artifactId>awaitility</artifactId>
<version>${awaitility.version}</version>
<scope>test</scope>
</dependency>
<!-- Resilience -->
<dependency>
<groupId>io.github.resilience4j</groupId>
<artifactId>resilience4j-spring-boot3</artifactId>
<version>${resilience4j.version}</version>
</dependency>
<dependency>
<groupId>io.github.resilience4j</groupId>
<artifactId>resilience4j-timelimiter</artifactId>
<version>${resilience4j.version}</version>
</dependency>
<!-- HTTP Client -->
<dependency>
<groupId>org.apache.httpcomponents.client5</groupId>
<artifactId>httpclient5</artifactId>
<version>5.2.1</version>
</dependency>
<!-- Monitoring -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-core</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
</dependencies>
Core Availability Testing Framework
1. Availability Metrics Collector
@Component
public class AvailabilityMetrics {
private final MeterRegistry meterRegistry;
private final Counter totalRequests;
private final Counter failedRequests;
private final Timer responseTimer;
private final Gauge uptimeGauge;
private long startTime;
private long totalDowntime;
private long lastFailureTime;
public AvailabilityMetrics(MeterRegistry meterRegistry) {
this.meterRegistry = meterRegistry;
this.startTime = System.currentTimeMillis();
this.totalRequests = Counter.builder("availability.requests.total")
.description("Total number of requests")
.register(meterRegistry);
this.failedRequests = Counter.builder("availability.requests.failed")
.description("Number of failed requests")
.register(meterRegistry);
this.responseTimer = Timer.builder("availability.response.time")
.description("Request response time")
.register(meterRegistry);
this.uptimeGauge = Gauge.builder("availability.uptime.percentage")
.description("System uptime percentage")
.register(meterRegistry, this, AvailabilityMetrics::calculateUptimePercentage);
}
public void recordRequest(boolean success, long duration) {
totalRequests.increment();
if (!success) {
failedRequests.increment();
recordFailure();
}
responseTimer.record(duration, TimeUnit.MILLISECONDS);
}
public void recordFailure() {
this.lastFailureTime = System.currentTimeMillis();
}
public void recordDowntime(long downtimeMs) {
this.totalDowntime += downtimeMs;
}
public double calculateUptimePercentage() {
long totalTime = System.currentTimeMillis() - startTime;
if (totalTime == 0) return 100.0;
long uptime = totalTime - totalDowntime;
return (double) uptime / totalTime * 100;
}
public double getErrorRate() {
double total = totalRequests.count();
double failed = failedRequests.count();
if (total == 0) return 0.0;
return (failed / total) * 100;
}
public long getMTBF() {
// Simplified MTBF calculation
long totalUptime = (System.currentTimeMillis() - startTime) - totalDowntime;
double failureCount = failedRequests.count();
if (failureCount == 0) return Long.MAX_VALUE;
return (long) (totalUptime / failureCount);
}
// Getters
public double getTotalRequests() { return totalRequests.count(); }
public double getFailedRequests() { return failedRequests.count(); }
}
2. Health Check Service
@Service
@Slf4j
public class HealthCheckService {
private final List<HealthIndicator> healthIndicators;
private final AvailabilityMetrics availabilityMetrics;
public HealthCheckService(List<HealthIndicator> healthIndicators,
AvailabilityMetrics availabilityMetrics) {
this.healthIndicators = healthIndicators;
this.availabilityMetrics = availabilityMetrics;
}
public HealthStatus checkHealth() {
List<HealthCheckResult> results = new ArrayList<>();
boolean overallHealthy = true;
for (HealthIndicator indicator : healthIndicators) {
HealthCheckResult result = indicator.check();
results.add(result);
if (!result.isHealthy()) {
overallHealthy = false;
log.warn("Health check failed: {} - {}", result.getComponent(), result.getMessage());
}
}
HealthStatus status = HealthStatus.builder()
.status(overallHealthy ? "UP" : "DOWN")
.timestamp(Instant.now())
.checks(results)
.build();
// Record availability metric
availabilityMetrics.recordRequest(overallHealthy, 0);
return status;
}
public CompletableFuture<HealthStatus> checkHealthAsync() {
return CompletableFuture.supplyAsync(this::checkHealth);
}
}
@Component
public interface HealthIndicator {
String getComponentName();
HealthCheckResult check();
}
@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
class HealthCheckResult {
private String component;
private boolean healthy;
private String message;
private Map<String, Object> details;
private Instant timestamp;
}
@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
class HealthStatus {
private String status;
private Instant timestamp;
private List<HealthCheckResult> checks;
}
3. Database Health Indicator
@Component
@Slf4j
public class DatabaseHealthIndicator implements HealthIndicator {
private final DataSource dataSource;
private static final String VALIDATION_QUERY = "SELECT 1";
public DatabaseHealthIndicator(DataSource dataSource) {
this.dataSource = dataSource;
}
@Override
public String getComponentName() {
return "database";
}
@Override
public HealthCheckResult check() {
try (Connection connection = dataSource.getConnection();
Statement statement = connection.createStatement()) {
long startTime = System.currentTimeMillis();
ResultSet resultSet = statement.executeQuery(VALIDATION_QUERY);
long responseTime = System.currentTimeMillis() - startTime;
boolean healthy = resultSet.next() && resultSet.getInt(1) == 1;
return HealthCheckResult.builder()
.component(getComponentName())
.healthy(healthy)
.message(healthy ? "Database is accessible" : "Database validation failed")
.details(Map.of(
"responseTime", responseTime,
"url", connection.getMetaData().getURL()
))
.timestamp(Instant.now())
.build();
} catch (Exception e) {
log.error("Database health check failed", e);
return HealthCheckResult.builder()
.component(getComponentName())
.healthy(false)
.message("Database connection failed: " + e.getMessage())
.timestamp(Instant.now())
.build();
}
}
}
4. External Service Health Indicator
@Component
@Slf4j
public class ExternalServiceHealthIndicator implements HealthIndicator {
private final RestTemplate restTemplate;
private final CircuitBreaker circuitBreaker;
public ExternalServiceHealthIndicator(RestTemplate restTemplate,
CircuitBreakerRegistry circuitBreakerRegistry) {
this.restTemplate = restTemplate;
this.circuitBreaker = circuitBreakerRegistry.circuitBreaker("externalServiceHealth");
}
@Override
public String getComponentName() {
return "external-payment-service";
}
@Override
public HealthCheckResult check() {
return circuitBreaker.executeSupplier(() -> {
try {
long startTime = System.currentTimeMillis();
ResponseEntity<String> response = restTemplate.getForEntity(
"https://api.payment-service.com/health", String.class);
long responseTime = System.currentTimeMillis() - startTime;
boolean healthy = response.getStatusCode().is2xxSuccessful();
return HealthCheckResult.builder()
.component(getComponentName())
.healthy(healthy)
.message(healthy ? "Service is responsive" :
"Service returned " + response.getStatusCode())
.details(Map.of(
"responseTime", responseTime,
"statusCode", response.getStatusCodeValue(),
"url", "https://api.payment-service.com/health"
))
.timestamp(Instant.now())
.build();
} catch (Exception e) {
log.warn("External service health check failed", e);
return HealthCheckResult.builder()
.component(getComponentName())
.healthy(false)
.message("Service unreachable: " + e.getMessage())
.timestamp(Instant.now())
.build();
}
});
}
}
Availability Testing Framework
1. Base Availability Test
public abstract class BaseAvailabilityTest {
protected static final Duration TEST_DURATION = Duration.ofMinutes(10);
protected static final Duration REQUEST_INTERVAL = Duration.ofSeconds(30);
protected static final double ACCEPTABLE_ERROR_RATE = 1.0; // 1%
protected static final long MAX_RESPONSE_TIME_MS = 2000; // 2 seconds
protected AvailabilityMetrics metrics;
protected ScheduledExecutorService scheduler;
@BeforeEach
void setUp() {
metrics = new AvailabilityMetrics(new SimpleMeterRegistry());
scheduler = Executors.newScheduledThreadPool(5);
}
@AfterEach
void tearDown() {
if (scheduler != null) {
scheduler.shutdownNow();
}
}
protected abstract CompletableFuture<Boolean> executeAvailabilityCheck();
protected AvailabilityTestResult runAvailabilityTest() {
List<CompletableFuture<Boolean>> futures = new ArrayList<>();
CountDownLatch completionLatch = new CountDownLatch(1);
// Schedule periodic health checks
ScheduledFuture<?> scheduledFuture = scheduler.scheduleAtFixedRate(() -> {
CompletableFuture<Boolean> future = executeAvailabilityCheck();
futures.add(future);
}, 0, REQUEST_INTERVAL.toMillis(), TimeUnit.MILLISECONDS);
// Stop after test duration
scheduler.schedule(() -> {
scheduledFuture.cancel(false);
completionLatch.countDown();
}, TEST_DURATION.toMillis(), TimeUnit.MILLISECONDS);
try {
completionLatch.await();
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
// Wait for all futures to complete
CompletableFuture.allOf(futures.toArray(new CompletableFuture[0])).join();
return analyzeResults(futures);
}
private AvailabilityTestResult analyzeResults(List<CompletableFuture<Boolean>> futures) {
long successfulChecks = futures.stream()
.filter(CompletableFuture::isDone)
.map(future -> {
try {
return future.get();
} catch (Exception e) {
return false;
}
})
.filter(Boolean::booleanValue)
.count();
long totalChecks = futures.size();
double availabilityPercentage = (double) successfulChecks / totalChecks * 100;
return AvailabilityTestResult.builder()
.totalChecks(totalChecks)
.successfulChecks(successfulChecks)
.failedChecks(totalChecks - successfulChecks)
.availabilityPercentage(availabilityPercentage)
.errorRate(100 - availabilityPercentage)
.testDuration(TEST_DURATION)
.meetsSla(availabilityPercentage >= (100 - ACCEPTABLE_ERROR_RATE))
.timestamp(Instant.now())
.build();
}
}
@Data
@Builder
@AllArgsConstructor
@NoArgsConstructor
class AvailabilityTestResult {
private long totalChecks;
private long successfulChecks;
private long failedChecks;
private double availabilityPercentage;
private double errorRate;
private Duration testDuration;
private boolean meetsSla;
private Instant timestamp;
public String getSummary() {
return String.format("Availability: %.2f%%, Error Rate: %.2f%%, SLA Met: %s",
availabilityPercentage, errorRate, meetsSla ? "YES" : "NO");
}
}
2. HTTP Endpoint Availability Test
@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT)
@TestInstance(TestInstance.Lifecycle.PER_CLASS)
class HttpEndpointAvailabilityTest extends BaseAvailabilityTest {
@LocalServerPort
private int port;
private RestTemplate restTemplate;
@BeforeEach
void setUpHttpClient() {
super.setUp();
restTemplate = new RestTemplateBuilder()
.setConnectTimeout(Duration.ofSeconds(5))
.setReadTimeout(Duration.ofSeconds(10))
.build();
}
@Test
void testHealthEndpointAvailability() {
AvailabilityTestResult result = runAvailabilityTest();
assertThat(result.getAvailabilityPercentage())
.as("Health endpoint availability should meet SLA")
.isGreaterThanOrEqualTo(100 - ACCEPTABLE_ERROR_RATE);
assertThat(result.getFailedChecks())
.as("Number of failed checks should be within acceptable limits")
.isLessThanOrEqualTo((long) (result.getTotalChecks() * ACCEPTABLE_ERROR_RATE / 100));
System.out.println("Health Endpoint Test Result: " + result.getSummary());
}
@Override
protected CompletableFuture<Boolean> executeAvailabilityCheck() {
return CompletableFuture.supplyAsync(() -> {
long startTime = System.currentTimeMillis();
try {
String healthUrl = String.format("http://localhost:%d/actuator/health", port);
ResponseEntity<String> response = restTemplate.getForEntity(healthUrl, String.class);
long responseTime = System.currentTimeMillis() - startTime;
boolean success = response.getStatusCode().is2xxSuccessful() &&
responseTime <= MAX_RESPONSE_TIME_MS;
metrics.recordRequest(success, responseTime);
if (!success) {
log.warn("Health check failed: Status={}, ResponseTime={}ms",
response.getStatusCode(), responseTime);
}
return success;
} catch (Exception e) {
long responseTime = System.currentTimeMillis() - startTime;
metrics.recordRequest(false, responseTime);
log.warn("Health check failed with exception: {}", e.getMessage());
return false;
}
});
}
}
3. Database Availability Test
@SpringBootTest
@TestInstance(TestInstance.Lifecycle.PER_CLASS)
class DatabaseAvailabilityTest extends BaseAvailabilityTest {
@Autowired
private DataSource dataSource;
@Autowired
private DatabaseHealthIndicator databaseHealthIndicator;
@Test
void testDatabaseAvailability() {
AvailabilityTestResult result = runAvailabilityTest();
assertThat(result.getAvailabilityPercentage())
.as("Database availability should meet SLA")
.isGreaterThanOrEqualTo(99.9); // 99.9% for database
System.out.println("Database Availability Test Result: " + result.getSummary());
}
@Override
protected CompletableFuture<Boolean> executeAvailabilityCheck() {
return CompletableFuture.supplyAsync(() -> {
long startTime = System.currentTimeMillis();
try {
HealthCheckResult result = databaseHealthIndicator.check();
long responseTime = System.currentTimeMillis() - startTime;
boolean success = result.isHealthy() && responseTime <= MAX_RESPONSE_TIME_MS;
metrics.recordRequest(success, responseTime);
if (!success) {
log.warn("Database health check failed: {}", result.getMessage());
}
return success;
} catch (Exception e) {
long responseTime = System.currentTimeMillis() - startTime;
metrics.recordRequest(false, responseTime);
log.warn("Database health check failed with exception: {}", e.getMessage());
return false;
}
});
}
}
4. Load Testing for Availability
@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT)
@TestInstance(TestInstance.Lifecycle.PER_CLASS)
class LoadAvailabilityTest {
@LocalServerPort
private int port;
private RestTemplate restTemplate;
private AvailabilityMetrics metrics;
@BeforeEach
void setUp() {
metrics = new AvailabilityMetrics(new SimpleMeterRegistry());
restTemplate = new RestTemplateBuilder()
.setConnectTimeout(Duration.ofSeconds(2))
.setReadTimeout(Duration.ofSeconds(5))
.build();
}
@Test
void testAvailabilityUnderLoad() throws InterruptedException {
int concurrentUsers = 50;
Duration testDuration = Duration.ofMinutes(5);
CountDownLatch startLatch = new CountDownLatch(1);
CountDownLatch completionLatch = new CountDownLatch(concurrentUsers);
List<CompletableFuture<LoadTestResult>> futures = new ArrayList<>();
// Start concurrent users
for (int i = 0; i < concurrentUsers; i++) {
CompletableFuture<LoadTestResult> future = CompletableFuture.supplyAsync(() -> {
try {
startLatch.await(); // Wait for all threads to be ready
return simulateUserSession();
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
return new LoadTestResult(0, 0, 0);
} finally {
completionLatch.countDown();
}
});
futures.add(future);
}
// Start the test
startLatch.countDown();
// Wait for test duration
boolean completed = completionLatch.await(testDuration.toMillis(), TimeUnit.MILLISECONDS);
if (!completed) {
log.warn("Test timed out before all users completed");
}
// Analyze results
LoadTestResult aggregateResult = aggregateResults(futures);
assertThat(aggregateResult.getAvailabilityPercentage())
.as("System should maintain availability under load")
.isGreaterThanOrEqualTo(99.0);
assertThat(aggregateResult.getAverageResponseTime())
.as("Response time should remain acceptable under load")
.isLessThan(1000); // 1 second
System.out.println("Load Test Result: " + aggregateResult);
}
private LoadTestResult simulateUserSession() {
int requests = 0;
int successes = 0;
long totalResponseTime = 0;
Random random = new Random();
long endTime = System.currentTimeMillis() + Duration.ofMinutes(1).toMillis();
while (System.currentTimeMillis() < endTime) {
try {
long startTime = System.currentTimeMillis();
String endpoint = random.nextBoolean() ? "/api/users" : "/api/orders";
String url = String.format("http://localhost:%d%s", port, endpoint);
ResponseEntity<String> response = restTemplate.getForEntity(url, String.class);
long responseTime = System.currentTimeMillis() - startTime;
totalResponseTime += responseTime;
requests++;
if (response.getStatusCode().is2xxSuccessful()) {
successes++;
metrics.recordRequest(true, responseTime);
} else {
metrics.recordRequest(false, responseTime);
}
// Random delay between requests
Thread.sleep(random.nextInt(1000) + 500);
} catch (Exception e) {
requests++;
metrics.recordRequest(false, 0);
}
}
double availability = requests > 0 ? (double) successes / requests * 100 : 0;
double avgResponseTime = successes > 0 ? (double) totalResponseTime / successes : 0;
return new LoadTestResult(requests, successes, avgResponseTime);
}
private LoadTestResult aggregateResults(List<CompletableFuture<LoadTestResult>> futures) {
long totalRequests = 0;
long totalSuccesses = 0;
long totalResponseTime = 0;
int resultCount = 0;
for (CompletableFuture<LoadTestResult> future : futures) {
try {
LoadTestResult result = future.get();
totalRequests += result.getTotalRequests();
totalSuccesses += result.getSuccessfulRequests();
totalResponseTime += result.getAverageResponseTime();
resultCount++;
} catch (Exception e) {
log.warn("Failed to get load test result", e);
}
}
double availability = totalRequests > 0 ? (double) totalSuccesses / totalRequests * 100 : 0;
double avgResponseTime = resultCount > 0 ? (double) totalResponseTime / resultCount : 0;
return new LoadTestResult(totalRequests, totalSuccesses, avgResponseTime);
}
@Data
@AllArgsConstructor
private static class LoadTestResult {
private long totalRequests;
private long successfulRequests;
private double averageResponseTime;
public double getAvailabilityPercentage() {
return totalRequests > 0 ? (double) successfulRequests / totalRequests * 100 : 0;
}
@Override
public String toString() {
return String.format("Availability: %.2f%%, Avg Response Time: %.2fms, Requests: %d/%d",
getAvailabilityPercentage(), averageResponseTime, successfulRequests, totalRequests);
}
}
}
5. Failover and Recovery Testing
@SpringBootTest
@TestInstance(TestInstance.Lifecycle.PER_CLASS)
class FailoverRecoveryTest {
@Autowired
private HealthCheckService healthCheckService;
@Autowired
private AvailabilityMetrics availabilityMetrics;
@Test
void testSystemRecoveryAfterFailure() {
// Initial health check
HealthStatus initialHealth = healthCheckService.checkHealth();
assertThat(initialHealth.getStatus()).isEqualTo("UP");
// Simulate a failure (e.g., by stopping a dependent service)
simulateFailure();
// Verify system detects failure
await().atMost(30, TimeUnit.SECONDS).until(() -> {
HealthStatus health = healthCheckService.checkHealth();
return "DOWN".equals(health.getStatus());
});
// Record downtime start
long downtimeStart = System.currentTimeMillis();
// Simulate recovery
simulateRecovery();
// Verify system recovers
await().atMost(60, TimeUnit.SECONDS).until(() -> {
HealthStatus health = healthCheckService.checkHealth();
return "UP".equals(health.getStatus());
});
long downtimeEnd = System.currentTimeMillis();
long downtimeDuration = downtimeEnd - downtimeStart;
// Record downtime
availabilityMetrics.recordDowntime(downtimeDuration);
// Verify recovery metrics
double uptimePercentage = availabilityMetrics.calculateUptimePercentage();
long mtbf = availabilityMetrics.getMTBF();
assertThat(uptimePercentage)
.as("Uptime percentage should be high despite failure")
.isGreaterThan(99.0);
assertThat(downtimeDuration)
.as("Recovery time should be within acceptable limits")
.isLessThan(120000); // 2 minutes max recovery time
log.info("Failover test completed: Downtime={}ms, Uptime={}%, MTBF={}ms",
downtimeDuration, uptimePercentage, mtbf);
}
private void simulateFailure() {
// In a real scenario, this might:
// - Stop a database container
// - Block network access to a service
// - Fill up disk space
log.info("Simulating system failure...");
// Example: Set a flag that health checks will detect
System.setProperty("SIMULATE_FAILURE", "true");
}
private void simulateRecovery() {
log.info("Simulating system recovery...");
System.setProperty("SIMULATE_FAILURE", "false");
}
}
6. Continuous Availability Monitoring
@Component
@Slf4j
public class ContinuousAvailabilityMonitor {
private final HealthCheckService healthCheckService;
private final AvailabilityMetrics availabilityMetrics;
private final ScheduledExecutorService scheduler;
private final List<AvailabilityEvent> events;
private boolean monitoring = false;
public ContinuousAvailabilityMonitor(HealthCheckService healthCheckService,
AvailabilityMetrics availabilityMetrics) {
this.healthCheckService = healthCheckService;
this.availabilityMetrics = availabilityMetrics;
this.scheduler = Executors.newSingleThreadScheduledExecutor();
this.events = Collections.synchronizedList(new ArrayList<>());
}
@EventListener(ContextRefreshedEvent.class)
public void startMonitoring() {
if (monitoring) {
return;
}
monitoring = true;
scheduler.scheduleAtFixedRate(this::performHealthCheck, 0, 30, TimeUnit.SECONDS);
log.info("Continuous availability monitoring started");
}
@PreDestroy
public void stopMonitoring() {
monitoring = false;
scheduler.shutdown();
log.info("Continuous availability monitoring stopped");
}
private void performHealthCheck() {
try {
long startTime = System.currentTimeMillis();
HealthStatus healthStatus = healthCheckService.checkHealth();
long responseTime = System.currentTimeMillis() - startTime;
boolean isHealthy = "UP".equals(healthStatus.getStatus());
// Record metric
availabilityMetrics.recordRequest(isHealthy, responseTime);
// Record event if status changed
recordStatusChange(healthStatus, isHealthy, responseTime);
if (!isHealthy) {
log.warn("System unhealthy detected: {}", healthStatus);
// Could trigger alerts here
}
} catch (Exception e) {
log.error("Health check execution failed", e);
availabilityMetrics.recordRequest(false, 0);
}
}
private void recordStatusChange(HealthStatus healthStatus, boolean isHealthy, long responseTime) {
if (events.isEmpty()) {
events.add(new AvailabilityEvent(isHealthy, healthStatus, responseTime));
return;
}
AvailabilityEvent lastEvent = events.get(events.size() - 1);
if (lastEvent.isHealthy() != isHealthy) {
events.add(new AvailabilityEvent(isHealthy, healthStatus, responseTime));
log.info("System status changed to: {}", isHealthy ? "HEALTHY" : "UNHEALTHY");
}
}
public AvailabilityReport generateReport() {
double uptimePercentage = availabilityMetrics.calculateUptimePercentage();
double errorRate = availabilityMetrics.getErrorRate();
long mtbf = availabilityMetrics.getMTBF();
long totalUptime = events.stream()
.filter(AvailabilityEvent::isHealthy)
.count();
long totalEvents = events.size();
return AvailabilityReport.builder()
.timestamp(Instant.now())
.uptimePercentage(uptimePercentage)
.errorRate(errorRate)
.mtbf(mtbf)
.totalEvents(totalEvents)
.healthyEvents(totalUptime)
.unhealthyEvents(totalEvents - totalUptime)
.events(new ArrayList<>(events))
.build();
}
@Data
@Builder
public static class AvailabilityReport {
private Instant timestamp;
private double uptimePercentage;
private double errorRate;
private long mtbf;
private long totalEvents;
private long healthyEvents;
private long unhealthyEvents;
private List<AvailabilityEvent> events;
}
@Data
@AllArgsConstructor
public static class AvailabilityEvent {
private boolean healthy;
private HealthStatus healthStatus;
private long responseTime;
private Instant timestamp;
public AvailabilityEvent(boolean healthy, HealthStatus healthStatus, long responseTime) {
this.healthy = healthy;
this.healthStatus = healthStatus;
this.responseTime = responseTime;
this.timestamp = Instant.now();
}
}
}
Configuration
# application.yml management: endpoints: web: exposure: include: health,metrics,availability endpoint: health: show-details: always show-components: always availability: enabled: true resilience4j: circuitbreaker: instances: externalServiceHealth: register-health-indicator: true failure-rate-threshold: 50 minimum-number-of-calls: 10 automatic-transition-from-open-to-half-open-enabled: true wait-duration-in-open-state: 10s permitted-number-of-calls-in-half-open-state: 3 sliding-window-type: COUNT_BASED sliding-window-size: 20 availability: monitoring: enabled: true interval: 30s sla: uptime: 99.9 max-response-time: 2000ms error-rate: 1.0
Best Practices
- Test Realistic Scenarios: Simulate actual failure conditions
- Monitor Key Metrics: Track uptime, error rates, and response times
- Automate Testing: Integrate availability tests in CI/CD pipeline
- Test Failover: Verify system recovers automatically from failures
- Set Realistic SLAs: Define achievable availability targets
- Monitor Dependencies: Track availability of external services
@Component
public class AvailabilitySLAValidator {
private final ContinuousAvailabilityMonitor monitor;
private final double targetUptime = 99.9;
private final long maxResponseTime = 2000;
private final double maxErrorRate = 1.0;
public boolean validateSLACompliance() {
ContinuousAvailabilityMonitor.AvailabilityReport report = monitor.generateReport();
boolean uptimeCompliant = report.getUptimePercentage() >= targetUptime;
boolean errorRateCompliant = report.getErrorRate() <= maxErrorRate;
// Calculate average response time from events
double avgResponseTime = report.getEvents().stream()
.mapToLong(ContinuousAvailabilityMonitor.AvailabilityEvent::getResponseTime)
.average()
.orElse(0);
boolean responseTimeCompliant = avgResponseTime <= maxResponseTime;
return uptimeCompliant && errorRateCompliant && responseTimeCompliant;
}
}
Conclusion
Availability testing in Java provides:
- Quantitative measurement of system reliability
- Proactive failure detection before users are impacted
- SLA compliance validation for business requirements
- Recovery process verification for failover scenarios
- Performance baseline under various load conditions
By implementing comprehensive availability testing, you can ensure your system meets reliability expectations, quickly recovers from failures, and maintains consistent performance for end-users. The combination of synthetic monitoring, load testing, and failover validation creates a robust framework for ensuring high availability.