Introduction
Maesh (now Traefik Mesh) is a lightweight service mesh that provides simple and effective traffic management, observability, and security for microservices. It sits alongside your applications without requiring code changes, making it ideal for Java microservices deployments.
Architecture Overview
1. Maesh Architecture
Java Microservices → Maesh Proxy → External Services ↓ ↓ Application Traffic Routing ↓ ↓ Business Logic Load Balancing ↓ ↓ REST/gRPC APIs Circuit Breaking
Setup and Dependencies
1. Kubernetes Deployment Dependencies
<!-- For Kubernetes client integration --> <dependency> <groupId>io.fabric8</groupId> <artifactId>kubernetes-client</artifactId> <version>6.7.2</version> </dependency> <!-- Spring Cloud Kubernetes --> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-kubernetes-fabric8</artifactId> <version>2.1.0</version> </dependency> <!-- Micrometer for metrics --> <dependency> <groupId>io.micrometer</groupId> <artifactId>micrometer-core</artifactId> <version>1.11.5</version> </dependency> <!-- Resilience4j for circuit breaker --> <dependency> <groupId>io.github.resilience4j</groupId> <artifactId>resilience4j-spring-boot2</artifactId> <version>2.1.0</version> </dependency>
2. Maesh Installation (Helm)
# maesh-values.yaml # Maesh/Traefik Mesh configuration # Lightweight mode (recommended for most use cases) lightweight: true # Enable access logging accessLog: enabled: true format: json # Traffic splitting configuration trafficSplitting: true # Mesh configuration mesh: # Enable automatic endpoint discovery endpointSlices: true # Ingress controller configuration ingressController: enabled: true # Service configuration service: type: LoadBalancer # RBAC configuration rbac: enabled: true # Metrics configuration metrics: prometheus: enabled: true serviceMonitor: enabled: true # Tracing configuration tracing: jaeger: enabled: true collectorUrl: "http://jaeger-collector:14268/api/traces" # Dashboard (optional) dashboard: enabled: true
# Install Maesh using Helm helm repo add traefik-mesh https://helm.traefik.io/mesh helm repo update helm install maesh traefik-mesh/traefik-mesh -f maesh-values.yaml --namespace maesh
Java Application Configuration
1. Spring Boot Configuration for Maesh
# application.yml
spring:
application:
name: user-service
cloud:
kubernetes:
discovery:
all-namespaces: true
reload:
enabled: true
server:
port: 8080
# Resilience4j Circuit Breaker Configuration
resilience4j:
circuitbreaker:
instances:
userService:
register-health-indicator: true
sliding-window-size: 10
failure-rate-threshold: 50
wait-duration-in-open-state: 10s
permitted-number-of-calls-in-half-open-state: 3
automatic-transition-from-open-to-half-open-enabled: true
retry:
instances:
userService:
max-attempts: 3
wait-duration: 2s
# Micrometer Metrics
management:
endpoints:
web:
exposure:
include: health,info,metrics,prometheus
endpoint:
health:
show-details: always
metrics:
export:
prometheus:
enabled: true
tags:
application: ${spring.application.name}
environment: ${ENVIRONMENT:development}
2. Kubernetes Deployment with Maesh
# k8s/user-service-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: user-service labels: app: user-service version: v1 annotations: # Maesh specific annotations maesh.traefik.io/traffic-type: http maesh.traefik.io/retry-attempts: "3" spec: replicas: 3 selector: matchLabels: app: user-service template: metadata: labels: app: user-service version: v1 annotations: # Prometheus scraping prometheus.io/scrape: "true" prometheus.io/port: "8080" prometheus.io/path: "/actuator/prometheus" spec: serviceAccountName: user-service-account containers: - name: user-service image: myregistry/user-service:1.0.0 ports: - containerPort: 8080 name: http env: - name: SPRING_PROFILES_ACTIVE value: "kubernetes" - name: JAVA_OPTS value: "-Xmx512m -Xms256m -javaagent:/app/opentelemetry-javaagent.jar" - name: OTEL_SERVICE_NAME value: "user-service" - name: OTEL_EXPORTER_OTLP_ENDPOINT value: "http://maesh-proxy:4317" resources: requests: memory: "512Mi" cpu: "200m" limits: memory: "1Gi" cpu: "500m" livenessProbe: httpGet: path: /actuator/health/liveness port: 8080 initialDelaySeconds: 60 periodSeconds: 10 readinessProbe: httpGet: path: /actuator/health/readiness port: 8080 initialDelaySeconds: 30 periodSeconds: 5 --- # Service with Maesh annotations apiVersion: v1 kind: Service metadata: name: user-service labels: app: user-service annotations: # Enable Maesh for this service maesh.traefik.io/traffic-type: http maesh.traefik.io/retry-attempts: "3" maesh.traefik.io/circuit-breaker-expression: "LatencyAtQuantileMS(50.0) > 100" spec: selector: app: user-service ports: - name: http port: 80 targetPort: 8080 protocol: TCP type: ClusterIP
Traffic Management
1. Service Communication with Maesh
@Service
@Slf4j
public class UserService {
private final RestTemplate restTemplate;
private final CircuitBreakerRegistry circuitBreakerRegistry;
private final RetryRegistry retryRegistry;
public UserService(@LoadBalanced RestTemplate restTemplate,
CircuitBreakerRegistry circuitBreakerRegistry,
RetryRegistry retryRegistry) {
this.restTemplate = restTemplate;
this.circuitBreakerRegistry = circuitBreakerRegistry;
this.retryRegistry = retryRegistry;
}
@CircuitBreaker(name = "orderService", fallbackMethod = "getUserOrdersFallback")
@Retry(name = "orderService", fallbackMethod = "getUserOrdersFallback")
@RateLimiter(name = "orderService")
public List<Order> getUserOrders(String userId) {
String url = "http://order-service/orders?userId=" + userId;
ResponseEntity<OrderResponse> response = restTemplate.exchange(
url, HttpMethod.GET, null, OrderResponse.class);
if (response.getStatusCode().is2xxSuccessful()) {
return response.getBody().getOrders();
}
throw new ServiceUnavailableException("Order service returned: " + response.getStatusCode());
}
public List<Order> getUserOrdersFallback(String userId, Exception e) {
log.warn("Fallback triggered for user orders: {}", userId, e);
return Collections.emptyList();
}
@TimeLimiter(name = "paymentService")
@Bulkhead(name = "paymentService")
public CompletableFuture<PaymentResponse> processPayment(PaymentRequest request) {
return CompletableFuture.supplyAsync(() -> {
String url = "http://payment-service/payments";
ResponseEntity<PaymentResponse> response = restTemplate.postForEntity(
url, request, PaymentResponse.class);
if (response.getStatusCode().is2xxSuccessful()) {
return response.getBody();
}
throw new PaymentServiceException("Payment service unavailable");
});
}
}
@Configuration
public class RestTemplateConfig {
@Bean
@LoadBalanced
public RestTemplate loadBalancedRestTemplate() {
return new RestTemplate();
}
@Bean
public WebClient.Builder loadBalancedWebClientBuilder() {
return WebClient.builder();
}
}
2. Circuit Breaker and Resilience Patterns
@Service
@Slf4j
public class ResilientServiceClient {
private final WebClient webClient;
private final CircuitBreaker circuitBreaker;
private final Retry retry;
private final Timer timer;
public ResilientServiceClient(WebClient.Builder webClientBuilder,
CircuitBreakerRegistry circuitBreakerRegistry,
RetryRegistry retryRegistry,
MeterRegistry meterRegistry) {
this.webClient = webClientBuilder.build();
this.circuitBreaker = circuitBreakerRegistry.circuitBreaker("externalService");
this.retry = retryRegistry.retry("externalService");
this.timer = meterRegistry.timer("external.service.calls");
}
public Mono<UserProfile> getUserProfileWithResilience(String userId) {
return CircuitBreakerOperator.of(circuitBreaker)
.transform(
RetryOperator.of(retry)
.transform(
webClient.get()
.uri("http://profile-service/profiles/{userId}", userId)
.retrieve()
.bodyToMono(UserProfile.class)
.name("external.service.calls")
.tag("service", "profile-service")
.tag("operation", "getProfile")
.metrics()
)
)
.doOnError(error -> log.error("Failed to get user profile: {}", userId, error))
.doOnSuccess(profile -> log.debug("Successfully retrieved profile: {}", userId));
}
public Flux<Notification> getUserNotifications(String userId) {
return webClient.get()
.uri("http://notification-service/notifications/{userId}", userId)
.retrieve()
.bodyToFlux(Notification.class)
.transform(ResilienceOperator.of(retry))
.transform(ResilienceOperator.of(circuitBreaker))
.timeout(Duration.ofSeconds(10))
.onErrorResume(throwable -> {
log.warn("Failed to fetch notifications, returning empty flux", throwable);
return Flux.empty();
});
}
}
Maesh Traffic Routing Configuration
1. Traffic Splitting Configuration
# k8s/traffic-splitting.yaml apiVersion: traefik.maesh.github.io/v1alpha1 kind: TrafficSplit metadata: name: user-service-split namespace: default spec: service: name: user-service port: 80 backends: - service: name: user-service-v1 port: 80 weight: 90 - service: name: user-service-v2 port: 80 weight: 10 --- # Canary deployment for user service apiVersion: apps/v1 kind: Deployment metadata: name: user-service-v2 labels: app: user-service version: v2 spec: replicas: 1 selector: matchLabels: app: user-service version: v2 template: metadata: labels: app: user-service version: v2 annotations: maesh.traefik.io/traffic-type: http spec: containers: - name: user-service image: myregistry/user-service:2.0.0-canary # ... rest of container spec
2. Retry and Timeout Configuration
# k8s/retry-policy.yaml apiVersion: traefik.maesh.github.io/v1alpha1 kind: Service metadata: name: order-service annotations: maesh.traefik.io/retry-attempts: "3" maesh.traefik.io/retry-initial-interval: "100ms" maesh.traefik.io/response-timeout: "30s" maesh.traefik.io/circuit-breaker-expression: "ResponseCodeRatio(500, 600, 0, 600) > 0.5" spec: selector: app: order-service ports: - port: 80 targetPort: 8080
Observability with Maesh
1. Distributed Tracing Integration
@Configuration
public class TracingConfig {
@Bean
public OpenTelemetry openTelemetry() {
return OpenTelemetrySdk.builder()
.setTracerProvider(
SdkTracerProvider.builder()
.addSpanProcessor(
BatchSpanProcessor.builder(
OtlpGrpcSpanExporter.builder()
.setEndpoint("http://maesh-proxy:4317")
.build()
).build()
)
.setResource(Resource.getDefault()
.merge(Resource.create(Attributes.of(
ResourceAttributes.SERVICE_NAME, "user-service"
)))
)
.build()
)
.setPropagators(ContextPropagators.create(
W3CTraceContextPropagator.getInstance()
))
.build();
}
@Bean
public Tracer tracer(OpenTelemetry openTelemetry) {
return openTelemetry.getTracer("user-service");
}
}
@Service
@Slf4j
public class TracedUserService {
private final Tracer tracer;
private final RestTemplate restTemplate;
public TracedUserService(Tracer tracer,
@LoadBalanced RestTemplate restTemplate) {
this.tracer = tracer;
this.restTemplate = restTemplate;
}
public User getUserWithTracing(String userId) {
Span span = tracer.spanBuilder("getUser")
.setAttribute("user.id", userId)
.startSpan();
try (Scope scope = span.makeCurrent()) {
// Add tracing headers to outgoing request
HttpHeaders headers = new HttpHeaders();
W3CTraceContextPropagator.getInstance().inject(
Context.current(),
headers,
(carrier, key, value) -> carrier.add(key, value));
HttpEntity<String> entity = new HttpEntity<>(headers);
ResponseEntity<User> response = restTemplate.exchange(
"http://user-service/users/{userId}",
HttpMethod.GET,
entity,
User.class,
userId
);
span.setStatus(StatusCode.OK);
return response.getBody();
} catch (Exception e) {
span.recordException(e);
span.setStatus(StatusCode.ERROR, e.getMessage());
throw e;
} finally {
span.end();
}
}
}
2. Metrics and Monitoring
@Service
@Slf4j
public class ServiceMetrics {
private final MeterRegistry meterRegistry;
private final Counter serviceCallsCounter;
private final Timer serviceResponseTimer;
private final Gauge activeRequestsGauge;
private final AtomicInteger activeRequests = new AtomicInteger(0);
public ServiceMetrics(MeterRegistry meterRegistry) {
this.meterRegistry = meterRegistry;
this.serviceCallsCounter = Counter.builder("service.calls.total")
.description("Total number of service calls")
.tags("application", "user-service")
.register(meterRegistry);
this.serviceResponseTimer = Timer.builder("service.response.time")
.description("Service response time")
.tags("application", "user-service")
.register(meterRegistry);
this.activeRequestsGauge = Gauge.builder("service.active.requests")
.description("Number of active requests")
.tags("application", "user-service")
.register(meterRegistry, activeRequests);
}
public <T> T measureServiceCall(String serviceName, String operation, Supplier<T> operation) {
activeRequests.incrementAndGet();
serviceCallsCounter.increment();
Timer.Sample sample = Timer.start(meterRegistry);
try {
T result = operation.get();
sample.stop(serviceResponseTimer
.tags("service", serviceName, "operation", operation, "status", "success"));
return result;
} catch (Exception e) {
sample.stop(serviceResponseTimer
.tags("service", serviceName, "operation", operation, "status", "error"));
throw e;
} finally {
activeRequests.decrementAndGet();
}
}
public void recordServiceError(String serviceName, String operation) {
meterRegistry.counter("service.errors.total",
"service", serviceName,
"operation", operation,
"application", "user-service"
).increment();
}
}
Security with Maesh
1. mTLS Configuration
# k8s/mtls-policy.yaml apiVersion: traefik.maesh.github.io/v1alpha1 kind: TLSOption metadata: name: strict-mtls namespace: default spec: minVersion: VersionTLS12 cipherSuites: - TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 - TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 clientAuth: clientAuthType: RequireAndVerifyClientCert --- apiVersion: traefik.maesh.github.io/v1alpha1 kind: Middleware metadata: name: mtls-middleware spec: tls: options: name: strict-mtls
2. Rate Limiting
@Service
@Slf4j
public class RateLimitService {
private final RateLimiterRegistry rateLimiterRegistry;
public RateLimitService(RateLimiterRegistry rateLimiterRegistry) {
this.rateLimiterRegistry = rateLimiterRegistry;
}
public boolean isAllowed(String serviceName, String clientId) {
RateLimiter rateLimiter = rateLimiterRegistry.rateLimiter(
serviceName + "-" + clientId,
() -> RateLimiterConfig.custom()
.limitForPeriod(100)
.limitRefreshPeriod(Duration.ofMinutes(1))
.timeoutDuration(Duration.ZERO)
.build()
);
return rateLimiter.acquirePermission();
}
@RateLimiter(name = "userCreation", fallbackMethod = "userCreationRateLimited")
public User createUserWithRateLimit(CreateUserRequest request) {
// Business logic for user creation
return userService.createUser(request);
}
public User userCreationRateLimited(CreateUserRequest request, Exception e) {
log.warn("User creation rate limited for: {}", request.getEmail());
throw new RateLimitExceededException("Too many user creation requests");
}
}
Advanced Maesh Configurations
1. A/B Testing Configuration
# k8s/ab-testing.yaml apiVersion: traefik.maesh.github.io/v1alpha1 kind: TrafficSplit metadata: name: payment-service-ab-test namespace: default spec: service: name: payment-service port: 80 matches: - headers: x-user-type: exact: premium backends: - service: name: payment-service-v2 port: 80 weight: 100 - headers: x-user-type: exact: standard backends: - service: name: payment-service-v1 port: 80 weight: 100 backends: - service: name: payment-service-v1 port: 80 weight: 50 - service: name: payment-service-v2 port: 80 weight: 50
2. Circuit Breaker Configuration
# k8s/circuit-breaker.yaml apiVersion: traefik.maesh.github.io/v1alpha1 kind: Service metadata: name: inventory-service annotations: maesh.traefik.io/circuit-breaker-expression: "ResponseCodeRatio(500, 600, 0, 600) > 0.5" maesh.traefik.io/circuit-breaker-check-period: "10s" maesh.traefik.io/circuit-breaker-fallback-status-code: "503" spec: selector: app: inventory-service ports: - port: 80 targetPort: 8080
Java Application Best Practices
1. Health Check Endpoints
@RestController
@RequestMapping("/actuator")
@Slf4j
public class HealthController {
private final ServiceMetrics serviceMetrics;
private final CircuitBreakerRegistry circuitBreakerRegistry;
public HealthController(ServiceMetrics serviceMetrics,
CircuitBreakerRegistry circuitBreakerRegistry) {
this.serviceMetrics = serviceMetrics;
this.circuitBreakerRegistry = circuitBreakerRegistry;
}
@GetMapping("/health/readiness")
public ResponseEntity<Health> readiness() {
Health.Builder status = Health.up();
// Check circuit breaker status
circuitBreakerRegistry.getAllCircuitBreakers().forEach((name, cb) -> {
status.withDetail("circuitbreaker." + name + ".state", cb.getState());
status.withDetail("circuitbreaker." + name + ".failureRate",
cb.getMetrics().getFailureRate());
});
// Check database connectivity
if (!isDatabaseHealthy()) {
status.down().withDetail("database", "unhealthy");
}
return ResponseEntity.ok(status.build());
}
@GetMapping("/health/liveness")
public ResponseEntity<Health> liveness() {
return ResponseEntity.ok(Health.up().build());
}
@GetMapping("/health/mesh")
public ResponseEntity<Map<String, Object>> meshHealth() {
Map<String, Object> health = new HashMap<>();
health.put("status", "UP");
health.put("mesh", "maesh");
health.put("timestamp", Instant.now().toString());
// Add service discovery information
health.put("discoveredServices", getDiscoveredServices());
health.put("trafficSplits", getActiveTrafficSplits());
return ResponseEntity.ok(health);
}
private boolean isDatabaseHealthy() {
// Implement database health check
return true;
}
private List<String> getDiscoveredServices() {
// Return list of services discovered via Maesh
return List.of("order-service", "payment-service", "inventory-service");
}
private List<String> getActiveTrafficSplits() {
// Return active traffic splits
return List.of("user-service-split", "payment-service-ab-test");
}
}
2. Configuration Management
@Configuration
@ConfigurationProperties(prefix = "maesh")
@Data
public class MaeshConfig {
private TrafficManagement traffic = new TrafficManagement();
private Observability observability = new Observability();
private Security security = new Security();
@Data
public static class TrafficManagement {
private int retryAttempts = 3;
private Duration timeout = Duration.ofSeconds(30);
private boolean circuitBreakerEnabled = true;
private String circuitBreakerExpression = "LatencyAtQuantileMS(50.0) > 100";
}
@Data
public static class Observability {
private boolean tracingEnabled = true;
private String tracingEndpoint = "http://maesh-proxy:4317";
private boolean metricsEnabled = true;
private String metricsPath = "/actuator/prometheus";
}
@Data
public static class Security {
private boolean mtlsEnabled = true;
private boolean rateLimitingEnabled = true;
private int rateLimitRequestsPerMinute = 100;
}
}
@Component
@Slf4j
public class MaeshConfigRefresher {
private final MaeshConfig maeshConfig;
private final KubernetesClient kubernetesClient;
@Scheduled(fixedRate = 30000) // Check every 30 seconds
public void refreshMaeshConfiguration() {
try {
// Check for updated Maesh configurations
ConfigMap configMap = kubernetesClient.configMaps()
.inNamespace("maesh")
.withName("maesh-config")
.get();
if (configMap != null) {
updateConfiguration(configMap.getData());
}
} catch (Exception e) {
log.warn("Failed to refresh Maesh configuration", e);
}
}
private void updateConfiguration(Map<String, String> configData) {
// Update application configuration based on Maesh config changes
log.info("Updating Maesh configuration from ConfigMap");
}
}
Testing with Maesh
1. Integration Tests
@SpringBootTest
@Testcontainers
@ActiveProfiles("test")
class MaeshIntegrationTest {
@Container
static GenericContainer maeshContainer = new GenericContainer("traefik/mesh:latest")
.withExposedPorts(8080, 8443)
.withEnv("MESH_MODE", "lightweight");
@DynamicPropertySource
static void configureProperties(DynamicPropertyRegistry registry) {
registry.add("maesh.proxy.url",
() -> "http://" + maeshContainer.getHost() + ":" + maeshContainer.getMappedPort(8080));
}
@Test
void testServiceCommunicationThroughMaesh() {
// Test service-to-service communication through Maesh proxy
String response = restTemplate.getForObject(
"http://user-service/users/123", String.class);
assertNotNull(response);
// Verify headers added by Maesh
}
}
@WebMvcTest(UserController.class)
class UserControllerTest {
@MockBean
private UserService userService;
@Autowired
private MockMvc mockMvc;
@Test
void shouldReturnUserWhenFound() throws Exception {
User user = new User("123", "[email protected]");
when(userService.getUser("123")).thenReturn(user);
mockMvc.perform(get("/users/123")
.header("X-Request-ID", "test-request")
.header("X-User-Type", "premium"))
.andExpect(status().isOk())
.andExpect(jsonPath("$.id").value("123"))
.andExpect(jsonPath("$.email").value("[email protected]"));
}
}
Monitoring and Dashboards
1. Grafana Dashboard Configuration
{
"dashboard": {
"title": "Maesh Service Mesh - Java Applications",
"panels": [
{
"title": "Service Response Times",
"type": "graph",
"targets": [
{
"expr": "histogram_quantile(0.95, rate(service_response_time_seconds_bucket[5m]))",
"legendFormat": "{{service}} - P95"
}
]
},
{
"title": "Circuit Breaker States",
"type": "stat",
"targets": [
{
"expr": "resilience4j_circuitbreaker_state",
"legendFormat": "{{name}} - {{state}}"
}
]
},
{
"title": "HTTP Traffic by Service",
"type": "table",
"targets": [
{
"expr": "sum by (service) (rate(http_requests_total[5m]))"
}
]
}
]
}
}
Conclusion
Maesh provides a simple yet powerful service mesh solution for Java applications with:
- Zero Code Changes - Works alongside existing applications
- Lightweight - Minimal resource overhead compared to other service meshes
- Traffic Management - Advanced routing, splitting, and circuit breaking
- Observability - Built-in metrics, tracing, and logging
- Security - mTLS and rate limiting out of the box
By integrating Maesh with Java applications using the patterns shown above, teams can achieve sophisticated service mesh capabilities without the complexity of heavier alternatives like Istio or Linkerd.