Maesh for Service Mesh in Java

Introduction

Maesh (now Traefik Mesh) is a lightweight service mesh that provides simple and effective traffic management, observability, and security for microservices. It sits alongside your applications without requiring code changes, making it ideal for Java microservices deployments.

Architecture Overview

1. Maesh Architecture

Java Microservices → Maesh Proxy → External Services
↓                  ↓
Application      Traffic Routing
↓                  ↓
Business Logic    Load Balancing
↓                  ↓
REST/gRPC APIs    Circuit Breaking

Setup and Dependencies

1. Kubernetes Deployment Dependencies

<!-- For Kubernetes client integration -->
<dependency>
<groupId>io.fabric8</groupId>
<artifactId>kubernetes-client</artifactId>
<version>6.7.2</version>
</dependency>
<!-- Spring Cloud Kubernetes -->
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-kubernetes-fabric8</artifactId>
<version>2.1.0</version>
</dependency>
<!-- Micrometer for metrics -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-core</artifactId>
<version>1.11.5</version>
</dependency>
<!-- Resilience4j for circuit breaker -->
<dependency>
<groupId>io.github.resilience4j</groupId>
<artifactId>resilience4j-spring-boot2</artifactId>
<version>2.1.0</version>
</dependency>

2. Maesh Installation (Helm)

# maesh-values.yaml
# Maesh/Traefik Mesh configuration
# Lightweight mode (recommended for most use cases)
lightweight: true
# Enable access logging
accessLog:
enabled: true
format: json
# Traffic splitting configuration
trafficSplitting: true
# Mesh configuration
mesh:
# Enable automatic endpoint discovery
endpointSlices: true
# Ingress controller configuration
ingressController:
enabled: true
# Service configuration
service:
type: LoadBalancer
# RBAC configuration
rbac:
enabled: true
# Metrics configuration
metrics:
prometheus:
enabled: true
serviceMonitor:
enabled: true
# Tracing configuration
tracing:
jaeger:
enabled: true
collectorUrl: "http://jaeger-collector:14268/api/traces"
# Dashboard (optional)
dashboard:
enabled: true
# Install Maesh using Helm
helm repo add traefik-mesh https://helm.traefik.io/mesh
helm repo update
helm install maesh traefik-mesh/traefik-mesh -f maesh-values.yaml --namespace maesh

Java Application Configuration

1. Spring Boot Configuration for Maesh

# application.yml
spring:
application:
name: user-service
cloud:
kubernetes:
discovery:
all-namespaces: true
reload:
enabled: true
server:
port: 8080
# Resilience4j Circuit Breaker Configuration
resilience4j:
circuitbreaker:
instances:
userService:
register-health-indicator: true
sliding-window-size: 10
failure-rate-threshold: 50
wait-duration-in-open-state: 10s
permitted-number-of-calls-in-half-open-state: 3
automatic-transition-from-open-to-half-open-enabled: true
retry:
instances:
userService:
max-attempts: 3
wait-duration: 2s
# Micrometer Metrics
management:
endpoints:
web:
exposure:
include: health,info,metrics,prometheus
endpoint:
health:
show-details: always
metrics:
export:
prometheus:
enabled: true
tags:
application: ${spring.application.name}
environment: ${ENVIRONMENT:development}

2. Kubernetes Deployment with Maesh

# k8s/user-service-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: user-service
labels:
app: user-service
version: v1
annotations:
# Maesh specific annotations
maesh.traefik.io/traffic-type: http
maesh.traefik.io/retry-attempts: "3"
spec:
replicas: 3
selector:
matchLabels:
app: user-service
template:
metadata:
labels:
app: user-service
version: v1
annotations:
# Prometheus scraping
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/actuator/prometheus"
spec:
serviceAccountName: user-service-account
containers:
- name: user-service
image: myregistry/user-service:1.0.0
ports:
- containerPort: 8080
name: http
env:
- name: SPRING_PROFILES_ACTIVE
value: "kubernetes"
- name: JAVA_OPTS
value: "-Xmx512m -Xms256m -javaagent:/app/opentelemetry-javaagent.jar"
- name: OTEL_SERVICE_NAME
value: "user-service"
- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: "http://maesh-proxy:4317"
resources:
requests:
memory: "512Mi"
cpu: "200m"
limits:
memory: "1Gi"
cpu: "500m"
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 60
periodSeconds: 10
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 30
periodSeconds: 5
---
# Service with Maesh annotations
apiVersion: v1
kind: Service
metadata:
name: user-service
labels:
app: user-service
annotations:
# Enable Maesh for this service
maesh.traefik.io/traffic-type: http
maesh.traefik.io/retry-attempts: "3"
maesh.traefik.io/circuit-breaker-expression: "LatencyAtQuantileMS(50.0) > 100"
spec:
selector:
app: user-service
ports:
- name: http
port: 80
targetPort: 8080
protocol: TCP
type: ClusterIP

Traffic Management

1. Service Communication with Maesh

@Service
@Slf4j
public class UserService {
private final RestTemplate restTemplate;
private final CircuitBreakerRegistry circuitBreakerRegistry;
private final RetryRegistry retryRegistry;
public UserService(@LoadBalanced RestTemplate restTemplate,
CircuitBreakerRegistry circuitBreakerRegistry,
RetryRegistry retryRegistry) {
this.restTemplate = restTemplate;
this.circuitBreakerRegistry = circuitBreakerRegistry;
this.retryRegistry = retryRegistry;
}
@CircuitBreaker(name = "orderService", fallbackMethod = "getUserOrdersFallback")
@Retry(name = "orderService", fallbackMethod = "getUserOrdersFallback")
@RateLimiter(name = "orderService")
public List<Order> getUserOrders(String userId) {
String url = "http://order-service/orders?userId=" + userId;
ResponseEntity<OrderResponse> response = restTemplate.exchange(
url, HttpMethod.GET, null, OrderResponse.class);
if (response.getStatusCode().is2xxSuccessful()) {
return response.getBody().getOrders();
}
throw new ServiceUnavailableException("Order service returned: " + response.getStatusCode());
}
public List<Order> getUserOrdersFallback(String userId, Exception e) {
log.warn("Fallback triggered for user orders: {}", userId, e);
return Collections.emptyList();
}
@TimeLimiter(name = "paymentService")
@Bulkhead(name = "paymentService")
public CompletableFuture<PaymentResponse> processPayment(PaymentRequest request) {
return CompletableFuture.supplyAsync(() -> {
String url = "http://payment-service/payments";
ResponseEntity<PaymentResponse> response = restTemplate.postForEntity(
url, request, PaymentResponse.class);
if (response.getStatusCode().is2xxSuccessful()) {
return response.getBody();
}
throw new PaymentServiceException("Payment service unavailable");
});
}
}
@Configuration
public class RestTemplateConfig {
@Bean
@LoadBalanced
public RestTemplate loadBalancedRestTemplate() {
return new RestTemplate();
}
@Bean
public WebClient.Builder loadBalancedWebClientBuilder() {
return WebClient.builder();
}
}

2. Circuit Breaker and Resilience Patterns

@Service
@Slf4j
public class ResilientServiceClient {
private final WebClient webClient;
private final CircuitBreaker circuitBreaker;
private final Retry retry;
private final Timer timer;
public ResilientServiceClient(WebClient.Builder webClientBuilder,
CircuitBreakerRegistry circuitBreakerRegistry,
RetryRegistry retryRegistry,
MeterRegistry meterRegistry) {
this.webClient = webClientBuilder.build();
this.circuitBreaker = circuitBreakerRegistry.circuitBreaker("externalService");
this.retry = retryRegistry.retry("externalService");
this.timer = meterRegistry.timer("external.service.calls");
}
public Mono<UserProfile> getUserProfileWithResilience(String userId) {
return CircuitBreakerOperator.of(circuitBreaker)
.transform(
RetryOperator.of(retry)
.transform(
webClient.get()
.uri("http://profile-service/profiles/{userId}", userId)
.retrieve()
.bodyToMono(UserProfile.class)
.name("external.service.calls")
.tag("service", "profile-service")
.tag("operation", "getProfile")
.metrics()
)
)
.doOnError(error -> log.error("Failed to get user profile: {}", userId, error))
.doOnSuccess(profile -> log.debug("Successfully retrieved profile: {}", userId));
}
public Flux<Notification> getUserNotifications(String userId) {
return webClient.get()
.uri("http://notification-service/notifications/{userId}", userId)
.retrieve()
.bodyToFlux(Notification.class)
.transform(ResilienceOperator.of(retry))
.transform(ResilienceOperator.of(circuitBreaker))
.timeout(Duration.ofSeconds(10))
.onErrorResume(throwable -> {
log.warn("Failed to fetch notifications, returning empty flux", throwable);
return Flux.empty();
});
}
}

Maesh Traffic Routing Configuration

1. Traffic Splitting Configuration

# k8s/traffic-splitting.yaml
apiVersion: traefik.maesh.github.io/v1alpha1
kind: TrafficSplit
metadata:
name: user-service-split
namespace: default
spec:
service:
name: user-service
port: 80
backends:
- service:
name: user-service-v1
port: 80
weight: 90
- service:
name: user-service-v2
port: 80
weight: 10
---
# Canary deployment for user service
apiVersion: apps/v1
kind: Deployment
metadata:
name: user-service-v2
labels:
app: user-service
version: v2
spec:
replicas: 1
selector:
matchLabels:
app: user-service
version: v2
template:
metadata:
labels:
app: user-service
version: v2
annotations:
maesh.traefik.io/traffic-type: http
spec:
containers:
- name: user-service
image: myregistry/user-service:2.0.0-canary
# ... rest of container spec

2. Retry and Timeout Configuration

# k8s/retry-policy.yaml
apiVersion: traefik.maesh.github.io/v1alpha1
kind: Service
metadata:
name: order-service
annotations:
maesh.traefik.io/retry-attempts: "3"
maesh.traefik.io/retry-initial-interval: "100ms"
maesh.traefik.io/response-timeout: "30s"
maesh.traefik.io/circuit-breaker-expression: "ResponseCodeRatio(500, 600, 0, 600) > 0.5"
spec:
selector:
app: order-service
ports:
- port: 80
targetPort: 8080

Observability with Maesh

1. Distributed Tracing Integration

@Configuration
public class TracingConfig {
@Bean
public OpenTelemetry openTelemetry() {
return OpenTelemetrySdk.builder()
.setTracerProvider(
SdkTracerProvider.builder()
.addSpanProcessor(
BatchSpanProcessor.builder(
OtlpGrpcSpanExporter.builder()
.setEndpoint("http://maesh-proxy:4317")
.build()
).build()
)
.setResource(Resource.getDefault()
.merge(Resource.create(Attributes.of(
ResourceAttributes.SERVICE_NAME, "user-service"
)))
)
.build()
)
.setPropagators(ContextPropagators.create(
W3CTraceContextPropagator.getInstance()
))
.build();
}
@Bean
public Tracer tracer(OpenTelemetry openTelemetry) {
return openTelemetry.getTracer("user-service");
}
}
@Service
@Slf4j
public class TracedUserService {
private final Tracer tracer;
private final RestTemplate restTemplate;
public TracedUserService(Tracer tracer, 
@LoadBalanced RestTemplate restTemplate) {
this.tracer = tracer;
this.restTemplate = restTemplate;
}
public User getUserWithTracing(String userId) {
Span span = tracer.spanBuilder("getUser")
.setAttribute("user.id", userId)
.startSpan();
try (Scope scope = span.makeCurrent()) {
// Add tracing headers to outgoing request
HttpHeaders headers = new HttpHeaders();
W3CTraceContextPropagator.getInstance().inject(
Context.current(),
headers,
(carrier, key, value) -> carrier.add(key, value));
HttpEntity<String> entity = new HttpEntity<>(headers);
ResponseEntity<User> response = restTemplate.exchange(
"http://user-service/users/{userId}",
HttpMethod.GET,
entity,
User.class,
userId
);
span.setStatus(StatusCode.OK);
return response.getBody();
} catch (Exception e) {
span.recordException(e);
span.setStatus(StatusCode.ERROR, e.getMessage());
throw e;
} finally {
span.end();
}
}
}

2. Metrics and Monitoring

@Service
@Slf4j
public class ServiceMetrics {
private final MeterRegistry meterRegistry;
private final Counter serviceCallsCounter;
private final Timer serviceResponseTimer;
private final Gauge activeRequestsGauge;
private final AtomicInteger activeRequests = new AtomicInteger(0);
public ServiceMetrics(MeterRegistry meterRegistry) {
this.meterRegistry = meterRegistry;
this.serviceCallsCounter = Counter.builder("service.calls.total")
.description("Total number of service calls")
.tags("application", "user-service")
.register(meterRegistry);
this.serviceResponseTimer = Timer.builder("service.response.time")
.description("Service response time")
.tags("application", "user-service")
.register(meterRegistry);
this.activeRequestsGauge = Gauge.builder("service.active.requests")
.description("Number of active requests")
.tags("application", "user-service")
.register(meterRegistry, activeRequests);
}
public <T> T measureServiceCall(String serviceName, String operation, Supplier<T> operation) {
activeRequests.incrementAndGet();
serviceCallsCounter.increment();
Timer.Sample sample = Timer.start(meterRegistry);
try {
T result = operation.get();
sample.stop(serviceResponseTimer
.tags("service", serviceName, "operation", operation, "status", "success"));
return result;
} catch (Exception e) {
sample.stop(serviceResponseTimer
.tags("service", serviceName, "operation", operation, "status", "error"));
throw e;
} finally {
activeRequests.decrementAndGet();
}
}
public void recordServiceError(String serviceName, String operation) {
meterRegistry.counter("service.errors.total",
"service", serviceName,
"operation", operation,
"application", "user-service"
).increment();
}
}

Security with Maesh

1. mTLS Configuration

# k8s/mtls-policy.yaml
apiVersion: traefik.maesh.github.io/v1alpha1
kind: TLSOption
metadata:
name: strict-mtls
namespace: default
spec:
minVersion: VersionTLS12
cipherSuites:
- TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
- TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
clientAuth:
clientAuthType: RequireAndVerifyClientCert
---
apiVersion: traefik.maesh.github.io/v1alpha1
kind: Middleware
metadata:
name: mtls-middleware
spec:
tls:
options:
name: strict-mtls

2. Rate Limiting

@Service
@Slf4j
public class RateLimitService {
private final RateLimiterRegistry rateLimiterRegistry;
public RateLimitService(RateLimiterRegistry rateLimiterRegistry) {
this.rateLimiterRegistry = rateLimiterRegistry;
}
public boolean isAllowed(String serviceName, String clientId) {
RateLimiter rateLimiter = rateLimiterRegistry.rateLimiter(
serviceName + "-" + clientId,
() -> RateLimiterConfig.custom()
.limitForPeriod(100)
.limitRefreshPeriod(Duration.ofMinutes(1))
.timeoutDuration(Duration.ZERO)
.build()
);
return rateLimiter.acquirePermission();
}
@RateLimiter(name = "userCreation", fallbackMethod = "userCreationRateLimited")
public User createUserWithRateLimit(CreateUserRequest request) {
// Business logic for user creation
return userService.createUser(request);
}
public User userCreationRateLimited(CreateUserRequest request, Exception e) {
log.warn("User creation rate limited for: {}", request.getEmail());
throw new RateLimitExceededException("Too many user creation requests");
}
}

Advanced Maesh Configurations

1. A/B Testing Configuration

# k8s/ab-testing.yaml
apiVersion: traefik.maesh.github.io/v1alpha1
kind: TrafficSplit
metadata:
name: payment-service-ab-test
namespace: default
spec:
service:
name: payment-service
port: 80
matches:
- headers:
x-user-type:
exact: premium
backends:
- service:
name: payment-service-v2
port: 80
weight: 100
- headers:
x-user-type:
exact: standard
backends:
- service:
name: payment-service-v1
port: 80
weight: 100
backends:
- service:
name: payment-service-v1
port: 80
weight: 50
- service:
name: payment-service-v2
port: 80
weight: 50

2. Circuit Breaker Configuration

# k8s/circuit-breaker.yaml
apiVersion: traefik.maesh.github.io/v1alpha1
kind: Service
metadata:
name: inventory-service
annotations:
maesh.traefik.io/circuit-breaker-expression: "ResponseCodeRatio(500, 600, 0, 600) > 0.5"
maesh.traefik.io/circuit-breaker-check-period: "10s"
maesh.traefik.io/circuit-breaker-fallback-status-code: "503"
spec:
selector:
app: inventory-service
ports:
- port: 80
targetPort: 8080

Java Application Best Practices

1. Health Check Endpoints

@RestController
@RequestMapping("/actuator")
@Slf4j
public class HealthController {
private final ServiceMetrics serviceMetrics;
private final CircuitBreakerRegistry circuitBreakerRegistry;
public HealthController(ServiceMetrics serviceMetrics,
CircuitBreakerRegistry circuitBreakerRegistry) {
this.serviceMetrics = serviceMetrics;
this.circuitBreakerRegistry = circuitBreakerRegistry;
}
@GetMapping("/health/readiness")
public ResponseEntity<Health> readiness() {
Health.Builder status = Health.up();
// Check circuit breaker status
circuitBreakerRegistry.getAllCircuitBreakers().forEach((name, cb) -> {
status.withDetail("circuitbreaker." + name + ".state", cb.getState());
status.withDetail("circuitbreaker." + name + ".failureRate", 
cb.getMetrics().getFailureRate());
});
// Check database connectivity
if (!isDatabaseHealthy()) {
status.down().withDetail("database", "unhealthy");
}
return ResponseEntity.ok(status.build());
}
@GetMapping("/health/liveness")
public ResponseEntity<Health> liveness() {
return ResponseEntity.ok(Health.up().build());
}
@GetMapping("/health/mesh")
public ResponseEntity<Map<String, Object>> meshHealth() {
Map<String, Object> health = new HashMap<>();
health.put("status", "UP");
health.put("mesh", "maesh");
health.put("timestamp", Instant.now().toString());
// Add service discovery information
health.put("discoveredServices", getDiscoveredServices());
health.put("trafficSplits", getActiveTrafficSplits());
return ResponseEntity.ok(health);
}
private boolean isDatabaseHealthy() {
// Implement database health check
return true;
}
private List<String> getDiscoveredServices() {
// Return list of services discovered via Maesh
return List.of("order-service", "payment-service", "inventory-service");
}
private List<String> getActiveTrafficSplits() {
// Return active traffic splits
return List.of("user-service-split", "payment-service-ab-test");
}
}

2. Configuration Management

@Configuration
@ConfigurationProperties(prefix = "maesh")
@Data
public class MaeshConfig {
private TrafficManagement traffic = new TrafficManagement();
private Observability observability = new Observability();
private Security security = new Security();
@Data
public static class TrafficManagement {
private int retryAttempts = 3;
private Duration timeout = Duration.ofSeconds(30);
private boolean circuitBreakerEnabled = true;
private String circuitBreakerExpression = "LatencyAtQuantileMS(50.0) > 100";
}
@Data
public static class Observability {
private boolean tracingEnabled = true;
private String tracingEndpoint = "http://maesh-proxy:4317";
private boolean metricsEnabled = true;
private String metricsPath = "/actuator/prometheus";
}
@Data
public static class Security {
private boolean mtlsEnabled = true;
private boolean rateLimitingEnabled = true;
private int rateLimitRequestsPerMinute = 100;
}
}
@Component
@Slf4j
public class MaeshConfigRefresher {
private final MaeshConfig maeshConfig;
private final KubernetesClient kubernetesClient;
@Scheduled(fixedRate = 30000) // Check every 30 seconds
public void refreshMaeshConfiguration() {
try {
// Check for updated Maesh configurations
ConfigMap configMap = kubernetesClient.configMaps()
.inNamespace("maesh")
.withName("maesh-config")
.get();
if (configMap != null) {
updateConfiguration(configMap.getData());
}
} catch (Exception e) {
log.warn("Failed to refresh Maesh configuration", e);
}
}
private void updateConfiguration(Map<String, String> configData) {
// Update application configuration based on Maesh config changes
log.info("Updating Maesh configuration from ConfigMap");
}
}

Testing with Maesh

1. Integration Tests

@SpringBootTest
@Testcontainers
@ActiveProfiles("test")
class MaeshIntegrationTest {
@Container
static GenericContainer maeshContainer = new GenericContainer("traefik/mesh:latest")
.withExposedPorts(8080, 8443)
.withEnv("MESH_MODE", "lightweight");
@DynamicPropertySource
static void configureProperties(DynamicPropertyRegistry registry) {
registry.add("maesh.proxy.url", 
() -> "http://" + maeshContainer.getHost() + ":" + maeshContainer.getMappedPort(8080));
}
@Test
void testServiceCommunicationThroughMaesh() {
// Test service-to-service communication through Maesh proxy
String response = restTemplate.getForObject(
"http://user-service/users/123", String.class);
assertNotNull(response);
// Verify headers added by Maesh
}
}
@WebMvcTest(UserController.class)
class UserControllerTest {
@MockBean
private UserService userService;
@Autowired
private MockMvc mockMvc;
@Test
void shouldReturnUserWhenFound() throws Exception {
User user = new User("123", "[email protected]");
when(userService.getUser("123")).thenReturn(user);
mockMvc.perform(get("/users/123")
.header("X-Request-ID", "test-request")
.header("X-User-Type", "premium"))
.andExpect(status().isOk())
.andExpect(jsonPath("$.id").value("123"))
.andExpect(jsonPath("$.email").value("[email protected]"));
}
}

Monitoring and Dashboards

1. Grafana Dashboard Configuration

{
"dashboard": {
"title": "Maesh Service Mesh - Java Applications",
"panels": [
{
"title": "Service Response Times",
"type": "graph",
"targets": [
{
"expr": "histogram_quantile(0.95, rate(service_response_time_seconds_bucket[5m]))",
"legendFormat": "{{service}} - P95"
}
]
},
{
"title": "Circuit Breaker States",
"type": "stat",
"targets": [
{
"expr": "resilience4j_circuitbreaker_state",
"legendFormat": "{{name}} - {{state}}"
}
]
},
{
"title": "HTTP Traffic by Service",
"type": "table",
"targets": [
{
"expr": "sum by (service) (rate(http_requests_total[5m]))"
}
]
}
]
}
}

Conclusion

Maesh provides a simple yet powerful service mesh solution for Java applications with:

  1. Zero Code Changes - Works alongside existing applications
  2. Lightweight - Minimal resource overhead compared to other service meshes
  3. Traffic Management - Advanced routing, splitting, and circuit breaking
  4. Observability - Built-in metrics, tracing, and logging
  5. Security - mTLS and rate limiting out of the box

By integrating Maesh with Java applications using the patterns shown above, teams can achieve sophisticated service mesh capabilities without the complexity of heavier alternatives like Istio or Linkerd.

Leave a Reply

Your email address will not be published. Required fields are marked *


Macro Nepal Helper