Unlocking System Visibility: Distributed Tracing Spans in Java

In modern microservices architectures, a single user request can traverse dozens of services. When something goes wrong, traditional logging becomes insufficient. Distributed Tracing provides end-to-end visibility by tracking requests as they flow through your entire system, with spans serving as the fundamental building blocks of this observability.


What is Distributed Tracing?

Distributed Tracing is a method of tracking requests as they propagate through multiple services. Each unit of work in the trace is called a span, which contains timing information, metadata, and context about the operation.

Core Concepts

  1. Trace: The entire journey of a request through your system
  2. Span: A single operation or unit of work within a trace
  3. Span Context: Propagation data that links spans together across service boundaries
  4. Tags/Attributes: Key-value pairs providing metadata about spans
  5. Events: Timed annotations within a span

OpenTelemetry: The Modern Standard

OpenTelemetry has emerged as the industry standard for distributed tracing, combining the best of OpenTracing and OpenCensus. It provides vendor-agnostic APIs, SDKs, and instrumentation.

OpenTelemetry Architecture

[Your App] → [OpenTelemetry API] → [OpenTelemetry SDK] → [Exporters] → [Backend]
|              |                    |                    |           |
Instrumentation  Create Spans       Process, batch,    Jaeger,     Visualization
with annotations and traces         sample spans       Zipkin,     and analysis
add attributes     Prometheus

Hands-On Tutorial: Implementing Distributed Tracing in Java

Let's build a complete microservices ecosystem with distributed tracing using OpenTelemetry, Spring Boot, and Jaeger.

Step 1: Project Setup and Dependencies

Maven Dependencies (pom.xml):

<properties>
<opentelemetry.version>1.34.1</opentelemetry.version>
<spring-boot.version>3.2.0</spring-boot.version>
</properties>
<dependencies>
<!-- Spring Boot -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
<version>${spring-boot.version}</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
<version>${spring-boot.version}</version>
</dependency>
<!-- OpenTelemetry -->
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-api</artifactId>
<version>${opentelemetry.version}</version>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-sdk</artifactId>
<version>${opentelemetry.version}</version>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-exporter-jaeger</artifactId>
<version>${opentelemetry.version}</version>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-semconv</artifactId>
<version>${opentelemetry.version}-alpha</version>
</dependency>
<!-- OpenTelemetry Instrumentation -->
<dependency>
<groupId>io.opentelemetry.instrumentation</groupId>
<artifactId>opentelemetry-spring-boot-starter</artifactId>
<version>2.1.0</version>
</dependency>
<!-- HTTP Client -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-webflux</artifactId>
<version>${spring-boot.version}</version>
</dependency>
</dependencies>

Step 2: OpenTelemetry Configuration

application.yml:

# OpenTelemetry Configuration
opentelemetry:
service:
name: order-service  # Change per service: user-service, payment-service, etc.
exporter:
jaeger:
endpoint: http://localhost:14250
otlp:
endpoint: http://localhost:4318
metric:
export:
interval: 60s
# Spring Boot Actuator
management:
endpoints:
web:
exposure:
include: health,info,metrics,trace
tracing:
sampling:
probability: 1.0  # Sample 100% of traces (adjust in production)
# Logging correlation with traces
logging:
pattern:
level: "%5p [${spring.application.name:},%X{traceId:-},%X{spanId:-}]"

Java Configuration:

@Configuration
public class OpenTelemetryConfig {
@Bean
public OpenTelemetry openTelemetry() {
return OpenTelemetrySdk.builder()
.setTracerProvider(
SdkTracerProvider.builder()
.addSpanProcessor(
BatchSpanProcessor.builder(
JaegerGrpcSpanExporter.builder()
.setEndpoint("http://localhost:14250")
.build()
).build()
)
.addSpanProcessor(
BatchSpanProcessor.builder(
LoggingSpanExporter.create()
).build()
)
.setSampler(Sampler.alwaysOn())
.build()
)
.setPropagators(
ContextPropagators.create(
TextMapPropagator.composite(
W3CTraceContextPropagator.getInstance(),
W3CBaggagePropagator.getInstance()
)
)
)
.build();
}
@Bean
public Tracer tracer(OpenTelemetry openTelemetry) {
return openTelemetry.getTracer("order-service");
}
}

Step 3: Core Span Management Service

@Service
public class TracingService {
private final Tracer tracer;
private static final Logger logger = LoggerFactory.getLogger(TracingService.class);
public TracingService(Tracer tracer) {
this.tracer = tracer;
}
/**
* Create a new span for a method or operation
*/
public <T> T trace(String spanName, Supplier<T> operation) {
Span span = tracer.spanBuilder(spanName).startSpan();
try (Scope scope = span.makeCurrent()) {
logger.info("Starting span: {}", spanName);
return operation.get();
} catch (Exception e) {
span.recordException(e);
span.setStatus(StatusCode.ERROR, e.getMessage());
throw e;
} finally {
span.end();
logger.info("Completed span: {}", spanName);
}
}
/**
* Create a span without return value
*/
public void trace(String spanName, Runnable operation) {
Span span = tracer.spanBuilder(spanName).startSpan();
try (Scope scope = span.makeCurrent()) {
logger.info("Starting span: {}", spanName);
operation.run();
} catch (Exception e) {
span.recordException(e);
span.setStatus(StatusCode.ERROR, e.getMessage());
throw e;
} finally {
span.end();
logger.info("Completed span: {}", spanName);
}
}
/**
* Add attributes to the current span
*/
public void addAttribute(String key, String value) {
Span currentSpan = Span.current();
if (currentSpan != null && !currentSpan.isRecording()) {
currentSpan.setAttribute(key, value);
}
}
/**
* Add event to the current span
*/
public void addEvent(String eventName) {
Span currentSpan = Span.current();
if (currentSpan != null && currentSpan.isRecording()) {
currentSpan.addEvent(eventName);
}
}
/**
* Add event with attributes
*/
public void addEvent(String eventName, Map<String, Object> attributes) {
Span currentSpan = Span.current();
if (currentSpan != null && currentSpan.isRecording()) {
AttributesBuilder attributesBuilder = Attributes.builder();
attributes.forEach((key, value) -> {
if (value instanceof String) {
attributesBuilder.put(AttributeKey.stringKey(key), (String) value);
} else if (value instanceof Long) {
attributesBuilder.put(AttributeKey.longKey(key), (Long) value);
} else if (value instanceof Boolean) {
attributesBuilder.put(AttributeKey.booleanKey(key), (Boolean) value);
} else if (value instanceof Double) {
attributesBuilder.put(AttributeKey.doubleKey(key), (Double) value);
}
});
currentSpan.addEvent(eventName, attributesBuilder.build());
}
}
/**
* Get current trace ID for logging correlation
*/
public String getCurrentTraceId() {
Span currentSpan = Span.current();
return currentSpan != null ? currentSpan.getSpanContext().getTraceId() : "no-trace";
}
/**
* Create a child span for complex operations
*/
public Span startChildSpan(String spanName) {
return tracer.spanBuilder(spanName).startSpan();
}
/**
* End a child span
*/
public void endSpan(Span span) {
if (span != null) {
span.end();
}
}
}

Step 4: Domain Models and Services

Order Service with Manual Instrumentation:

@Service
public class OrderService {
private final Tracer tracer;
private final TracingService tracingService;
private final UserServiceClient userServiceClient;
private final PaymentServiceClient paymentServiceClient;
private final InventoryServiceClient inventoryServiceClient;
private static final Logger logger = LoggerFactory.getLogger(OrderService.class);
public OrderService(Tracer tracer, TracingService tracingService,
UserServiceClient userServiceClient, 
PaymentServiceClient paymentServiceClient,
InventoryServiceClient inventoryServiceClient) {
this.tracer = tracer;
this.tracingService = tracingService;
this.userServiceClient = userServiceClient;
this.paymentServiceClient = paymentServiceClient;
this.inventoryServiceClient = inventoryServiceClient;
}
/**
* Create an order with detailed distributed tracing
*/
public Order createOrder(OrderRequest request) {
return tracingService.trace("order.create", () -> {
// Add request attributes to span
tracingService.addAttribute("order.userId", request.getUserId());
tracingService.addAttribute("order.totalAmount", request.getTotalAmount().toString());
tracingService.addAttribute("order.itemCount", String.valueOf(request.getItems().size()));
logger.info("Creating order for user: {}", request.getUserId());
tracingService.addEvent("order.validation.started");
// Validate user
User user = validateUser(request.getUserId());
tracingService.addEvent("order.validation.completed");
// Process payment
tracingService.addEvent("order.payment.started");
PaymentResult paymentResult = processPayment(request, user);
tracingService.addAttribute("order.paymentId", paymentResult.getPaymentId());
tracingService.addEvent("order.payment.completed");
// Reserve inventory
tracingService.addEvent("order.inventory.started");
InventoryResult inventoryResult = reserveInventory(request.getItems());
tracingService.addAttribute("order.inventoryReserved", "true");
tracingService.addEvent("order.inventory.completed");
// Create order
Order order = saveOrder(request, paymentResult, inventoryResult);
tracingService.addAttribute("order.id", order.getId());
tracingService.addEvent("order.created");
logger.info("Successfully created order: {}", order.getId());
return order;
});
}
/**
* Validate user with its own span
*/
private User validateUser(String userId) {
Span userValidationSpan = tracer.spanBuilder("user.validation")
.setAttribute("user.id", userId)
.startSpan();
try (Scope scope = userValidationSpan.makeCurrent()) {
tracingService.addEvent("user.service.call");
User user = userServiceClient.getUser(userId);
userValidationSpan.setAttribute("user.exists", "true");
userValidationSpan.setAttribute("user.email", user.getEmail());
return user;
} catch (Exception e) {
userValidationSpan.recordException(e);
userValidationSpan.setStatus(StatusCode.ERROR, "User validation failed");
throw new OrderException("User validation failed: " + e.getMessage(), e);
} finally {
userValidationSpan.end();
}
}
/**
* Process payment with complex span structure
*/
private PaymentResult processPayment(OrderRequest request, User user) {
return tracingService.trace("payment.process", () -> {
tracingService.addAttribute("payment.amount", request.getTotalAmount().toString());
tracingService.addAttribute("payment.currency", "USD");
tracingService.addAttribute("payment.userEmail", user.getEmail());
// Simulate payment processing steps
tracingService.addEvent("payment.authorization.started");
PaymentResult authResult = paymentServiceClient.authorizePayment(
request.getTotalAmount(), user.getEmail()
);
tracingService.addEvent("payment.authorization.completed");
tracingService.addEvent("payment.capture.started");
PaymentResult captureResult = paymentServiceClient.capturePayment(
authResult.getPaymentId()
);
tracingService.addEvent("payment.capture.completed");
return captureResult;
});
}
/**
* Reserve inventory with batch operations
*/
private InventoryResult reserveInventory(List<OrderItem> items) {
Span inventorySpan = tracer.spanBuilder("inventory.reserve")
.setAttribute("inventory.itemCount", items.size())
.startSpan();
try (Scope scope = inventorySpan.makeCurrent()) {
List<CompletableFuture<InventoryItemResult>> futures = items.stream()
.map(item -> reserveSingleItem(item, inventorySpan))
.collect(Collectors.toList());
// Wait for all reservations to complete
CompletableFuture.allOf(futures.toArray(new CompletableFuture[0])).join();
List<InventoryItemResult> results = futures.stream()
.map(CompletableFuture::join)
.collect(Collectors.toList());
boolean allReserved = results.stream().allMatch(InventoryItemResult::isReserved);
inventorySpan.setAttribute("inventory.allReserved", allReserved);
if (!allReserved) {
inventorySpan.setStatus(StatusCode.ERROR, "Some items could not be reserved");
throw new InventoryException("Failed to reserve all inventory items");
}
return new InventoryResult(results);
} finally {
inventorySpan.end();
}
}
/**
* Reserve single inventory item with child span
*/
private CompletableFuture<InventoryItemResult> reserveSingleItem(OrderItem item, Span parentSpan) {
return CompletableFuture.supplyAsync(() -> {
Span itemSpan = tracer.spanBuilder("inventory.reserve.item")
.setParent(Context.current().with(parentSpan))
.setAttribute("inventory.itemId", item.getItemId())
.setAttribute("inventory.quantity", item.getQuantity())
.startSpan();
try (Scope scope = itemSpan.makeCurrent()) {
tracingService.addEvent("inventory.check.availability");
InventoryItemResult result = inventoryServiceClient.reserveItem(
item.getItemId(), item.getQuantity()
);
itemSpan.setAttribute("inventory.reserved", result.isReserved());
if (!result.isReserved()) {
itemSpan.setAttribute("inventory.failureReason", result.getFailureReason());
itemSpan.setStatus(StatusCode.ERROR, result.getFailureReason());
}
return result;
} finally {
itemSpan.end();
}
});
}
private Order saveOrder(OrderRequest request, PaymentResult paymentResult, 
InventoryResult inventoryResult) {
// Simulate order persistence
return new Order(
"order-" + System.currentTimeMillis(),
request.getUserId(),
request.getTotalAmount(),
OrderStatus.CONFIRMED
);
}
}
// Domain Classes
class Order {
private String id;
private String userId;
private BigDecimal totalAmount;
private OrderStatus status;
// constructors, getters, setters
}
class OrderRequest {
private String userId;
private List<OrderItem> items;
private BigDecimal totalAmount;
// constructors, getters, setters
}
enum OrderStatus {
PENDING, CONFIRMED, SHIPPED, DELIVERED, CANCELLED
}

Step 5: HTTP Client with Context Propagation

@Service
public class UserServiceClient {
private final WebClient webClient;
private final Tracer tracer;
private final TracingService tracingService;
public UserServiceClient(WebClient.Builder webClientBuilder, 
Tracer tracer,
TracingService tracingService) {
this.webClient = webClientBuilder.baseUrl("http://localhost:8081").build();
this.tracer = tracer;
this.tracingService = tracingService;
}
/**
* HTTP call with automatic context propagation
*/
public User getUser(String userId) {
Span span = tracer.spanBuilder("http.user.service.get")
.setSpanKind(SpanKind.CLIENT)
.setAttribute("http.method", "GET")
.setAttribute("http.url", "/api/users/" + userId)
.setAttribute("user.id", userId)
.startSpan();
try (Scope scope = span.makeCurrent()) {
tracingService.addEvent("http.call.start");
User user = webClient.get()
.uri("/api/users/{userId}", userId)
.headers(headers -> {
// Inject trace context into HTTP headers
OpenTelemetry.getGlobalPropagators().getTextMapPropagator()
.inject(Context.current(), headers, 
(carrier, key, value) -> carrier.set(key, value));
})
.retrieve()
.bodyToMono(User.class)
.block();
span.setAttribute("http.status_code", 200);
span.setAttribute("user.found", user != null);
tracingService.addEvent("http.call.success");
return user;
} catch (WebClientResponseException e) {
span.setStatus(StatusCode.ERROR, "HTTP error: " + e.getStatusCode());
span.setAttribute("http.status_code", e.getStatusCode().value());
span.recordException(e);
tracingService.addEvent("http.call.error", Map.of(
"error.code", String.valueOf(e.getStatusCode().value()),
"error.message", e.getStatusText()
));
throw new ServiceException("User service error: " + e.getMessage(), e);
} catch (Exception e) {
span.setStatus(StatusCode.ERROR, "HTTP call failed");
span.recordException(e);
tracingService.addEvent("http.call.failure");
throw new ServiceException("Failed to call user service", e);
} finally {
span.end();
}
}
}

Step 6: REST Controller with Automatic Instrumentation

@RestController
@RequestMapping("/api/orders")
public class OrderController {
private final OrderService orderService;
private final Tracer tracer;
private static final Logger logger = LoggerFactory.getLogger(OrderController.class);
public OrderController(OrderService orderService, Tracer tracer) {
this.orderService = orderService;
this.tracer = tracer;
}
@PostMapping
public ResponseEntity<Order> createOrder(@RequestBody OrderRequest request) {
// Current span is automatically created by Spring Boot instrumentation
Span currentSpan = Span.current();
currentSpan.setAttribute("http.route", "/api/orders");
currentSpan.setAttribute("http.method", "POST");
currentSpan.updateName("POST /api/orders");
logger.info("Received order creation request for user: {}", request.getUserId());
try {
Order order = orderService.createOrder(request);
currentSpan.setAttribute("order.created", "true");
currentSpan.setAttribute("order.id", order.getId());
return ResponseEntity.status(201).body(order);
} catch (Exception e) {
currentSpan.recordException(e);
currentSpan.setStatus(StatusCode.ERROR, "Order creation failed");
throw e;
}
}
@GetMapping("/{orderId}")
public ResponseEntity<Order> getOrder(@PathVariable String orderId) {
Span span = tracer.spanBuilder("order.get")
.setAttribute("order.id", orderId)
.setAttribute("http.route", "/api/orders/{orderId}")
.startSpan();
try (Scope scope = span.makeCurrent()) {
// Simulate database lookup
span.addEvent("database.query.start");
Thread.sleep(50); // Simulate DB latency
span.addEvent("database.query.complete");
Order order = new Order(orderId, "user123", BigDecimal.valueOf(99.99), OrderStatus.CONFIRMED);
span.setAttribute("order.found", order != null);
span.setAttribute("order.status", order.getStatus().name());
return ResponseEntity.ok(order);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
span.recordException(e);
span.setStatus(StatusCode.ERROR, "Operation interrupted");
return ResponseEntity.status(500).build();
} catch (Exception e) {
span.recordException(e);
span.setStatus(StatusCode.ERROR, "Failed to get order");
return ResponseEntity.status(500).build();
} finally {
span.end();
}
}
}

Step 7: Custom Span Processor for Advanced Processing

@Component
public class CustomSpanProcessor implements SpanProcessor {
private static final Logger logger = LoggerFactory.getLogger(CustomSpanProcessor.class);
private final MeterRegistry meterRegistry;
private final Counter spanCounter;
private final Timer spanDurationTimer;
public CustomSpanProcessor(MeterRegistry meterRegistry) {
this.meterRegistry = meterRegistry;
this.spanCounter = Counter.builder("spans.total")
.description("Total number of spans processed")
.register(meterRegistry);
this.spanDurationTimer = Timer.builder("spans.duration")
.description("Duration of spans")
.register(meterRegistry);
}
@Override
public void onStart(Context context, ReadWriteSpan span) {
String spanName = span.getName();
String spanKind = span.getKind().name();
spanCounter.increment();
logger.debug("Span started: {} [{}]", spanName, spanKind);
// Add custom attributes to all spans
span.setAttribute("deployment.environment", getEnvironment());
span.setAttribute("service.version", getServiceVersion());
// Business-specific logic
if (spanName.contains("order")) {
span.setAttribute("business.domain", "order-management");
} else if (spanName.contains("payment")) {
span.setAttribute("business.domain", "payment-processing");
} else if (spanName.contains("inventory")) {
span.setAttribute("business.domain", "inventory-management");
}
}
@Override
public boolean isStartRequired() {
return true;
}
@Override
public void onEnd(ReadableSpan span) {
String spanName = span.getName();
long durationMs = (span.getLatencyNanos() / 1_000_000);
spanDurationTimer.record(durationMs, TimeUnit.MILLISECONDS);
// Log slow spans
if (durationMs > 1000) { // 1 second threshold
logger.warn("Slow span detected: {} took {}ms", spanName, durationMs);
}
// Log errors
if (span.getStatus().getStatusCode() == StatusCode.ERROR) {
logger.error("Span completed with error: {} - {}", 
spanName, span.getStatus().getDescription());
}
logger.debug("Span completed: {} [{}ms]", spanName, durationMs);
}
@Override
public boolean isEndRequired() {
return true;
}
private String getEnvironment() {
return System.getenv().getOrDefault("ENVIRONMENT", "development");
}
private String getServiceVersion() {
return System.getenv().getOrDefault("SERVICE_VERSION", "1.0.0");
}
}

Running the Application

  1. Start Jaeger for visualization:
   docker run -d --name jaeger \
-e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
-e COLLECTOR_OTLP_ENABLED=true \
-p 6831:6831/udp \
-p 6832:6832/udp \
-p 5778:5778 \
-p 16686:16686 \
-p 4317:4317 \
-p 4318:4318 \
-p 14250:14250 \
-p 14268:14268 \
-p 14269:14269 \
-p 9411:9411 \
jaegertracing/all-in-one:1.48
  1. Start your Spring Boot applications
  2. Generate traffic and view traces:
   curl -X POST http://localhost:8080/api/orders \
-H "Content-Type: application/json" \
-d '{"userId": "user123", "totalAmount": 99.99, "items": []}'
  1. View traces in Jaeger UI: http://localhost:16686

Best Practices

1. Span Naming Conventions

// Good span names
spanBuilder("order.create")
spanBuilder("payment.process.authorization")
spanBuilder("http.user.service.get")
// Avoid
spanBuilder("doWork") // Too vague
spanBuilder("method1") // Implementation detail

2. Attribute Guidelines

// Use semantic conventions
span.setAttribute(SemanticAttributes.HTTP_METHOD, "GET");
span.setAttribute(SemanticAttributes.HTTP_ROUTE, "/api/users/{id}");
span.setAttribute(SemanticAttributes.HTTP_STATUS_CODE, 200);
// Add business context
span.setAttribute("order.id", orderId);
span.setAttribute("payment.amount", amount.toString());
span.setAttribute("user.tier", "premium");

3. Error Handling

try {
// Business logic
} catch (Exception e) {
span.recordException(e);
span.setStatus(StatusCode.ERROR, "Operation failed");
// Add custom error attributes
span.setAttribute("error.type", e.getClass().getSimpleName());
span.setAttribute("error.retryable", true);
throw e;
}

Conclusion

Distributed Tracing with spans provides unparalleled visibility into your microservices architecture. By implementing comprehensive tracing with OpenTelemetry, you can:

  • Debug complex issues across service boundaries
  • Identify performance bottlenecks and optimize latency
  • Understand service dependencies and data flow
  • Monitor business transactions end-to-end
  • Correlate logs and metrics with trace data

Start with automatic instrumentation for quick wins, then add manual instrumentation for business-critical operations. Remember: the goal is not just to collect data, but to derive actionable insights that improve your system's reliability and performance.

Leave a Reply

Your email address will not be published. Required fields are marked *


Macro Nepal Helper