Distributed Tracing with Zipkin and Cassandra Storage in Java

In modern microservices architectures, a single user request can traverse dozens of services. Debugging performance issues or failures in such a distributed system is incredibly challenging. This is where distributed tracing comes in.

Zipkin is a popular open-source distributed tracing system that helps gather timing data needed to troubleshoot latency problems in microservice architectures. While Zipkin often uses in-memory storage for quick starts, for production, you need a persistent and scalable backend like Apache Cassandra.

This article will guide you through setting up a Java application that uses Zipkin for tracing and stores trace data directly in Cassandra.

Table of Contents

Architecture Overview

The typical integration involves three main components:

Your Java Application: The microservice that generates traces.
Zipkin Collector/Server: The service that receives traces from your application and stores them. (We will use the Brave library to report directly to Zipkin).
Cassandra Database: The persistent storage where Zipkin writes the trace data.

[Your Java App] --(HTTP/gRPC)--> [Zipkin Server] --(Writes)--> [Cassandra Cluster]

Step 1: Project Setup (Dependencies)

We'll use Spring Boot for simplicity, but the tracing concepts apply to any Java framework. The key library is Brave, which instruments your application and sends traces to Zipkin.

Maven pom.xml Dependencies:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0">
<modelVersion>4.0.0</modelVersion>
<groupId>com.example</groupId>
<artifactId>zipkin-cassandra-demo</artifactId>
<version>1.0.0</version>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>3.2.0</version>
<relativePath/>
</parent>
<properties>
<java.version>17</java.version>
</properties>
<dependencies>
<!-- Spring Boot Web for REST controller -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- Brave instrumentation for Spring Boot -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-tracing-bridge-brave</artifactId>
</dependency>
<!-- Brave reporter to send traces to Zipkin -->
<dependency>
<groupId>io.zipkin.reporter2</groupId>
<artifactId>zipkin-reporter-brave</artifactId>
</dependency>
<!-- Optional: For health checks and actutor endpoints -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
</dependencies>
</project>

Step 2: Configuration

Configure your application to send traces to a Zipkin server that is set up to use Cassandra.

application.yml

# Application configuration
spring:
application:
name: my-zipkin-service
# Zipkin configuration
management:
tracing:
sampling:
probability: 1.0 # Sample 100% of traces. Set to 0.1 for 10% in production.
zipkin:
tracing:
endpoint: http://localhost:9411/api/v2/spans # The endpoint of your Zipkin server
# Brave-specific configuration (optional but recommended)
logging:
pattern:
level: "%5p [${spring.application.name:},%X{traceId:-},%X{spanId:-}]"

Step 3: Java Application Code

Here's a simple Spring Boot application with a traced endpoint.

ZipkinCassandraDemoApplication.java

package com.example.zipkindemo;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
@SpringBootApplication
public class ZipkinCassandraDemoApplication {
public static void main(String[] args) {
SpringApplication.run(ZipkinCassandraDemoApplication.class, args);
}
}

DemoController.java

package com.example.zipkindemo.controller;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.client.RestTemplate;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
@RestController
public class DemoController {
private static final Logger logger = LoggerFactory.getLogger(DemoController.class);
private final RestTemplate restTemplate;
public DemoController(RestTemplate restTemplate) {
this.restTemplate = restTemplate;
}
@GetMapping("/start")
public String startWork() {
logger.info("Starting the traced request...");
// Simulate some internal work
doSomeInternalWork();
// Simulate a call to another service (this will create a new span)
String response = restTemplate.getForObject("http://localhost:8080/other", String.class);
logger.info("Work completed with response: {}", response);
return "Trace completed! Check Zipkin UI at http://localhost:9411";
}
@GetMapping("/other")
public String otherService() {
logger.info("Inside the 'other' service endpoint.");
try {
Thread.sleep(100); // Simulate processing time
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
return "Response from other service";
}
private void doSomeInternalWork() {
// This method call will be automatically traced as part of the current span
logger.debug("Doing some internal work...");
try {
Thread.sleep(50);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
}

RestTemplateConfig.java

package com.example.zipkindemo.config;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.web.client.RestTemplate;
@Configuration
public class RestTemplateConfig {
@Bean
public RestTemplate restTemplate() {
return new RestTemplate();
}
}

Step 4: Setting Up Zipkin Server with Cassandra

You need to run a Zipkin server that is configured to use Cassandra.

Using Docker (Recommended):

The easiest way is to use the official Zipkin Docker image with Cassandra storage.

# First, ensure you have a Cassandra instance running
docker run --name some-cassandra -p 9042:9042 -d cassandra:4.1
# Then run Zipkin, configured to use the Cassandra instance
docker run -d -p 9411:9411 \
-e STORAGE_TYPE=cassandra3 \
-e CASSANDRA_KEYSPACE=zipkin \
-e CASSANDRA_CONTACT_POINTS=host.docker.internal:9042 \
--name zipkin \
openzipkin/zipkin:latest

Explanation of Environment Variables:

STORAGE_TYPE=cassandra3: Tells Zipkin to use the Cassandra 3.x+ storage component.
CASSANDRA_KEYSPACE=zipkin: The Cassandra keyspace to use (default is zipkin).
CASSANDRA_CONTACT_POINTS=host.docker.internal:9042: The address of your Cassandra node. host.docker.internal works on Mac/Windows to connect to the host machine. On Linux, you might need to use your machine's IP.

Step 5: Running and Visualizing Traces

Start Cassandra (if not already running via Docker).
Start the Zipkin Server with the command above.
Run your Java Spring Boot application.
Generate a trace by visiting: http://localhost:8080/start
Open Zipkin UI at http://localhost:9411

In the Zipkin UI:

You can search for traces by service name (my-zipkin-service).
Click on a trace to see a detailed waterfall view of the request.
You will see spans for:
- The GET /start request
- The internal method calls (if configured with more detailed instrumentation)
- The GET /other request (as a separate span, showing the remote call)

Advanced Configuration

Custom Tracing Configuration

For more control, you can explicitly configure the tracing beans:

package com.example.zipkindemo.config;
import brave.Tracing;
import brave.propagation.ThreadLocalCurrentTraceContext;
import zipkin2.reporter.AsyncReporter;
import zipkin2.reporter.okhttp3.OkHttpSender;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
@Configuration
public class TracingConfig {
@Bean
public Tracing tracing() {
return Tracing.newBuilder()
.localServiceName("my-zipkin-service")
.currentTraceContext(ThreadLocalCurrentTraceContext.create())
.spanReporter(spanReporter())
.build();
}
@Bean
public AsyncReporter<spanzipkin2.Span> spanReporter() {
return AsyncReporter.create(OkHttpSender.create("http://localhost:9411/api/v2/spans"));
}
}

Key Benefits of Cassandra for Zipkin Storage

Scalability: Cassandra is designed to scale horizontally, making it suitable for high-throughput tracing data.
High Availability: Built-in replication and no single point of failure.
Performance: Write-optimized database, perfect for the high-ingest workload of tracing data.
TTL Support: Cassandra supports Time-To-Live (TTL) on data, which Zipkin uses to automatically purge old traces.

Conclusion

Integrating Zipkin with Cassandra provides a production-ready distributed tracing solution for your Java microservices. The Brave library seamlessly instruments your Spring Boot applications, capturing timing data and correlation IDs. By using Cassandra as the storage backend, you ensure that your tracing data is durable, scalable, and efficiently managed.

This setup gives you the visibility needed to diagnose complex performance issues across your distributed system, ultimately leading to more reliable and performant applications.