System Design
ObservabilitySplunkDynatraceMicrometerSpring Boot

Production Observability for Microservices

Logs, Metrics, Traces — Building the Three Pillars

How to instrument Spring Boot microservices for production observability: structured logging with Splunk, metrics with Micrometer and Dynatrace, and distributed tracing — learned from supporting fintech and gaming platforms in production.

11 min readMarch 5, 2024

The Three Pillars of Observability

Observability is the ability to understand your system's internal state from its external outputs. The three pillars: Logs (what happened), Metrics (how much/how fast), Traces (how did a request flow). In production microservices, no single pillar is sufficient — you need all three correlated by a trace ID.

Structured Logging: Making Logs Queryable

Unstructured logs are hard to search at scale. Use structured JSON logging with consistent fields: timestamp, trace_id, span_id, service, level, message, and domain-specific fields (account_id, event_type). In Spring Boot, configure Logback with a JSON encoder and MDC propagation for trace IDs. In Splunk, this enables powerful SPL queries: find all events for an account across 10 services in a single search.

LoggingFilter.java
@Component
public class TraceLoggingFilter extends OncePerRequestFilter {
    @Override
    protected void doFilterInternal(HttpServletRequest req,
            HttpServletResponse res, FilterChain chain)
            throws ServletException, IOException {
        String traceId = Optional
            .ofNullable(req.getHeader("X-Trace-Id"))
            .orElse(UUID.randomUUID().toString());

        MDC.put("trace_id", traceId);
        MDC.put("service", "account-catalog-service");
        res.setHeader("X-Trace-Id", traceId);

        try {
            chain.doFilter(req, res);
        } finally {
            MDC.clear();
        }
    }
}

Micrometer Metrics: Business-Level Visibility

System metrics (CPU, heap) are necessary but not sufficient. What you really need are business metrics: events processed per second, registration success rate, batch job throughput per partition, Kafka consumer lag. Micrometer with the Dynatrace registry makes this declarative in Spring Boot. We instrumented every Kafka consumer and Spring Batch step with throughput and latency counters, surfaced in Dynatrace dashboards.

Tags

#observability#splunk#dynatrace#micrometer#production