Event Service Agent Kata

ADR-0009: Observability Baseline

Status: Accepted (Phase 1 complete, Phase 2 planned)

Problem

Provide minimal yet effective logs, metrics, and correlation tracking for debugging and SLOs across modules.

Context

Options

Option 1: Application-Level Correlation Only

Structured logs with tenantId, serviceCallId, correlationId fields. Application manages correlation via MessageMetadata Context. No automatic span linking.

Option 2: Infrastructure-Level Tracing Only

OpenTelemetry traces with automatic span propagation via traceparent headers. Relies on infra tooling (broker/HTTP clients) for correlation. No explicit app-level correlation fields.

Option 3: Dual-Level Observability (Hybrid)

Application-level correlation (MessageMetadata with correlationId/causationId) for domain tracing + infrastructure-level OpenTelemetry for automatic span linking. Orthogonal concerns that complement each other.

Decision

Adopt Option 3 (Dual-Level) with phased rollout:

Phase 1: Application-Level Correlation (PL-24) ✅ COMPLETE

Scope: Explicit correlation tracking via MessageMetadata Context

Implementation:

Benefits:

Limitations:

Phase 2: Infrastructure-Level Tracing (Future)

Scope: OpenTelemetry automatic span propagation

Implementation (planned):

Benefits:

Integration Strategy:

Consequences

Positive

Phase 1 (Current):

Phase 2 (Future):

Neutral

Negative

⚠️ Phase 1:

⚠️ Phase 2: