Event Service Agent Kata

Domain (Problem Space)

Purpose

Contract the discovery into problem-space essentials: constraints, core concepts, and high-level workflows, independent of implementation or solution structures.

Why this document exists:

Shared Understanding: Establishes ubiquitous language between stakeholders and developers
Problem-First Thinking: Defines WHAT we’re solving before HOW we solve it
Scope Boundaries: Makes explicit what’s in/out of MVP to prevent scope creep
Design Validation: Implementation decisions must trace back to problem-space requirements

Constraints (MVP)

Why these constraints:

Single attempt per Service Call; no retries/cancellations
- Why: Simplifies state machine (no retry counters, backoff strategies)
- Trade-off: Less resilient, but avoids complexity of idempotent retry logic
- Evolution path: Can add retries in Phase 2 without breaking current design
HTTP is the only supported protocol
- Why: 80% of use cases, standardized request/response semantics
- Trade-off: Can’t call gRPC/GraphQL services in MVP
- Evolution path: Protocol abstraction already in place (RequestSpec extensible)
Due time semantics: execute at/after dueAt; if dueAt <= now, eligible immediately
- Why: Prevents blocked queue if timer fires late (system resilience)
- Trade-off: “Fire ASAP” vs “Fire at exact time” — chose pragmatism
- Business impact: Acceptable for async service calls (not real-time events)
Multi-tenancy: every action is tenant-scoped; all queries filter by tenantId
- Why: Data isolation enforced at application layer (defense in depth)
- Trade-off: Every operation pays tenant-filtering cost, but prevents data leakage
- Security: Type system prevents cross-tenant access (TenantId brand)
Minimal persistence of bodies/headers (size-limited snippets; redaction allowed)
- Why: PII/GDPR compliance, storage efficiency
- Trade-off: Can’t fully debug failed requests, but keeps storage bounded
Event-driven messaging with at-least-once delivery tolerance
- Why: Resilience over strict ordering (crash-tolerant, broker-agnostic)
- Trade-off: Consumers must be idempotent, but system survives broker restarts
DB is the source of truth; Orchestration is the only writer of domain state
- Why: Single Writer Principle prevents race conditions and conflicting writes
- Trade-off: All reads must query Orchestration’s tables (no projections in MVP)
- Correctness: Guarantees consistent state transitions (see ADR-0004)

Core Concepts (Ubiquitous Language)

Why establish ubiquitous language:

Prevents miscommunication between domain experts and developers
Ensures code names match business vocabulary (ServiceCall, not Task/Job)
Enables non-technical stakeholders to understand code structure

Domain Entities:

Tenant: Logical owner of Service Calls
- Why separate from User: Multi-organization support (one user, many tenants)
- Security boundary: All data partitioned by TenantId
Service Call: Intention to invoke an external service with a request spec and a due time
- Why “Service Call” not “HTTP Request”: Protocol-agnostic terminology
- Aggregate root: All state transitions center on ServiceCall lifecycle
Due Time (dueAt): Earliest instant at which execution may start
- Why “earliest”: System fires at-or-after, not at-exact-time (resilience over precision)
- Type: DateTime.Utc (timezone-agnostic, prevents DST bugs)
Execution: The single attempt to perform the Service Call
- Why single attempt: MVP constraint, simplifies failure handling
Outcome/Status: Scheduled Running Succeeded Failed
- Why these states: Minimal state machine, no intermediate states (simpler correctness proofs)
Tag: Label(s) associated at submission for later filtering
- Why tags: Enables batch queries without complex schema (extensibility point)

Glossary (Problem-Space)

Intention: the user’s desire to have a call executed.
Eligibility: the condition that time has reached or passed dueAt.
Attempt: one try to turn intention into an outcome.
Outcome: the terminal result of an attempt (Succeeded/Failed).

[!NOTE] Tenancy

Scope: All operations are tenant-scoped. Every command, event, and query includes tenantId. Cross-tenant access is not permitted.

Identity: The primary business identity is (tenantId, serviceCallId). Idempotency keys are per-tenant.

Isolation: Read models and stores are logically partitioned by tenantId Physical isolation (DB per tenant) is not required for MVP but must remain feasible.

Storage/indexing: All primary indexes include tenantId first; common secondary indexes: (tenantId, status), (tenantId, dueAt), (tenantId, tags). API queries the domain tables directly; no projections are maintained for MVP.

Messaging: Topics/queues are either shared with tenantId in message envelopes or logically partitioned per tenant. Consumers must filter by tenantId.

API/auth: Authentication/authorization is out-of-scope for MVP; however, all API routes embed :tenantId and handlers enforce scoping and idempotency within the tenant.

Timers: Registrations carry (tenantId, serviceCallId, dueAt). Timer emissions include tenantId to preserve scoping on wakeup.

Observability: Logs/metrics/traces annotate tenantId for correlation;

High-Level Workflows

Why document workflows at problem-space level:

Define business process independent of technical implementation
Validate that solution architecture maps back to business needs
Identify missing use cases or edge conditions early

Core Workflows:

Submission
- A tenant submits a Service Call with name, requestSpec, dueAt, and optional tags
- Orchestration validates, persists Scheduled state in the domain DB, and emits domain events after commit
- Why validation first: Fail fast before persisting invalid data (saves storage, prevents orphaned timers)
- Why events after commit: Outbox pattern prevents dual-write problem (see ADR-0008)
Scheduling
- Orchestration ensures a timer exists for dueAt (or starts immediately when dueAt <= now)
- Why separate Timer module: Decouples time-tracking from business logic (testability, maintainability)
- Why immediate execution: If already past due time, no reason to wait (UX optimization)
Becoming Due
- At/after dueAt, the Service Call becomes eligible to start; Timer emits a due signal; Orchestration decides to start
- Why Timer “signals” vs “executes”: Timer doesn’t make business decisions, only reports time passage
- Why Orchestration decides: Guards against duplicate firing, validates state transition (Scheduled → Running)
Execution
- Exactly one attempt is performed, producing either success (response metadata) or failure (error metadata)
- Orchestration is the single writer for Running and terminal states
- Why single writer: Prevents race conditions in state transitions (correctness guarantee)
- Why capture metadata: Enables debugging, latency analysis, and outcome auditing
Observation
- The tenant can list and filter calls by status, tags, and date; API serves queries directly from the domain DB with proper indexes
- Why direct DB queries: Simpler than CQRS projections for MVP (acceptable read latency)
- Why filter by status/tags: Common UX patterns (show pending, show failed, etc.)

Business State Diagram (Problem-Space)

stateDiagram-v2
  [*] --> Scheduled: Submission recorded
  Scheduled --> Running: Becomes due and starts
  Running --> Succeeded: Attempt outcome (success)
  Running --> Failed: Attempt outcome (failure)
  Succeeded --> [*]
  Failed --> [*]

Quality Attributes

Correctness: one attempt; legal transitions only; Orchestration is the only writer.
Observability: state transitions produce domain events published after DB commit (via outbox); correlation IDs propagate across messages.
Evolvability: protocol-agnostic value objects; IO behind ports; broker-first without ES/CQRS.

Out-of-Scope (MVP)

Retries/backoff, cancellation, editing requests, CRON-like schedules, authentication/egress policy.

Implementation Notes (Non-normative)

Idempotency: API computes serviceCallId deterministically from (tenantId, idempotencyKey); Orchestration enforces uniqueness on (tenantId, serviceCallId).
Due-time guard: Orchestration checks dueAt <= now when starting; Timer may deliver duplicates; transitions are conditional (e.g., only Scheduled → Running).
Privacy: bodies/headers are redacted/truncated before persistence and when included in events.

This site is open source. Improve this page.