Document

Architecture Decisions

Notes on major design choices and tradeoffs.

ADR-001: Reactive Redis for Active Auction State

Context

Active auction state can experience sharp increases in write concurrency, especially near auction close. That path needs low-latency reads, writes, and atomic operations, while long-term persistence can be handled separately.

Decision

Redis, accessed through Spring Data Redis Reactive, is used as the primary store for active auction state.

Consequences

Pros: Fast reads and writes, atomic primitives, and a natural fit for low-latency reactive workflows.
Cons: Active state must fit in memory.
Mitigation: Redis AOF is enabled to improve crash recovery by persisting write commands that can be replayed on restart.

ADR-002: Java Records for Domain Models

Context

Most domain types in the system are data carriers rather than behavior-heavy objects.

Decision

Java records are used for domain models.

Consequences

Pros: Concise, immutable by default, readable.
Cons: They do not align with patterns built around mutable JPA entities.
Rationale: The system uses Redis rather than JPA, so that tradeoff is acceptable.

ADR-003: ReactiveRedisTemplate with JSON Serialization

Context

This system needs direct, non-blocking access to Redis data structures and operations beyond simple CRUD, including sets, sorted sets, and Lua-backed atomic updates. Older (deprecated) Jackson-based serializer choices are also not a good foundation for a new codebase.

Decision

Redis access is written with ReactiveRedisTemplate, and serialization uses RedisSerializer.json().

Consequences

Pros: Fully non-blocking access and explicit control over persistence behavior.
Cons: Standard repository operations must be implemented manually.

ADR-004: `Mono.defer()` for Stateful Reactive Tests

Context

Retry logic is difficult to test when mocks always return the same state on every invocation.

Decision

Mono.defer() is used in tests that need to simulate state changing between retries.

Consequences

Pros: Tests reflect the behavior of a real datastore more closely.
Cons: Adds complexity.

ADR-005: Separate Sentinel State from Core Auction State

Context

The bidding engine and Sentinel currently share Redis infrastructure. Analysis workloads should not be able to interfere with the core bid path.

Decision

Sentinel state is planned to move to separate backing infrastructure. Shared Redis should remain limited to messaging and coordination where cross-service communication is required.

Consequences

Pros: Reduces the risk that analysis activity will impact auction latency.
Cons: Adds complexity.

ADR-006: Feedback Loop for Fraud Decisions

Context

Fraud decisions are only useful in the long term if mistakes can be reviewed and incorporated into future model updates.

Decision

A feedback loop is planned in which fraud outcomes are written to an audit trail, corrected when necessary, and reused during retraining.

Consequences

Pros: Improves model quality over time.
Cons: Requires an additional pipeline/interface.

ADR-007: Event Sourcing for Reliable Rollbacks

Context

Overwriting the current bid state makes fraud reversals harder because the system loses the full sequence of prior accepted actions.

Decision

An eventual transition to event sourcing is planned. Bids, reversals, and enforcement actions would be recorded as append-only events rather than only as the current snapshot.

Consequences

Pros: Makes rollback logic more accurate and improves historical traceability.
Cons: Requires a substantial architectural shift.

ADR-008: Specialized ML Service for Fraud Scoring

Context

General-purpose LLM integration is poorly suited to fast numerical fraud evaluation. The workload is structured, repetitive, and latency-sensitive.

Decision

Fraud scoring runs through a dedicated Python FastAPI service backed by a specialized model such as CatBoost, rather than through a general LLM stack.

Consequences

Pros: Lower latency, lower cost, and a better fit for tabular inference.
Cons: Introduces a polyglot architecture with Java and Python services.

ADR-009: Decouple Model Development from Network Integration

Context

Building the ML model and the Java-to-Python integration at the same time increases delivery risk.

Decision

The service boundary is validated first with simple models and synthetic data. Once the transport path is stable, the production model can be introduced behind the same interface.

Consequences

Pros: Reduces integration risk and isolates failures earlier.
Cons: Requires temporary tooling that is not part of the final product.

ADR-010: Manual Dockerfile for the Python ML Service

Context

Python ML services often depend on low-level system packages that automated container tooling does not consistently infer.

Decision

The Python service is containerized with a hand-written Dockerfile instead of an automated buildpack flow.

Consequences

Pros: The runtime image is explicit and reproducible.
Cons: Container maintenance is manual.

ADR-011: Hybrid Proxy Bidding with Atomic Redis Lua Persistence

Context

Proxy bidding and bid increments are vulnerable to race conditions when implemented as ordinary application-side read-modify-write logic.

Decision

Proxy bidding uses a hybrid design: the Java service resolves forward-path bidding outcomes, while Redis Lua scripts provide atomic state persistence and rollback recomputation.

Consequences

Pros: State transitions are atomic and do not require distributed locks.
Cons: Business logic is split across Java and Lua.

ADR-012: Testcontainers for Redis Integration Testing

Context

Mocking cannot fully validate Redis Lua behavior or script execution semantics.

Decision

Integration tests run against ephemeral Redis containers using Testcontainers.

Consequences

Pros: Tests execute against a real Redis engine and real scripts.
Cons: Test runs are slower and require Docker.

ADR-013: Epoch Time for Cross-Boundary Comparisons

Context

Soft-close logic extends auction end times inside Lua scripts. Passing high-level Java time objects across the Java-Lua boundary created serialization and comparison issues.

Decision

Time values are passed as epoch seconds.

Consequences

Pros: Comparisons remain numeric and predictable across language boundaries.
Cons: Requires explicit time conversion, increasing the risk of careless mistakes (with time units, for example).

ADR-014: Asynchronous Fraud Detection via Redis Streams

Context

Scoring each bid synchronously would add network latency and model overhead directly to the bid path. That cost is least acceptable at the exact moment bidding traffic is highest.

Decision

Fraud analysis is decoupled from bid execution through Redis Streams. The Bidding Engine writes bid events to a stream, and Sentinel consumes them asynchronously for downstream scoring and enforcement.

Consequences

Pros: Bid execution remains fast, and Sentinel can scale independently of the core auction service.
Cons: Fraud enforcement becomes eventually consistent. A malicious bid may be accepted briefly before later review and reversal.

ADR-015: Micro-Batching for ML Inference

Context

Submitting one HTTP request to the ML service for every individual bid creates unnecessary transport overhead and does not use the scoring service efficiently.

Decision

Sentinel uses micro-batching with a bounded time and size window. Bid events are accumulated briefly, then submitted to the ML service as a batch.

Consequences

Pros: Reduces per-request overhead and improves inference efficiency.
Cons: Adds a small delay to the fraud pipeline while the batch window fills.

ADR-016: Smart Default Routing for Polyglot Development

Context

A Java service and a Python service do not always run in the same environment during development. One may run in Docker while the other runs directly in an IDE.

Decision

Service URLs are configured with sensible defaults and environment overrides, allowing the same codebase to run in Docker Compose or locally with minimal setup.

Consequences

Pros: Improves developer experience and reduces setup friction across environments.
Cons: Developers still need to understand which effective route is active at runtime.

Created by Justin Walker

Architecture Decisions

ADR-001: Reactive Redis for Active Auction State

Context

Decision

Consequences

ADR-002: Java Records for Domain Models

Context

Decision

Consequences

ADR-003: ReactiveRedisTemplate with JSON Serialization

Context

Decision

Consequences

ADR-004: Mono.defer() for Stateful Reactive Tests

Context

Decision

Consequences

ADR-005: Separate Sentinel State from Core Auction State

Context

Decision

Consequences

ADR-006: Feedback Loop for Fraud Decisions

Context

Decision

Consequences

ADR-007: Event Sourcing for Reliable Rollbacks

Context

Decision

Consequences

ADR-008: Specialized ML Service for Fraud Scoring

Context

Decision

Consequences

ADR-009: Decouple Model Development from Network Integration

Context

Decision

Consequences

ADR-010: Manual Dockerfile for the Python ML Service

Context

Decision

Consequences

ADR-011: Hybrid Proxy Bidding with Atomic Redis Lua Persistence

Context

Decision

Consequences

ADR-012: Testcontainers for Redis Integration Testing

Context

Decision

Consequences

ADR-013: Epoch Time for Cross-Boundary Comparisons

Context

Decision

Consequences

ADR-014: Asynchronous Fraud Detection via Redis Streams

Context

Decision

Consequences

ADR-015: Micro-Batching for ML Inference

Context

Decision

Consequences

ADR-016: Smart Default Routing for Polyglot Development

Context

Decision

Consequences

ADR-004: `Mono.defer()` for Stateful Reactive Tests