Component

Sentinel Service

Responsible for rate-limit enforcement and asynchronous fraud screening.

Purpose

Sentinel serves two roles. On the synchronous path, it applies Redis-backed token bucket decisions before protected services absorb unnecessary traffic. Off the hot path, it consumes bid events from Redis Streams, batches them for model scoring, and records enforcement outcomes for the rest of the system.

Responsibilities

Enforce per-client request budgets through atomic token bucket decisions
Share rate-limit state across multiple service instances
Consume bid events from Redis Streams
Forward micro-batches to the Python sentinel-ml service
Write flagged actors to the banned_users Redis set

Rate-Limit Decision Flow

sequenceDiagram
    participant C as Client
    participant S as Sentinel Service
    participant R as Redis

    C->>S: POST /check
    S->>R: Execute token bucket Lua script
    R-->>S: Allowed / blocked + remaining tokens
    S-->>C: 200 OK or 429 Too Many Requests

The synchronous path is intentionally short. Sentinel receives a request, delegates the bucket update to Redis, and returns the decision without additional coordination.

Fraud Analysis Flow

sequenceDiagram
    participant X as Redis Stream
    participant S as Sentinel Service
    participant M as sentinel-ml
    participant R as Redis

    S->>X: Read bid events
    S->>S: Form micro-batch
    S->>M: Submit batch for scoring
    M-->>S: Fraud predictions
    S->>R: Add flagged actors to banned_users

This pipeline stays off the request path. Bids are scored asynchronously, and enforcement data is written back to Redis without adding latency to bid execution.

Token Bucket Model

Each caller is identified by X-User-ID. Redis stores the current token count and last refill time for that caller. On each request, Sentinel:

Computes elapsed time since the last refill
Adds newly earned tokens, capped at bucket capacity
Deducts the request cost when enough tokens are available
Returns a deny response when capacity is exhausted

All four steps run inside one Lua script so refill and consumption happen as a single atomic operation.

Runtime Stack

Layer	Technology
Language	Java 25
Framework	Spring Boot 4 / WebFlux
State Store	Redis
Coordination	Lua scripts
Stream Processing	Redis Streams
ML Integration	Python FastAPI (`sentinel-ml`)

Quick Start

services:
  sentinel-service:
    build: ./sentinel-service
    ports:
      - "8081:8081"
    environment:
      - SPRING_DATA_REDIS_HOST=redis
    depends_on:
      - redis

  redis:
    image: redis:7.2-alpine
    ports:
      - "6379:6379"

docker compose up -d

Local Development

Start Redis

docker run -d -p 6379:6379 --name sentinel-redis redis:7.2-alpine

Run the service

./mvnw spring-boot:run

API

Endpoint

POST /check

Parameters

capacity - maximum number of tokens in the bucket
rate - refill rate in tokens per second
cost - number of tokens consumed by the request
X-User-ID header - unique identifier for the caller

Example

curl -X POST "http://localhost:8081/check?capacity=10&rate=1&cost=1" \
  -H "X-User-ID: test_user"

Response

{
  "allowed": true
}

API Documentation

When the service is running locally, the OpenAPI documentation is available at:

http://localhost:8081/swagger-ui.html

Created by Justin Walker