Component
Sentinel Service
Responsible for rate-limit enforcement and asynchronous fraud screening.
Purpose
Sentinel serves two roles. On the synchronous path, it applies Redis-backed token bucket decisions before protected services absorb unnecessary traffic. Off the hot path, it consumes bid events from Redis Streams, batches them for model scoring, and records enforcement outcomes for the rest of the system.
Responsibilities
- Enforce per-client request budgets through atomic token bucket decisions
- Share rate-limit state across multiple service instances
- Consume bid events from Redis Streams
- Forward micro-batches to the Python
sentinel-mlservice - Write flagged actors to the
banned_usersRedis set
Rate-Limit Decision Flow
sequenceDiagram
participant C as Client
participant S as Sentinel Service
participant R as Redis
C->>S: POST /check
S->>R: Execute token bucket Lua script
R-->>S: Allowed / blocked + remaining tokens
S-->>C: 200 OK or 429 Too Many Requests
The synchronous path is intentionally short. Sentinel receives a request, delegates the bucket update to Redis, and returns the decision without additional coordination.
Fraud Analysis Flow
sequenceDiagram
participant X as Redis Stream
participant S as Sentinel Service
participant M as sentinel-ml
participant R as Redis
S->>X: Read bid events
S->>S: Form micro-batch
S->>M: Submit batch for scoring
M-->>S: Fraud predictions
S->>R: Add flagged actors to banned_users
This pipeline stays off the request path. Bids are scored asynchronously, and enforcement data is written back to Redis without adding latency to bid execution.
Token Bucket Model
Each caller is identified by X-User-ID. Redis stores the current token count and last refill time for that caller. On each request, Sentinel:
- Computes elapsed time since the last refill
- Adds newly earned tokens, capped at bucket capacity
- Deducts the request cost when enough tokens are available
- Returns a deny response when capacity is exhausted
All four steps run inside one Lua script so refill and consumption happen as a single atomic operation.
Runtime Stack
| Layer | Technology |
|---|---|
| Language | Java 25 |
| Framework | Spring Boot 4 / WebFlux |
| State Store | Redis |
| Coordination | Lua scripts |
| Stream Processing | Redis Streams |
| ML Integration | Python FastAPI (sentinel-ml) |
Quick Start
services:
sentinel-service:
build: ./sentinel-service
ports:
- "8081:8081"
environment:
- SPRING_DATA_REDIS_HOST=redis
depends_on:
- redis
redis:
image: redis:7.2-alpine
ports:
- "6379:6379"
docker compose up -d
Local Development
Start Redis
docker run -d -p 6379:6379 --name sentinel-redis redis:7.2-alpine
Run the service
./mvnw spring-boot:run
API
Endpoint
POST /check
Parameters
capacity- maximum number of tokens in the bucketrate- refill rate in tokens per secondcost- number of tokens consumed by the requestX-User-IDheader - unique identifier for the caller
Example
curl -X POST "http://localhost:8081/check?capacity=10&rate=1&cost=1" \
-H "X-User-ID: test_user"
Response
{
"allowed": true
}
API Documentation
When the service is running locally, the OpenAPI documentation is available at:
http://localhost:8081/swagger-ui.html
Created by Justin Walker