Learn System Design in 10 DaysDay 10: Design E-Commerce & Rate Limiter

Day 10: Design E-Commerce & Rate Limiter

What You'll Learn Today

  • E-commerce platform architecture
  • Product catalog and search with Elasticsearch
  • Shopping cart and order processing
  • Inventory management and preventing overselling
  • Distributed transactions with the Saga pattern
  • Rate limiter algorithms: Token Bucket, Leaky Bucket, Sliding Window
  • Distributed rate limiting
  • Final interview tips and advice

Part 1: E-Commerce Platform

Requirements

  • Browse and search products
  • Add to cart and checkout
  • Payment processing
  • Inventory management (no overselling)
  • Order tracking
  • Scale: millions of products, thousands of orders per second during peak sales

High-Level Architecture

flowchart TB
    C["Client (Web/Mobile)"]
    GW["API Gateway"]
    subgraph Services["Microservices"]
        PS["Product Service"]
        CS["Cart Service"]
        OS["Order Service"]
        PAY["Payment Service"]
        INV["Inventory Service"]
        NS["Notification Service"]
    end
    ES["Elasticsearch"]
    CA["Redis Cache"]
    MQ["Message Queue"]
    DB1[("Product DB")]
    DB2[("Order DB")]
    DB3[("Inventory DB")]

    C --> GW --> Services
    PS --> DB1
    PS --> ES
    PS --> CA
    CS --> CA
    OS --> DB2
    INV --> DB3
    OS --> MQ
    MQ --> PAY & INV & NS

    style GW fill:#f59e0b,color:#fff
    style Services fill:#3b82f6,color:#fff
    style ES fill:#8b5cf6,color:#fff
    style CA fill:#ef4444,color:#fff
    style MQ fill:#22c55e,color:#fff

Product Catalog Service

The product catalog needs to handle:

  • Millions of products with varying attributes
  • Full-text search with facets (category, price range, brand)
  • Fast reads, infrequent writes

Data model:

{
  "product_id": "p_123",
  "name": "Wireless Headphones",
  "description": "Noise-cancelling Bluetooth headphones...",
  "category": ["electronics", "audio"],
  "price": 79.99,
  "attributes": {
    "brand": "AudioTech",
    "color": "black",
    "battery_life": "30 hours"
  },
  "images": ["img1.jpg", "img2.jpg"],
  "rating": 4.5,
  "review_count": 1234
}

Search with Elasticsearch

flowchart LR
    PS["Product Service"]
    DB[("Primary DB (PostgreSQL)")]
    CDC["Change Data Capture"]
    ES["Elasticsearch"]
    C["Client Search"]

    PS --> DB
    DB --> CDC --> ES
    C --> ES

    style DB fill:#8b5cf6,color:#fff
    style CDC fill:#f59e0b,color:#fff
    style ES fill:#3b82f6,color:#fff
  • Primary DB (PostgreSQL): Source of truth for product data
  • CDC (Debezium): Capture changes and sync to Elasticsearch
  • Elasticsearch: Full-text search, faceted search, autocomplete, relevance ranking
  • Search features: Fuzzy matching, synonyms, weighted fields, aggregations

Shopping Cart Service

flowchart TB
    subgraph Guest["Guest User"]
        G1["Store cart in browser (localStorage)"]
        G2["On login β†’ merge with server cart"]
    end
    subgraph LoggedIn["Logged-in User"]
        L1["Store cart in Redis"]
        L2["TTL: 7-30 days"]
        L3["Persist to DB periodically"]
    end
    style Guest fill:#f59e0b,color:#fff
    style LoggedIn fill:#3b82f6,color:#fff

Why Redis for carts?

  • Fast reads and writes
  • TTL for automatic expiration of abandoned carts
  • Supports atomic operations (increment quantity)
  • Memory-efficient for small data structures

Order and Payment Processing

sequenceDiagram
    participant U as User
    participant OS as Order Service
    participant INV as Inventory Service
    participant PAY as Payment Service
    participant NS as Notification Service

    U->>OS: Place order
    OS->>INV: Reserve inventory
    INV->>OS: Reserved
    OS->>PAY: Process payment
    PAY->>OS: Payment confirmed
    OS->>INV: Confirm deduction
    OS->>NS: Send confirmation
    NS->>U: Order confirmed email

Inventory Management: Preventing Overselling

The biggest challenge in e-commerce is preventing overselling during high traffic (flash sales).

flowchart TB
    subgraph Problem["Race Condition"]
        direction TB
        P1["User A reads: stock = 1"]
        P2["User B reads: stock = 1"]
        P3["User A: stock = 1 - 1 = 0 βœ“"]
        P4["User B: stock = 1 - 1 = 0 βœ“"]
        P5["Result: -1 stock (OVERSOLD)"]
        P1 --> P3
        P2 --> P4
        P3 --> P5
        P4 --> P5
    end
    subgraph Solution["Solutions"]
        direction TB
        S1["Pessimistic Lock\nSELECT ... FOR UPDATE"]
        S2["Optimistic Lock\nWHERE version = N"]
        S3["Redis Atomic\nDECR with Lua script"]
    end
    style Problem fill:#ef4444,color:#fff
    style Solution fill:#22c55e,color:#fff
Strategy How Pros Cons
Pessimistic locking SELECT FOR UPDATE Strong consistency Low throughput, deadlocks
Optimistic locking Version check on update Better throughput Retry on conflict
Redis atomic ops Lua script: check + decrement Very fast Data loss risk if Redis fails
Queue-based Serialize requests per product No conflicts Higher latency

Recommended for flash sales: Use Redis for the fast path (decrement atomically), then asynchronously confirm with the database. This handles the burst while maintaining eventual consistency.

-- Redis Lua script for atomic inventory check + decrement
local stock = tonumber(redis.call('GET', KEYS[1]))
if stock > 0 then
    redis.call('DECR', KEYS[1])
    return 1  -- success
end
return 0  -- out of stock

Distributed Transactions: Saga Pattern

In microservices, you can't use a single database transaction across services. The Saga pattern breaks a transaction into a sequence of local transactions with compensating actions.

flowchart LR
    subgraph Forward["Happy Path"]
        direction LR
        T1["1. Create Order"]
        T2["2. Reserve Inventory"]
        T3["3. Process Payment"]
        T4["4. Confirm Order"]
        T1 --> T2 --> T3 --> T4
    end
    subgraph Compensate["Failure Compensation"]
        direction RL
        C3["3. Refund Payment"]
        C2["2. Release Inventory"]
        C1["1. Cancel Order"]
        C3 --> C2 --> C1
    end
    T3 -.->|"Payment fails"| C3

    style Forward fill:#22c55e,color:#fff
    style Compensate fill:#ef4444,color:#fff
Saga Type Coordination Pros Cons
Choreography Events between services Decoupled Hard to track
Orchestration Central orchestrator Clear flow Single point of coordination

Part 2: Rate Limiter

Why Rate Limiting?

  • Prevent abuse and DDoS attacks
  • Control costs
  • Ensure fair usage
  • Protect downstream services

Algorithm Comparison

flowchart TB
    subgraph TK["Token Bucket"]
        direction TB
        TK1["Bucket holds N tokens"]
        TK2["Tokens added at rate R"]
        TK3["Request takes 1 token"]
        TK4["Empty β†’ reject"]
        TK1 --> TK2 --> TK3 --> TK4
    end
    subgraph LK["Leaky Bucket"]
        direction TB
        LK1["Queue of fixed size"]
        LK2["Process at constant rate"]
        LK3["Full queue β†’ reject"]
        LK1 --> LK2 --> LK3
    end
    subgraph SW["Sliding Window"]
        direction TB
        SW1["Track timestamps"]
        SW2["Count in window"]
        SW3["Over limit β†’ reject"]
        SW1 --> SW2 --> SW3
    end
    style TK fill:#3b82f6,color:#fff
    style LK fill:#8b5cf6,color:#fff
    style SW fill:#22c55e,color:#fff

Token Bucket

Bucket capacity: 10 tokens
Refill rate: 2 tokens/second

Time 0s: 10 tokens (full)
Request: 10 tokens β†’ 0 tokens (all allowed)
Time 1s: 2 tokens (refilled)
Requests: 3 β†’ 2 allowed, 1 rejected

Pros: Allows bursts up to bucket size, simple, memory-efficient Cons: Tuning bucket size and refill rate requires experimentation

Leaky Bucket

Queue size: 5
Processing rate: 2 requests/second

Requests arrive in burst: 8 requests
β†’ 5 queued (processed at steady rate)
β†’ 3 rejected (queue full)

Pros: Smooth output rate, prevents bursts Cons: Recent requests may be delayed, no burst tolerance

Sliding Window Counter

Window: 1 minute, Limit: 100

Current time: 12:01:30
Previous window (12:00-12:01): 80 requests
Current window (12:01-12:02): 30 requests so far

Weighted count = 80 * 50% + 30 = 70 (under limit β†’ allow)
(50% because we're halfway through current window)

Pros: Smooth, no burst at window boundaries Cons: Approximate (but good enough for most use cases)

Comparison Table

Algorithm Memory Burst Allowed Accuracy Complexity
Token Bucket Low Yes Good Low
Leaky Bucket Low No Good Low
Fixed Window Low Edge bursts Moderate Low
Sliding Window Log High No Exact Medium
Sliding Window Counter Low Minimal Approximate Low

Distributed Rate Limiting

flowchart TB
    C["Client"]
    LB["Load Balancer"]
    S1["Server 1"]
    S2["Server 2"]
    S3["Server 3"]
    RD["Redis (Centralized Counter)"]

    C --> LB
    LB --> S1 & S2 & S3
    S1 & S2 & S3 -->|"Check/increment"| RD

    style LB fill:#f59e0b,color:#fff
    style RD fill:#ef4444,color:#fff

Without a shared store, each server tracks its own count β€” a user hitting different servers would bypass the limit. Use Redis for centralized, atomic counter management.

Redis implementation:

-- Sliding window counter in Redis
MULTI
INCR   user:123:minute:202501301200
EXPIRE user:123:minute:202501301200 120
EXEC

Rate Limiting at Different Layers

Layer What Example
Client Prevent unnecessary requests Debounce, disable button
CDN/Edge Block at network edge Cloudflare rate rules
API Gateway Per-user/API key limits Kong, AWS API Gateway
Application Business logic limits Order frequency
Database Connection pool limits Max connections

Interview Tips Recap

The System Design Interview Framework

flowchart LR
    R["1. Requirements\n(5 min)"]
    E["2. Estimation\n(5 min)"]
    H["3. High-Level Design\n(10 min)"]
    D["4. Deep Dive\n(15 min)"]
    W["5. Wrap Up\n(5 min)"]

    R --> E --> H --> D --> W

    style R fill:#3b82f6,color:#fff
    style E fill:#8b5cf6,color:#fff
    style H fill:#22c55e,color:#fff
    style D fill:#f59e0b,color:#fff
    style W fill:#ef4444,color:#fff

Top Mistakes to Avoid

Mistake Why It Hurts What to Do Instead
Jumping to solution Shows poor communication Ask requirements first
Over-engineering Shows lack of pragmatism Start simple, add complexity
Ignoring tradeoffs Shows shallow understanding Always discuss pros/cons
Monologue Misses interviewer's focus Check in frequently
No estimation Can't justify design choices Quick back-of-envelope math
Forgetting failure cases Shows inexperience Discuss what happens when things break

Final Advice

  1. Practice out loud β€” Talk through designs as if in an interview
  2. Draw diagrams β€” Visuals communicate architecture better than words
  3. Know your numbers β€” Latency, throughput, and storage estimates
  4. Trade-offs are everything β€” There's no perfect design, only appropriate ones
  5. Start with what you know β€” Build confidence with familiar components
  6. Stay calm β€” It's a conversation, not an exam

Summary

Concept Description
Product catalog PostgreSQL + Elasticsearch via CDC
Shopping cart Redis with TTL for fast access
Inventory Atomic operations to prevent overselling
Saga pattern Distributed transactions with compensation
Token Bucket Allows bursts, simple rate limiting
Leaky Bucket Smooth output, no bursts
Sliding Window Balanced accuracy and memory
Distributed rate limiting Centralized Redis counter
Interview framework Requirements β†’ Estimation β†’ Design β†’ Deep Dive β†’ Wrap Up

Key Takeaways

  1. E-commerce inventory management is a concurrency problem β€” use atomic operations
  2. The Saga pattern handles distributed transactions through compensation, not rollback
  3. Token Bucket is the most widely used rate limiting algorithm (used by AWS, Stripe, etc.)
  4. In interviews, the process matters more than the final answer β€” communicate clearly and discuss tradeoffs

Practice Problems

Problem 1: Basic

Design a simple rate limiter using the Token Bucket algorithm. Define the data model, the check-and-consume logic, and how you'd configure different limits for different API endpoints.

Problem 2: Intermediate

Design a flash sale system for an e-commerce platform where 10,000 items are available and 1 million users try to buy at the same time. How do you handle the traffic spike, prevent overselling, and provide a fair experience?

Challenge

Design a ticket booking system (like Ticketmaster). Consider: seat selection with temporary holds, payment timeout handling, waiting rooms for high-demand events, and preventing scalper bots. How do you ensure a fair and reliable booking experience?


References


Congratulations!

You've completed Learn System Design in 10 Days! Over the past 10 days, you've built a comprehensive foundation in system design:

  • Days 1-2: Fundamentals β€” scaling, load balancing, caching, databases
  • Days 3-4: Distributed systems β€” CAP theorem, consistency, messaging, storage
  • Days 5-6: Architecture patterns β€” design patterns, microservices, API design
  • Days 7-10: Real-world designs β€” URL shortener, social media, video streaming, e-commerce

System design is a skill that improves with practice. Keep designing, keep questioning tradeoffs, and keep learning. Every system you encounter in your daily work is an opportunity to apply these principles.

Good luck with your interviews β€” you've got this!