Day 7: Design a URL Shortener
What You'll Learn Today
- How to approach a full system design interview
- Requirements gathering and estimation
- URL encoding strategies: Base62, hashing, counter-based
- Database selection for key-value lookups
- Caching strategies for hot URLs
- Analytics pipeline design
- Redirect behavior: 301 vs 302
Step 1: Requirements Clarification
Always start a system design interview by asking clarifying questions.
Functional Requirements
- Shorten: Given a long URL, generate a short unique URL
- Redirect: When users access the short URL, redirect to the original
- Custom aliases: Users can optionally pick a custom short URL
- Analytics: Track click counts, geographic data, referrer
- Expiration: URLs can have an optional expiry time
Non-Functional Requirements
- High availability: The redirect service must be always up
- Low latency: Redirects should happen in milliseconds
- Short URLs should not be predictable (security)
Step 2: Capacity Estimation
flowchart LR
subgraph Write["Write Load"]
W1["1M new URLs / day"]
W2["~12 URLs / sec"]
end
subgraph Read["Read Load"]
R1["100M reads / day"]
R2["~1,200 reads / sec"]
end
subgraph Storage["Storage (5 years)"]
S1["1M Γ 365 Γ 5 = 1.8B URLs"]
S2["~500 bytes/record"]
S3["Total: ~900 GB"]
end
style Write fill:#3b82f6,color:#fff
style Read fill:#22c55e,color:#fff
style Storage fill:#8b5cf6,color:#fff
| Metric | Value |
|---|---|
| Write ratio | 1M URLs/day (~12/sec) |
| Read ratio | 100M reads/day (~1,200/sec) |
| Read:Write ratio | 100:1 |
| URL record size | ~500 bytes |
| 5-year storage | ~900 GB |
| Short URL length | 7 characters (Base62 β 62^7 β 3.5 trillion) |
Step 3: High-Level Architecture
flowchart TB
C["Client"]
LB["Load Balancer"]
subgraph App["Application Layer"]
WS["Write Service"]
RS["Redirect Service"]
end
CA["Cache (Redis)"]
DB[("Database")]
AN["Analytics Service"]
MQ["Message Queue"]
C --> LB --> App
RS -->|"1. Check cache"| CA
RS -->|"2. Cache miss"| DB
WS --> DB
RS -->|"Log click"| MQ
MQ --> AN
style LB fill:#f59e0b,color:#fff
style App fill:#3b82f6,color:#fff
style CA fill:#ef4444,color:#fff
style DB fill:#8b5cf6,color:#fff
style AN fill:#22c55e,color:#fff
Step 4: URL Encoding Strategies
Option 1: Base62 Encoding with Counter
Use a global counter (auto-increment ID) and convert to Base62.
Counter: 123456789
Base62: 8M0kX (characters: a-z, A-Z, 0-9)
62^7 = 3,521,614,606,208 (~3.5 trillion unique URLs)
Pros: No collisions, simple Cons: Predictable (sequential), single point of failure for counter
Option 2: Hash-Based (MD5/SHA-256)
Input: "https://example.com/very/long/url"
MD5: "e4d909c290d0fb1ca068ffaddf22cbd0"
Take first 7 chars: "e4d909c"
Pros: No coordination needed Cons: Collisions possible, need collision resolution
Option 3: Pre-Generated Key Service
flowchart LR
KG["Key Generator"]
KDB[("Key Database")]
WS["Write Service"]
KG -->|"Pre-generate keys"| KDB
WS -->|"Fetch unused key"| KDB
style KG fill:#22c55e,color:#fff
style KDB fill:#8b5cf6,color:#fff
style WS fill:#3b82f6,color:#fff
A separate service pre-generates unique keys and stores them. When a new short URL is needed, it fetches an unused key. This avoids collisions and is fast.
Comparison
| Approach | Collisions | Predictable | Coordination | Performance |
|---|---|---|---|---|
| Base62 + Counter | None | Yes | Counter needed | Fast |
| Hash-based | Possible | No | None | Medium |
| Pre-generated Keys | None | No | Key service | Fast |
Step 5: Database Choice
The data model is simple: a key-value mapping from short URL to long URL.
{
"shortUrl": "abc1234",
"longUrl": "https://example.com/very/long/path",
"userId": "user_123",
"createdAt": "2025-01-15T10:00:00Z",
"expiresAt": "2026-01-15T10:00:00Z",
"clickCount": 42567
}
| Database | Pros | Cons |
|---|---|---|
| DynamoDB / Cassandra | High write throughput, scales horizontally | Limited query flexibility |
| PostgreSQL | ACID, rich queries | Harder to scale writes |
| Redis | Extremely fast | Memory-limited |
| MongoDB | Flexible schema | Less predictable performance |
Recommended: NoSQL (DynamoDB or Cassandra) for the URL mapping due to the simple key-value access pattern and high read/write volume. Use Redis as a cache layer.
Step 6: Caching Hot URLs
A small percentage of URLs receive the majority of traffic (Zipf's law). Caching these in Redis dramatically reduces database load.
flowchart TB
C["Client Request: /abc1234"]
RS["Redirect Service"]
CA["Redis Cache"]
DB[("Database")]
C --> RS
RS -->|"1. Lookup"| CA
CA -->|"Cache Hit"| RS
RS -->|"2. Cache Miss"| DB
DB -->|"3. Return URL"| RS
RS -->|"4. Populate Cache"| CA
RS -->|"5. 302 Redirect"| C
style CA fill:#ef4444,color:#fff
style DB fill:#8b5cf6,color:#fff
style RS fill:#3b82f6,color:#fff
- Cache eviction: LRU (Least Recently Used)
- Cache size: 20% of daily URLs covers ~80% of traffic
- Cache hit ratio target: >90%
- TTL: Match URL expiration time
Step 7: Analytics Pipeline
Clicks need to be tracked without slowing down redirects.
flowchart LR
RS["Redirect Service"]
MQ["Kafka"]
AP["Analytics Processor"]
TS[("Time-Series DB")]
DASH["Dashboard"]
RS -->|"Async click event"| MQ
MQ --> AP
AP --> TS
TS --> DASH
style RS fill:#3b82f6,color:#fff
style MQ fill:#f59e0b,color:#fff
style AP fill:#22c55e,color:#fff
style TS fill:#8b5cf6,color:#fff
Each click event contains:
- Short URL key
- Timestamp
- IP address (for geolocation)
- User-Agent (for device/browser stats)
- Referrer URL
The redirect service publishes events to a message queue (Kafka) asynchronously so analytics processing doesn't affect redirect latency.
Step 8: Redirect β 301 vs 302
| Status Code | Meaning | Browser Behavior | Analytics Impact |
|---|---|---|---|
| 301 | Moved Permanently | Caches redirect | Misses repeat visits |
| 302 | Found (Temporary) | Always hits server | Captures all visits |
Use 302 if analytics are important (most URL shorteners do this). The browser will always call your server, allowing you to count every click.
Use 301 if you want to reduce server load and don't need accurate click tracking.
Complete Data Flow
sequenceDiagram
participant U as User
participant LB as Load Balancer
participant WS as Write Service
participant RS as Redirect Service
participant C as Cache (Redis)
participant DB as Database
participant K as Kafka
Note over U,DB: URL Creation
U->>LB: POST /shorten {longUrl}
LB->>WS: Forward request
WS->>DB: Store mapping
DB->>WS: Confirm
WS->>U: Return short URL
Note over U,K: URL Redirect
U->>LB: GET /abc1234
LB->>RS: Forward request
RS->>C: Lookup abc1234
C->>RS: Cache hit β long URL
RS->>K: Publish click event (async)
RS->>U: 302 Redirect to long URL
Summary
| Concept | Description |
|---|---|
| Requirements | Shorten, redirect, analytics, custom aliases |
| Scale | 100:1 read:write ratio, 1.2K reads/sec |
| Encoding | Base62, hash, or pre-generated keys |
| Database | NoSQL for simple key-value access pattern |
| Caching | Redis with LRU for hot URLs |
| Analytics | Async via message queue |
| Redirect | 302 for analytics, 301 for performance |
Key Takeaways
- Always start with requirements and estimation β they drive your design decisions
- The read:write ratio (100:1) tells you to optimize for reads with caching
- Pre-generated key service avoids both collisions and predictability
- Decouple analytics from the redirect path using async messaging
Practice Problems
Problem 1: Basic
Calculate the minimum short URL length needed if you expect 10 billion unique URLs over the system lifetime using Base62 encoding.
Problem 2: Intermediate
Design the URL expiration mechanism. How would you handle expired URLs efficiently? Consider both eager deletion and lazy deletion approaches.
Challenge
Extend the URL shortener to support rate limiting per user, A/B testing (redirect to different URLs based on percentage), and geographic-based redirects (different destination URLs per country).
References
- System Design Interview β Alex Xu, Chapter 8
- URL Shortening at Scale β Engineering Blog
- Base62 Encoding
- HTTP 301 vs 302
Next up: In Day 8, we'll tackle two classic problems β designing a Social Media News Feed and a Chat/Messaging System.