data tiering design
Guide for data tiering design
Data Tiering & Archival Strategy (100k CCU / 100M MAU)
1. High-Level Strategy
To support 100M MAU with 100k Concurrent Users (CCU) without a massive multi-petabyte database, we will implement a Time-Based Tiering Architecture.
Core Concept: "Active" players live in the fast, expensive DB. "Inactive" players are moved to cheaper, slower storage by background workers.
2. Storage Tiers
Capacity Breakdown (100M Users)
- Hot Tier (100k CCU + Buffer): ~1M Users (1% of total).
- Size: ~100 GB (Fits in RAM).
- Load: 100% of Write Traffic.
- Warm Tier (Churned/Casual): ~9M Users.
- Size: ~1 TB.
- Load: Occasional reads (Login checks), low writes (Worker migrations).
- Cold Tier (Long Tail): ~90M Users.
- Size: ~10 TB.
- Load: Rare reads (Reactivation), Write-once (Archival).
3. Worker Architecture
We will run a set of dedicated Archival Workers separate from the Game Servers.
A. The "Cool Down" Worker (Hot -> Warm)
- Trigger: Scheduled Job (e.g., every hour).
- Query:
SELECT id FROM players WHERE last_login < NOW() - INTERVAL '3 days' AND tier = 'hot' - Action:
- Move rows to
players_warmtable (or separate DB partition). - Delete from
players_hot. - Update index / lookup table.
- Move rows to
B. The "Deep Freeze" Worker (Warm -> Cold)
- Trigger: Daily Job.
- Query:
SELECT id FROM players WHERE last_login < NOW() - INTERVAL '30 days' AND tier = 'warm' - Action:
- Serialize full player state to JSON/Protobuf.
- Upload to S3:
s3://game-archives/players/{uuid}.json. - Update Lookup Table:
SET status='archived', s3_path='...'. - Delete from
players_warm.
C. The "Thaw" Process (On Login)
- Trigger: Player logs in.
- Logic:
- Check Hot DB -> Found? Login.
- Check Warm DB -> Found? Move to Hot (Async or Sync) -> Login.
- Check Cold Lookup -> Found?
- Blocking: User sees "Restoring Account..." spinner (2-3s).
- Action: Download S3 JSON -> Insert to Hot DB -> Login.
4. 100k CCU Hardware Implications
With this strategy, the Hot Database only needs to handle the 100k Active Users + 1M Recent Users.
- Hot DB Specs (100k CCU):
- Writes: ~25k - 50k TPS (Heavy Load).
- Size: Small (~100GB).
- Hardware: 1x Primary + 2x Replicas.
- CPU: High frequency (Ryzen/Core i9 equivalent or Xeon Gold).
- RAM: 128GB (Fit entire Hot DB in RAM).
- Disk: 2x 1TB NVMe RAID 1.
- Workers:
- Can run on cheap commodity hardware / spot instances.
- Queue-based (RabbitMQ/Redis) to throttle load on the DB.
5. Schema Strategy
Option A: Partitioned Tables (Postgres Native)
Use Postgres declarative partitioning:
CREATE TABLE players (
id UUID,
last_login TIMESTAMP,
...
) PARTITION BY RANGE (last_login);
-- But this requires moving rows which Postgres doesn't do automatically easily.Option B: Application-Level Tables (Recommended)
Explicit tables give finer control over storage engines.
players_active(On NVMe Tablespace)players_archive_lookup(Pointer to S3)player_data_blob(Optional: Intermediate blob store)
Next Steps
- Define the
PlayerArchivecapabilities (S3 compatible store). - Prototype the "Thaw" logic in the Login Server.
- Implement the
ArchivalWorkerservice.