backend scalability walkthrough
Guide for backend scalability walkthrough
Backend Scalability & Archival System Walkthrough
This document outlines the implementation of the scalable backend infrastructure, specifically focusing on Data Tiering (to support 100M+ MAU) and the SRE Logging stack.
1. Data Tiering & Archival System
To keep the primary "Hot" database small while supporting massive user counts, we implemented a 3-tier strategy.
Tier Definitions
Implementation Details
- Worker Binary:
api-ops/src/bin/archival_worker.rs - Logic Module:
api-ops/src/archival_worker.rs - Snapshot Logic:
api-ops/src/player_snapshot.rs - Storage Adapter:
api-ops/src/storage_adapter.rs(S3 implementation)
Archive Lifecycle
- Cooldown:
process_hot_to_warmruns hourly. Moves users inactive > 3 days to 'warm' tier. - Deep Freeze:
process_warm_to_coldruns hourly.- Fetches full player state (Stats, Inventory, Bank, Friends).
- Serializes to JSON.
- Uploads to S3 (
bucket/players/{uuid}.json). - Deletes rows from
bank_storage,player_equipment,friends. - Updates
playerstable row totier='archived', setsarchive_url.
- Thaw:
process_all_restoresruns hourly (or on demand).- Finds players with
tier='restoring'. - Downloads JSON from S3.
- Restores all data to PostgreSQL tables.
- Sets
tier='hot'.
- Finds players with
Verification
Ran
cargo check -p api-ops to verify the new worker and S3 integration.Finished dev profile [unoptimized + debuginfo] target(s) in 0.58s2. Infrastructure & Logging (PLG Stack)
To handle 100k CCU log volume (est. 1TB+/month), we moved away from ELK to a more efficient PLG stack.
- Promtail: Scrapes Docker container logs natively. defined in
loh-devops/infrastructure/game/config/promtail.yml. - Loki: Stores logs with high compression (S3-ready). defined in
loh-devops/infrastructure/game/config/loki.yml. - Grafana: Visualizes logs. Datasource auto-provisioned in
loh-devops/infrastructure/backend/grafana/provisioning/datasources/loki_datasource.yml.
3. Hardware Strategy
For the 100k CCU / 100M MAU target:
- Database: Managed Postgres (Cloud) recommended for Reliability.
- Compute: Refurbished Mini-PC Cluster (e.g., Lenovo ThinkCentre Tiny) for cost-effective self-hosting of Game Servers and Archival Workers.