security-infrastructure-todo

Guide for security-infrastructure-todo

Security & Infrastructure TODO - Production Deployment

Target: 2500 Concurrent Users via Cloudflare Tunnel
Last Updated: 2026-01-05

Phase 1: IMMEDIATE SECURITY FIXES (Week 1)

Critical Security Issues

  • Replace hardcoded WebSocket URL (4h)
    • File: loh-game/src/systems/network/mod.rs:99
    • Action: Use environment variable GAME_SERVER_URL
    • Add .env support with dotenv crate
    • Create .env.example template
  • Implement TLS for WebSocket (2h)
    • Change ws:// to wss://
    • Behind Cloudflare Tunnel, server can use self-signed cert
    • Cloudflare handles public TLS termination
  • Add password hashing (6h)
    • File: loh-game/src/systems/ui/login.rs
    • Verify Salted SHA256 (PBKDF2) implementation
    • CONSTRAINT: Do NOT use Argon2id (CPU limit)hashing
    • NEVER send plaintext passwords over network
    • Use challenge-response or JWT-based auth
  • Fix panic vulnerabilities (8h)
    • Replace .unwrap() in network code (35+ instances)
    • Priority files:
      • src/systems/network/mod.rs (lines 97, 99)
      • src/systems/editor/ui.rs (JSON serialization)
      • src/plugin_api/manager.rs
    • Use proper error handling: Result<T, E> with ? operator
  • Add input validation (4h)
    • Implement message size limits: 1MB max
    • Add rate limiting per connection
    • Sanitize player names/chat inputs
Estimated Time: 24 hours (3 days)

Phase 2: INFRASTRUCTURE SETUP (Week 1-2)

Cloudflare Configuration

  • Set up Cloudflare Tunnel (2h)
    • Install cloudflared on game server
    • Configure tunnel for WebSocket endpoint
    • Create tunnel with: cloudflared tunnel create loh-game
    • Route: wss://game.yourdomain.com -> localhost:3000/ws
  • Configure Cloudflare Security (4h)
    • Enable Bot Fight Mode
    • Set up WAF rules for WebSocket
    • Configure rate limiting (2500 concurrent, ~10k requests/min)
    • Enable DDoS protection
    • Set up IP Access Rules if needed

Environment & Secrets Management

  • Create environment structure (3h)
    loh-game/
     .env.development
     .env.staging  
     .env.production
     .env.example (committed to git)
    • Add to .gitignore: .env, .env.* (except .env.example)
    • Use dotenv crate to load environment-specific configs
  • Set up secrets (2h)
    • GAME_SERVER_URL
    • DATABASE_URL
    • JWT_SECRET (generate: openssl rand -base64 64)
    • SENTRY_DSN
    • CLOUDFLARE_TUNNEL_TOKEN
Estimated Time: 11 hours (1.5 days)

Phase 3: OBSERVABILITY & MONITORING (Week 2)

Error Tracking

  • Integrate Sentry (4h)
    • Add to Cargo.toml: sentry = "0.32"
    • Initialize in main.rs:
      let _guard = sentry::init((
       env::var("SENTRY_DSN").unwrap(),
       sentry::ClientOptions {
           release: sentry::release_name!(),
           ..Default::default()
       },
      ));
    • Cost: $26/month (Team plan, 50k events/month)

Structured Logging

  • Implement tracing (6h)
    • Replace log with tracing + tracing-subscriber
    • Add correlation IDs to network requests
    • Log to JSON for structured parsing
    • Configure log levels per environment
  • Set up log aggregation (4h)
    • Options:
      • Free: Cloudflare Logpush to R2 ($15/TB/month storage)
      • Paid: Datadog ($15/host/month), New Relic ($25/month)
    • Recommendation: Start with Cloudflare R2 + custom dashboard

Metrics & Monitoring

  • Add metrics instrumentation (8h)
    • Use metrics crate
    • Track:
      • Active WebSocket connections
      • Messages/second
      • Authentication attempts
      • Player actions/second
      • Memory usage per connection
    • Export to Prometheus
  • Create monitoring dashboard (4h)
    • Cloudflare Analytics (free with tunnel)
    • Grafana Cloud (free tier: 10k series) or self-hosted
    • Key metrics:
      • Concurrent connections (target: 2500)
      • CPU/Memory usage
      • Network bandwidth
      • Error rates
Estimated Time: 22 hours (3 days)

Phase 4: BACKEND HARDENING (Week 3)

Database Security

  • Review database access (4h)
    • Confirm prepared statements usage
    • Add connection pool limits (max: 100 connections)
    • Enable SSL for database connections
    • Implement backup encryption
  • Set up database backups (3h)
    • Automated daily backups
    • Retain: 7 dailies, 4 weeklies, 3 monthlies
    • Test restore procedure
    • Store backups in separate region

Rate Limiting & Anti-Abuse

  • Implement server-side rate limiting (6h)
    • Per-IP: 100 requests/minute
    • Per-user: 1000 actions/minute
    • Connection limit per IP: 5 concurrent
    • Use governor crate or Redis-based limiting
  • Add abuse detection (4h)
    • Detect spam patterns
    • Flag suspicious behavior
    • Implement temporary bans
    • Log all moderation actions

Session Management

  • Implement JWT tokens (8h)
    • Short-lived access tokens (15 min)
    • Refresh tokens (7 days)
    • Secure token storage on client
    • Token revocation mechanism
Estimated Time: 25 hours (3 days)

Phase 5: CI/CD & AUTOMATION (Week 3-4)

GitHub Actions Setup

  • Create security pipeline (6h)
    • .github/workflows/security.yml:
      • cargo audit on every PR
      • cargo deny for license compliance
      • Dependency vulnerability scanning
      • SAST with Semgrep
  • Create build pipeline (4h)
    • .github/workflows/build.yml:
      • Build on push to main
      • Run all tests
      • Generate artifacts
      • Upload to release
  • Create deployment pipeline (8h)
    • .github/workflows/deploy.yml:
      • Deploy to staging on merge to develop
      • Deploy to production on tag v*
      • Automated rollback on failure
      • Slack/Discord notifications

Testing & Quality

  • Add security tests (6h)
    • Network layer fuzz testing
    • Authentication bypass tests
    • SQL injection tests (if applicable)
    • XSS/injection in chat system
Estimated Time: 24 hours (3 days)

Phase 6: PLUGIN SYSTEM SECURITY (Week 4)

  • Audit Lua sandbox (8h)
    • Review mlua security settings
    • Disable dangerous functions (io, os, debug)
    • Implement resource limits (CPU, memory)
    • Test for sandbox escapes
  • Audit WASM runtime (8h)
    • Review wasmtime configuration
    • Implement capability-based security
    • Add memory/CPU limits
    • Test for privilege escalation
  • Plugin signature verification (6h)
    • Sign official plugins with GPG
    • Verify signatures before loading
    • Implement plugin allowlist
Estimated Time: 22 hours (3 days)

Phase 7: COMPLIANCE & DOCUMENTATION (Week 5)

  • GDPR compliance (8h)
    • Implement data export API
    • Add account deletion flow
    • Create privacy policy
    • Add consent management
  • Security documentation (6h)
    • Document threat model
    • Create incident response playbook
    • Write security policies
    • Prepare security.txt file
  • Runbook creation (8h)
    • Deployment procedures
    • Rollback procedures
    • Disaster recovery plan
    • On-call runbook
Estimated Time: 22 hours (3 days)

TOTAL ESTIMATED TIME

  • Phase 1 (Critical): 3 days
  • Phase 2 (Infra): 1.5 days
  • Phase 3 (Observability): 3 days
  • Phase 4 (Hardening): 3 days
  • Phase 5 (CI/CD): 3 days
  • Phase 6 (Plugins): 3 days
  • Phase 7 (Compliance): 3 days
Total: ~19 days of focused work (4 engineering-weeks)

DEPENDENCIES TO ADD

# Security
argon2 = "0.5"
jsonwebtoken = "9.2"

# Configuration
dotenv = "0.15"

# Observability
sentry = "0.32"
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter", "json"] }
metrics = "0.21"
metrics-exporter-prometheus = "0.13"

# Rate Limiting
governor = "0.6"

# Code Quality
cargo-audit = "0.18"
cargo-deny = "0.14"

MONTHLY COST ESTIMATE (2500 CCU)

Infrastructure (Cloudflare Tunnel + Backend)

  • Cloudflare Tunnel: $0 (included in Pro plan)
  • Cloudflare Pro Plan: $20/month
  • Game Server (see specs below): $80-160/month
  • Database (managed Postgres): $25-75/month

Observability

  • Sentry (Team): $26/month
  • Grafana Cloud (free tier): $0
  • Log storage (Cloudflare R2): $5-15/month

Other

  • Backups (S3-compatible): $10-20/month
  • Domain + SSL: $15/year (~$1.25/month)
TOTAL: $156-297/month (starting conservative: ~$200/month)
As you scale to 5000+ CCU, costs will increase linearly with server resources.

ROLLOUT STRATEGY

  1. Week 1-2: Fix critical security issues, basic infra
  2. Week 3: Beta testing with 100 users
  3. Week 4: Staged rollout to 500 users
  4. Week 5: Scale to 1500 users, monitor
  5. Week 6+: Full capacity (2500 CCU)
  6. Ongoing: Continuous monitoring and optimization

SUCCESS METRICS

  • Zero critical vulnerabilities in cargo audit
  • < 0.1% error rate in production
  • < 100ms p95 WebSocket latency
  • 99.9% uptime SLA
  • All secrets in environment, none in code
  • Automated deployments with < 5min downtime
  • < 1 hour MTTR (Mean Time To Recovery)