emergency-migration-aws

Guide for emergency-migration-aws

Emergency Migration Plan - AWS

For: Legends of Himavat - Disaster Recovery
Scenario: Primary server failure, need to migrate to AWS immediately
Target: Restore service within 2-4 hours
Last Updated: 2026-01-05

🚨 WHEN TO USE THIS PLAN

Emergency Scenarios:
  • āœ… Primary server hardware failure
  • āœ… Hosting provider outage (Hetzner/OVH down)
  • āœ… DDoS attack overwhelming current infrastructure
  • āœ… Data center network failure
  • āœ… Security breach requiring immediate isolation
This is NOT for:
  • āŒ Planned migrations (use migration guide instead)
  • āŒ Performance optimization
  • āŒ Cost reduction
Time to Recovery: 2-4 hours (assuming backups available)

šŸ“‹ PRE-REQUISITES (PREPARE NOW)

Critical Items to Have Ready:

  1. Latest Backups (automated daily)
    • PostgreSQL backup (last 24h)
    • Game data files backup
    • Docker images backup/registry
    • Environment variables documented
  2. AWS Account
    • AWS account created
    • IAM user with admin permissions
    • Payment method configured
    • Service limits checked (EC2, RDS)
  3. Access Credentials
    • AWS Access Key ID
    • AWS Secret Access Key
    • Domain registrar access
    • Cloudflare account access
  4. Documentation
    • Current .env.production file backed up
    • docker-compose.yml backed up
    • CloudFlare tunnel credentials

šŸŽÆ SERVICES MAPPING

Current StackAWS EquivalentPurpose
Hetzner AX52 (8 cores, 32GB)EC2 c6i.2xlarge (8 vCPU, 16GB)Game server
PostgreSQL (Docker)RDS PostgreSQL 15 (db.t4g.large)Database
Cloudflare TunnelApplication Load Balancer + EC2Traffic routing
Local backupsS3 + RDS SnapshotsBackup storage
Docker ComposeECS Fargate OR EC2 + DockerContainer orchestration
Cost Estimate: ~$300-400/month (vs $280 current)

⚔ EMERGENCY DEPLOYMENT (2-4 HOURS)

PHASE 1: AWS FOUNDATION (30 minutes)

Step 1.1: Launch EC2 Instance

# Via AWS Console or CLI
aws ec2 run-instances \
  --image-id ami-0c55b159cbfafe1f0 \  # Ubuntu 22.04 LTS
  --instance-type c6i.2xlarge \
  --key-name your-emergency-key \
  --security-group-ids sg-emergency \
  --subnet-id subnet-your-subnet \
  --block-device-mappings '[{"DeviceName":"/dev/sda1","Ebs":{"VolumeSize":100,"VolumeType":"gp3"}}]' \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=loh-game-emergency}]'
Manual via Console:
  1. EC2 → Launch Instance
  2. AMI: Ubuntu Server 22.04 LTS
  3. Instance type: c6i.2xlarge (8 vCPU, 16 GiB RAM)
  4. Storage: 100 GB gp3 SSD
  5. Security Group: Create new:
    • SSH (22) from your IP
    • HTTP (80) from 0.0.0.0/0
    • HTTPS (443) from 0.0.0.0/0
    • Custom TCP (3000) from 0.0.0.0/0
  6. Launch
Note Elastic IP: Assign and note down for DNS

Step 1.2: Launch RDS Instance

# Via CLI
aws rds create-db-instance \
  --db-instance-identifier loh-game-db \
  --db-instance-class db.t4g.large \
  --engine postgres \
  --engine-version 15.4 \
  --master-username lohadmin \
  --master-user-password <SECURE_PASSWORD> \
  --allocated-storage 100 \
  --storage-type gp3 \
  --vpc-security-group-ids sg-rds \
  --backup-retention-period 7 \
  --preferred-backup-window "03:00-04:00" \
  --publicly-accessible false
Manual via Console:
  1. RDS → Create Database
  2. Engine: PostgreSQL 15.4
  3. Template: Production
  4. DB instance: db.t4g.large (2 vCPU, 8 GB RAM)
  5. Storage: 100 GB gp3, auto-scaling enabled
  6. Credentials: lohadmin / <SECURE_PASSWORD>
  7. VPC: Same as EC2
  8. Public access: No
  9. Backup: 7 days retention
  10. Create
Wait 10-15 minutes for RDS to be available

PHASE 2: DATA MIGRATION (45-60 minutes)

Step 2.1: Upload Backup to S3

# Create S3 bucket
aws s3 mb s3://loh-game-emergency-backups --region us-east-1

# Upload latest database backup
aws s3 cp /opt/loh-game/backups/postgres/backup_latest.sql.gz \
  s3://loh-game-emergency-backups/db-backup.sql.gz

# Upload game data
aws s3 sync /opt/loh-game/data/game-data \
  s3://loh-game-emergency-backups/game-data/

Step 2.2: SSH into EC2 and Setup

# SSH into EC2
ssh -i your-emergency-key.pem ubuntu@<EC2_ELASTIC_IP>

# Update system
sudo apt update && sudo apt upgrade -y

# Install Docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker ubuntu

# Install Docker Compose
sudo curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose

# Install AWS CLI
sudo apt install awscli -y

# Download backups from S3
mkdir -p /opt/loh-game/{data,backups}
aws s3 sync s3://loh-game-emergency-backups/game-data/ /opt/loh-game/data/game-data/
aws s3 cp s3://loh-game-emergency-backups/db-backup.sql.gz /opt/loh-game/backups/

Step 2.3: Restore Database to RDS

# Get RDS endpoint from AWS Console (e.g., loh-game-db.xxxxx.us-east-1.rds.amazonaws.com)

# Install PostgreSQL client
sudo apt install postgresql-client -y

# Restore backup
gunzip -c /opt/loh-game/backups/db-backup.sql.gz | \
  psql -h <RDS_ENDPOINT> -U lohadmin -d postgres

# Alternative: Create DB first then restore
psql -h <RDS_ENDPOINT> -U lohadmin -d postgres -c "CREATE DATABASE loh_production;"
gunzip -c /opt/loh-game/backups/db-backup.sql.gz | \
  psql -h <RDS_ENDPOINT> -U lohadmin -d loh_production
Verify restore:
psql -h <RDS_ENDPOINT> -U lohadmin -d loh_production -c "SELECT COUNT(*) FROM users;"

PHASE 3: DEPLOY GAME SERVER (30-45 minutes)

Step 3.1: Create Environment File

nano /opt/loh-game/.env.production
# Database (RDS)
DATABASE_URL=postgresql://lohadmin:<PASSWORD>@<RDS_ENDPOINT>:5432/loh_production
DB_USER=lohadmin
DB_PASSWORD=<PASSWORD>
DB_NAME=loh_production

# Game Server
GAME_SERVER_URL=wss://game.yourdomain.com/ws
BIND_ADDRESS=0.0.0.0:3000
RUST_LOG=info

# Security (use same secrets from backup)
JWT_SECRET=<FROM_BACKUP>
PASSWORD_SALT=<FROM_BACKUP>

# Monitoring
SENTRY_DSN=<YOUR_SENTRY_DSN>
ENVIRONMENT=production-aws-emergency

Step 3.2: Pull/Deploy Game Server

# Option A: Pull from Docker Hub (if you pushed images)
docker pull your-dockerhub/loh-game:latest
docker run -d \
  --name loh-game-server \
  --env-file /opt/loh-game/.env.production \
  -p 3000:3000 \
  -v /opt/loh-game/data/game-data:/app/data:ro \
  --restart unless-stopped \
  your-dockerhub/loh-game:latest

# Option B: Build from source (if no registry)
cd /tmp
git clone https://github.com/your-org/loh-game.git
cd loh-game
docker build -f Dockerfile.prod -t loh-game:latest .
docker run -d \
  --name loh-game-server \
  --env-file /opt/loh-game/.env.production \
  -p 3000:3000 \
  -v /opt/loh-game/data/game-data:/app/data:ro \
  --restart unless-stopped \
  loh-game:latest

Step 3.3: Verify Server Running

# Check container
docker ps

# Check logs
docker logs -f loh-game-server

# Test health endpoint
curl http://localhost:3000/health
Expected: {"status":"healthy"}

PHASE 4: TRAFFIC ROUTING (30 minutes)

Option A: Update Cloudflare Tunnel (Fastest - 10 min)

# On EC2, install cloudflared
curl -L https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64 -o cloudflared
sudo mv cloudflared /usr/local/bin/
sudo chmod +x /usr/local/bin/cloudflared

# Copy tunnel credentials from backup
scp your-old-server:/home/user/.cloudflared/<TUNNEL-ID>.json .
mkdir -p ~/.cloudflared
mv <TUNNEL-ID>.json ~/.cloudflared/

# Create config
mkdir -p /etc/cloudflared
sudo nano /etc/cloudflared/config.yml
tunnel: <TUNNEL-ID>
credentials-file: /home/ubuntu/.cloudflared/<TUNNEL-ID>.json

ingress:
  - hostname: game.yourdomain.com
    service: ws://localhost:3000
    originRequest:
      noTLSVerify: true
  - service: http_status:404
# Run tunnel
cloudflared tunnel --config /etc/cloudflared/config.yml run <TUNNEL-NAME>

# Or as service
sudo cloudflared service install
sudo systemctl start cloudflared
DNS updates automatically via tunnel

Option B: Direct DNS Update (If Cloudflare Tunnel lost - 20 min)

# Update Cloudflare DNS to point to EC2 Elastic IP
# Via Cloudflare Dashboard:
1. DNS → Records → Edit "game" record
2. Change from CNAME to A record
3. Point to EC2 Elastic IP
4. Proxy: ON (orange cloud)
5. Save

# Wait for DNS propagation (1-5 minutes)

Option C: AWS ALB (Most robust - 30 min)

# Create Application Load Balancer
aws elbv2 create-load-balancer \
  --name loh-game-alb \
  --subnets subnet-xxxxx subnet-yyyyy \
  --security-groups sg-alb \
  --scheme internet-facing \
  --type application

# Create target group
aws elbv2 create-target-group \
  --name loh-game-targets \
  --protocol HTTP \
  --port 3000 \
  --vpc-id vpc-xxxxx \
  --health-check-path /health

# Register EC2 instance
aws elbv2 register-targets \
  --target-group-arn <TARGET_GROUP_ARN> \
  --targets Id=<EC2_INSTANCE_ID>

# Create listener (WebSocket support)
aws elbv2 create-listener \
  --load-balancer-arn <ALB_ARN> \
  --protocol HTTPS \
  --port 443 \
  --certificates CertificateArn=<ACM_CERT_ARN> \
  --default-actions Type=forward,TargetGroupArn=<TARGET_GROUP_ARN>

# Update Cloudflare to point to ALB DNS

PHASE 5: VERIFICATION (15 minutes)

Test Connection

# From local machine
wscat -c wss://game.yourdomain.com/ws

# Expected: Connected

Monitor Metrics

# On EC2
docker stats

# Check logs
docker logs -f loh-game-server

# Database connections
psql -h <RDS_ENDPOINT> -U lohadmin -d loh_production -c "SELECT count(*) FROM pg_stat_activity;"

Announce to Users

🚨 Emergency maintenance complete!
Server migrated to new infrastructure.
All data preserved. Please reconnect.

šŸ’° EMERGENCY COST ESTIMATE

Monthly Costs (AWS):

ServiceSpecMonthly Cost
EC2 c6i.2xlarge8 vCPU, 16 GB RAM$122 (on-demand)
RDS db.t4g.large2 vCPU, 8 GB RAM$73 (on-demand)
EBS gp3 Storage100 GB$8
RDS Storage100 GB gp3$12
Data Transfer~500 GB/month$45
S3 Backups50 GB$1
CloudWatchBasic monitoring$10
ALB (if used)Load balancer$22
Reserve capacity10% buffer$29
TOTAL$322/month
Cost Optimization (apply after emergency):
  • Reserved Instances: Save 40% (~$195/month)
  • Savings Plans: Save 30-40%
  • Right-size instances: db.t4g.medium ($36 vs $73)

šŸ”„ ROLLBACK PROCEDURE

If AWS migration fails or old server recovers:
# 1. Update DNS back to old server
# Via Cloudflare: Point "game" record to old server IP

# 2. Stop AWS services
aws ec2 stop-instances --instance-ids <INSTANCE_ID>
aws rds stop-db-instance --db-instance-identifier loh-game-db

# 3. Verify old server operational
curl http://old-server-ip:3000/health

# 4. Announce rollback to users
# 5. Keep AWS resources for 48h before termination (in case of issues)

šŸ”„ ALTERNATIVE CLOUD PROVIDERS

DigitalOcean (Simpler than AWS)

  • Droplet: CPU-Optimized 16GB ($144/month)
  • Managed PostgreSQL: 4GB ($60/month)
  • Total: ~$204/month
  • Migration time: 2 hours (simpler than AWS)

Google Cloud Platform

  • Compute Engine: n2-standard-8 ($194/month)
  • Cloud SQL PostgreSQL: db-n1-standard-2 ($94/month)
  • Total: ~$288/month
  • Migration time: 2.5 hours

Hetzner Cloud (Cheapest)

  • CX51: 16 vCPU, 32 GB RAM (€44/month ~$48)
  • Postgres: Managed DB (€20/month ~$22)
  • Total: ~$70/month
  • Migration time: 1.5 hours (if already familiar)
Recommendation: DigitalOcean for emergencies (simplest) or Hetzner if cost-critical.

šŸ“‹ POST-MIGRATION CHECKLIST

Within 24 hours:
  • Set up automated backups (RDS snapshots)
  • Configure CloudWatch alarms
  • Update monitoring (Sentry, Grafana)
  • Test backup/restore procedure
  • Document new infrastructure
  • Update .env.production in git (encrypted)
Within 1 week:
  • Apply cost optimizations (Reserved Instances)
  • Set up multi-AZ for RDS (high availability)
  • Configure Auto Scaling (if needed)
  • Conduct post-mortem on outage
  • Update disaster recovery plan

šŸ›”ļø PREVENTION (Avoid Future Emergencies)

  1. Automated Backups
    • Database: Daily to S3
    • Docker images: Push to registry daily
    • Config files: Git repository
  2. Monitoring & Alerts
    • Uptime monitoring (UptimeRobot, Pingdom)
    • Server health (CPU, RAM, disk)
    • Alert on 5min downtime
  3. Redundancy
    • Keep AWS account ready (don't wait for emergency)
    • Test migration quarterly
    • Document all credentials securely (1Password, Vault)
  4. Disaster Recovery Testing
    • Quarterly: Full migration drill
    • Monthly: Backup restore test
    • Weekly: Verify backups exist

šŸ“ž EMERGENCY CONTACTS

Save these NOW:
  • AWS Support: +1-877-742-2911 (24/7 for Premium Support)
  • Cloudflare Support: Dashboard → Support → Create ticket
  • Your team leads: [FILL IN]
  • Database expert: [FILL IN]

SUCCESS CRITERIA

Emergency migration complete when:
  • Game server responding to WebSocket connections
  • Database restored and accessible
  • DNS pointing to new infrastructure
  • Players can connect and play
  • Data verified (spot check user accounts)
  • Monitoring operational
  • Backups configured
  • Team notified
Expected Downtime: 2-4 hours (depending on backup size and preparation)

Document Version: 1.0
Last Tested: Never (test quarterly!)
Next Test Date: [SCHEDULE THIS]
CRITICAL: Test this plan quarterly. An untested disaster recovery plan is worthless.