emergency-migration-aws

Guide for emergency-migration-aws

Emergency Migration Plan - AWS

For: Legends of Himavat - Disaster Recovery
Scenario: Primary server failure, need to migrate to AWS immediately
Target: Restore service within 2-4 hours
Last Updated: 2026-01-05

🚨 WHEN TO USE THIS PLAN

Emergency Scenarios:

✅ Primary server hardware failure
✅ Hosting provider outage (Hetzner/OVH down)
✅ DDoS attack overwhelming current infrastructure
✅ Data center network failure
✅ Security breach requiring immediate isolation

This is NOT for:

❌ Planned migrations (use migration guide instead)
❌ Performance optimization
❌ Cost reduction

Time to Recovery: 2-4 hours (assuming backups available)

📋 PRE-REQUISITES (PREPARE NOW)

Critical Items to Have Ready:

Latest Backups (automated daily)
- PostgreSQL backup (last 24h)
- Game data files backup
- Docker images backup/registry
- Environment variables documented
AWS Account
- AWS account created
- IAM user with admin permissions
- Payment method configured
- Service limits checked (EC2, RDS)
Access Credentials
- AWS Access Key ID
- AWS Secret Access Key
- Domain registrar access
- Cloudflare account access
Documentation
- Current .env.production file backed up
- docker-compose.yml backed up
- CloudFlare tunnel credentials

🎯 SERVICES MAPPING

Current Stack	AWS Equivalent	Purpose
Hetzner AX52 (8 cores, 32GB)	EC2 c6i.2xlarge (8 vCPU, 16GB)	Game server
PostgreSQL (Docker)	RDS PostgreSQL 15 (db.t4g.large)	Database
Cloudflare Tunnel	Application Load Balancer + EC2	Traffic routing
Local backups	S3 + RDS Snapshots	Backup storage
Docker Compose	ECS Fargate OR EC2 + Docker	Container orchestration

Cost Estimate: ~$300-400/month (vs $280 current)

⚡ EMERGENCY DEPLOYMENT (2-4 HOURS)

PHASE 1: AWS FOUNDATION (30 minutes)

Step 1.1: Launch EC2 Instance

# Via AWS Console or CLI
aws ec2 run-instances \
  --image-id ami-0c55b159cbfafe1f0 \  # Ubuntu 22.04 LTS
  --instance-type c6i.2xlarge \
  --key-name your-emergency-key \
  --security-group-ids sg-emergency \
  --subnet-id subnet-your-subnet \
  --block-device-mappings '[{"DeviceName":"/dev/sda1","Ebs":{"VolumeSize":100,"VolumeType":"gp3"}}]' \
  --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=loh-game-emergency}]'

Manual via Console:

EC2 → Launch Instance
AMI: Ubuntu Server 22.04 LTS
Instance type: c6i.2xlarge (8 vCPU, 16 GiB RAM)
Storage: 100 GB gp3 SSD
Security Group: Create new:
- SSH (22) from your IP
- HTTP (80) from 0.0.0.0/0
- HTTPS (443) from 0.0.0.0/0
- Custom TCP (3000) from 0.0.0.0/0
Launch

Note Elastic IP: Assign and note down for DNS

Step 1.2: Launch RDS Instance

# Via CLI
aws rds create-db-instance \
  --db-instance-identifier loh-game-db \
  --db-instance-class db.t4g.large \
  --engine postgres \
  --engine-version 15.4 \
  --master-username lohadmin \
  --master-user-password <SECURE_PASSWORD> \
  --allocated-storage 100 \
  --storage-type gp3 \
  --vpc-security-group-ids sg-rds \
  --backup-retention-period 7 \
  --preferred-backup-window "03:00-04:00" \
  --publicly-accessible false

Manual via Console:

RDS → Create Database
Engine: PostgreSQL 15.4
Template: Production
DB instance: db.t4g.large (2 vCPU, 8 GB RAM)
Storage: 100 GB gp3, auto-scaling enabled
Credentials: lohadmin / <SECURE_PASSWORD>
VPC: Same as EC2
Public access: No
Backup: 7 days retention
Create

Wait 10-15 minutes for RDS to be available

PHASE 2: DATA MIGRATION (45-60 minutes)

Step 2.1: Upload Backup to S3

# Create S3 bucket
aws s3 mb s3://loh-game-emergency-backups --region us-east-1

# Upload latest database backup
aws s3 cp /opt/loh-game/backups/postgres/backup_latest.sql.gz \
  s3://loh-game-emergency-backups/db-backup.sql.gz

# Upload game data
aws s3 sync /opt/loh-game/data/game-data \
  s3://loh-game-emergency-backups/game-data/

Step 2.2: SSH into EC2 and Setup

# SSH into EC2
ssh -i your-emergency-key.pem ubuntu@<EC2_ELASTIC_IP>

# Update system
sudo apt update && sudo apt upgrade -y

# Install Docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker ubuntu

# Install Docker Compose
sudo curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose

# Install AWS CLI
sudo apt install awscli -y

# Download backups from S3
mkdir -p /opt/loh-game/{data,backups}
aws s3 sync s3://loh-game-emergency-backups/game-data/ /opt/loh-game/data/game-data/
aws s3 cp s3://loh-game-emergency-backups/db-backup.sql.gz /opt/loh-game/backups/

Step 2.3: Restore Database to RDS

# Get RDS endpoint from AWS Console (e.g., loh-game-db.xxxxx.us-east-1.rds.amazonaws.com)

# Install PostgreSQL client
sudo apt install postgresql-client -y

# Restore backup
gunzip -c /opt/loh-game/backups/db-backup.sql.gz | \
  psql -h <RDS_ENDPOINT> -U lohadmin -d postgres

# Alternative: Create DB first then restore
psql -h <RDS_ENDPOINT> -U lohadmin -d postgres -c "CREATE DATABASE loh_production;"
gunzip -c /opt/loh-game/backups/db-backup.sql.gz | \
  psql -h <RDS_ENDPOINT> -U lohadmin -d loh_production

Verify restore:

psql -h <RDS_ENDPOINT> -U lohadmin -d loh_production -c "SELECT COUNT(*) FROM users;"

PHASE 3: DEPLOY GAME SERVER (30-45 minutes)

Step 3.1: Create Environment File

nano /opt/loh-game/.env.production

# Database (RDS)
DATABASE_URL=postgresql://lohadmin:<PASSWORD>@<RDS_ENDPOINT>:5432/loh_production
DB_USER=lohadmin
DB_PASSWORD=<PASSWORD>
DB_NAME=loh_production

# Game Server
GAME_SERVER_URL=wss://game.yourdomain.com/ws
BIND_ADDRESS=0.0.0.0:3000
RUST_LOG=info

# Security (use same secrets from backup)
JWT_SECRET=<FROM_BACKUP>
PASSWORD_SALT=<FROM_BACKUP>

# Monitoring
SENTRY_DSN=<YOUR_SENTRY_DSN>
ENVIRONMENT=production-aws-emergency

Step 3.2: Pull/Deploy Game Server

# Option A: Pull from Docker Hub (if you pushed images)
docker pull your-dockerhub/loh-game:latest
docker run -d \
  --name loh-game-server \
  --env-file /opt/loh-game/.env.production \
  -p 3000:3000 \
  -v /opt/loh-game/data/game-data:/app/data:ro \
  --restart unless-stopped \
  your-dockerhub/loh-game:latest

# Option B: Build from source (if no registry)
cd /tmp
git clone https://github.com/your-org/loh-game.git
cd loh-game
docker build -f Dockerfile.prod -t loh-game:latest .
docker run -d \
  --name loh-game-server \
  --env-file /opt/loh-game/.env.production \
  -p 3000:3000 \
  -v /opt/loh-game/data/game-data:/app/data:ro \
  --restart unless-stopped \
  loh-game:latest

Step 3.3: Verify Server Running

# Check container
docker ps

# Check logs
docker logs -f loh-game-server

# Test health endpoint
curl http://localhost:3000/health

Expected: {"status":"healthy"}

PHASE 4: TRAFFIC ROUTING (30 minutes)

Option A: Update Cloudflare Tunnel (Fastest - 10 min)

# On EC2, install cloudflared
curl -L https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64 -o cloudflared
sudo mv cloudflared /usr/local/bin/
sudo chmod +x /usr/local/bin/cloudflared

# Copy tunnel credentials from backup
scp your-old-server:/home/user/.cloudflared/<TUNNEL-ID>.json .
mkdir -p ~/.cloudflared
mv <TUNNEL-ID>.json ~/.cloudflared/

# Create config
mkdir -p /etc/cloudflared
sudo nano /etc/cloudflared/config.yml

tunnel: <TUNNEL-ID>
credentials-file: /home/ubuntu/.cloudflared/<TUNNEL-ID>.json

ingress:
  - hostname: game.yourdomain.com
    service: ws://localhost:3000
    originRequest:
      noTLSVerify: true
  - service: http_status:404

# Run tunnel
cloudflared tunnel --config /etc/cloudflared/config.yml run <TUNNEL-NAME>

# Or as service
sudo cloudflared service install
sudo systemctl start cloudflared

DNS updates automatically via tunnel

Option B: Direct DNS Update (If Cloudflare Tunnel lost - 20 min)

# Update Cloudflare DNS to point to EC2 Elastic IP
# Via Cloudflare Dashboard:
1. DNS → Records → Edit "game" record
2. Change from CNAME to A record
3. Point to EC2 Elastic IP
4. Proxy: ON (orange cloud)
5. Save

# Wait for DNS propagation (1-5 minutes)

Option C: AWS ALB (Most robust - 30 min)

# Create Application Load Balancer
aws elbv2 create-load-balancer \
  --name loh-game-alb \
  --subnets subnet-xxxxx subnet-yyyyy \
  --security-groups sg-alb \
  --scheme internet-facing \
  --type application

# Create target group
aws elbv2 create-target-group \
  --name loh-game-targets \
  --protocol HTTP \
  --port 3000 \
  --vpc-id vpc-xxxxx \
  --health-check-path /health

# Register EC2 instance
aws elbv2 register-targets \
  --target-group-arn <TARGET_GROUP_ARN> \
  --targets Id=<EC2_INSTANCE_ID>

# Create listener (WebSocket support)
aws elbv2 create-listener \
  --load-balancer-arn <ALB_ARN> \
  --protocol HTTPS \
  --port 443 \
  --certificates CertificateArn=<ACM_CERT_ARN> \
  --default-actions Type=forward,TargetGroupArn=<TARGET_GROUP_ARN>

# Update Cloudflare to point to ALB DNS

PHASE 5: VERIFICATION (15 minutes)

Test Connection

# From local machine
wscat -c wss://game.yourdomain.com/ws

# Expected: Connected

Monitor Metrics

# On EC2
docker stats

# Check logs
docker logs -f loh-game-server

# Database connections
psql -h <RDS_ENDPOINT> -U lohadmin -d loh_production -c "SELECT count(*) FROM pg_stat_activity;"

Announce to Users

🚨 Emergency maintenance complete!
Server migrated to new infrastructure.
All data preserved. Please reconnect.

💰 EMERGENCY COST ESTIMATE

Monthly Costs (AWS):

Service	Spec	Monthly Cost
EC2 c6i.2xlarge	8 vCPU, 16 GB RAM	$122 (on-demand)
RDS db.t4g.large	2 vCPU, 8 GB RAM	$73 (on-demand)
EBS gp3 Storage	100 GB	$8
RDS Storage	100 GB gp3	$12
Data Transfer	~500 GB/month	$45
S3 Backups	50 GB	$1
CloudWatch	Basic monitoring	$10
ALB (if used)	Load balancer	$22
Reserve capacity	10% buffer	$29
TOTAL		$322/month

Cost Optimization (apply after emergency):

Reserved Instances: Save 40% (~$195/month)
Savings Plans: Save 30-40%
Right-size instances: db.t4g.medium ($36 vs $73)

🔄 ROLLBACK PROCEDURE

If AWS migration fails or old server recovers:

# 1. Update DNS back to old server
# Via Cloudflare: Point "game" record to old server IP

# 2. Stop AWS services
aws ec2 stop-instances --instance-ids <INSTANCE_ID>
aws rds stop-db-instance --db-instance-identifier loh-game-db

# 3. Verify old server operational
curl http://old-server-ip:3000/health

# 4. Announce rollback to users
# 5. Keep AWS resources for 48h before termination (in case of issues)

🔥 ALTERNATIVE CLOUD PROVIDERS

DigitalOcean (Simpler than AWS)

Droplet: CPU-Optimized 16GB ($144/month)
Managed PostgreSQL: 4GB ($60/month)
Total: ~$204/month
Migration time: 2 hours (simpler than AWS)

Google Cloud Platform

Compute Engine: n2-standard-8 ($194/month)
Cloud SQL PostgreSQL: db-n1-standard-2 ($94/month)
Total: ~$288/month
Migration time: 2.5 hours

Hetzner Cloud (Cheapest)

CX51: 16 vCPU, 32 GB RAM (€44/month ~$48)
Postgres: Managed DB (€20/month ~$22)
Total: ~$70/month
Migration time: 1.5 hours (if already familiar)

Recommendation: DigitalOcean for emergencies (simplest) or Hetzner if cost-critical.

📋 POST-MIGRATION CHECKLIST

Within 24 hours:

Set up automated backups (RDS snapshots)
Configure CloudWatch alarms
Update monitoring (Sentry, Grafana)
Test backup/restore procedure
Document new infrastructure
Update .env.production in git (encrypted)

Within 1 week:

Apply cost optimizations (Reserved Instances)
Set up multi-AZ for RDS (high availability)
Configure Auto Scaling (if needed)
Conduct post-mortem on outage
Update disaster recovery plan

🛡️ PREVENTION (Avoid Future Emergencies)

Automated Backups
- Database: Daily to S3
- Docker images: Push to registry daily
- Config files: Git repository
Monitoring & Alerts
- Uptime monitoring (UptimeRobot, Pingdom)
- Server health (CPU, RAM, disk)
- Alert on 5min downtime
Redundancy
- Keep AWS account ready (don't wait for emergency)
- Test migration quarterly
- Document all credentials securely (1Password, Vault)
Disaster Recovery Testing
- Quarterly: Full migration drill
- Monthly: Backup restore test
- Weekly: Verify backups exist

📞 EMERGENCY CONTACTS

Save these NOW:

AWS Support: +1-877-742-2911 (24/7 for Premium Support)
Cloudflare Support: Dashboard → Support → Create ticket
Your team leads: [FILL IN]
Database expert: [FILL IN]

SUCCESS CRITERIA

Emergency migration complete when:

Game server responding to WebSocket connections
Database restored and accessible
DNS pointing to new infrastructure
Players can connect and play
Data verified (spot check user accounts)
Monitoring operational
Backups configured
Team notified

Expected Downtime: 2-4 hours (depending on backup size and preparation)

Document Version: 1.0
Last Tested: Never (test quarterly!)
Next Test Date: [SCHEDULE THIS]

CRITICAL: Test this plan quarterly. An untested disaster recovery plan is worthless.