emergency-migration-aws
Guide for emergency-migration-aws
Emergency Migration Plan - AWS
For: Legends of Himavat - Disaster Recovery
Scenario: Primary server failure, need to migrate to AWS immediately
Target: Restore service within 2-4 hours
Last Updated: 2026-01-05
Scenario: Primary server failure, need to migrate to AWS immediately
Target: Restore service within 2-4 hours
Last Updated: 2026-01-05
šØ WHEN TO USE THIS PLAN
Emergency Scenarios:
- ā Primary server hardware failure
- ā Hosting provider outage (Hetzner/OVH down)
- ā DDoS attack overwhelming current infrastructure
- ā Data center network failure
- ā Security breach requiring immediate isolation
This is NOT for:
- ā Planned migrations (use migration guide instead)
- ā Performance optimization
- ā Cost reduction
Time to Recovery: 2-4 hours (assuming backups available)
š PRE-REQUISITES (PREPARE NOW)
Critical Items to Have Ready:
- Latest Backups (automated daily)
- PostgreSQL backup (last 24h)
- Game data files backup
- Docker images backup/registry
- Environment variables documented
- AWS Account
- AWS account created
- IAM user with admin permissions
- Payment method configured
- Service limits checked (EC2, RDS)
- Access Credentials
- AWS Access Key ID
- AWS Secret Access Key
- Domain registrar access
- Cloudflare account access
- Documentation
- Current
.env.productionfile backed up - docker-compose.yml backed up
- CloudFlare tunnel credentials
- Current
šÆ SERVICES MAPPING
Cost Estimate: ~$300-400/month (vs $280 current)
ā” EMERGENCY DEPLOYMENT (2-4 HOURS)
PHASE 1: AWS FOUNDATION (30 minutes)
Step 1.1: Launch EC2 Instance
# Via AWS Console or CLI
aws ec2 run-instances \
--image-id ami-0c55b159cbfafe1f0 \ # Ubuntu 22.04 LTS
--instance-type c6i.2xlarge \
--key-name your-emergency-key \
--security-group-ids sg-emergency \
--subnet-id subnet-your-subnet \
--block-device-mappings '[{"DeviceName":"/dev/sda1","Ebs":{"VolumeSize":100,"VolumeType":"gp3"}}]' \
--tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=loh-game-emergency}]'Manual via Console:
- EC2 ā Launch Instance
- AMI: Ubuntu Server 22.04 LTS
- Instance type: c6i.2xlarge (8 vCPU, 16 GiB RAM)
- Storage: 100 GB gp3 SSD
- Security Group: Create new:
- SSH (22) from your IP
- HTTP (80) from 0.0.0.0/0
- HTTPS (443) from 0.0.0.0/0
- Custom TCP (3000) from 0.0.0.0/0
- Launch
Note Elastic IP: Assign and note down for DNS
Step 1.2: Launch RDS Instance
# Via CLI
aws rds create-db-instance \
--db-instance-identifier loh-game-db \
--db-instance-class db.t4g.large \
--engine postgres \
--engine-version 15.4 \
--master-username lohadmin \
--master-user-password <SECURE_PASSWORD> \
--allocated-storage 100 \
--storage-type gp3 \
--vpc-security-group-ids sg-rds \
--backup-retention-period 7 \
--preferred-backup-window "03:00-04:00" \
--publicly-accessible falseManual via Console:
- RDS ā Create Database
- Engine: PostgreSQL 15.4
- Template: Production
- DB instance: db.t4g.large (2 vCPU, 8 GB RAM)
- Storage: 100 GB gp3, auto-scaling enabled
- Credentials: lohadmin / <SECURE_PASSWORD>
- VPC: Same as EC2
- Public access: No
- Backup: 7 days retention
- Create
Wait 10-15 minutes for RDS to be available
PHASE 2: DATA MIGRATION (45-60 minutes)
Step 2.1: Upload Backup to S3
# Create S3 bucket
aws s3 mb s3://loh-game-emergency-backups --region us-east-1
# Upload latest database backup
aws s3 cp /opt/loh-game/backups/postgres/backup_latest.sql.gz \
s3://loh-game-emergency-backups/db-backup.sql.gz
# Upload game data
aws s3 sync /opt/loh-game/data/game-data \
s3://loh-game-emergency-backups/game-data/Step 2.2: SSH into EC2 and Setup
# SSH into EC2
ssh -i your-emergency-key.pem ubuntu@<EC2_ELASTIC_IP>
# Update system
sudo apt update && sudo apt upgrade -y
# Install Docker
curl -fsSL https://get.docker.com -o get-docker.sh
sudo sh get-docker.sh
sudo usermod -aG docker ubuntu
# Install Docker Compose
sudo curl -L "https://github.com/docker/compose/releases/latest/download/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
sudo chmod +x /usr/local/bin/docker-compose
# Install AWS CLI
sudo apt install awscli -y
# Download backups from S3
mkdir -p /opt/loh-game/{data,backups}
aws s3 sync s3://loh-game-emergency-backups/game-data/ /opt/loh-game/data/game-data/
aws s3 cp s3://loh-game-emergency-backups/db-backup.sql.gz /opt/loh-game/backups/Step 2.3: Restore Database to RDS
# Get RDS endpoint from AWS Console (e.g., loh-game-db.xxxxx.us-east-1.rds.amazonaws.com)
# Install PostgreSQL client
sudo apt install postgresql-client -y
# Restore backup
gunzip -c /opt/loh-game/backups/db-backup.sql.gz | \
psql -h <RDS_ENDPOINT> -U lohadmin -d postgres
# Alternative: Create DB first then restore
psql -h <RDS_ENDPOINT> -U lohadmin -d postgres -c "CREATE DATABASE loh_production;"
gunzip -c /opt/loh-game/backups/db-backup.sql.gz | \
psql -h <RDS_ENDPOINT> -U lohadmin -d loh_productionVerify restore:
psql -h <RDS_ENDPOINT> -U lohadmin -d loh_production -c "SELECT COUNT(*) FROM users;"PHASE 3: DEPLOY GAME SERVER (30-45 minutes)
Step 3.1: Create Environment File
nano /opt/loh-game/.env.production# Database (RDS)
DATABASE_URL=postgresql://lohadmin:<PASSWORD>@<RDS_ENDPOINT>:5432/loh_production
DB_USER=lohadmin
DB_PASSWORD=<PASSWORD>
DB_NAME=loh_production
# Game Server
GAME_SERVER_URL=wss://game.yourdomain.com/ws
BIND_ADDRESS=0.0.0.0:3000
RUST_LOG=info
# Security (use same secrets from backup)
JWT_SECRET=<FROM_BACKUP>
PASSWORD_SALT=<FROM_BACKUP>
# Monitoring
SENTRY_DSN=<YOUR_SENTRY_DSN>
ENVIRONMENT=production-aws-emergencyStep 3.2: Pull/Deploy Game Server
# Option A: Pull from Docker Hub (if you pushed images)
docker pull your-dockerhub/loh-game:latest
docker run -d \
--name loh-game-server \
--env-file /opt/loh-game/.env.production \
-p 3000:3000 \
-v /opt/loh-game/data/game-data:/app/data:ro \
--restart unless-stopped \
your-dockerhub/loh-game:latest
# Option B: Build from source (if no registry)
cd /tmp
git clone https://github.com/your-org/loh-game.git
cd loh-game
docker build -f Dockerfile.prod -t loh-game:latest .
docker run -d \
--name loh-game-server \
--env-file /opt/loh-game/.env.production \
-p 3000:3000 \
-v /opt/loh-game/data/game-data:/app/data:ro \
--restart unless-stopped \
loh-game:latestStep 3.3: Verify Server Running
# Check container
docker ps
# Check logs
docker logs -f loh-game-server
# Test health endpoint
curl http://localhost:3000/healthExpected:
{"status":"healthy"}PHASE 4: TRAFFIC ROUTING (30 minutes)
Option A: Update Cloudflare Tunnel (Fastest - 10 min)
# On EC2, install cloudflared
curl -L https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64 -o cloudflared
sudo mv cloudflared /usr/local/bin/
sudo chmod +x /usr/local/bin/cloudflared
# Copy tunnel credentials from backup
scp your-old-server:/home/user/.cloudflared/<TUNNEL-ID>.json .
mkdir -p ~/.cloudflared
mv <TUNNEL-ID>.json ~/.cloudflared/
# Create config
mkdir -p /etc/cloudflared
sudo nano /etc/cloudflared/config.ymltunnel: <TUNNEL-ID>
credentials-file: /home/ubuntu/.cloudflared/<TUNNEL-ID>.json
ingress:
- hostname: game.yourdomain.com
service: ws://localhost:3000
originRequest:
noTLSVerify: true
- service: http_status:404# Run tunnel
cloudflared tunnel --config /etc/cloudflared/config.yml run <TUNNEL-NAME>
# Or as service
sudo cloudflared service install
sudo systemctl start cloudflaredDNS updates automatically via tunnel
Option B: Direct DNS Update (If Cloudflare Tunnel lost - 20 min)
# Update Cloudflare DNS to point to EC2 Elastic IP
# Via Cloudflare Dashboard:
1. DNS ā Records ā Edit "game" record
2. Change from CNAME to A record
3. Point to EC2 Elastic IP
4. Proxy: ON (orange cloud)
5. Save
# Wait for DNS propagation (1-5 minutes)Option C: AWS ALB (Most robust - 30 min)
# Create Application Load Balancer
aws elbv2 create-load-balancer \
--name loh-game-alb \
--subnets subnet-xxxxx subnet-yyyyy \
--security-groups sg-alb \
--scheme internet-facing \
--type application
# Create target group
aws elbv2 create-target-group \
--name loh-game-targets \
--protocol HTTP \
--port 3000 \
--vpc-id vpc-xxxxx \
--health-check-path /health
# Register EC2 instance
aws elbv2 register-targets \
--target-group-arn <TARGET_GROUP_ARN> \
--targets Id=<EC2_INSTANCE_ID>
# Create listener (WebSocket support)
aws elbv2 create-listener \
--load-balancer-arn <ALB_ARN> \
--protocol HTTPS \
--port 443 \
--certificates CertificateArn=<ACM_CERT_ARN> \
--default-actions Type=forward,TargetGroupArn=<TARGET_GROUP_ARN>
# Update Cloudflare to point to ALB DNSPHASE 5: VERIFICATION (15 minutes)
Test Connection
# From local machine
wscat -c wss://game.yourdomain.com/ws
# Expected: ConnectedMonitor Metrics
# On EC2
docker stats
# Check logs
docker logs -f loh-game-server
# Database connections
psql -h <RDS_ENDPOINT> -U lohadmin -d loh_production -c "SELECT count(*) FROM pg_stat_activity;"Announce to Users
šØ Emergency maintenance complete!
Server migrated to new infrastructure.
All data preserved. Please reconnect.š° EMERGENCY COST ESTIMATE
Monthly Costs (AWS):
Cost Optimization (apply after emergency):
- Reserved Instances: Save 40% (~$195/month)
- Savings Plans: Save 30-40%
- Right-size instances: db.t4g.medium ($36 vs $73)
š ROLLBACK PROCEDURE
If AWS migration fails or old server recovers:
# 1. Update DNS back to old server
# Via Cloudflare: Point "game" record to old server IP
# 2. Stop AWS services
aws ec2 stop-instances --instance-ids <INSTANCE_ID>
aws rds stop-db-instance --db-instance-identifier loh-game-db
# 3. Verify old server operational
curl http://old-server-ip:3000/health
# 4. Announce rollback to users
# 5. Keep AWS resources for 48h before termination (in case of issues)š„ ALTERNATIVE CLOUD PROVIDERS
DigitalOcean (Simpler than AWS)
- Droplet: CPU-Optimized 16GB ($144/month)
- Managed PostgreSQL: 4GB ($60/month)
- Total: ~$204/month
- Migration time: 2 hours (simpler than AWS)
Google Cloud Platform
- Compute Engine: n2-standard-8 ($194/month)
- Cloud SQL PostgreSQL: db-n1-standard-2 ($94/month)
- Total: ~$288/month
- Migration time: 2.5 hours
Hetzner Cloud (Cheapest)
- CX51: 16 vCPU, 32 GB RAM (ā¬44/month ~$48)
- Postgres: Managed DB (ā¬20/month ~$22)
- Total: ~$70/month
- Migration time: 1.5 hours (if already familiar)
Recommendation: DigitalOcean for emergencies (simplest) or Hetzner if cost-critical.
š POST-MIGRATION CHECKLIST
Within 24 hours:
- Set up automated backups (RDS snapshots)
- Configure CloudWatch alarms
- Update monitoring (Sentry, Grafana)
- Test backup/restore procedure
- Document new infrastructure
- Update
.env.productionin git (encrypted)
Within 1 week:
- Apply cost optimizations (Reserved Instances)
- Set up multi-AZ for RDS (high availability)
- Configure Auto Scaling (if needed)
- Conduct post-mortem on outage
- Update disaster recovery plan
š”ļø PREVENTION (Avoid Future Emergencies)
- Automated Backups
- Database: Daily to S3
- Docker images: Push to registry daily
- Config files: Git repository
- Monitoring & Alerts
- Uptime monitoring (UptimeRobot, Pingdom)
- Server health (CPU, RAM, disk)
- Alert on 5min downtime
- Redundancy
- Keep AWS account ready (don't wait for emergency)
- Test migration quarterly
- Document all credentials securely (1Password, Vault)
- Disaster Recovery Testing
- Quarterly: Full migration drill
- Monthly: Backup restore test
- Weekly: Verify backups exist
š EMERGENCY CONTACTS
Save these NOW:
- AWS Support: +1-877-742-2911 (24/7 for Premium Support)
- Cloudflare Support: Dashboard ā Support ā Create ticket
- Your team leads: [FILL IN]
- Database expert: [FILL IN]
SUCCESS CRITERIA
Emergency migration complete when:
- Game server responding to WebSocket connections
- Database restored and accessible
- DNS pointing to new infrastructure
- Players can connect and play
- Data verified (spot check user accounts)
- Monitoring operational
- Backups configured
- Team notified
Expected Downtime: 2-4 hours (depending on backup size and preparation)
Document Version: 1.0
Last Tested: Never (test quarterly!)
Next Test Date: [SCHEDULE THIS]
Last Tested: Never (test quarterly!)
Next Test Date: [SCHEDULE THIS]
CRITICAL: Test this plan quarterly. An untested disaster recovery plan is worthless.