rollback
Guide for rollback
Rollback Runbook
When to Rollback
Immediate Rollback (no questions asked):
- Critical security vulnerability introduced
- Data corruption detected
- Error rate > 10%
- Core gameplay broken for >50% of users
- Payment processing broken
Evaluate First (assess if hotfix is faster):
- Minor feature broken
- Performance degradation (but stable)
- UI glitches
- Non-critical bugs
Rollback Decision Matrix
Rollback Procedures
Application Rollback (Simple)
Use when: Code changes only, no database migrations
# 1. SSH to production server
ssh loh-prod
# 2. Check current version
git describe --tags # e.g., v1.2.3
# 3. Rollback to previous version
git checkout v1.2.2
cargo build --release
# 4. Restart services
sudo systemctl restart loh-backend
sudo systemctl restart loh-game-server
# 5. Verify
curl https://api.loh.game/healthDuration: 5-10 minutes
Database Rollback (Complex)
Use when: Database migrations were run in deployment
⚠️ CAUTION: Database rollbacks can cause data loss. Evaluate carefully.
# 1. Stop application first
sudo systemctl stop loh-backend
# 2. Check migration status
diesel migration list
# 3. Revert migration (one at a time)
diesel migration revert --database-url=$DATABASE_URL
# 4. Rollback application code
git checkout v1.2.2
cargo build --release
# 5. Restart
sudo systemctl start loh-backendDuration: 15-30 minutes (depending on data volume)
Alternative: Forward Fix (Hotfix)
When faster than rollback:
- Small code change (<10 lines)
- Bug fix can be merged immediately
- No database changes needed
# 1. Create hotfix branch
git checkout -b hotfix/critical-fix v1.2.3
# 2. Apply fix
# (edit code)
# 3. Fast-track review (skip non-critical checks)
git commit -m "hotfix: fix critical bug"
git push origin hotfix/critical-fix
# 4. Deploy immediately
ssh loh-prod
cd /opt/loh-backend
git fetch
git checkout hotfix/critical-fix
cargo build --release
sudo systemctl restart loh-backendPost-Rollback Actions
1. Communication (CRITICAL)
User Announcement:
⚠️ Service Restored
We rolled back to the previous version due to technical issues.
All services are now stable.
We apologize for the inconvenience and are investigating the root cause.2. Preserve Evidence
# Save logs before they rotate
sudo journalctl -u loh-backend --since "1 hour ago" > /tmp/incident_logs.txt
# Database snapshot (if relevant)
pg_dump loh_production > /tmp/post_rollback_dump.sql
# Sentry export
# (capture error IDs and stack traces)3. Root Cause Analysis
- Schedule post-mortem within 24 hours
- Review what went wrong
- Update deploy checklist if needed
- Add tests to prevent recurrence
Rollback Verification Checklist
After rollback, verify:
- Application starts successfully
- Health endpoint returns 200 OK
- WebSocket connections working
- Authentication working
- Database queries succeeding
- Error rate back to baseline (<0.1%)
- No data corruption (spot check recent transactions)
- Monitoring dashboards show green
Common Rollback Scenarios
Scenario 1: Breaking API Change
Problem: New version changed API contract, mobile clients crashing
Solution:
- Rollback immediately
- Implement API versioning in next release
- Maintain backward compatibility
Scenario 2: Performance Regression
Problem: New version causes 2x slower response times
Solution:
- Check if quick optimization possible (hotfix)
- If not, rollback
- Profile code offline to find bottleneck
Scenario 3: Database Migration Failed Partway
Problem: Migration started but errored midway
Solution:
- DO NOT rollback application yet
- Manually fix migration state in database
- Either complete or revert migration
- Then rollback application if needed
Emergency Contacts
- On-Call Engineer: PagerDuty
- Database Admin: @dba on Slack (for migration issues)
- Engineering Lead: (for rollback approval if unclear)