prometheus port conflict
Guide for prometheus port conflict
Prometheus Port Conflict Issue
Problem
When restarting the game client after a crash or manual termination, the application panics:
thread 'main' (21677) panicked at src/systems/telemetry.rs:57:10:
failed to install Prometheus recorder: FailedToCreateHTTPListener("Address already in use (os error 98)")Root Cause
The Prometheus HTTP listener binds to port 9000 and doesn't release it immediately upon process termination.
This happens because:
- The game process terminates (crash, Ctrl+C, or normal exit)
- The OS keeps the TCP socket in
TIME_WAITstate for 30-120 seconds - Restarting the game immediately tries to bind to port 9000 again
- The bind fails because the port is still in use
File:
src/systems/telemetry.rs (lines 53-57)let builder = PrometheusBuilder::new();
builder
.with_http_listener(([0, 0, 0, 0], 9000))
.install()
.expect("failed to install Prometheus recorder"); // ← Panics hereSolutions
Solution 1: Graceful Error Handling (Recommended for Development)
Don't panic if the port is already in use. Log a warning instead and continue without metrics.
File:
src/systems/telemetry.rsBefore (lines 53-57):
let builder = PrometheusBuilder::new();
builder
.with_http_listener(([0, 0, 0, 0], 9000))
.install()
.expect("failed to install Prometheus recorder");After:
let builder = PrometheusBuilder::new();
match builder.with_http_listener(([0, 0, 0, 0], 9000)).install() {
Ok(_) => {
info!("Telemetry initialized. Metrics at http://localhost:9000/metrics");
}
Err(e) => {
warn!("Failed to install Prometheus recorder: {}. Metrics will be unavailable.", e);
warn!("This usually means port 9000 is already in use. Wait 30s or kill the process using the port.");
}
}Pros:
- Game doesn't crash on restart
- Clear warning message to developers
- Metrics are optional, not critical for gameplay
Cons:
- Metrics unavailable until port is freed
- Silent failure if you don't check logs
Solution 2: Dynamic Port Allocation
Let the OS choose an available port automatically.
File:
src/systems/telemetry.rslet builder = PrometheusBuilder::new();
match builder.with_http_listener(([0, 0, 0, 0], 0)).install() { // Port 0 = dynamic
Ok(_) => {
// Note: You won't know which port was assigned without additional code
info!("Telemetry initialized. Metrics available on a dynamic port.");
}
Err(e) => {
warn!("Failed to install Prometheus recorder: {}", e);
}
}Pros:
- Never conflicts
- Works for multiple instances
Cons:
- Port number is unknown (need to query the listener to find it)
- Harder to configure Prometheus scraper
Solution 3: SO_REUSEADDR Socket Option
Allow the socket to be reused immediately (requires changes to
metrics-exporter-prometheus).Not directly available in the current API, but you could:
- Fork
metrics-exporter-prometheus - Add
SO_REUSEADDRto the socket before binding - Use your forked version
Pros:
- Immediate port reuse
- Consistent port number
Cons:
- Requires maintaining a fork
- More complex
Solution 4: Kill Existing Process on Port 9000
Manually or automatically kill any process using port 9000 before starting.
Manual:
# Find process using port 9000
lsof -ti:9000
# Kill it
kill -9 $(lsof -ti:9000)
# Then run the game
cargo run --bin legends_clientAutomatic (in a startup script):
#!/bin/bash
# kill_and_run.sh
lsof -ti:9000 | xargs -r kill -9
cargo run --bin legends_clientPros:
- Ensures port is always available
- No code changes needed
Cons:
- Might kill unrelated processes
- Platform-specific (Linux/macOS only)
Solution 5: Wait for Port to Be Released
Add retry logic with exponential backoff.
File:
src/systems/telemetry.rslet builder = PrometheusBuilder::new();
let mut retries = 0;
let max_retries = 3;
loop {
match builder.with_http_listener(([0, 0, 0, 0], 9000)).install() {
Ok(_) => {
info!("Telemetry initialized. Metrics at http://localhost:9000/metrics");
break;
}
Err(e) if retries < max_retries => {
retries += 1;
warn!("Failed to bind port 9000 (attempt {}/{}): {}. Retrying in 2s...", retries, max_retries, e);
std::thread::sleep(std::time::Duration::from_secs(2));
}
Err(e) => {
warn!("Failed to install Prometheus recorder after {} attempts: {}", max_retries, e);
break;
}
}
}Pros:
- Automatic recovery
- No manual intervention
Cons:
- Delays startup
- Still might fail if port is held by another service
Recommended Approach
For Development: Use Solution 1 (Graceful Error Handling)
- Fast iteration
- Non-blocking
- Clear error messages
For Production: Use Solution 2 (Dynamic Port) + Service Discovery
- Deploy with a sidecar that queries the dynamic port
- Register with Prometheus via service discovery (Kubernetes, Consul, etc.)
For Local Testing: Use Solution 4 (Kill Script)
- Convenient for rapid testing
- No code changes
Implemented Solution
We implemented Solution 1 (Graceful Error Handling) for the development environment.
File:
src/systems/telemetry.rsBefore (lines 53-59):
let builder = PrometheusBuilder::new();
builder
.with_http_listener(([0, 0, 0, 0], 9000))
.install()
.expect("failed to install Prometheus recorder");
info!("Telemetry initialized. Metrics at http://localhost:9000/metrics");After (lines 53-60):
let builder = PrometheusBuilder::new();
builder
.with_http_listener(([0, 0, 0, 0], 9000))
.install()
.unwrap_or_else(|e| {
warn!("Failed to start Prometheus exporter (port 9000 likely in use): {}", e);
});
info!("Telemetry initialized. Metrics attempted at http://localhost:9000/metrics");Changes:
- Replaced
.expect()with.unwrap_or_else()to handle errors gracefully - Log a warning instead of panicking when port binding fails
- Updated success message to say "attempted" instead of "at" to reflect that metrics may not be available
Verification
After implementing Solution 1:
# First run
cargo run --bin legends_client
# Output: Telemetry initialized. Metrics attempted at http://localhost:9000/metrics
# Ctrl+C to stop
# Immediate second run
cargo run --bin legends_client
# Output (if port still in use):
# WARN Failed to start Prometheus exporter (port 9000 likely in use): FailedToCreateHTTPListener("Address already in use (os error 98)")
# INFO Telemetry initialized. Metrics attempted at http://localhost:9000/metrics
# Game continues to run normally ✅Verification Status: ✅ CONFIRMED - The game no longer crashes when port 9000 is in use.
Related Issues
- Tracing panic: See tracing_panic_fix.md
- Duplicate plugin panic: See duplicate_plugin_panic.md
Platform-Specific Notes
Linux
- Default
TIME_WAITduration: 60 seconds - Check port usage:
lsof -i :9000orss -tuln | grep 9000
macOS
- Default
TIME_WAITduration: 15 seconds - Check port usage:
lsof -i :9000
Windows
- Default
TIME_WAITduration: 120 seconds - Check port usage:
netstat -ano | findstr :9000