Error Message Reference
Project: Hermes Speed Monitor
Last Updated: 2026-05-01
Purpose: Centralized error message reference with troubleshooting steps
Table of Contents
- API Errors
- Configuration Errors
- Exporter Errors
- Alert Provider Errors
- Speedtest Errors
- Dispatcher Errors
- Database Errors
API Errors
Authentication & Authorization
Missing X-Api-Key header
HTTP Status: 401 Unauthorized
Source: src/api/auth.py
Cause: API request missing required X-Api-Key header
Troubleshooting:
- Verify API key is configured in
.envfile (API_KEY=your-key-here) - Ensure API client is sending
X-Api-Keyheader with all requests - Check API key is not empty or whitespace-only
Example Fix:
# Add API key to .env
echo "API_KEY=$(openssl rand -hex 32)" >> .env
# Test with curl
curl -H "X-Api-Key: your-key-here" http://localhost:8080/api/health
Invalid API key
HTTP Status: 403 Forbidden
Source: src/api/auth.py
Cause: API key provided does not match configured key
Troubleshooting:
- Verify API key matches value in
.envfile - Check for leading/trailing whitespace in API key
- Ensure API key has not been rotated without updating clients
- Verify environment variables are loaded correctly (
docker-compose logs hermes-api)
Rate limit exceeded
HTTP Status: 429 Too Many Requests
Source: src/api/auth.py
Cause: Client exceeded configured rate limit
Troubleshooting:
- Check
RATE_LIMIT_PER_MINUTEconfiguration (default: 60 requests/minute) - Reduce request frequency or batch operations
- Implement exponential backoff in client code
- Consider increasing rate limit if legitimate use case requires it
Configuration:
# Increase rate limit (use with caution)
RATE_LIMIT_PER_MINUTE=120
Resource Errors
No database found yet
HTTP Status: 503 Service Unavailable
Source: src/api/routes/results.py
Cause: SQLite database not initialized (no speedtests have run yet)
Troubleshooting:
- Wait for first speedtest to complete (check logs)
- Manually trigger test:
POST /api/trigger - Verify SQLite exporter is enabled in runtime config
- Check filesystem permissions on data directory
Expected Behavior: This error is normal on first startup before the first speedtest completes.
Failed to start manual test thread
HTTP Status: 500 Internal Server Error
Source: src/api/routes/trigger.py
Cause: Unable to start background thread for manual speedtest
Troubleshooting:
- Check system resources (CPU, memory)
- Review logs for thread creation errors
- Verify no resource exhaustion (too many threads)
- Restart container if thread pool is exhausted
Configuration Endpoint Errors
Invalid configuration payload
HTTP Status: 400 Bad Request
Source: src/api/routes/config.py
Cause: Configuration update contains invalid values
Troubleshooting:
- Verify
interval_minutesis between 5 and 1440 - Ensure
enabled_exportersis an array of valid exporter names - Check JSON syntax is valid
- Review validation error details in response body
Valid Exporter Names:
csvprometheuslokisqlite
Alert Configuration Errors
Threshold must be a positive integer
HTTP Status: 400 Bad Request
Source: src/api/routes/alerts.py
Cause: Alert threshold value is invalid
Troubleshooting:
- Ensure threshold is an integer ≥ 1
- Recommended range: 2-5 consecutive failures
- Too low (1) may cause false positives
- Too high (>10) may delay critical alerts
Cooldown must be a positive integer
HTTP Status: 400 Bad Request
Source: src/api/routes/alerts.py
Cause: Alert cooldown period is invalid
Troubleshooting:
- Ensure cooldown is an integer ≥ 1 (minutes)
- Recommended: 60 minutes to avoid alert storms
- Minimum: 1 minute (for testing only)
Webhook URL must start with http:// or https://
HTTP Status: 400 Bad Request
Source: src/api/routes/alerts.py
Cause: Webhook URL has invalid scheme
Troubleshooting:
- Ensure URL starts with
http://orhttps:// - No other schemes (ftp, file, etc.) are supported
- Check for typos in URL
Gotify/ntfy URL is required
HTTP Status: 400 Bad Request
Source: src/api/routes/alerts.py
Cause: Alert provider enabled but URL not configured
Troubleshooting:
- Provide valid URL for alert provider
- Verify URL is accessible from container
- Check network connectivity to alert service
Configuration Errors
Runtime Configuration
Runtime config validation failed
Level: WARNING
Source: src/runtime_config.py
Cause: Stored configuration contains invalid values
Troubleshooting:
- Check
data/runtime_config.jsonfor malformed JSON - Review validation warnings in logs
- Delete config file to reset to defaults (make backup first)
- Verify file permissions allow read/write
Manual Reset:
# Backup current config
cp data/runtime_config.json data/runtime_config.json.bak
# Reset to defaults (container will regenerate)
rm data/runtime_config.json
docker-compose restart hermes-api hermes-scheduler
Could not save runtime config
Level: ERROR
Source: src/runtime_config.py
Cause: Failed to write configuration file
Troubleshooting:
- Check filesystem permissions on
data/directory - Verify disk space is available (
df -h) - Check for read-only filesystem
- Review logs for underlying OS errors
interval_minutes out of range
Level: WARNING
Source: src/runtime_config.py
Cause: Interval value outside allowed range (5-1440)
Troubleshooting:
- Ensure interval is between 5 minutes and 24 hours
- Minimum: 5 minutes (to avoid ISP throttling)
- Maximum: 1440 minutes (24 hours)
- Default will be used if invalid
Exporter Errors
CSV Exporter
CSV write failed
Level: ERROR
Source: src/exporters/csv_exporter.py
Cause: Unable to write to CSV file
Troubleshooting:
- Check filesystem permissions on logs directory
- Verify disk space available
- Ensure CSV path is valid and writable
- Check for file locks from other processes
CSV prune failed
Level: WARNING
Source: src/exporters/csv_exporter.py
Cause: Unable to prune old rows (non-fatal)
Troubleshooting:
- Check CSV file is not corrupted
- Verify sufficient disk space for temp file
- Review retention configuration
- Note: This warning does not prevent new writes
Note: CSV pruning failures are non-fatal. New results will still be written.
Prometheus Exporter
Invalid port number
Level: ERROR
Source: src/exporters/prometheus_exporter.py
Cause: Prometheus port is not a valid integer or is out of range
Troubleshooting:
- Verify
PROMETHEUS_PORTis an integer (1-65535) - Common ports: 9090 (Prometheus), 9100 (Node Exporter), 8000-9999 (custom)
- Ensure port is not already in use
- Check for typos in environment variable
Prometheus port already in use
Level: ERROR
Source: src/exporters/prometheus_exporter.py
Cause: Another process is listening on configured port
Troubleshooting:
- Check for duplicate Hermes instances
- Use different port:
PROMETHEUS_PORT=9091 - Identify conflicting process:
lsof -i :9090(Linux) ornetstat -ano | findstr :9090(Windows) - Kill conflicting process or choose different port
Failed to update Prometheus gauges
Level: ERROR
Source: src/exporters/prometheus_exporter.py
Cause: Error updating metrics (likely label mismatch)
Troubleshooting:
- Check result data contains all required fields
- Review logs for label cardinality warnings
- Verify ISP field is not excessively long
- Restart exporter to reset metrics
Loki Exporter
Loki URL is required
Level: ERROR
Source: src/exporters/loki_exporter.py
Cause: Loki exporter enabled but URL not configured
Troubleshooting:
- Set
LOKI_URLenvironment variable - Example:
LOKI_URL=http://loki:3100 - Verify Loki is accessible from container
- Test connectivity:
curl http://loki:3100/ready
Loki URL must use http or https
Level: ERROR
Source: src/exporters/loki_exporter.py
Cause: Invalid URL scheme
Troubleshooting:
- Ensure URL starts with
http://orhttps:// - No other schemes supported (file, ftp, etc.)
- Check for typos
Loki push connection error
Level: ERROR
Source: src/exporters/loki_exporter.py
Cause: Network connectivity issue to Loki
Troubleshooting:
- Verify Loki service is running
- Check network connectivity:
docker exec hermes-scheduler curl http://loki:3100/ready - Verify Docker network configuration
- Check firewall rules
- Review Loki logs for errors
Loki push timed out
Level: ERROR
Source: src/exporters/loki_exporter.py
Cause: Loki did not respond within timeout period
Troubleshooting:
- Check Loki service health and load
- Increase timeout:
LOKI_TIMEOUT=10(default: 5) - Verify network latency is acceptable
- Review Loki resource usage (CPU, memory)
- Check for Loki ingester backlogs
Loki push rejected
Level: ERROR
Source: src/exporters/loki_exporter.py
Cause: Loki returned HTTP error
Troubleshooting:
- Check Loki authentication if required
- Verify push path is correct (default:
/loki/api/v1/push) - Review Loki error response in logs
- Check Loki tenant configuration if multi-tenant
- Verify log labels are valid (no special characters)
SQLite Exporter
SQLite write failed
Level: ERROR
Source: src/exporters/sqlite_exporter.py
Cause: Failed to insert row into database
Troubleshooting:
- Check filesystem permissions on data directory
- Verify disk space available
- Check database is not corrupted:
sqlite3 data/results.db "PRAGMA integrity_check;" - Review schema migration status
SQLite database locked (timeout after Xs)
Level: ERROR
Source: src/exporters/sqlite_exporter.py
Cause: Database locked by another process
Troubleshooting:
- Check for long-running queries (API result endpoint with large page size)
- Reduce page_size in API queries
- Verify WAL mode is enabled (better concurrency)
- Check for zombie processes holding locks:
fuser data/results.db - Restart container if lock persists
Note: Lock timeout is configured in code (default: 30 seconds)
Alert Provider Errors
Generic Alert Errors
Alert provider [name] failed to send
Level: ERROR
Source: src/services/alert_manager.py
Cause: Alert provider failed to send notification
Troubleshooting:
- Check provider-specific logs for details
- Verify provider URL is accessible
- Test provider manually (curl/API client)
- Check network connectivity
- Review provider authentication credentials
Note: Alert sending is async. Failures are logged but do not block speedtest execution.
Webhook Provider
Webhook delivery failed
Level: ERROR
Source: src/services/alert_providers.py
Cause: HTTP request to webhook URL failed
Troubleshooting:
- Verify webhook URL is correct and accessible
- Check webhook endpoint logs for errors
- Verify endpoint accepts POST with JSON payload
- Test manually:
curl -X POST -H "Content-Type: application/json" -d '{"test": true}' YOUR_WEBHOOK_URL - Check for SSRF protection blocking private IPs
Gotify Provider
Gotify notification failed
Level: ERROR
Source: src/services/alert_providers.py
Cause: Failed to send notification to Gotify
Troubleshooting:
- Verify Gotify URL and app token
- Check Gotify service is running
- Test token:
curl "http://gotify:80/message?token=YOUR_TOKEN" -F "message=test" - Review Gotify logs for errors
- Verify network connectivity from container
ntfy Provider
ntfy notification failed
Level: ERROR
Source: src/services/alert_providers.py
Cause: Failed to send notification to ntfy
Troubleshooting:
- Verify ntfy URL and topic name
- Check ntfy service is running
- Test:
curl -d "test message" http://ntfy:80/YOUR_TOPIC - Review ntfy logs for errors
- Verify topic name does not contain invalid characters
Speedtest Errors
Runner Errors
Ookla CLI not found
Level: ERROR
Source: src/providers/ookla.py
Cause: The Ookla speedtest binary is not installed or is not in PATH
Troubleshooting:
- Install from https://www.speedtest.net/apps/cli
- Verify:
speedtest --version(should print “Speedtest by Ookla”) - If using only
ndt7orcustomproviders, the Ookla CLI is not required — removeooklafromSPEEDTEST_PROVIDERS
Speedtest execution failed
Level: ERROR
Source: src/providers/ookla.py
Cause: Ookla CLI returned a non-zero exit code
Troubleshooting:
- Verify CLI is installed:
speedtest --version - Check internet connectivity
- Test manually:
speedtest --format=json - Review stderr output in logs
- Check for ISP blocking speedtest traffic
Speedtest timed out
Level: ERROR
Source: src/providers/ookla.py
Cause: Ookla CLI did not complete within 120 seconds
Troubleshooting:
- Check for network congestion
- Pin to a faster server via
SPEEDTEST_SERVER_ID - Verify sufficient bandwidth available
- Check for ISP throttling
- Consider adding
ndt7as a fallback:SPEEDTEST_PROVIDERS=ookla,ndt7
NDT7 test failed
Level: ERROR
Source: src/providers/ndt7.py
Cause: M-Lab Locate API unreachable, or WebSocket connection dropped
Troubleshooting:
- Confirm outbound HTTPS is allowed (Locate API at
locate.measurementlab.net) - Confirm outbound WSS (port 443) is not blocked by firewall
- M-Lab may be temporarily at capacity — check https://www.measurementlab.net/status/
Custom HTTP test failed
Level: ERROR
Source: src/providers/custom_http.py
Cause: Request to the configured download or upload URL failed
Troubleshooting:
- Verify
SPEEDTEST_CUSTOM_URL_DOWNLOADis reachable from the container - Confirm the URL scheme is
http://orhttps://(other schemes are rejected for security) - For upload: verify the upload endpoint accepts
POSTwithapplication/octet-stream - Check firewall rules between the Hermes container and the custom endpoint
All providers exhausted
Level: ERROR
Source: src/services/speedtest_runner.py
Cause: Every provider in the SPEEDTEST_PROVIDERS
chain failed; the primary provider was retried once before falling back
Troubleshooting:
- Review per-provider error entries above for root cause
- Check consecutive failure count for alerting
- Verify internet connectivity is stable
- Add a resilient fallback, e.g.,
SPEEDTEST_PROVIDERS=ookla,ndt7
Note: The primary provider is retried once on failure. Subsequent providers in the chain are each attempted once. The error from the last failed provider is surfaced.
Dispatcher Errors
Dispatch Failures
Exporter '[name]' failed
Level: ERROR
Source: src/result_dispatcher.py
Cause: One or more exporters failed during dispatch
Troubleshooting:
- Review exporter-specific error in logs
- Check exporter configuration
- Verify exporter dependencies are running (Loki, Prometheus)
- Test exporter individually
- Disable failing exporter if blocking progress
Note: Dispatcher continues even if one exporter fails. Partial success is possible.
Dispatch called but no exporters are registered
Level: WARNING
Source: src/result_dispatcher.py
Cause: No exporters configured
Troubleshooting:
- Enable at least one exporter in runtime config
- Verify
enabled_exporterslist is not empty - Check exporter registration in main.py
- Review startup logs for exporter initialization
Note: This is a configuration issue, not an error. Add at least one exporter to store results.
Invalid result type passed to register
Level: ERROR
Source: src/result_dispatcher.py
Cause: Attempted to register a non-exporter object
Troubleshooting:
- This is a programming error, not a configuration issue
- Verify custom exporters inherit from
BaseExporter - Review exporter registration code
- Report bug if using only built-in exporters
Database Errors
Migration Errors
SQLite migration failed
Level: ERROR
Source: src/exporters/sqlite_exporter.py
Cause: Failed to initialize or migrate database schema
Troubleshooting:
- Check database file is not corrupted
- Verify filesystem permissions
- Review migration logic for errors
- Backup and delete database to recreate:
mv data/results.db data/results.db.bak
SQLite integrity check failed
Level: ERROR
Source: Manual troubleshooting
Cause: Database file is corrupted
Troubleshooting:
- Stop container
- Backup database:
cp data/results.db data/results.db.bak - Attempt repair:
sqlite3 data/results.db ".recover" | sqlite3 data/results_repaired.db - If unrecoverable, restore from backup or start fresh
General Troubleshooting Tips
Logs
- Scheduler logs:
docker-compose logs -f hermes-scheduler - API logs:
docker-compose logs -f hermes-api - Combined:
docker-compose logs -f
Health Checks
- API health:
curl http://localhost:8080/api/health - Prometheus metrics:
curl http://localhost:9090/metrics - Loki ready:
curl http://loki:3100/ready
Configuration Validation
# Check runtime config
cat data/runtime_config.json | jq .
# Validate JSON syntax
jq empty data/runtime_config.json && echo "Valid JSON" || echo "Invalid JSON"
Database Inspection
# Connect to SQLite database
sqlite3 data/results.db
# Check schema
.schema
# Count rows
SELECT COUNT(*) FROM speed_results;
# View latest result
SELECT * FROM speed_results ORDER BY timestamp DESC LIMIT 1;
Network Debugging
# Test connectivity from scheduler container
docker exec hermes-scheduler curl -v http://loki:3100/ready
# Test external connectivity
docker exec hermes-scheduler curl -v https://www.google.com
# Check DNS resolution
docker exec hermes-scheduler nslookup loki
Related Documentation
- Error Handling Conventions — Exception patterns and guidelines
- Error Catalog — Complete list of errors by module
- Monitoring Runbook — Production monitoring guide
- Architecture — System design and component interaction
- API Reference — Complete API documentation
Document History
- 2026-05-01: Initial documentation (M2 from ERROR-HANDLING-REVIEW.md)