Architecture

Hermes is designed as a distributed two-container system that periodically runs internet speed tests and exports results to multiple observability destinations.

System Architecture

Data Flow

flowchart TD
    subgraph API_CONTAINER["hermes-api container"]
        API["**FastAPI REST API**\nsrc/api/main.py\n:8080"]
        REACT["**React Frontend**\nfrontend/dist\n(served by FastAPI)"]
    end

    subgraph SCHED_CONTAINER["hermes-scheduler container"]
        MAIN["**main.py**\nEntry point + scheduler"]
        RUNNER["**SpeedtestRunner**\nsrc/services/speedtest_runner.py"]
        MODEL["**SpeedResult**\nsrc/models/speed_result.py"]
        DISP["**ResultDispatcher**\nsrc/result_dispatcher.py"]
        ALERT["**AlertManager**\nsrc/services/alert_manager.py"]
        CSV["CSVExporter"]
        SQLITE["SQLiteExporter"]
        PROM["PrometheusExporter\n:8000/metrics"]
        LOKI["LokiExporter"]
    end

    SHARED_VOL[("**Shared Volume**\nruntime_config.json\n.run_trigger\nresults.csv")]
    DATA_VOL[("**Data Volume**\nhermes.db")]
    ALERT_DEST[("**Alert Destinations**\nWebhook, Gotify,\nntfy, Apprise")]

    REACT -- "HTTP GET/POST" --> API
    API -- "POST /api/trigger\nPUT /api/config\nPUT /api/alerts" --> SHARED_VOL
    API -- "GET /api/results" --> DATA_VOL
    SHARED_VOL -- "polls every 30s" --> MAIN
    MAIN -- "scheduled / triggered" --> RUNNER
    RUNNER -- "success" --> MODEL
    RUNNER -- "success/failure" --> ALERT
    MODEL --> DISP
    DISP --> CSV
    DISP --> SQLITE
    DISP --> PROM
    DISP --> LOKI
    CSV -- "writes" --> SHARED_VOL
    SQLITE -- "writes" --> DATA_VOL
    ALERT -- "on threshold" --> ALERT_DEST

    style API fill:#2e7d32,stroke:#a5d6a7,color:#ffffff
    style REACT fill:#1565c0,stroke:#90caf9,color:#ffffff
    style MAIN fill:#2e7d32,stroke:#a5d6a7,color:#ffffff
    style RUNNER fill:#2e7d32,stroke:#a5d6a7,color:#ffffff
    style MODEL fill:#2e7d32,stroke:#a5d6a7,color:#ffffff
    style DISP fill:#2e7d32,stroke:#a5d6a7,color:#ffffff
    style ALERT fill:#c62828,stroke:#ef5350,color:#ffffff
    style CSV fill:#2e7d32,stroke:#a5d6a7,color:#ffffff
    style SQLITE fill:#2e7d32,stroke:#a5d6a7,color:#ffffff
    style PROM fill:#2e7d32,stroke:#a5d6a7,color:#ffffff
    style LOKI fill:#2e7d32,stroke:#a5d6a7,color:#ffffff
    style SHARED_VOL fill:#f57c00,stroke:#ffb74d,color:#ffffff
    style DATA_VOL fill:#f57c00,stroke:#ffb74d,color:#ffffff
    style ALERT_DEST fill:#c62828,stroke:#ef5350,color:#ffffff

Deployment Topology

flowchart LR
    subgraph HERMES_HOST["Hermes Host"]
        subgraph SCHED_CONTAINER["hermes-scheduler"]
            PROM_EP["PrometheusExporter\n:8000/metrics"]
            LOKI_EXP["LokiExporter"]
            ALERT_MGR["AlertManager"]
        end
        subgraph API_CONTAINER["hermes-api"]
            API["FastAPI + React UI\n:8080"]
        end
    end

    subgraph OBS_HOST["Observability Host"]
        PROMETHEUS["Prometheus\n:9090"]
        LOKI["Loki\n:3100"]
        GRAFANA["Grafana\n:3000"]
    end

    subgraph ALERT_HOST["Alert Services (optional)"]
        GOTIFY["Gotify"]
        NTFY["ntfy"]
        APPRISE["Apprise API"]
        WEBHOOK["Custom Webhook"]
    end

    PROMETHEUS -- "scrapes :8000/metrics\nevery 15s" --> PROM_EP
    LOKI_EXP -- "HTTP push\n/loki/api/v1/push" --> LOKI
    GRAFANA -- "PromQL queries" --> PROMETHEUS
    GRAFANA -- "LogQL queries" --> LOKI
    ALERT_MGR -- "HTTP POST\non failure threshold" --> GOTIFY
    ALERT_MGR -- "HTTP POST\non failure threshold" --> NTFY
    ALERT_MGR -- "HTTP POST\non failure threshold" --> APPRISE
    ALERT_MGR -- "HTTP POST\non failure threshold" --> WEBHOOK

    style PROM_EP fill:#2e7d32,stroke:#a5d6a7,color:#ffffff
    style LOKI_EXP fill:#2e7d32,stroke:#a5d6a7,color:#ffffff
    style ALERT_MGR fill:#c62828,stroke:#ef5350,color:#ffffff
    style API fill:#1565c0,stroke:#90caf9,color:#ffffff
    style PROMETHEUS fill:#e65100,stroke:#ffcc80,color:#ffffff
    style LOKI fill:#1565c0,stroke:#90caf9,color:#ffffff
    style GRAFANA fill:#4a148c,stroke:#ce93d8,color:#ffffff
    style GOTIFY fill:#c62828,stroke:#ef5350,color:#ffffff
    style NTFY fill:#c62828,stroke:#ef5350,color:#ffffff
    style APPRISE fill:#c62828,stroke:#ef5350,color:#ffffff
    style WEBHOOK fill:#c62828,stroke:#ef5350,color:#ffffff

Note: Both containers use the same Docker image (hermes:latest) but with different entry points:

hermes-scheduler: Runs python -m src.main (background worker with scheduler)
hermes-api: Uses the default CMD which starts the FastAPI server

This single-image design simplifies builds and deployments while enabling flexible scaling of each service independently.

Component Details

hermes-scheduler Container

Purpose: Background worker that runs speed tests on schedule and exports results.

Key Components:

main.py — Entry point, wires scheduler and watches for trigger file
SpeedtestRunner — Orchestrates speed test providers with automatic fallback; primary provider is retried once before falling back to the next in the configured chain
src/providers/ — Pluggable test provider implementations: OoklaProvider (Ookla CLI), NDT7Provider (M-Lab WebSocket), CustomHttpProvider (user-supplied HTTP endpoints)
SpeedResult — Data model capturing download, upload, ping, jitter, ISP, timestamp
ResultDispatcher — Fans out results to all enabled exporters
AlertManager — Tracks consecutive failures and sends notifications
Exporters: CSV, SQLite, Prometheus, Loki

Exposed Ports:

:8000 — Prometheus /metrics scrape endpoint

Volumes:

/app/logs — CSV results and application logs
/app/data — SQLite database, runtime config, trigger file

Environment Variables:

SPEEDTEST_INTERVAL_MINUTES — Test frequency (default: 60)
RUN_ON_STARTUP — Run test immediately on start (default: true)
SPEEDTEST_PROVIDERS — Ordered, comma-separated provider chain (default: ookla); values: ookla, ndt7, custom
SPEEDTEST_SERVER_ID — Pin Ookla tests to a specific server ID (default: auto-select)
SPEEDTEST_CUSTOM_URL_DOWNLOAD — Download URL for the custom provider (required when custom is in the chain)
SPEEDTEST_CUSTOM_URL_UPLOAD — Upload URL for the custom provider (optional)
SPEEDTEST_CUSTOM_DURATION_S — Download stream duration in seconds (default: 10)
SPEEDTEST_CUSTOM_CONNECTIONS — Parallel download connections (default: 1)
SPEEDTEST_CUSTOM_CHUNK_SIZE_MB — Upload payload size in MB (default: 25)
ENABLED_EXPORTERS — Comma-separated list: csv, sqlite, prometheus, loki
PROMETHEUS_PORT — Port for metrics endpoint (default: 8000)
LOKI_URL — Loki push endpoint (e.g., http://loki:3100)
Alert configuration (see Alerts)

hermes-api Container

Purpose: FastAPI REST API serving the React frontend and providing programmatic access.

Key Components:

src/api/main.py — FastAPI application with middleware
src/api/auth.py — API key authentication and rate limiting
src/api/routes/ — Endpoint modules (config, results, trigger, alerts)
frontend/dist — Built React SPA served as static files

Exposed Ports:

:8080 — HTTP API and frontend

Volumes:

/app/logs — Shared CSV results (for fallback reads)
/app/data — Shared SQLite database and runtime config

Environment Variables:

API_KEY — Optional API key for auth (32+ chars, disables auth if unset)
RATE_LIMIT_PER_MINUTE — Max write requests per API key per 60s (default: 60)
CORS_ORIGINS — Comma-separated allowed origins (default: http://localhost:5173,http://localhost:4173)
MAX_REQUEST_BODY_SIZE — Request size limit in bytes (default: 1048576 = 1 MB)

Middleware:

RequestSizeLimitMiddleware — Rejects requests > MAX_REQUEST_BODY_SIZE
SecurityHeadersMiddleware — Adds X-Frame-Options, X-Content-Type-Options, Cross-Origin-Resource-Policy, Referrer-Policy
CORSMiddleware — Validates origins, restricts methods (GET, POST, PUT) and headers

Data Models

SpeedResult

Shared data contract between all components:

@dataclass
class SpeedResult:
    download_mbps: float
    upload_mbps: float
    ping_ms: float
    jitter_ms: float
    isp: str
    timestamp: str  # ISO 8601 format

Runtime Configuration

Stored in data/runtime_config.json, managed via API or UI:

{
  "speedtest_interval_minutes": 60,
  "enabled_exporters": ["csv", "sqlite", "prometheus"],
  "alerts": {
    "enabled": true,
    "failure_threshold": 3,
    "cooldown_minutes": 60,
    "providers": {
      "webhook": { "enabled": false, "url": "" },
      "gotify": { "enabled": false, "url": "", "token": "", "priority": 5 },
      "ntfy": { "enabled": true, "url": "https://ntfy.sh", "topic": "hermes_alerts", "token": "", "priority": 3, "tags": "warning,rotating_light" },
      "apprise": { "enabled": false, "url": "", "urls": [] }
    }
  }
}

Exporters

CSV Exporter

Path: logs/results.csv

Format:

timestamp,download_mbps,upload_mbps,ping_ms,jitter_ms,isp
2026-04-29T12:00:00Z,250.5,35.2,15.3,2.1,Comcast

Configuration:

CSV_MAX_ROWS — Limit total rows (oldest removed first)
CSV_RETENTION_DAYS — Delete rows older than N days

Use Case: Simple log file for manual inspection or external processing.

SQLite Exporter

Path: data/hermes.db

Table Schema:

CREATE TABLE results (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    timestamp TEXT NOT NULL,
    download_mbps REAL NOT NULL,
    upload_mbps REAL NOT NULL,
    ping_ms REAL NOT NULL,
    jitter_ms REAL NOT NULL,
    isp TEXT NOT NULL
);
CREATE INDEX idx_timestamp ON results(timestamp DESC);

Configuration:

SQLITE_MAX_ROWS — Limit total rows (oldest removed first)
SQLITE_RETENTION_DAYS — Delete rows older than N days
Uses WAL (Write-Ahead Logging) mode for concurrency

Use Case: Primary storage for API and UI queries. Best performance for dashboard charts.

Prometheus Exporter

Endpoint: http://<hermes-host>:8000/metrics

Metrics:

hermes_download_mbps{isp="Comcast"} 250.5
hermes_upload_mbps{isp="Comcast"} 35.2
hermes_ping_ms{isp="Comcast"} 15.3
hermes_jitter_ms{isp="Comcast"} 2.1
hermes_last_test_timestamp{isp="Comcast"} 1714396800
hermes_test_failure{isp="Comcast"} 0

Configuration:

PROMETHEUS_PORT — Port for /metrics endpoint (default: 8000)

Integration:

scrape_configs:
  - job_name: 'hermes'
    scrape_interval: 15s
    static_configs:
      - targets: ['hermes-scheduler:8000']

Use Case: Time-series metrics for Grafana dashboards, alerting rules, long-term retention.

Loki Exporter

Endpoint: Pushes to http://<loki-host>:3100/loki/api/v1/push

Log Format:

{
  "streams": [{
    "stream": {
      "job": "hermes_speedtest",
      "isp": "Comcast",
      "level": "info"
    },
    "values": [
      ["1714396800000000000", "{\"download_mbps\": 250.5, \"upload_mbps\": 35.2, \"ping_ms\": 15.3, \"jitter_ms\": 2.1, \"isp\": \"Comcast\", \"timestamp\": \"2026-04-29T12:00:00Z\"}"]
    ]
  }]
}

Configuration:

LOKI_URL — Loki push endpoint (e.g., http://loki:3100)
LOKI_JOB_LABEL — Job label for log entries (default: hermes_speedtest)

Use Case: Structured logs for LogQL queries in Grafana, correlation with other application logs.

Alert System

AlertManager Component

Purpose: Tracks consecutive test failures and sends notifications when threshold is met.

Behavior:

Maintains failure counter across test runs
Resets counter on successful test
Sends alerts when failure_count >= ALERT_FAILURE_THRESHOLD
Enforces cooldown period (ALERT_COOLDOWN_MINUTES) after each alert
Supports multiple simultaneous providers

Providers:

Webhook — POST JSON to custom HTTP endpoint
Gotify — Self-hosted push notifications
ntfy — Simple pub-sub notifications
Apprise — 100+ services via Apprise API

Configuration:

Via UI Settings page (recommended)
Via environment variables
Stored in data/runtime_config.json

See Alert Configuration for detailed setup guides.

Integration Points

Prometheus Integration

Hermes exposes metrics for Prometheus scraping:

Configure Prometheus scrape job:

scrape_configs:
  - job_name: 'hermes'
    scrape_interval: 15s
    static_configs:
      - targets: ['hermes-scheduler:8000']

Verify metrics:
```
curl http://localhost:8000/metrics
```
Query in Grafana with PromQL:
```
hermes_download_mbps{job="hermes"}
```

Loki Integration

Hermes pushes logs directly to Loki:

Set LOKI_URL environment variable:
```
LOKI_URL=http://loki:3100
```
Enable loki exporter:
```
ENABLED_EXPORTERS=csv,sqlite,loki
```

Query in Grafana with LogQL:

{job="hermes_speedtest"} | json | download_mbps > 100

Grafana Dashboard

Import pre-built dashboard:

Download docs/grafana-dashboard.json
In Grafana: + → Import → Upload JSON file
Select Prometheus and Loki datasources
Dashboard includes:
- Download/upload trend charts
- Ping/jitter statistics
- Test failure annotations
- ISP labels

Project Structure

Hermes/
├── src/
│   ├── main.py                        # Entry point — wires scheduler, dispatcher, and exporters
│   ├── config.py                      # Static config loaded from environment variables
│   ├── constants.py                   # Centralized constants for exporters and alert providers
│   ├── runtime_config.py              # Persistent runtime state (interval, enabled exporters)
│   ├── shared_state.py                # Shared state for alert_manager access across API
│   ├── result_dispatcher.py           # ResultDispatcher — fans out SpeedResult to exporters
│   ├── api/
│   │   ├── main.py                    # FastAPI app — REST API + React frontend serving
│   │   ├── auth.py                    # API key authentication and rate limiting
│   │   └── routes/                    # API endpoint modules (config, results, trigger, alerts)
│   ├── models/
│   │   └── speed_result.py            # SpeedResult dataclass — shared data contract
│   ├── providers/
│   │   ├── base.py                    # BaseTestProvider ABC — provider interface
│   │   ├── ookla.py                   # OoklaProvider — Ookla CLI subprocess runner
│   │   ├── ndt7.py                    # NDT7Provider — M-Lab WebSocket speed test
│   │   └── custom_http.py             # CustomHttpProvider — user-supplied HTTP endpoints
│   ├── services/
│   │   ├── speedtest_runner.py        # SpeedtestRunner — orchestrates providers with fallback
│   │   ├── alert_manager.py           # AlertManager — tracks failures and sends alerts
│   │   ├── alert_providers.py         # Alert provider implementations (Webhook, Gotify, ntfy, Apprise)
│   │   ├── alert_provider_factory.py  # Shared alert provider registration logic
│   │   ├── health_server.py           # Health check endpoint
│   │   └── log_service.py             # Logging configuration
│   ├── exporters/
│   │   ├── base_exporter.py           # Abstract BaseExporter interface
│   │   ├── csv_exporter.py            # CSVExporter — appends rows to CSV log
│   │   ├── prometheus_exporter.py     # PrometheusExporter — updates Gauges, /metrics endpoint
│   │   ├── loki_exporter.py           # LokiExporter — ships JSON log events via HTTP push
│   │   └── sqlite_exporter.py         # SQLiteExporter — stores results in hermes.db (WAL mode)
├── frontend/
│   ├── src/
│   │   ├── main.tsx                   # React app entry point
│   │   ├── pages/                     # Dashboard and Settings pages
│   │   ├── components/                # Reusable UI components
│   │   ├── context/                   # React context for global state
│   │   └── lib/                       # API client and utilities
│   ├── package.json                   # Frontend dependencies
│   └── vite.config.ts                 # Vite build configuration
├── tests/
│   ├── test_main.py
│   ├── test_api_*.py                  # FastAPI endpoint tests (including alerts)
│   ├── test_alert_manager.py
│   ├── test_alert_providers.py
│   ├── test_csv_exporter.py
│   ├── test_loki_exporter.py
│   ├── test_prometheus_exporter.py
│   ├── test_result_dispatcher.py
│   ├── test_runtime_config.py
│   └── test_sqlite_exporter.py
├── .env.example                       # Example environment variables
├── docker-compose.yml                 # Production deployment (two-container architecture)
├── Dockerfile                         # Multi-stage build (Python + Node.js)
├── requirements.txt                   # Python dependencies
├── pytest.ini                         # pytest configuration
└── README.md

Design Principles

Separation of Concerns

Scheduler container handles test execution and data export
API container handles user interaction and REST access
Clear boundaries via shared volumes

Multi-Destination Export

Single test result dispatched to multiple exporters
Exporters are independent — one failure doesn’t affect others
Configurable via runtime settings

Stateless API

API reads shared data (SQLite, CSV) but doesn’t run tests
Triggers are file-based (.run_trigger) for container isolation
Runtime config stored in JSON, not in-memory

Defense in Depth

Multiple security layers (auth, rate limiting, SSRF protection, size limits)
Input validation at API boundary
Security headers on all responses

Observability First

Native Prometheus metrics exposure
Native Loki log pushing
Pre-built Grafana dashboard
Health check endpoints

System Architecture

Data Flow

Deployment Topology

Component Details

hermes-scheduler Container

hermes-api Container

Data Models

SpeedResult

Runtime Configuration

Exporters

CSV Exporter

SQLite Exporter

Prometheus Exporter

Loki Exporter

Alert System

AlertManager Component

Integration Points

Prometheus Integration

Loki Integration

Grafana Dashboard

Project Structure

Design Principles

Separation of Concerns

Multi-Destination Export

Stateless API

Defense in Depth

Observability First

See Also