How can we help?

Monitoring & Observability

What metrics do you collect and how is the platform monitored?

Every node in the cluster runs a Prometheus node exporter and a Datadog agent. The kube-prometheus-stack continuously collects:

Node-level: CPU, memory, disk I/O, network throughput
Kubernetes: pod health, deployment status, PVC usage, API server latency, kubelet metrics
Application (per dyno): CPU %, memory % against allocated limits, over configurable rolling windows
Addon-level:
- PostgreSQL: active connections, transaction commits, cache hit ratio, database size
- Elasticsearch: query rate, fetch time, indexing time, JVM memory used, cluster health, document count
- Redis: memory usage, connections

Grafana dashboards provide real-time visibility into all of the above.

How does alerting work?

Honeybadger monitors application errors and infrastructure health. Alerts are routed to PagerDuty (for on-call paging) and Slack (for team visibility). The Deploy team is on-call for infrastructure incidents.

Users can also set their own consumption alerts ‚Äî configurable CPU, memory, and storage thresholds per app or addon ‚Äî with email, Slack, or PagerDuty notifications at frequencies from every 5 minutes to weekly.

How can we help?

Monitoring & Observability

What metrics do you collect and how is the platform monitored?

How does alerting work?

Can't find what you're looking for?

ON THIS PAGE