docs: Monitoring and metrics (#86)

* metrics: Adding docs for metrics Signed-off-by: Dave Henderson <dhenderson@gmail.com> * Addressing review comments Signed-off-by: Dave Henderson <dhenderson@gmail.com> * Apply suggestions from code review Co-authored-by: Matt Holt <mholt@users.noreply.github.com> * Update src/docs/markdown/metrics.md Co-authored-by: Dave Henderson <dhenderson@gmail.com> Co-authored-by: Matt Holt <mholt@users.noreply.github.com>
2025-06-17 11:44:57 -04:00 · 2020-09-22 22:12:24 -04:00 · 2020-09-22 22:12:24 -04:00 · 30084b98b0
commit 30084b98b0
parent 2c6a8a0f9f
4 changed files with 303 additions and 2 deletions
--- a/src/docs/markdown/caddyfile/directives.md
+++ b/src/docs/markdown/caddyfile/directives.md
@ -19,6 +19,7 @@ Directive | Description
 **[header](/docs/caddyfile/directives/header)** | Sets or removes response headers
 **[import](/docs/caddyfile/directives/import)** | Include snippets or files
 **[log](/docs/caddyfile/directives/log)** | Enables access/request logging
+**[metrics](/docs/caddyfile/directives/metrics)** | Configures the Prometheus metrics exposition endpoint
 **[php_fastcgi](/docs/caddyfile/directives/php_fastcgi)** | Serve PHP sites over FastCGI
 **[redir](/docs/caddyfile/directives/redir)** | Issues an HTTP redirect to the client
 **[request_header](/docs/caddyfile/directives/request_header)** | Manipulates request headers
@ -90,10 +91,11 @@ handle_path
 route

 respond
+metrics
 reverse_proxy
 php_fastcgi
 file_server
 acme_server
 ```

-You can override/customize this ordering by using the [`order` global option](/docs/caddyfile/options) or the [`route` directive](/docs/caddyfile/directives/route).
+You can override/customize this ordering by using the [`order` global option](/docs/caddyfile/options) or the [`route` directive](/docs/caddyfile/directives/route).
--- a/src/docs/markdown/caddyfile/directives/metrics.md
+++ b/src/docs/markdown/caddyfile/directives/metrics.md
@ -0,0 +1,45 @@
+---
+title: metrics (Caddyfile directive)
+---
+
+# metrics
+
+Configures a Prometheus metrics exposition endpoint so the gathered metrics can
+be exposed for scraping.
+
+Note that a `/metrics` endpoint is also attached to the [admin API](/docs/api),
+which is not configurable, and is not available when the admin API is disabled.
+
+This endpoint will return metrics in the [Prometheus exposition format](https://prometheus.io/docs/instrumenting/exposition_formats/#text-based-format)
+or, if negotiated, in the [OpenMetrics exposition format](https://pkg.go.dev/github.com/prometheus/client_golang@v1.7.1/prometheus/promhttp#HandlerOpts)
+(`application/openmetrics-text`).
+
+See also [Monitoring Caddy with Prometheus metrics](/docs/metrics).
+
+## Syntax
+
+```caddy-d
+metrics [<matcher>]
+```
+
+## Examples
+
+Expose metrics at the default `/metrics` path:
+
+```caddy-d
+metrics /metrics
+```
+
+Expose metrics at another path:
+
+```caddy-d
+metrics /foo/bar/baz
+```
+
+Serve metrics at a separate subdomain:
+
+```caddy
+metrics.example.com {
+	metrics
+}
+```
--- a/src/docs/markdown/metrics.md
+++ b/src/docs/markdown/metrics.md
@ -0,0 +1,253 @@
+---
+title: Monitoring Caddy with Prometheus metrics
+---
+
+# Monitoring Caddy with Prometheus metrics
+
+Whether you're running thousands of Caddy instances in the cloud, or a single
+Caddy server on an embedded device, it's likely that at some point you'll want
+to have a high-level overview of what Caddy is doing, and how long it's taking.
+In other words, you're going to want to be able to _monitor_ Caddy.
+
+## Prometheus
+
+[Prometheus](https://prometheus.io) is a monitoring platform that collects
+metrics from monitored targets by scraping metrics HTTP endpoints on these
+targets. As well as helping you to display metrics with a dashboarding tool like [Grafana](https://grafana.com/docs/grafana/latest/getting-started/what-is-grafana/), Prometheus is also used for [alerting](https://prometheus.io/docs/alerting/latest/overview/).
+
+Like Caddy, Prometheus is written in Go and distributed as a single binary. To
+install it, see the [Prometheus Installation docs](https://prometheus.io/docs/prometheus/latest/installation/),
+or on MacOS just run `brew install prometheus`.
+
+Read the [Prometheus docs](https://prometheus.io/docs/introduction/first_steps/)
+if you're brand new to Prometheus, otherwise read on!
+
+To configure Prometheus to scrape from Caddy you'll need a YAML configuration
+file similar to this:
+
+```yaml
+# prometheus.yaml
+global:
+  scrape_interval: 15s # default is 1 minute
+
+scrape_configs:
+  - job_name: caddy
+    static_configs:
+      - targets: ['localhost:2019']
+```
+
+You can then start up Prometheus like this:
+
+```console
+$ prometheus --config.file=prometheus.yaml
+```
+
+## Caddy's metrics
+
+Like any process monitored with Prometheus, Caddy exposes an HTTP endpoint
+that responds in the [Prometheus exposition format](https://prometheus.io/docs/instrumenting/exposition_formats/#text-based-format).
+Caddy's Prometheus client is also configured to respond with the [OpenMetrics exposition format](https://pkg.go.dev/github.com/prometheus/client_golang@v1.7.1/prometheus/promhttp#HandlerOpts)
+if negotiated (that is, if the `Accept` header is set to
+`application/openmetrics-text; version=0.0.1`).
+
+By default, there is a `/metrics` endpoint available at the [admin API](/docs/api)
+(i.e. http://localhost:2019/metrics). But if the admin API is
+disabled or you wish to listen on a different port or path, you can use the
+[`metrics` handler](/docs/caddyfile/directives/metrics) to configure this.
+
+You can see the metrics with any browser or HTTP client like `curl`:
+
+```console
+$ curl http://localhost:2019/metrics
+# HELP caddy_admin_http_requests_total Counter of requests made to the Admin API's HTTP endpoints.
+# TYPE caddy_admin_http_requests_total counter
+caddy_admin_http_requests_total{code="200",handler="metrics",method="GET",path="/metrics"} 2
+# HELP caddy_http_request_duration_seconds Histogram of round-trip request durations.
+# TYPE caddy_http_request_duration_seconds histogram
+caddy_http_request_duration_seconds_bucket{code="308",handler="static_response",method="GET",server="remaining_auto_https_redirects",le="0.005"} 1
+caddy_http_request_duration_seconds_bucket{code="308",handler="static_response",method="GET",server="remaining_auto_https_redirects",le="0.01"} 1
+caddy_http_request_duration_seconds_bucket{code="308",handler="static_response",method="GET",server="remaining_auto_https_redirects",le="0.025"} 1
+...
+```
+
+There are a number of metrics you'll see, that broadly fall under 3 categories:
+
+- Runtime metrics
+- Admin API metrics
+- HTTP Middleware metrics
+
+### Runtime metrics
+
+These metrics cover the internals of the Caddy process, and are provided
+automatically by the Prometheus Go Client. They are prefixed with `go_*` and
+`process_*`.
+
+Note that the `process_*` metrics are only collected on Linux and Windows.
+
+See the documentation for the [Go Collector](https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#NewGoCollector),
+[Process Colletor](https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#NewProcessCollector),
+and [BuildInfo Collector](https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#NewBuildInfoCollector).
+
+### Admin API metrics
+
+These are metrics that help to monitor the Caddy admin API. Each of the admin
+endpoints is instrumented to track request counts and errors.
+
+These metrics are prefixed with `caddy_admin_*`.
+
+For example:
+
+```console
+$ curl -s http://localhost:2019/metrics | grep ^caddy_admin
+caddy_admin_http_requests_total{code="200",handler="admin",method="GET",path="/config/"} 1
+caddy_admin_http_requests_total{code="200",handler="admin",method="GET",path="/debug/pprof/"} 2
+caddy_admin_http_requests_total{code="200",handler="admin",method="GET",path="/debug/pprof/cmdline"} 1
+caddy_admin_http_requests_total{code="200",handler="load",method="POST",path="/load"} 1
+caddy_admin_http_requests_total{code="200",handler="metrics",method="GET",path="/metrics"} 3
+```
+
+#### `caddy_admin_http_requests_total`
+
+A counter of the number of requests handled by admin endpoints, including
+modules in the `admin.api.*` namespace.
+
+Label  | Description
+-------|------------
+`code` | HTTP status code
+`handler` | The handler or module name
+`method` | The HTTP method
+`path` | The URL path the admin endpoint was mounted to
+
+#### `caddy_admin_http_request_errors_total`
+
+A counter of the number of errors encountered in admin endpoints, including
+modules in the `admin.api.*` namespace.
+
+Label  | Description
+-------|------------
+`handler` | The handler or module name
+`method` | The HTTP method
+`path` | The URL path the admin endpoint was mounted to
+
+### HTTP Middleware metrics
+
+All Caddy HTTP middleware handlers are instrumented automatically for
+determining request latency, time-to-first-byte, errors, and request/response
+body sizes.
+
+<aside class="tip">Because all middleware handlers are instrumented, and many
+requests are handled by multiple handlers, make sure not to simply sum
+all the counters together.</aside>
+
+For the histogram metrics below, the buckets are currently not configurable.
+For durations, the default ([`prometheus.DefBuckets`](https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#pkg-variables)
+set of buckets is used (5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, and 10s).
+For sizes, the buckets are 256b, 1kiB, 4kiB, 16kiB, 64kiB, 256kiB, 1MiB, and 4MiB.
+
+#### `caddy_http_requests_in_flight`
+
+A gauge of the number of requests currently being handled by this server.
+
+Label  | Description
+-------|------------
+`server` | The server name
+`handler` | The handler or module name
+
+#### `caddy_http_request_errors_total`
+
+A counter of middleware errors encountered while handling requests.
+
+Label  | Description
+-------|------------
+`server` | The server name
+`handler` | The handler or module name
+
+#### `caddy_http_requests_total`
+
+A counter of HTTP(S) requests made.
+
+Label  | Description
+-------|------------
+`server` | The server name
+`handler` | The handler or module name
+
+#### `caddy_http_request_duration_seconds`
+
+A histogram of the round-trip request durations.
+
+Label  | Description
+-------|------------
+`server` | The server name
+`handler` | The handler or module name
+`code` | HTTP status code
+`method` | The HTTP method
+
+#### `caddy_http_request_size_bytes`
+
+A histogram of the total (estimated) size of the request. Includes body.
+
+Label  | Description
+-------|------------
+`server` | The server name
+`handler` | The handler or module name
+`code` | HTTP status code
+`method` | The HTTP method
+
+#### `caddy_http_response_size_bytes`
+
+A histogram of the size of the returned response body.
+
+Label  | Description
+-------|------------
+`server` | The server name
+`handler` | The handler or module name
+`code` | HTTP status code
+`method` | The HTTP method
+
+#### `caddy_http_response_duration_seconds`
+
+A histogram of time-to-first-byte for responses.
+
+Label  | Description
+-------|------------
+`server` | The server name
+`handler` | The handler or module name
+`code` | HTTP status code
+`method` | The HTTP method
+
+## Sample Queries
+
+Once you have Prometheus scraping Caddy's metrics, you can start to see some
+interesting metrics about how Caddy's performing.
+
+<aside class="tip">If you've started up a Prometheus server to scrape Caddy with
+the config above, try pasting these queries into the Prometheus UI at
+<a href="http://localhost:9090/graph">http://localhost:9090/graph</a></aside>
+
+For example, to see the per-second request rate, as averaged over 5 minutes:
+
+```
+rate(caddy_http_requests_total{handler="file_server"}[5m])
+```
+
+To see the rate at which your latency threshold of 100ms is being exceeded:
+
+```
+sum(rate(caddy_http_request_duration_seconds_count{server="srv0"}[5m])) by (handler)
+-
+sum(rate(caddy_http_request_duration_seconds_bucket{le="0.100", server="srv0"}[5m])) by (handler)
+```
+
+To find the 95th percentile request duration on the `file_server`
+handler, you can use a query like this:
+
+```
+histogram_quantile(0.95, sum(caddy_http_request_duration_seconds_bucket{handler="file_server"}) by (le))
+```
+
+Or to see the median response size in bytes for successful `GET` requests on the
+`file_server` handler:
+
+```
+histogram_quantile(0.5, caddy_http_response_size_bytes_bucket{method="GET", handler="file_server", code="200"})
+```