From 30084b98b0631f421a15da8891885b1461fdf118 Mon Sep 17 00:00:00 2001 From: Dave Henderson Date: Tue, 22 Sep 2020 22:12:24 -0400 Subject: [PATCH] docs: Monitoring and metrics (#86) * metrics: Adding docs for metrics Signed-off-by: Dave Henderson * Addressing review comments Signed-off-by: Dave Henderson * Apply suggestions from code review Co-authored-by: Matt Holt * Update src/docs/markdown/metrics.md Co-authored-by: Dave Henderson Co-authored-by: Matt Holt --- src/docs/markdown/caddyfile/directives.md | 4 +- .../markdown/caddyfile/directives/metrics.md | 45 ++++ src/docs/markdown/metrics.md | 253 ++++++++++++++++++ src/includes/docs-nav.html | 3 +- 4 files changed, 303 insertions(+), 2 deletions(-) create mode 100644 src/docs/markdown/caddyfile/directives/metrics.md create mode 100644 src/docs/markdown/metrics.md diff --git a/src/docs/markdown/caddyfile/directives.md b/src/docs/markdown/caddyfile/directives.md index b47e5d6..ce1c4d9 100644 --- a/src/docs/markdown/caddyfile/directives.md +++ b/src/docs/markdown/caddyfile/directives.md @@ -19,6 +19,7 @@ Directive | Description **[header](/docs/caddyfile/directives/header)** | Sets or removes response headers **[import](/docs/caddyfile/directives/import)** | Include snippets or files **[log](/docs/caddyfile/directives/log)** | Enables access/request logging +**[metrics](/docs/caddyfile/directives/metrics)** | Configures the Prometheus metrics exposition endpoint **[php_fastcgi](/docs/caddyfile/directives/php_fastcgi)** | Serve PHP sites over FastCGI **[redir](/docs/caddyfile/directives/redir)** | Issues an HTTP redirect to the client **[request_header](/docs/caddyfile/directives/request_header)** | Manipulates request headers @@ -90,10 +91,11 @@ handle_path route respond +metrics reverse_proxy php_fastcgi file_server acme_server ``` -You can override/customize this ordering by using the [`order` global option](/docs/caddyfile/options) or the [`route` directive](/docs/caddyfile/directives/route). \ No newline at end of file +You can override/customize this ordering by using the [`order` global option](/docs/caddyfile/options) or the [`route` directive](/docs/caddyfile/directives/route). diff --git a/src/docs/markdown/caddyfile/directives/metrics.md b/src/docs/markdown/caddyfile/directives/metrics.md new file mode 100644 index 0000000..40c38d6 --- /dev/null +++ b/src/docs/markdown/caddyfile/directives/metrics.md @@ -0,0 +1,45 @@ +--- +title: metrics (Caddyfile directive) +--- + +# metrics + +Configures a Prometheus metrics exposition endpoint so the gathered metrics can +be exposed for scraping. + +Note that a `/metrics` endpoint is also attached to the [admin API](/docs/api), +which is not configurable, and is not available when the admin API is disabled. + +This endpoint will return metrics in the [Prometheus exposition format](https://prometheus.io/docs/instrumenting/exposition_formats/#text-based-format) +or, if negotiated, in the [OpenMetrics exposition format](https://pkg.go.dev/github.com/prometheus/client_golang@v1.7.1/prometheus/promhttp#HandlerOpts) +(`application/openmetrics-text`). + +See also [Monitoring Caddy with Prometheus metrics](/docs/metrics). + +## Syntax + +```caddy-d +metrics [] +``` + +## Examples + +Expose metrics at the default `/metrics` path: + +```caddy-d +metrics /metrics +``` + +Expose metrics at another path: + +```caddy-d +metrics /foo/bar/baz +``` + +Serve metrics at a separate subdomain: + +```caddy +metrics.example.com { + metrics +} +``` diff --git a/src/docs/markdown/metrics.md b/src/docs/markdown/metrics.md new file mode 100644 index 0000000..ca4c366 --- /dev/null +++ b/src/docs/markdown/metrics.md @@ -0,0 +1,253 @@ +--- +title: Monitoring Caddy with Prometheus metrics +--- + +# Monitoring Caddy with Prometheus metrics + +Whether you're running thousands of Caddy instances in the cloud, or a single +Caddy server on an embedded device, it's likely that at some point you'll want +to have a high-level overview of what Caddy is doing, and how long it's taking. +In other words, you're going to want to be able to _monitor_ Caddy. + +## Prometheus + +[Prometheus](https://prometheus.io) is a monitoring platform that collects +metrics from monitored targets by scraping metrics HTTP endpoints on these +targets. As well as helping you to display metrics with a dashboarding tool like [Grafana](https://grafana.com/docs/grafana/latest/getting-started/what-is-grafana/), Prometheus is also used for [alerting](https://prometheus.io/docs/alerting/latest/overview/). + +Like Caddy, Prometheus is written in Go and distributed as a single binary. To +install it, see the [Prometheus Installation docs](https://prometheus.io/docs/prometheus/latest/installation/), +or on MacOS just run `brew install prometheus`. + +Read the [Prometheus docs](https://prometheus.io/docs/introduction/first_steps/) +if you're brand new to Prometheus, otherwise read on! + +To configure Prometheus to scrape from Caddy you'll need a YAML configuration +file similar to this: + +```yaml +# prometheus.yaml +global: + scrape_interval: 15s # default is 1 minute + +scrape_configs: + - job_name: caddy + static_configs: + - targets: ['localhost:2019'] +``` + +You can then start up Prometheus like this: + +```console +$ prometheus --config.file=prometheus.yaml +``` + +## Caddy's metrics + +Like any process monitored with Prometheus, Caddy exposes an HTTP endpoint +that responds in the [Prometheus exposition format](https://prometheus.io/docs/instrumenting/exposition_formats/#text-based-format). +Caddy's Prometheus client is also configured to respond with the [OpenMetrics exposition format](https://pkg.go.dev/github.com/prometheus/client_golang@v1.7.1/prometheus/promhttp#HandlerOpts) +if negotiated (that is, if the `Accept` header is set to +`application/openmetrics-text; version=0.0.1`). + +By default, there is a `/metrics` endpoint available at the [admin API](/docs/api) +(i.e. http://localhost:2019/metrics). But if the admin API is +disabled or you wish to listen on a different port or path, you can use the +[`metrics` handler](/docs/caddyfile/directives/metrics) to configure this. + +You can see the metrics with any browser or HTTP client like `curl`: + +```console +$ curl http://localhost:2019/metrics +# HELP caddy_admin_http_requests_total Counter of requests made to the Admin API's HTTP endpoints. +# TYPE caddy_admin_http_requests_total counter +caddy_admin_http_requests_total{code="200",handler="metrics",method="GET",path="/metrics"} 2 +# HELP caddy_http_request_duration_seconds Histogram of round-trip request durations. +# TYPE caddy_http_request_duration_seconds histogram +caddy_http_request_duration_seconds_bucket{code="308",handler="static_response",method="GET",server="remaining_auto_https_redirects",le="0.005"} 1 +caddy_http_request_duration_seconds_bucket{code="308",handler="static_response",method="GET",server="remaining_auto_https_redirects",le="0.01"} 1 +caddy_http_request_duration_seconds_bucket{code="308",handler="static_response",method="GET",server="remaining_auto_https_redirects",le="0.025"} 1 +... +``` + +There are a number of metrics you'll see, that broadly fall under 3 categories: + +- Runtime metrics +- Admin API metrics +- HTTP Middleware metrics + +### Runtime metrics + +These metrics cover the internals of the Caddy process, and are provided +automatically by the Prometheus Go Client. They are prefixed with `go_*` and +`process_*`. + +Note that the `process_*` metrics are only collected on Linux and Windows. + +See the documentation for the [Go Collector](https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#NewGoCollector), +[Process Colletor](https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#NewProcessCollector), +and [BuildInfo Collector](https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#NewBuildInfoCollector). + +### Admin API metrics + +These are metrics that help to monitor the Caddy admin API. Each of the admin +endpoints is instrumented to track request counts and errors. + +These metrics are prefixed with `caddy_admin_*`. + +For example: + +```console +$ curl -s http://localhost:2019/metrics | grep ^caddy_admin +caddy_admin_http_requests_total{code="200",handler="admin",method="GET",path="/config/"} 1 +caddy_admin_http_requests_total{code="200",handler="admin",method="GET",path="/debug/pprof/"} 2 +caddy_admin_http_requests_total{code="200",handler="admin",method="GET",path="/debug/pprof/cmdline"} 1 +caddy_admin_http_requests_total{code="200",handler="load",method="POST",path="/load"} 1 +caddy_admin_http_requests_total{code="200",handler="metrics",method="GET",path="/metrics"} 3 +``` + +#### `caddy_admin_http_requests_total` + +A counter of the number of requests handled by admin endpoints, including +modules in the `admin.api.*` namespace. + +Label | Description +-------|------------ +`code` | HTTP status code +`handler` | The handler or module name +`method` | The HTTP method +`path` | The URL path the admin endpoint was mounted to + +#### `caddy_admin_http_request_errors_total` + +A counter of the number of errors encountered in admin endpoints, including +modules in the `admin.api.*` namespace. + +Label | Description +-------|------------ +`handler` | The handler or module name +`method` | The HTTP method +`path` | The URL path the admin endpoint was mounted to + +### HTTP Middleware metrics + +All Caddy HTTP middleware handlers are instrumented automatically for +determining request latency, time-to-first-byte, errors, and request/response +body sizes. + + + +For the histogram metrics below, the buckets are currently not configurable. +For durations, the default ([`prometheus.DefBuckets`](https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#pkg-variables) +set of buckets is used (5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, and 10s). +For sizes, the buckets are 256b, 1kiB, 4kiB, 16kiB, 64kiB, 256kiB, 1MiB, and 4MiB. + +#### `caddy_http_requests_in_flight` + +A gauge of the number of requests currently being handled by this server. + +Label | Description +-------|------------ +`server` | The server name +`handler` | The handler or module name + +#### `caddy_http_request_errors_total` + +A counter of middleware errors encountered while handling requests. + +Label | Description +-------|------------ +`server` | The server name +`handler` | The handler or module name + +#### `caddy_http_requests_total` + +A counter of HTTP(S) requests made. + +Label | Description +-------|------------ +`server` | The server name +`handler` | The handler or module name + +#### `caddy_http_request_duration_seconds` + +A histogram of the round-trip request durations. + +Label | Description +-------|------------ +`server` | The server name +`handler` | The handler or module name +`code` | HTTP status code +`method` | The HTTP method + +#### `caddy_http_request_size_bytes` + +A histogram of the total (estimated) size of the request. Includes body. + +Label | Description +-------|------------ +`server` | The server name +`handler` | The handler or module name +`code` | HTTP status code +`method` | The HTTP method + +#### `caddy_http_response_size_bytes` + +A histogram of the size of the returned response body. + +Label | Description +-------|------------ +`server` | The server name +`handler` | The handler or module name +`code` | HTTP status code +`method` | The HTTP method + +#### `caddy_http_response_duration_seconds` + +A histogram of time-to-first-byte for responses. + +Label | Description +-------|------------ +`server` | The server name +`handler` | The handler or module name +`code` | HTTP status code +`method` | The HTTP method + +## Sample Queries + +Once you have Prometheus scraping Caddy's metrics, you can start to see some +interesting metrics about how Caddy's performing. + + + +For example, to see the per-second request rate, as averaged over 5 minutes: + +``` +rate(caddy_http_requests_total{handler="file_server"}[5m]) +``` + +To see the rate at which your latency threshold of 100ms is being exceeded: + +``` +sum(rate(caddy_http_request_duration_seconds_count{server="srv0"}[5m])) by (handler) +- +sum(rate(caddy_http_request_duration_seconds_bucket{le="0.100", server="srv0"}[5m])) by (handler) +``` + +To find the 95th percentile request duration on the `file_server` +handler, you can use a query like this: + +``` +histogram_quantile(0.95, sum(caddy_http_request_duration_seconds_bucket{handler="file_server"}) by (le)) +``` + +Or to see the median response size in bytes for successful `GET` requests on the +`file_server` handler: + +``` +histogram_quantile(0.5, caddy_http_response_size_bytes_bucket{method="GET", handler="file_server", code="200"}) +``` diff --git a/src/includes/docs-nav.html b/src/includes/docs-nav.html index 720922a..77acbbd 100644 --- a/src/includes/docs-nav.html +++ b/src/includes/docs-nav.html @@ -45,6 +45,7 @@
  • Conventions
  • Config Adapters
  • How Logging Works
  • +
  • Monitoring Caddy
  • Caddy Architecture
  • Developers
  • @@ -59,4 +60,4 @@
  • v1 Docs
  • - \ No newline at end of file +