docs: Monitoring and metrics (#86)

* metrics: Adding docs for metrics

Signed-off-by: Dave Henderson <dhenderson@gmail.com>

* Addressing review comments

Signed-off-by: Dave Henderson <dhenderson@gmail.com>

* Apply suggestions from code review

Co-authored-by: Matt Holt <mholt@users.noreply.github.com>

* Update src/docs/markdown/metrics.md

Co-authored-by: Dave Henderson <dhenderson@gmail.com>

Co-authored-by: Matt Holt <mholt@users.noreply.github.com>
This commit is contained in:
Dave Henderson 2020-09-22 22:12:24 -04:00 committed by GitHub
parent 2c6a8a0f9f
commit 30084b98b0
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
4 changed files with 303 additions and 2 deletions

View file

@ -19,6 +19,7 @@ Directive | Description
**[header](/docs/caddyfile/directives/header)** | Sets or removes response headers
**[import](/docs/caddyfile/directives/import)** | Include snippets or files
**[log](/docs/caddyfile/directives/log)** | Enables access/request logging
**[metrics](/docs/caddyfile/directives/metrics)** | Configures the Prometheus metrics exposition endpoint
**[php_fastcgi](/docs/caddyfile/directives/php_fastcgi)** | Serve PHP sites over FastCGI
**[redir](/docs/caddyfile/directives/redir)** | Issues an HTTP redirect to the client
**[request_header](/docs/caddyfile/directives/request_header)** | Manipulates request headers
@ -90,10 +91,11 @@ handle_path
route
respond
metrics
reverse_proxy
php_fastcgi
file_server
acme_server
```
You can override/customize this ordering by using the [`order` global option](/docs/caddyfile/options) or the [`route` directive](/docs/caddyfile/directives/route).
You can override/customize this ordering by using the [`order` global option](/docs/caddyfile/options) or the [`route` directive](/docs/caddyfile/directives/route).

View file

@ -0,0 +1,45 @@
---
title: metrics (Caddyfile directive)
---
# metrics
Configures a Prometheus metrics exposition endpoint so the gathered metrics can
be exposed for scraping.
Note that a `/metrics` endpoint is also attached to the [admin API](/docs/api),
which is not configurable, and is not available when the admin API is disabled.
This endpoint will return metrics in the [Prometheus exposition format](https://prometheus.io/docs/instrumenting/exposition_formats/#text-based-format)
or, if negotiated, in the [OpenMetrics exposition format](https://pkg.go.dev/github.com/prometheus/client_golang@v1.7.1/prometheus/promhttp#HandlerOpts)
(`application/openmetrics-text`).
See also [Monitoring Caddy with Prometheus metrics](/docs/metrics).
## Syntax
```caddy-d
metrics [<matcher>]
```
## Examples
Expose metrics at the default `/metrics` path:
```caddy-d
metrics /metrics
```
Expose metrics at another path:
```caddy-d
metrics /foo/bar/baz
```
Serve metrics at a separate subdomain:
```caddy
metrics.example.com {
metrics
}
```

View file

@ -0,0 +1,253 @@
---
title: Monitoring Caddy with Prometheus metrics
---
# Monitoring Caddy with Prometheus metrics
Whether you're running thousands of Caddy instances in the cloud, or a single
Caddy server on an embedded device, it's likely that at some point you'll want
to have a high-level overview of what Caddy is doing, and how long it's taking.
In other words, you're going to want to be able to _monitor_ Caddy.
## Prometheus
[Prometheus](https://prometheus.io) is a monitoring platform that collects
metrics from monitored targets by scraping metrics HTTP endpoints on these
targets. As well as helping you to display metrics with a dashboarding tool like [Grafana](https://grafana.com/docs/grafana/latest/getting-started/what-is-grafana/), Prometheus is also used for [alerting](https://prometheus.io/docs/alerting/latest/overview/).
Like Caddy, Prometheus is written in Go and distributed as a single binary. To
install it, see the [Prometheus Installation docs](https://prometheus.io/docs/prometheus/latest/installation/),
or on MacOS just run `brew install prometheus`.
Read the [Prometheus docs](https://prometheus.io/docs/introduction/first_steps/)
if you're brand new to Prometheus, otherwise read on!
To configure Prometheus to scrape from Caddy you'll need a YAML configuration
file similar to this:
```yaml
# prometheus.yaml
global:
scrape_interval: 15s # default is 1 minute
scrape_configs:
- job_name: caddy
static_configs:
- targets: ['localhost:2019']
```
You can then start up Prometheus like this:
```console
$ prometheus --config.file=prometheus.yaml
```
## Caddy's metrics
Like any process monitored with Prometheus, Caddy exposes an HTTP endpoint
that responds in the [Prometheus exposition format](https://prometheus.io/docs/instrumenting/exposition_formats/#text-based-format).
Caddy's Prometheus client is also configured to respond with the [OpenMetrics exposition format](https://pkg.go.dev/github.com/prometheus/client_golang@v1.7.1/prometheus/promhttp#HandlerOpts)
if negotiated (that is, if the `Accept` header is set to
`application/openmetrics-text; version=0.0.1`).
By default, there is a `/metrics` endpoint available at the [admin API](/docs/api)
(i.e. http://localhost:2019/metrics). But if the admin API is
disabled or you wish to listen on a different port or path, you can use the
[`metrics` handler](/docs/caddyfile/directives/metrics) to configure this.
You can see the metrics with any browser or HTTP client like `curl`:
```console
$ curl http://localhost:2019/metrics
# HELP caddy_admin_http_requests_total Counter of requests made to the Admin API's HTTP endpoints.
# TYPE caddy_admin_http_requests_total counter
caddy_admin_http_requests_total{code="200",handler="metrics",method="GET",path="/metrics"} 2
# HELP caddy_http_request_duration_seconds Histogram of round-trip request durations.
# TYPE caddy_http_request_duration_seconds histogram
caddy_http_request_duration_seconds_bucket{code="308",handler="static_response",method="GET",server="remaining_auto_https_redirects",le="0.005"} 1
caddy_http_request_duration_seconds_bucket{code="308",handler="static_response",method="GET",server="remaining_auto_https_redirects",le="0.01"} 1
caddy_http_request_duration_seconds_bucket{code="308",handler="static_response",method="GET",server="remaining_auto_https_redirects",le="0.025"} 1
...
```
There are a number of metrics you'll see, that broadly fall under 3 categories:
- Runtime metrics
- Admin API metrics
- HTTP Middleware metrics
### Runtime metrics
These metrics cover the internals of the Caddy process, and are provided
automatically by the Prometheus Go Client. They are prefixed with `go_*` and
`process_*`.
Note that the `process_*` metrics are only collected on Linux and Windows.
See the documentation for the [Go Collector](https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#NewGoCollector),
[Process Colletor](https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#NewProcessCollector),
and [BuildInfo Collector](https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#NewBuildInfoCollector).
### Admin API metrics
These are metrics that help to monitor the Caddy admin API. Each of the admin
endpoints is instrumented to track request counts and errors.
These metrics are prefixed with `caddy_admin_*`.
For example:
```console
$ curl -s http://localhost:2019/metrics | grep ^caddy_admin
caddy_admin_http_requests_total{code="200",handler="admin",method="GET",path="/config/"} 1
caddy_admin_http_requests_total{code="200",handler="admin",method="GET",path="/debug/pprof/"} 2
caddy_admin_http_requests_total{code="200",handler="admin",method="GET",path="/debug/pprof/cmdline"} 1
caddy_admin_http_requests_total{code="200",handler="load",method="POST",path="/load"} 1
caddy_admin_http_requests_total{code="200",handler="metrics",method="GET",path="/metrics"} 3
```
#### `caddy_admin_http_requests_total`
A counter of the number of requests handled by admin endpoints, including
modules in the `admin.api.*` namespace.
Label | Description
-------|------------
`code` | HTTP status code
`handler` | The handler or module name
`method` | The HTTP method
`path` | The URL path the admin endpoint was mounted to
#### `caddy_admin_http_request_errors_total`
A counter of the number of errors encountered in admin endpoints, including
modules in the `admin.api.*` namespace.
Label | Description
-------|------------
`handler` | The handler or module name
`method` | The HTTP method
`path` | The URL path the admin endpoint was mounted to
### HTTP Middleware metrics
All Caddy HTTP middleware handlers are instrumented automatically for
determining request latency, time-to-first-byte, errors, and request/response
body sizes.
<aside class="tip">Because all middleware handlers are instrumented, and many
requests are handled by multiple handlers, make sure not to simply sum
all the counters together.</aside>
For the histogram metrics below, the buckets are currently not configurable.
For durations, the default ([`prometheus.DefBuckets`](https://pkg.go.dev/github.com/prometheus/client_golang/prometheus#pkg-variables)
set of buckets is used (5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, and 10s).
For sizes, the buckets are 256b, 1kiB, 4kiB, 16kiB, 64kiB, 256kiB, 1MiB, and 4MiB.
#### `caddy_http_requests_in_flight`
A gauge of the number of requests currently being handled by this server.
Label | Description
-------|------------
`server` | The server name
`handler` | The handler or module name
#### `caddy_http_request_errors_total`
A counter of middleware errors encountered while handling requests.
Label | Description
-------|------------
`server` | The server name
`handler` | The handler or module name
#### `caddy_http_requests_total`
A counter of HTTP(S) requests made.
Label | Description
-------|------------
`server` | The server name
`handler` | The handler or module name
#### `caddy_http_request_duration_seconds`
A histogram of the round-trip request durations.
Label | Description
-------|------------
`server` | The server name
`handler` | The handler or module name
`code` | HTTP status code
`method` | The HTTP method
#### `caddy_http_request_size_bytes`
A histogram of the total (estimated) size of the request. Includes body.
Label | Description
-------|------------
`server` | The server name
`handler` | The handler or module name
`code` | HTTP status code
`method` | The HTTP method
#### `caddy_http_response_size_bytes`
A histogram of the size of the returned response body.
Label | Description
-------|------------
`server` | The server name
`handler` | The handler or module name
`code` | HTTP status code
`method` | The HTTP method
#### `caddy_http_response_duration_seconds`
A histogram of time-to-first-byte for responses.
Label | Description
-------|------------
`server` | The server name
`handler` | The handler or module name
`code` | HTTP status code
`method` | The HTTP method
## Sample Queries
Once you have Prometheus scraping Caddy's metrics, you can start to see some
interesting metrics about how Caddy's performing.
<aside class="tip">If you've started up a Prometheus server to scrape Caddy with
the config above, try pasting these queries into the Prometheus UI at
<a href="http://localhost:9090/graph">http://localhost:9090/graph</a></aside>
For example, to see the per-second request rate, as averaged over 5 minutes:
```
rate(caddy_http_requests_total{handler="file_server"}[5m])
```
To see the rate at which your latency threshold of 100ms is being exceeded:
```
sum(rate(caddy_http_request_duration_seconds_count{server="srv0"}[5m])) by (handler)
-
sum(rate(caddy_http_request_duration_seconds_bucket{le="0.100", server="srv0"}[5m])) by (handler)
```
To find the 95th percentile request duration on the `file_server`
handler, you can use a query like this:
```
histogram_quantile(0.95, sum(caddy_http_request_duration_seconds_bucket{handler="file_server"}) by (le))
```
Or to see the median response size in bytes for successful `GET` requests on the
`file_server` handler:
```
histogram_quantile(0.5, caddy_http_response_size_bytes_bucket{method="GET", handler="file_server", code="200"})
```