docs: New logging & arch articles; various minor improvements

This commit is contained in:
Matthew Holt 2020-03-30 15:15:23 -06:00
parent e918275b63
commit fe58da0269
No known key found for this signature in database
GPG key ID: 2A349DD577D586A5
9 changed files with 365 additions and 19 deletions

View file

@ -0,0 +1,145 @@
---
title: Architecture
---
Architecture
============
Caddy is a single, self-contained, static binary with zero external dependencies because it's written in Go. These values comprise important parts of the project's vision because they simplify deployment and reduce tedious troubleshooting in production environments.
If there's no dynamic linking, then how can it be extended? Caddy sports a novel plugin architecture that expands its capabilities far beyond that of any other web server, even those with external (dynamically-linked) dependencies.
Our philosophy of "fewer moving parts" ultimately results in more reliable, more manageable, less expensive sites—especially at scale. This semi-technical document describes how we achieve that goal through software engineering.
## Overview
Caddy consists of a command, core library, and modules.
The **command** provides the [command line interface](/docs/command-line) you are hopefully familiar with. It's how you launch the process from your operating system. The amount of code and logic here is fairly minimal, and has only what is needed to bootstrap the core in the user's desired way. We intentionally avoid using flags and environment variables for configuration except as they pertain to bootstrapping config.
<aside class="tip">
Modules can add subcommands to the command line interface! For instance, that's where the <a href="/docs/command-line#caddy-file-server">caddy file-server</a> command comes from. These added commands may have any flags or use any environment variables they want, even though the core Caddy commands minimize their use.
</aside>
The **[core library](https://pkg.go.dev/github.com/caddyserver/caddy/v2?tab=doc)**, or "core" of Caddy, primarily manages configuration. It can [`Run()`](https://pkg.go.dev/github.com/caddyserver/caddy/v2?tab=doc#Run) a new configuration or [`Stop()`](https://pkg.go.dev/github.com/caddyserver/caddy/v2?tab=doc#Stop) a running config. It also provides various utilities, types, and values for modules to use.
**Modules** do everything else. Many modules come built into Caddy, which are called the _standard modules_. These are determined to be the most useful to the most users.
<aside class="tip">
Sometimes the terms <i>module</i>, <i>plugin</i>, and <i>extension</i> get used interchangably, and usually that's OK. Technically, all modules are plugins, but not all plugins are modules. Modules are specifically a kind of plugin that extends Caddy's <a href="/docs/json/">config structure</a>.
</aside>
## Caddy core
At its core, Caddy merely loads an initial configuration ("config") or, if there isn't one, opens a socket to accept new configuration later on.
A [Caddy configuration](/docs/json/) is a JSON document, with some fields at its top level:
```json
{
"admin": {},
"logging": {},
"apps": {•••},
...
}
```
The core of Caddy knows how to work with some of these fields natively:
- [`admin`](/docs/json/admin/) so it can set up the [admin API](/docs/api) and manage the process
- [`logging`](/docs/json/logging/) so it can [emit logs](/docs/logging)
But other top-level fields (like [`apps`](/docs/json/apps/)) are opaque to the core of Caddy. In fact, all Caddy knows how to do with the bytes in `apps` is deserialize them into an interface type that it can call two methods on:
1. `Start()`
2. `Stop()`
... and that's it. It calls `Start()` on each app when a config is loaded, and `Stop()` on each app when a config is unloaded.
When an app module is started, it initiates the app's module lifecycle.
<aside class="tip">
If you are a programmer who is building Caddy modules, you can find analogous information in our <a href="/docs/extending-caddy">Extending Caddy</a> guide, but with more focus on code.
</aside>
## Module lifecycle
There are two kinds of modules: _host modules_ and _guest modules_.
**Host modules** (or "parent" modules) are those that load other modules.
**Guest modules** (or "child" modules) are those that get loaded. All modules are guest modules -- even app modules.
Modules get loaded, are provisioned and validated, get used, then are cleaned up, in this sequence:
1. Loaded
2. Provisioned and validated
3. Used
4. Cleaned up
Caddy kicks off the module lifecycle when a config is loaded first by initializing all the configured app modules. From there, it's turtles all the way down as each app module takes it the rest of the way.
### Load phase
Loading a module involves deserializing its JSON bytes into a typed value in memory. That's... basically it. It's just decoding JSON into a value.
### Provision phase
This phase is where most of the setup work goes. All modules get a chance to provision themselves after being loaded.
Since any properties from the JSON encoding will already have been decoded, only additional setup needs to take place here. The most common task during provisioning is setting up guest modules. In other words, provisioning a host module also results in provisioning its guest modules, all the way down.
You can get a sense for this by [traversing Caddy's JSON structure in our docs](/docs/json/). Anywhere you see `{•••}` is where guest modules may be used; and as you click into one, you can continue exploring all the way down until there are no more guest modules.
Other common provisioning tasks are setting up internal values that will be used during the module's lifetime, or standardizing inputs. For example, the [http.matchers.remote_ip](/docs/modules/http.matchers.remote_ip) module uses the provisioning phase to parse CIDR values out of the string inputs it received from the JSON. That way, it doesn't have to do this during every HTTP request, and is more efficient as a result.
Validation also can take place in the provision phase. If a module's resulting config is invalid, an error can be returned here which aborts the entire config load process.
### Use phase
Once a guest module is provisioned and validated, it can be used by its host module. What exactly this means is up to each host module.
Each module has an ID, which consists of a namespace and a name in that namespace. For example, [`http.handlers.reverse_proxy`](/docs/modules/http.handlers.reverse_proxy) is an HTTP handler because it is in the `http.handlers` namespace, and its name is `reverse_proxy`. All modules in the `http.handlers` namespace satisfy the same interface, known to the host module. Thus, the `http` app knows how to load and use these kinds of modules.
### Cleanup phase
When it is time for a config to be stopped, all modules get unloaded. If a module allocated any resources that should be freed, it has an opportunity to do so in the cleanup phase.
## Plugging in
A module -- or any Caddy plugin -- gets "plugged in" to Caddy by adding an `import` for the module's package. By importing the package, [the module registers itself](https://pkg.go.dev/github.com/caddyserver/caddy/v2?tab=doc#RegisterModule) with the Caddy core, so when the Caddy process starts, it knows each module by name. It can even associate between module values and names, and vice-versa.
<aside class="tip">
Plugins can be added without modifying the Caddy code base at all. There are instructions <a href="https://github.com/caddyserver/caddy/#with-version-information-andor-plugins">in the readme</a> for doing this!
</aside>
## Managing configuration
Changing a running server's active configuration (often called a "reload") can be tricky with the high levels of concurrency and thousands of parameters that servers require. Caddy solves this problem elegantly using a design that has many benefits:
- No interruption to running services
- Granular config changes are possible
- Only one lock required (in the background)
- All reloads are atomic, consistent, isolated, and mostly durable ("ACID")
- Minimal global state
You can [watch a video about the design of Caddy 2 here](https://www.youtube.com/watch?v=EhJO8giOqQs).
A config reload works by provisioning the new modules, and if all succeed, the old ones are cleaned up. For a brief period, two configs are operational at the same time.
Each configuration is associated with a [context](https://pkg.go.dev/github.com/caddyserver/caddy/v2?tab=doc#Context) which holds all the module state, so most state escapes the scope of a config. This is good news for correctness, performance, and simplicity!
However, sometimes truly global state is necessary. For example, the reverse proxy may keep track of the health of its upstreams; since there is only one of each upstream globally, it would be bad if it forgot about them every time a minor config change was made. Fortunately, Caddy [provides facilities](https://pkg.go.dev/github.com/caddyserver/caddy/v2?tab=doc#UsagePool) similar to a language runtime's garbage collector to keep global state tidy.
One obvious approach to on-line config updates is to synchronize access to every single config parameter, even in hot paths. This is unbelievably bad in terms of performance and complexity&mdash;especially at scale&mdash;so Caddy does not use this approach.
Instead, configs are treated as immutable, atomic units: either the whole thing is replaced, or nothing gets changed. The [admin API endpoints](/docs/api)&mdash;which permit granular changes by traversing into the structure&mdash;mutate only an in-memory representation of the config, from which a whole new config document is generated and loaded. This approach has vast benefits in terms of simplicity, performance, and consistency. Since there is only one lock, it is easy for Caddy to process rapid reloads.

View file

@ -112,6 +112,32 @@ reverse_proxy localhost:9000 localhost:9001 {
Here, `lb_policy` is a subdirective to `reverse_proxy` (it sets the load balancing policy to use between backends).
### Tokens and quotes
The Caddyfile is lexed into tokens before being parsed. Whitespace is significant in the Caddyfile, because tokens are separated by whitespace.
Often, directives expect a certain number of arguments; if a single argument has a value with whitespace, it would be lexed as two separate tokens:
```
directive abc def
```
This could be problematic and return errors or unexpected behavior.
If `abc def` is supposed to be the value of a single argument, it needs to be quoted:
```
directive "abc def"
```
Quotes can be escaped if you need to use quotes in quoted tokens, too:
```
directive "\"abc def\""
```
Inside quoted tokens, all other characters are treated literally, including spaces, tabs, and newlines.
## Addresses
@ -120,10 +146,6 @@ An address always appears at the top of the site block, and is usually the first
These are examples of valid addresses:
<aside class="tip">
<a href="/docs/automatic-https">Automatic HTTPS</a> is enabled if your site's address contains a hostname or IP address. This behavior is purely implicit, however, so it never overrides any explicit configuration. For example, if the site's address is <code>http://example.com</code>, auto-HTTPS will not activate because the scheme is explicitly <code>http://</code>.
</aside>
- `localhost`
- `example.com`
- `:443`
@ -133,6 +155,10 @@ These are examples of valid addresses:
- `[::1]:2015`
- `example.com/foo/*`
<aside class="tip">
<a href="/docs/automatic-https">Automatic HTTPS</a> is enabled if your site's address contains a hostname or IP address. This behavior is purely implicit, however, so it never overrides any explicit configuration. For example, if the site's address is <code>http://example.com</code>, auto-HTTPS will not activate because the scheme is explicitly <code>http://</code>.
</aside>
From the address, Caddy can potentially infer the scheme, host, port, and path of your site.
If you specify a hostname, only requests with a matching Host header will be honored. In other words, if the site address is `localhost`, then Caddy will not match requests to `127.0.0.1`.

View file

@ -67,13 +67,13 @@ Because matcher tokens all work the same, the various possibilities for the matc
Many directives manipulate the HTTP handler chain. The order in which those directives are evaluated matters, so a default ordering is hard-coded into Caddy:
```
root
header
redir
rewrite
root
uri
try_files

View file

@ -138,14 +138,14 @@ Full matcher documentation can be found [in each respective matcher module's doc
### expression
⚠️ _This module is still experimental and, as such, may experience breaking changes._
```
expression <cel...>
```
By any [CEL (Common Expression Language)](https://github.com/google/cel-spec) expression that returns `true` or `false`.
⚠️ This module is still experimental and, as such, may experience breaking changes.
As a special case, Caddy [placeholders](/docs/conventions#placeholders) (or [Caddyfile shorthands](/docs/caddyfile/concepts#placeholders)) may be used in these CEL expressions, as they are preprocessed and converted to regular CEL function calls before being interpreted by the CEL environment.
Examples:
@ -167,7 +167,7 @@ file {
By files.
- `root` defines the directory in which to look for files. Default is the current working directory, or the `root` [variable](/docs/modules/http.handlers.vars) (`{http.vars.root}`) if set.
- `root` defines the directory in which to look for files. Default is the current working directory, or the `root` [variable](/docs/modules/http.handlers.vars) (`{http.vars.root}`) if set (can be set via the [`root` directive](/docs/caddyfile/directives/root)).
- `try_files` checks files in its list that match the try_policy.
- `try_policy` specifies how to choose a file. Default is `first_exist`.
- `first_exist` checks for file existence. The first file that exists is selected.
@ -175,6 +175,8 @@ By files.
- `largest_size` chooses the file with the largest size.
- `most_recent_modified` chooses the file that was most recently modified.
An empty `file` matcher will see if the requested file (verbatim from the URI, relative to the [site root](/docs/caddyfile/directives/root)) exists.
### header

View file

@ -0,0 +1,168 @@
---
title: How Logging Works
---
How Logging Works
=================
Caddy has powerful and flexible logging facilities, but they may be different than what you're used to, especially if you're coming from more archaic shared hosting or other legacy web servers.
## Overview
There are two main aspects of logging: emission and consumption.
**Emission** means to produce messages. It consists of three steps:
1. Gathering relevant information (context)
2. Building a useful representation (encoding)
3. Sending that representation to an output (writing)
This functionality is baked into the core of Caddy, enabling any part of the Caddy code base or that of modules (plugins) to emit logs.
**Consumption** is the intake &amp; processing of messages. In order to be useful, emitted logs must be consumed. Logs that are merely written but never read provide no value. Consuming logs can be as simple as an administrator reading console output, or as advanced as attaching a log aggregation tool or cloud service to filter, count, and index log messages.
### Caddy's role
_Caddy is a log emitter_. It does not consume logs, except for the minimum processing required to encode and write logs. This is important because it keeps Caddy's core simpler, leading to fewer bugs and edge cases, while reducing maintenance burden. Ultimately, log processing is out of the scope of Caddy core.
However, there's always the possibility for a Caddy app module that consumes logs. (It just doesn't exist yet, to our knowledge.)
## Structured logs
As with most modern applications, Caddy's logs are _structured_. This means that the information in a message is not simply an opaque string or byte slice. Instead, data remains strongly typed and is keyed by individual _field names_ until it is time to encode the message and write it out.
Compare traditional unstructured logs&mdash;like the archaic Common Log Format (CLF)&mdash;commonly used with traditional HTTP servers:
```
127.0.0.1 - - [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.1" 200 2326
```
This format "has structure" but is not "structured": it can only be used to log HTTP requests. There is no (efficient) way to encode it differently, because it is an opaque string of bytes. It is also missing a lot of information. It does not even include the Host header of the request! This log format is only useful when hosting a single site, and for getting the most basic of information about requests.
<aside class="tip">
The lack of host information in CLF is why these logs usually need to be written to separate files when hosting more than one site: there is no way to know the Host header from the request otherwise!
</aside>
Now compare an equivalent structured log message from Caddy, encoded as JSON and formatted nicely for display:
```json
{
"level": "info",
"ts": 1585597114.7687502,
"logger": "http.log.access",
"msg": "handled request",
"request": {
"method": "GET",
"uri": "/",
"proto": "HTTP/2.0",
"remote_addr": "127.0.0.1:50876",
"host": "example.com",
"headers": {
"User-Agent": [
"curl/7.64.1"
],
"Accept": [
"*/*"
]
},
"tls": {
"resumed": false,
"version": 771,
"ciphersuite": 49196,
"proto": "h2",
"proto_mutual": true,
"server_name": "example.com"
}
},
"latency": 0.000014711,
"size": 2326,
"status": 200,
"resp_headers": {
"Server": [
"Caddy"
],
"Content-Type": ["text/html"]
}
}
```
<aside class="tip">
In actual access logs emitted from Caddy, another field called "common_log" is also present. The purpose of this field is just to help people transition from legacy systems to more modern ones.
</aside>
You can see how the structured log is much more useful and contains much more information. The abundance of information in this log message is not only useful, but it comes at virtually no performance overhead: Caddy's logs are zero-allocation. Structured logs have no restrictions on data types or context: they can be used in any code path and include any kind of information.
Because the logs are structured and strongly-typed, they can be encoded into any format. So if you don't want to work with JSON, logs can be encoded into any other representation. Caddy supports [logfmt](https://brandur.org/logfmt) and others through [log encoder modules](/docs/json/logging/logs/encoder/), and even more can be added.
**Most importantly** in the distinction between structured logs and legacy formats, a structured log can be encoded as Common Log Format (or anything else!), but not the other way around. It is non-trivial (or at least inefficient) to go from CLF to structured formats, and impossible considering the lack of information.
In essence, efficient, structured logging generally promotes these philosophies:
- Too many logs are better than too few
- Filtering is better than discarding
- Defer encoding for greater flexibility and interoperability
## Emission
In code, a log emission resembles the following:
```go
logger.Debug("proxy roundtrip",
zap.String("upstream", di.Upstream.String()),
zap.Object("request", caddyhttp.LoggableHTTPRequest{Request: req}),
zap.Object("headers", caddyhttp.LoggableHTTPHeader(res.Header)),
zap.Duration("duration", duration),
zap.Int("status", res.StatusCode),
)
```
<aside class="tip">
This is an actual line of code from Caddy's reverse proxy. This line is what allows you to inspect requests to configured upstreams when you have debug logging enabled. It is an invaluable piece of data when troubleshooting!
</aside>
You can see that this one function call contains the log level, a message, and several fields of data. All these are strongly-typed, and Caddy uses a zero-allocation logging library so log emissions are quick and efficient with almost no overhead.
The `logger` variable is a `zap.Logger` that may have any amount of context associated with it, which includes both a name and fields of data. This allows loggers to "inherit" from parent contexts quite nicely, enabling advanced tracing and metrics.
From there, the message is sent through a highly efficient processing pipeline where it is encoded and written.
## Logging pipeline
As you saw above, messages are emitted by **loggers**. The messages are then sent to **logs** for processing.
Caddy lets you [configure multiple logs](/docs/json/logging/logs/) which can process messages. A log consists of an encoder, writer, minimum level, sampling ratio, and a list of loggers to include or exclude. In Caddy, there is always a default log named `default`. You can customize it by specifying a log keyed as `"default"` in [this object](/docs/json/logging/logs/) in the config.
<aside class="tip">
Now would be a good time to <a href="/docs/json/logging/">explore Caddy's logging docs</a> so you can become familiar with the structure and parameters we're talking about.
</aside>
- **Encoder:** The format for the log. Transforms the in-memory data representation into a byte slice. Encoders have access to all fields of a log message.
- **Writer:** The log output. Can be any log writer module, like to a file or network socket. It simply writes bytes.
- **Level:** Logs have various levels, from DEBUG to FATAL. Messages lower than the specified level will be ignored by the log.
- **Sampling:** Extremely hot paths may emit more logs than can be processed effectively; enabling sampling is a way to reduce the load while still yielding a representative sample of messages.
- **Include/exclude:** Each message is emitted by a logger, which has a name (usually derived from the module ID). Logs can include or exclude messages from certain loggers.
When a log message is emitted from Caddy:
- The originating logger's name is checked against each log's include/exclude list; if included (or not excluded), it is admitted into that log.
- If sampling is enabled, a quick calculation determines whether to keep the log message.
- The message is encoded using the log's configured encoder.
- The encoded bytes are then written to the log's configured writer.
By default, all messages go to all configured logs. This adheres to the values of structured logging described above. You can limit which messages go to which logs by setting their include/exclude lists, but this is mostly for filtering messages from different modules; it is not intended to be used like a log aggregation service. To keep Caddy's logging pipeline streamlined and efficient, advanced processing of log messages is deferred to consumption.
## Consumption
After messages are sent to an output, a consumer will read them in, parse them, and handle them accordingly.
This is a very different problem domain from emitting logs, and the core of Caddy does not handle consumption (although a Caddy app module certainly could). There are numerous tools you can use for processing streams of JSON messages (or other formats) and viewing, filtering, indexing, and querying logs. You could even write or implement your own.
For example, if you run legacy software that requires CLF separated into different files based on a particular field (e.g. hostname), you could use or write a simple tool that reads in the JSON, calls `sprintf()` to create a CLF string, then write it to a file based on the value in the `request.host` field.
Caddy's logging facilities can be used to implement metrics and tracing as well: metrics basically count messages with certain characteristics, and tracing links multiple messages together based on commonalities between them.
There are countless possibilities for what you can do by consuming Caddy's logs!

View file

@ -2,7 +2,7 @@
title: Caddyfile Quick-start
---
# Caddyfile quick-start
# Caddyfile Quick-start
Create a new text file named `Caddyfile` (no extension).
@ -24,11 +24,12 @@ Save this and run Caddy from the same folder that contains your Caddyfile:
<pre><code class="cmd bash">caddy start</code></pre>
You will probably be asked for your password, because Caddy serves all sites -- even local ones -- over HTTPS by default. (The password prompt should only happen the first time!)
<aside class="tip">
For local HTTPS, Caddy automatically generates certificates and unique private keys for you. The root certificate is added to your system's trust store, which is why the password prompt is necessary. It allows you to develop locally over HTTPS without certificate errors. Just don't share your root key!
For local HTTPS, Caddy automatically generates certificates and unique private keys for you. The root certificate is added to your system's trust store, which is why the password prompt is necessary. It allows you to develop locally over HTTPS without certificate errors.
</aside>
You will probably be asked for your password, because Caddy will serve all sites -- even local ones -- over HTTPS. (The password prompt should only happen the first time!)
Either open your browser to [localhost](http://localhost) or `curl` it:

View file

@ -1,8 +1,8 @@
---
title: File server quick-start
title: Static files quick-start
---
# File server quick-start
# Static files quick-start
This guide will show you how to get a production-ready static file server up and running quickly.
@ -21,7 +21,7 @@ In your terminal, change to the root directory of your site and run:
<pre><code class="cmd bash">caddy file-server</code></pre>
If you get a permission error, it probably means your OS does not allow you to bind to low ports -- so use a high port instead:
If you get a permissions error, it probably means your OS does not allow you to bind to low ports -- so use a high port instead:
<pre><code class="cmd bash">caddy file-server --listen :2015</code></pre>
@ -53,7 +53,7 @@ Then, from the same directory, run:
<pre><code class="cmd bash">caddy run</code></pre>
You can then load [localhost](https://localhost) (or whatever the address in your config is) to see your site!
You can then load [localhost](https://localhost) (or whatever the address is in your config) to see your site!
The [`file_server` directive](/docs/caddyfile/directives/file_server) has more options for you to customize your site. Make sure to [reload](/docs/command-line#caddy-reload) Caddy (or stop and start it again) when you change the Caddyfile!

View file

@ -67,7 +67,7 @@ This guide won't delve into the new features available -- which are really cool,
Caddy's default port is no longer `:2015`. Caddy 2's default port is `:443` or, if no hostname/IP is known, port `:80`. You can always customize the ports in your config.
Caddy 2's default protocol is [_always_ HTTPS if a hostname or IP is known](/docs/automatic-https#tldr). This is different from Caddy 1, where only public-looking domains used HTTPS by default. Now, _every_ site uses HTTPS (unless you disable it by explicitly specifying port `:80` or `http://`).
Caddy 2's default protocol is [_always_ HTTPS if a hostname or IP is known](/docs/automatic-https#overview). This is different from Caddy 1, where only public-looking domains used HTTPS by default. Now, _every_ site uses HTTPS (unless you disable it by explicitly specifying port `:80` or `http://`).
IP addresses and localhost domains will be issued certificates from a [locally-trusted, embedded CA](/docs/automatic-https#local-https). All other domains will use Let's Encrypt. (This is all configurable.)
@ -136,11 +136,13 @@ Implied file extensions can be done with [`try_files`](/docs/caddyfile/directive
Assuming you're serving PHP, the v2 equivalent is [`php_fastcgi`](/docs/caddyfile/directives/php_fastcgi).
- **v1:** `fastcgi / localhost:9005`
- **v1:** `fastcgi / localhost:9005 php`
- **v2:** `php_fastcgi localhost:9005`
Note that the `fastcgi` directive from v1 did a lot under the hood, including trying files on disk, rewriting requests, and even redirecting. The v2 `php_fastcgi` directive also does these things for you, but the docs give its [expanded form](/docs/caddyfile/directives/php_fastcgi#expanded-form) that you can modify if your requirements are different.
There is no `php` preset needed in v2, since the `php_fastcgi` directive assumes PHP by default. A line such as `php_fastcgi 127.0.0.1:9000 php` will cause the reverse proxy to think that there is a second backend called `php`, leading to connection errors.
### gzip

View file

@ -10,7 +10,7 @@
<ul>
<li><a href="/docs/quick-starts/api">Using the API</a></li>
<li><a href="/docs/quick-starts/caddyfile">Using a Caddyfile</a></li>
<li><a href="/docs/quick-starts/file-server">File server</a></li>
<li><a href="/docs/quick-starts/static-files">Static files</a></li>
<li><a href="/docs/quick-starts/reverse-proxy">Reverse proxy</a></li>
<li><a href="/docs/quick-starts/https">HTTPS</a></li>
</ul>
@ -39,5 +39,7 @@
<li><a href="/docs/conventions">Conventions</a></li>
<li><a href="/docs/config-adapters">Config Adapters</a></li>
<li><a href="/docs/extending-caddy">Extending Caddy</a></li>
<li><a href="/docs/logging">How Logging Works</a></li>
<li><a href="/docs/architecture">Caddy Architecture</a></li>
</ul>
</nav>