--- title: Profiling Caddy --- Profiling Caddy ================ A **program profile** is a snapshot of a program's use of resources at runtime. Profiles can be extremely helpful to identify problem areas, troubleshoot bugs and crashes, and optimize code. Caddy uses Go's tooling for capturing profiles, which is called [pprof](https://github.com/google/pprof), and it is built into the `go` command. Profiles report on consumers of CPU and memory, show stack traces of goroutines, and help track down deadlocks or high-contention synchronization primitives. When reporting certain bugs in Caddy, we may ask for a profile. This article can help. It describes both how to obtain profiles with Caddy,and how to use and interpret the resulting pprof profiles in general. Two things to know before getting started: 1. **Caddy profiles are NOT security-sensitive.** They contain benign technical readouts, not the contents of memory. They cannot enable hackers to have access to your system. They are safe to share. 2. **Profiles are lightweight and can be collected in production.** In fact, this is a recommended best practice for many users; see later in this article. ## Obtaining profiles Profiles are available via the [admin interface](/docs/api) at `/debug/pprof/`. On a machine running Caddy, open it in your browser: ``` http://localhost:2019/debug/pprof/ ``` You'll notice a simple table of counts and links, such as: Count | Profile ----- | -------------------- 79 | allocs 0 | block 0 | cmdline 22 | goroutine 79 | heap 0 | mutex 0 | profile 29 | threadcreate 0 | trace | | full goroutine stack dump The counts are a handy way to quickly identify leaks. If you suspect a leak, refresh the page repeatedly and you'll see one or more of those counts constantly increasing. If the heap count grows, it's a possible memory leak; if the goroutine count grows, it's a possible goroutine leak. Click through the profiles and see what they look like. Some may be empty and that's normal a lot of the time. The most commonly-used ones are goroutine (function stacks), heap (memory), and profile (CPU). Other profiles are useful for troubleshooting mutex contention or deadlocks. At the bottom, there's a simple description of each profile: - **allocs:** A sampling of all past memory allocations - **block:** Stack traces that led to blocking on synchronization primitives - **cmdline:** The command line invocation of the current program - **goroutine:** Stack traces of all current goroutines. Use debug=2 as a query parameter to export in the same format as an unrecovered panic. - **heap:** A sampling of memory allocations of live objects. You can specify the gc GET parameter to run GC before taking the heap sample. - **mutex:** Stack traces of holders of contended mutexes - **profile:** CPU profile. You can specify the duration in the seconds GET parameter. After you get the profile file, use the go tool pprof command to investigate the profile. - **threadcreate:** Stack traces that led to the creation of new OS threads - **trace:** A trace of execution of the current program. You can specify the duration in the seconds GET parameter. After you get the trace file, use the go tool trace command to investigate the trace. ### Downloading profiles Clicking the links on the pprof index page above will give you profiles in text format. This is useful for debugging, and it's what we on the Caddy team prefer because we can scan it to look for obvious clues without needing extra tooling. But binary is actually the default format. The HTML links append the `?debug=` query string parameter to format them as text, except for the (CPU) "profile" link, which does not have a textual representation. These are the query string parameters you can set (from [the Go docs](https://pkg.go.dev/net/http/pprof#hdr-Parameters)): - **`debug=N` (all profiles except cpu):** response format: N = 0: binary (default), N > 0: plaintext - **`gc=N` (heap profile):** N > 0: run a garbage collection cycle before profiling - **`seconds=N` (allocs, block, goroutine, heap, mutex, threadcreate profiles):** return a delta profile - **`seconds=N` (cpu, trace profiles):** profile for the given duration Because these are HTTP endpoints, you can also use any HTTP client like curl or wget to download profiles. Once your profiles are downloaded, you can upload them to a GitHub issue comment or use a site like [pprof.me](https://pprof.me/). For CPU profiles specifically, [flamegraph.com](https://flamegraph.com/) is another option. ## Accessing remotely _If you're already able to access the admin API locally, skip this section._ By default, Caddy's admin API is only accessible over the loopback socket. However, there are at least 3 ways you can access Caddy's /debug/pprof endpoint remotely: ### Reverse proxy through your site One easy option is to simply reverse proxy to it from your site: ```caddy-d reverse_proxy /debug/pprof/* localhost:2019 { header_up Host {upstream_hostport} } ``` This will, of course, make profiles available to who can connect to your site. If that's not desired, you can add some authentication using an HTTP auth module of your choice. (Don't forget the `/debug/pprof/*` matcher, otherwise you'll proxy the entire admin API!) ### SSH tunnel Another way is to use an SSH tunnel. This is an encrypted connection using the SSH protocol between your computer and your server. Run a command like this on your computer:
ssh -N username@example.com -L 8123:localhost:2019
This tunnels `localhost:8123` to `example.com:2019`. Make sure to replace `username`, `example.com`, and ports as necessary.
Then in another terminal you can run `curl` like so:
curl -v http://localhost:8123/debug/pprof/ -H "Host: localhost:2019"
You can avoid the need for `-H "Host: ..."` by using port 2019 on both sides of the tunnel (but this requires that port 2019 is not already taken on your own computer).
While the tunnel is active, you can access any and all of the admin API. Type Ctrl+C on the `ssh` command to close the tunnel.
### Remote admin API
You can also configure the admin API to accept remote connections to authorized clients.
(TODO: Write article about this.)
## Goroutine profiles
The goroutine dump is useful for knowing what goroutines exist and what their call stacks are. In other words, it give us an idea of code that is either currently executing or is blocking/waiting.
If you click "goroutines" or go to `/debug/pprof/goroutine?debug=1`, you'll see a list of goroutines and their call stacks. For example:
```
goroutine profile: total 88
23 @ 0x43e50e 0x436d37 0x46bda5 0x4e1327 0x4e261a 0x4e2608 0x545a65 0x5590c5 0x6b2e9b 0x50ddb8 0x6b307e 0x6b0650 0x6b6918 0x6b6921 0x4b8570 0xb11a05 0xb119d4 0xb12145 0xb1d087 0x4719c1
# 0x46bda4 internal/poll.runtime_pollWait+0x84 runtime/netpoll.go:343
# 0x4e1326 internal/poll.(*pollDesc).wait+0x26 internal/poll/fd_poll_runtime.go:84
# 0x4e2619 internal/poll.(*pollDesc).waitRead+0x279 internal/poll/fd_poll_runtime.go:89
# 0x4e2607 internal/poll.(*FD).Read+0x267 internal/poll/fd_unix.go:164
# 0x545a64 net.(*netFD).Read+0x24 net/fd_posix.go:55
# 0x5590c4 net.(*conn).Read+0x44 net/net.go:179
# 0x6b2e9a crypto/tls.(*atLeastReader).Read+0x3a crypto/tls/conn.go:805
# 0x50ddb7 bytes.(*Buffer).ReadFrom+0x97 bytes/buffer.go:211
# 0x6b307d crypto/tls.(*Conn).readFromUntil+0xdd crypto/tls/conn.go:827
# 0x6b064f crypto/tls.(*Conn).readRecordOrCCS+0x24f crypto/tls/conn.go:625
# 0x6b6917 crypto/tls.(*Conn).readRecord+0x157 crypto/tls/conn.go:587
# 0x6b6920 crypto/tls.(*Conn).Read+0x160 crypto/tls/conn.go:1369
# 0x4b856f io.ReadAtLeast+0x8f io/io.go:335
# 0xb11a04 io.ReadFull+0x64 io/io.go:354
# 0xb119d3 golang.org/x/net/http2.readFrameHeader+0x33 golang.org/x/net@v0.14.0/http2/frame.go:237
# 0xb12144 golang.org/x/net/http2.(*Framer).ReadFrame+0x84 golang.org/x/net@v0.14.0/http2/frame.go:498
# 0xb1d086 golang.org/x/net/http2.(*serverConn).readFrames+0x86 golang.org/x/net@v0.14.0/http2/server.go:818
1 @ 0x43e50e 0x44e286 0xafeeb3 0xb0af86 0x5c29fc 0x5c3225 0xb0365b 0xb03650 0x15cb6af 0x43e09b 0x4719c1
# 0xafeeb2 github.com/caddyserver/caddy/v2/cmd.cmdRun+0xcd2 github.com/caddyserver/caddy/v2@v2.7.4/cmd/commandfuncs.go:277
# 0xb0af85 github.com/caddyserver/caddy/v2/cmd.init.1.func2.WrapCommandFuncForCobra.func1+0x25 github.com/caddyserver/caddy/v2@v2.7.4/cmd/cobra.go:126
# 0x5c29fb github.com/spf13/cobra.(*Command).execute+0x87b github.com/spf13/cobra@v1.7.0/command.go:940
# 0x5c3224 github.com/spf13/cobra.(*Command).ExecuteC+0x3a4 github.com/spf13/cobra@v1.7.0/command.go:1068
# 0xb0365a github.com/spf13/cobra.(*Command).Execute+0x5a github.com/spf13/cobra@v1.7.0/command.go:992
# 0xb0364f github.com/caddyserver/caddy/v2/cmd.Main+0x4f github.com/caddyserver/caddy/v2@v2.7.4/cmd/main.go:65
# 0x15cb6ae main.main+0xe caddy/main.go:11
# 0x43e09a runtime.main+0x2ba runtime/proc.go:267
1 @ 0x43e50e 0x44e9c5 0x8ec085 0x4719c1
# 0x8ec084 github.com/caddyserver/certmagic.(*Cache).maintainAssets+0x304 github.com/caddyserver/certmagic@v0.19.2/maintain.go:67
...
```
The first line, `goroutine profile: total 88`, tells us what we're looking at and how many goroutines there are.
The list of goroutines follows. They are grouped by their call stacks in descending order of frequency.
A goroutine line has this syntax: `go tool pprof profile
Type: cpu
Time: Nov 4, 2023 at 6:32pm (MDT)
Duration: 5.12s, Total samples = 92.30s (1801.83%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof)
This is something you can explore. Entering `help` gives you a list of commands and `o` will show you current options.
There's a lot of commands, but some common ones are:
- `top`: Show what used the most CPU. You can append a number like `top 20` to see more.
- `web`: Open the call graph in your web browser. This is a great way to visually see CPU usage.
- `svg`: Generate an SVG image of the call graph. It's the same as `web` except it doesn't open your web browser and the SVG is saved locally.
- `tree`: More of a tabular view of the call stack.
If we run `top`, we see output like:
```
Showing nodes accounting for 120ms, 85.71% of 140ms total
Showing top 10 nodes out of 124
flat flat% sum% cum cum%
30ms 21.43% 21.43% 30ms 21.43% runtime.madvise
10ms 7.14% 28.57% 70ms 50.00% github.com/caddyserver/caddy/v2/modules/caddyhttp/rewrite.Rewrite.ServeHTTP
10ms 7.14% 35.71% 20ms 14.29% github.com/dlclark/regexp2.(*Regexp).run
10ms 7.14% 42.86% 10ms 7.14% runtime.(*bmap).setoverflow (inline)
10ms 7.14% 50.00% 10ms 7.14% runtime.memclrNoHeapPointers
10ms 7.14% 57.14% 10ms 7.14% runtime.pthread_cond_wait
10ms 7.14% 64.29% 10ms 7.14% runtime.pthread_kill
10ms 7.14% 71.43% 10ms 7.14% runtime.scanobject
10ms 7.14% 78.57% 10ms 7.14% runtime.step
10ms 7.14% 85.71% 10ms 7.14% runtime.walltime
```
Interesting! Ignoring the `madvise` system call by the runtime, we can instantly see that the most time was spent in the `rewrite` handler.