feat(registry): add support for custom CA certificates and TLS validation

- Introduced `--registry-ca` and `--registry-ca-validate` flags for configuring TLS verification with private registries.
- Implemented in-memory token caching with expiration handling.
- Updated documentation to reflect new CLI options and usage examples.
- Added tests for token cache concurrency and expiry behavior.
This commit is contained in:
kalvinparker 2025-11-14 14:30:37 +00:00
parent 76f9cea516
commit e1f67fc3d0
18 changed files with 738 additions and 17 deletions

View file

@ -0,0 +1,29 @@
# Summary Checkpoint
This file marks a checkpoint for summarizing repository changes.
All future requests that ask to "summarise all the changes thus far" should consider
only changes made after this checkpoint was created.
Checkpoint timestamp (UTC): 2025-11-13T12:00:00Z
Notes:
- Purpose: act as a stable anchor so that subsequent "summarise all the changes thus far"
requests will include only modifications after this point.
- Location: `docs/SUMMARY_CHECKPOINT.md`
Recent delta (since previous checkpoint):
- Added CLI flags and wiring: `--registry-ca` and `--registry-ca-validate` (startup validation).
- Implemented secure-by-default registry transport behavior and support for a custom CA bundle.
- Introduced an in-memory bearer token cache (honors `expires_in`) and refactored time usage
to allow deterministic tests via an injectable `now` function.
- Added deterministic unit tests for the token cache (`pkg/registry/auth/auth_cache_test.go`).
- Added quickstart documentation snippets to `README.md`, `docs/index.md`, and
`docs/private-registries.md` showing `--registry-ca` + `--registry-ca-validate`.
- Created `CHANGELOG.md` with an Unreleased entry for the new `--registry-ca-validate` flag.
- Ran package tests locally: `pkg/registry/auth` and `pkg/registry/digest` — tests passed
(some integration tests were skipped due to missing credentials).
If you want the next checkpoint after more changes (e.g., mapping the update call chain,
documenting data shapes, or adding concurrency tests), request another summary break.

View file

@ -460,8 +460,34 @@ Alias for:
--notification-report
--notification-template porcelain.VERSION.summary-no-log
Argument: --porcelain, -P
Environment Variable: WATCHTOWER_PORCELAIN
Possible values: v1
Default: -
```
## Registry TLS options
Options to configure TLS verification when Watchtower talks to image registries.
```text
Argument: --insecure-registry
Environment Variable: WATCHTOWER_INSECURE_REGISTRY
Type: Boolean
Default: false
```
```text
Argument: --registry-ca
Environment Variable: WATCHTOWER_REGISTRY_CA
Type: String (path to PEM bundle inside container)
Default: -
```
```text
Argument: --registry-ca-validate
Environment Variable: WATCHTOWER_REGISTRY_CA_VALIDATE
Type: Boolean
Default: false
```

View file

@ -0,0 +1,46 @@
@startuml
title Watchtower Update Flow
actor User as CLI
participant "cmd (root)" as CMD
participant "internal/actions.Update" as ACT
participant "container.Client" as CLIENT
participant "pkg/registry/digest" as DIG
participant "pkg/registry/auth" as AUTH
participant "pkg/registry" as REG
database "Docker Engine" as DOCKER
CLI -> CMD: trigger runUpdatesWithNotifications()
CMD -> ACT: Update(client, UpdateParams)
ACT -> CLIENT: ListContainers(filter)
loop per container
ACT -> CLIENT: IsContainerStale(container, params)
CLIENT -> CLIENT: PullImage (maybe)
CLIENT -> DIG: CompareDigest(container, registryAuth)
DIG -> AUTH: GetToken(challenge)
AUTH -> AUTH: getCachedToken / storeToken
DIG -> REG: newTransport() (uses --insecure-registry / --registry-ca)
DIG -> DOCKER: HEAD manifest with token
alt digest matches
CLIENT --> ACT: no pull needed
else
CLIENT -> DOCKER: ImagePull(image)
end
CLIENT --> ACT: HasNewImage -> stale/newestImage
end
ACT -> ACT: SortByDependencies
ACT -> CLIENT: StopContainer / StartContainer (with lifecycle hooks)
ACT -> CLIENT: RemoveImageByID (cleanup)
ACT --> CMD: progress.Report()
note right of AUTH
Tokens are cached by auth URL (realm+service+scope)
ExpiresIn (seconds) sets TTL when provided
end note
note left of REG
TLS is secure-by-default
`--registry-ca` provides PEM bundle
`--registry-ca-validate` fails startup on invalid bundle
end note
@enduml

View file

@ -63,3 +63,17 @@ the following command:
volumes:
- /var/run/docker.sock:/var/run/docker.sock
```
Quick note: if your registry uses a custom TLS certificate, mount the CA bundle and enable startup validation so Watchtower fails fast on misconfiguration:
```bash
docker run --detach \
--name watchtower \
--volume /var/run/docker.sock:/var/run/docker.sock \
--volume /etc/ssl/private-certs:/certs \
containrrr/watchtower \
--registry-ca /certs/my-registry-ca.pem \
--registry-ca-validate=true
```
+
Prefer this over `--insecure-registry` for production.

View file

@ -205,3 +205,45 @@ A few additional notes:
4. An alternative to adding the various variables is to create a ~/.aws/config and ~/.aws/credentials files and
place the settings there, then mount the ~/.aws directory to / in the container.
## Token caching and required scopes
Watchtower attempts to minimize calls to registry auth endpoints by caching short-lived bearer tokens when available.
- Token cache: When Watchtower requests a bearer token from a registry auth endpoint, it will cache the token in-memory keyed by the auth realm + service + scope. If the token response includes an `expires_in` field, Watchtower will honor it and refresh the token only after expiry. This reduces load and rate-limit pressure on registry auth servers.
- Required scope: Watchtower requests tokens with the following scope format: `repository:<image-path>:pull`. This is sufficient for read-only operations required by Watchtower (HEAD or pull). For registries enforcing fine-grained scopes, ensure the provided credentials can request tokens with `pull` scope for the repositories you want to monitor.
- Credential sources: Watchtower supports these sources (in priority order):
1. Environment variables: `REPO_USER` and `REPO_PASS`.
2. Docker config file (`DOCKER_CONFIG` path or default location, typically `/root/.docker/config.json` when running in container) including support for credential helpers and native stores.
When possible, prefer using short-lived tokens or credential helpers and avoid embedding long-lived plaintext credentials in environment variables.
### Providing a custom CA bundle
For private registries using certificates signed by an internal CA, prefer providing a PEM encoded CA bundle to disable verification bypassing. Use the `--registry-ca` flag or the `WATCHTOWER_REGISTRY_CA` environment variable to point to a file inside the container with one or more PEM encoded certificates. Watchtower will merge the provided bundle with system roots and validate registry certificates accordingly.
Example (docker run):
```bash
docker run -v /etc/ssl/private-certs:/certs -e WATCHTOWER_REGISTRY_CA=/certs/my-registry-ca.pem containrrr/watchtower
```
This is the recommended approach instead of `--insecure-registry` for production deployments.
#### Quick example: validate CA at startup
If you want Watchtower to fail fast when the provided CA bundle is invalid or missing, mount the CA into the container and enable validation:
```bash
docker run --detach \
--name watchtower \
--volume /var/run/docker.sock:/var/run/docker.sock \
--volume /etc/ssl/private-certs:/certs \
containrrr/watchtower \
--registry-ca /certs/my-registry-ca.pem \
--registry-ca-validate=true
```
+
This makes misconfiguration explicit during startup and is recommended for unattended deployments.

166
docs/update-flow.md Normal file
View file

@ -0,0 +1,166 @@
<!--
DO NOT EDIT: Generated documentation describing the Watchtower update flow.
This file contains the end-to-end flow, data shapes, and a mermaid diagram.
-->
# Watchtower Update Flow
This document explains the end-to-end update flow in the Watchtower codebase, including the main function call chain, the key data shapes, and diagrams (Mermaid & PlantUML).
## Quick Summary
- Trigger: CLI (`watchtower` start / scheduler / HTTP API update) constructs `types.UpdateParams` and calls `internal/actions.Update`.
- `internal/actions.Update` orchestrates discovery, stale detection, lifecycle hooks, stopping/restarting containers, cleanup and reporting.
- Image pull optimization uses a digest HEAD request (`pkg/registry/digest`) and a token flow (`pkg/registry/auth`) with an in-memory token cache.
- TLS for HEAD/token requests is secure-by-default and configurable via `--insecure-registry`, `--registry-ca`, and `--registry-ca-validate`.
---
## Call Chain (step-by-step)
1. CLI start / scheduler / HTTP API
- Entry points: `main()` -> `cmd.Execute()` -> Cobra command `Run` / `PreRun`.
- `cmd.PreRun` reads flags and config, sets `registry.InsecureSkipVerify` and `registry.RegistryCABundle`.
2. Run update
- `cmd.runUpdatesWithNotifications` builds `types.UpdateParams` and calls `internal/actions.Update(client, updateParams)`.
3. Orchestration: `internal/actions.Update`
- If `params.LifecycleHooks` -> `lifecycle.ExecutePreChecks(client, params)`
- Discover containers: `client.ListContainers(params.Filter)`
- For each container:
- `client.IsContainerStale(container, params)`
- calls `client.PullImage(ctx, container)` unless `container.IsNoPull(params)` is true
- `PullImage` obtains `types.ImagePullOptions` via `pkg/registry.GetPullOptions(image)`
- tries digest optimization: `pkg/registry/digest.CompareDigest(container, opts.RegistryAuth)`
- `auth.GetToken(container, registryAuth)` obtains a token:
- sends GET to the challenge URL (`/v2/`), inspects `WWW-Authenticate`
- for `Bearer`: constructs auth URL with `realm`, `service`, and `scope` (`repository:<path>:pull`)
- checks in-memory cache (`auth.getCachedToken(cacheKey)`) keyed by the auth URL
- if missing, requests token from auth URL (Basic header if Docker cred present), parses `types.TokenResponse` and calls `auth.storeToken(cacheKey, token, ExpiresIn)`
- `digest.GetDigest(manifestURL, token)` performs an HTTP `HEAD` using a transport created by `digest.newTransport()`
- transport respects `registry.InsecureSkipVerify` and uses `registry.GetRegistryCertPool()` when a CA bundle is provided
- If remote digest matches a local digest, `PullImage` skips the pull
- `client.HasNewImage(ctx, container)` compares local image ID with remote image ID
- `targetContainer.VerifyConfiguration()` (fail/skip logic)
- Mark scanned/skipped in `session.Progress` and set `container.SetStale(stale)`
- Sort containers: `sorter.SortByDependencies(containers)`
- `UpdateImplicitRestart(containers)` sets `LinkedToRestarting` flags
- Build `containersToUpdate` and mark them for update in `Progress`
- Update strategy:
- Rolling restart: `performRollingRestart(containersToUpdate, client, params)`
- `stopStaleContainer(c)` -> `restartStaleContainer(c)` per container
- Normal: `stopContainersInReversedOrder(...)` -> `restartContainersInSortedOrder(...)`
- `stopStaleContainer` runs `lifecycle.ExecutePreUpdateCommand` and `client.StopContainer`
- `restartStaleContainer` may `client.RenameContainer` (watchtower self), `client.StartContainer` and `lifecycle.ExecutePostUpdateCommand`
- If `params.Cleanup` -> `cleanupImages(client, imageIDs)` calls `client.RemoveImageByID`
- If `params.LifecycleHooks` -> `lifecycle.ExecutePostChecks(client, params)`
- Return `progress.Report()` (a `types.Report` implemented from `session.Progress`)
---
## Key data shapes
- `types.UpdateParams` (created in `cmd/runUpdatesWithNotifications`)
- `Filter` (types.Filter)
- `Cleanup bool`
- `NoRestart bool`
- `Timeout time.Duration`
- `MonitorOnly bool`
- `NoPull bool`
- `LifecycleHooks bool`
- `RollingRestart bool`
- `LabelPrecedence bool`
- `container.Client` interface (in `pkg/container/client.go`) — used by `actions.Update`
- `ListContainers(Filter) ([]types.Container, error)`
- `GetContainer(containerID) (types.Container, error)`
- `StopContainer(types.Container, time.Duration) error`
- `StartContainer(types.Container) (types.ContainerID, error)`
- `RenameContainer(types.Container, string) error`
- `IsContainerStale(types.Container, types.UpdateParams) (bool, types.ImageID, error)`
- `ExecuteCommand(containerID types.ContainerID, command string, timeout int) (SkipUpdate bool, err error)`
- `RemoveImageByID(types.ImageID) error`
- `WarnOnHeadPullFailed(types.Container) bool`
- `types.Container` interface (in `pkg/types/container.go`) — methods used include:
- `ID(), Name(), ImageName(), ImageID(), SafeImageID(), IsRunning(), IsRestarting()`
- `VerifyConfiguration() error`, `HasImageInfo() bool`, `ImageInfo() *types.ImageInspect`
- lifecycle hooks: `GetLifecyclePreUpdateCommand(), GetLifecyclePostUpdateCommand(), PreUpdateTimeout(), PostUpdateTimeout()`
- flags: `IsNoPull(UpdateParams), IsMonitorOnly(UpdateParams), ToRestart(), IsWatchtower()`
- `session.Progress` and `session.ContainerStatus` (reporting)
- `Progress` is a map `map[types.ContainerID]*ContainerStatus`
- `ContainerStatus` fields: `containerID, containerName, imageName, oldImage, newImage, error, state`
- `Progress.Report()` returns a `types.Report` implementation
- `types.TokenResponse` (used by `pkg/registry/auth`) contains `Token string` and `ExpiresIn int` (seconds)
---
## Diagrams
Mermaid sequence diagram (embedded):
```mermaid
sequenceDiagram
participant CLI as CLI / Scheduler / HTTP API
participant CMD as cmd
participant ACT as internal/actions.Update
participant CLIENT as container.Client (docker wrapper)
participant DIG as pkg/registry/digest
participant AUTH as pkg/registry/auth
participant REG as pkg/registry (TLS config)
participant DOCKER as Docker Engine
CLI->>CMD: trigger runUpdatesWithNotifications()
CMD->>ACT: Update(client, UpdateParams)
ACT->>CLIENT: ListContainers(filter)
loop per container
ACT->>CLIENT: IsContainerStale(container, params)
CLIENT->>CLIENT: PullImage (maybe)
CLIENT->>DIG: CompareDigest(container, registryAuth)
DIG->>AUTH: GetToken(challenge)
AUTH->>AUTH: getCachedToken / storeToken
DIG->>REG: newTransport() (uses --insecure-registry / --registry-ca)
DIG->>DOCKER: HEAD manifest with token
alt digest matches
CLIENT-->>ACT: no pull needed
else
CLIENT->>DOCKER: ImagePull(image)
end
CLIENT-->>ACT: HasNewImage -> stale/ newestImage
end
ACT->>ACT: SortByDependencies
ACT->>CLIENT: StopContainer / StartContainer (with lifecycle hooks)
ACT->>CLIENT: RemoveImageByID (cleanup)
ACT-->>CMD: progress.Report()
```
For reference, a PlantUML source for the same sequence is available in `docs/diagrams/update-flow.puml`.
---
## Security & operational notes
- TLS: registry HEAD and token requests are secure-by-default. Use `--registry-ca` to add private CAs, and `--registry-ca-validate` to fail fast on bad bundles. Avoid `--insecure-registry` except for testing.
- Token cache: tokens are cached per auth URL (realm+service+scope). Tokens with `ExpiresIn` are cached for that TTL. No persistent or distributed cache is provided.
- Digest HEAD optimization avoids pulls and unnecessary rate consumption when possible. DockerHub/GHCR may rate-limit HEAD or behave differently; the code includes a `WarnOnAPIConsumption` heuristic.
---
## Where to look in the code
- Orchestration: `internal/actions/update.go`
- CLI wiring: `cmd/root.go`, `internal/flags/flags.go`
- Container wrapper: `pkg/container/client.go`, `pkg/container/container.go`
- Digest & transport: `pkg/registry/digest/digest.go`
- Token & auth handling: `pkg/registry/auth/auth.go`
- TLS helpers: `pkg/registry/registry.go`
- Lifecycle hooks: `pkg/lifecycle/lifecycle.go`
- Session/reporting: `pkg/session/*`, `pkg/types/report.go`
---
If you'd like, I can also open a branch and create a PR with these files, or convert the PlantUML into an SVG and add it to the docs site.
End of document.