dockcheck/addons/prometheus/README.md

62 lines
2.5 KiB
Markdown
Raw Normal View History

Upstream patches and additional patching (#2) * Ensures DSM GUI refreshes its updates * Removed whale icon and changed verbosity * Added addon for Prometheus+node_exporter * Changed local image check to check on image ID rather than name * Update podcheck.sh changed docker->podman, typo * - **v0.6.0**: - **Grafana & Prometheus Integration:** - Added a detailed Prometheus metrics exporter that now reports not only the number of containers with updates, no-updates, and errors, but also the total number of containers checked, the duration of the update check, and the epoch timestamp of the last check. - Enhanced documentation with instructions on integrating these metrics with Grafana for visual monitoring. - **Improved Error Handling & Code Refactoring:** - Introduced `set -euo pipefail` and local variable scoping within functions to improve reliability and prevent unexpected behaviour. - Standardised container name handling and refined the Quadlet detection logic. - **Self-Update Enhancements:** - Updated the self-update mechanism to support both Git-based and HTTP-based updates, with an automatic restart that preserves the original arguments. - **Miscellaneous Improvements:** - Enhanced dependency installer to support both package manager and static binary installations for `jq` and `regctl`. - General code refactoring across the project for better readability and maintainability. * Update podcheck.sh * increment version * Update Quadlet detection logic Update Quadlet detection logic to support flexible service naming - Modified the quadlet update block to first try an exact match for "$i.service". - If no exact match is found, build a regex pattern from the container name (allowing underscores and hyphens interchangeably) and search user service units. - When multiple candidate units are found, the script attempts to choose the one that exactly matches (ignoring case) or defaults to the first candidate. - This update allows containers like "containera" to match service units named "container_a.service" and supports scenarios with multiple counterparts (e.g., matrix-a, matrix-b, matrix_db). * search name fix * fixes to arg parsing * Logic overhaul, verbose output and better syntax * Added support for prometheus --------- Co-authored-by: mag37 <robin.ivehult@gmail.com>
2025-02-25 14:12:01 +00:00
## [Prometheus](https://github.com/prometheus/prometheus) and [node_exporter](https://github.com/prometheus/node_exporter)
Podcheck check is capable to export metrics to prometheus via the text file collector provided by the node_exporter.
In order to do so the -c flag has to be specified followed by the file path that is configured in the text file collector of the node_exporter.
A simple cron job can be configured to export these metrics on a regular interval as shown in the sample below:
```
0 1 * * * /root/podcheck.sh -n -c /var/lib/node_exporter/textfile_collector
```
The following metrics are exported to prometheus
```
# HELP podcheck_images_analyzed Podman images that have been analyzed
# TYPE podcheck_images_analyzed gauge
podcheck_images_analyzed 22
# HELP podcheck_images_outdated Podman images that are outdated
# TYPE podcheck_images_outdated gauge
podcheck_images_outdated 7
# HELP podcheck_images_latest Podman images that are outdated
# TYPE podcheck_images_latest gauge
podcheck_images_latest 14
# HELP podcheck_images_error Podman images with analysis errors
# TYPE podcheck_images_error gauge
podcheck_images_error 1
# HELP podcheck_images_analyze_timestamp_seconds Last podcheck run time
# TYPE podcheck_images_analyze_timestamp_seconds gauge
podcheck_images_analyze_timestamp_seconds 1737924029
```
Once those metrics are exported they can be used to define alarms as shown below
```
- alert: podcheck_images_outdated
expr: sum by(instance) (podcheck_images_outdated) > 0
for: 15s
labels:
severity: warning
annotations:
summary: "{{ $labels.instance }} has {{ $value }} outdated podman images."
description: "{{ $labels.instance }} has {{ $value }} outdated podman images."
- alert: podcheck_images_error
expr: sum by(instance) (podcheck_images_error) > 0
for: 15s
labels:
severity: warning
annotations:
summary: "{{ $labels.instance }} has {{ $value }} podman images having an error."
description: "{{ $labels.instance }} has {{ $value }} podman images having an error."
- alert: podcheck_image_last_analyze
expr: (time() - podcheck_images_analyze_timestamp_seconds) > (3600 * 24 * 3)
for: 15s
labels:
severity: warning
annotations:
summary: "{{ $labels.instance }} has not updated the podcheck statistics for more than 3 days."
description: "{{ $labels.instance }} has not updated the podcheck statistics for more than 3 days."
```
There is a reference Grafana dashboard in [grafana/grafana_dashboard.json](./grafana/grafana_dashboard.json).
![](./grafana/grafana_dashboard.png)