Velero alerts provided by Avisi Cloud
AME Kubernetes comes with a set of default alerts for Velero. This page serves as a reference for when one of these alerts fires within your cluster. Each alert gives a brief description of what it means, a list of possible causes and suggestions on how to resolve the issue.
Table of Content
A backup created by velero has one or more errors.
This alert means that Velero was only able to create a partial back-up. Certain resources and/or persistent disks where not included with the back-up. You should investigate using
velero backup describe <backup-name> and
velero backup logs <backup-name> to determine which resources failed.
- A namespace that is included in the back-up does not exist. Adjust your back-up schedule if necessary.
- failed to create the back-up due to memory constraints (e.g. OOM Event for restic)
- timeouts due to restic back-ups taking to long.
Failed to create the back-up entirely
The backup failed entirely and no resources where stored in S3.
- no access to S3 object storage due to incorrect authentication
- no network connection
- service account has incorrect RBAC permissions
Failed to delete a back-up that should have been removed
A back-up either failed to be removed entirely or Velero was unable to gain a lock to access the S3 restic repository.
This could result in unnecessary storage usage in S3 or orhpaned back-up files.
- no access to S3 object storage due to incorrect authentication, no network permissions, …
- OOM events for restic in the velero pod (memory constraints)