Velero alerts provided by Avisi Cloud

Default Alerts

AME Kubernetes comes with a set of default alerts for Velero. This page serves as a reference for when one of these alerts fires within your cluster. Each alert gives a brief description of what it means, a list of possible causes and suggestions on how to resolve the issue.

Table of Content

Velero Alerts

VeleroBackupPartialFailures [warning]

A backup created by velero has one or more errors.

This alert means that Velero was only able to create a partial back-up. Certain resources and/or persistent disks where not included with the back-up. You should investigate using velero backup describe <backup-name> and velero backup logs <backup-name> to determine which resources failed.

Possible causes

  • A namespace that is included in the back-up does not exist. Adjust your back-up schedule if necessary.
  • failed to create the back-up due to memory constraints (e.g. OOM Event for restic)
  • timeouts due to restic back-ups taking to long.

VeleroBackupFailures [warning]

Failed to create the back-up entirely

The backup failed entirely and no resources where stored in S3.

Possible causes

  • no access to S3 object storage due to incorrect authentication
  • no network connection
  • service account has incorrect RBAC permissions

VeleroBackupRemovalFailures [warning]

Failed to delete a back-up that should have been removed

A back-up either failed to be removed entirely or Velero was unable to gain a lock to access the S3 restic repository.

This could result in unnecessary storage usage in S3 or orhpaned back-up files.

Possible causes

  • no access to S3 object storage due to incorrect authentication, no network permissions, …
  • OOM events for restic in the velero pod (memory constraints)