Upgrade a PostgreSQL pod to next major version
Upgrading a PostgreSQL pod to the next major version on Kubernetes
Situation
When you need to upgrade a PostgreSQL pod / statefulset from 9.6 to 11. You manage your own statefulset either through Helm or Kubernetes manifests. This may not be the right approach when using a PostgreSQL operator.
This approach works between any PostgreSQL versions (such as 11 to 14).
Goals:
- Upgrade from postgres 9.6 to postgres 11 with minimal risk and down time (or from 11 to 12).
- Repeatable.
- Easy to automate at large scale.
- Works on Kubernetes
Non-goals:
- Data validation scripts
- Provide production ready automation scripts to run this for large scale database deployments
Before you start
- Make sure you have back-ups of your data. Check how to create a persistent volume snapshot.
- This needs to be tweaked to work for your postgres set-up (mostly correct names of the
pvc
, correctsubPath
and postgres versions). - Please test this procedure in a test environment before updating your production PostgreSQL pods!
Step by step
This is done based on prior work done by tianon. Please note that you will need to verify this approach on your set-up before attempting to upgrade production databases. Also make sure you have a working back-up before starting.
Preperation
You will need two seperate volumes:
- The old data directory for postgres
- A new data directory for the new version of postgres
!! Pay attention to the logging output during an upgrade. It may generate a post-upgrade.sql file that needs to be run. !!
If any post-upgrade processing is required, pg_upgrade will issue warnings as it completes. It will also generate script files that must be run by the administrator. The script files will connect to each database that needs post-upgrade processing. (See 13. Post-Upgrade processing
on pgupgrade docs)
The upgrade step should handle this for you, but be sure to pay attention to the output of this. example:
Demo set-up
This section describes a demo set-up on how to upgrade a Postgres 9.6 deployment to Postgres 11 on Kubernetes. Adjust based on your own deployment. This is environment specific.
You can probably adopt the upgrade and upgrade-post-migration jobs to aid you in this.
Important
Before attempting this on production database, make sure you have a back-up. The proces documented in this section uses a validation step before removing the old database data, but failure may happen and data lose can occur.
First create the demo database. For this, we assume a Gitlab PostgreSQL database.
Apply the manifests from above:
This should initalize a new database.
Next, scale the database to zero:
Wait until the database has shutdown, before moving to the next step.
Upgrade job
Save the following into upgrade-job.yaml
:
Start the upgrade job:
Make sure that the PGUSER
and POSTGRES_INITDB_ARGS
match the owner of the database. See k8s/upgrade-job.yaml
for
an example. If you do not do this correctly, the job will fail.
Example expected output:
Next, we can validate that the postgres 11 upgrade succeeded. Use this step to run any validation query before moving on and removing the temp volumes and old pg 9.6 data.
Save the following to validation-database.yaml
:
Apply it to the cluster:
This will start a sepearte statefulset with a PostgreSQL container running PG 11 on the temp data directory.
Use kubectl logs
to validate the database started succesfully and did not initialize a new database.
Run any validation queries you see fit, but note that any modifications made in this database will end-up becoming live after the next step.
Once done, tear down the validation database:
Wait until the database has shutdown before moving on to the next step.
Switching the persistent volume
Here we have two options;
- Move the data from one PVC to the other.
- Switch the Persistent Volumes used by the PVC.
Alternatively, simply start-up the postgres 11 database on a new volume and continue to use that instead. Remove the old PVC once you go live. You will be done after the previous step (validation-database
).
Option 1) Move data between PVCs
This method will destroy the old data.
Start the upgrade-post-steps job:
Wait until this job has run to complation.
Option 2) Switch PV used by PVC
This method will preserve the old data, but requires manual work and is more complex to execute.
- Edit PV of the postgres 11 PVC to have
persistentVolumeReclaimPolicy
with a value ofRetain
. - Make sure you have a copy of the references between PVC and PV's (
kubectl get pvc
should be sufficient). - Delete the PVC holding the PV
pg-data-temp
. - The pv should now have the status
Released
- In order to not nuke the old PV holding the postgres 9.6 data, also modify the
persistentVolumeReclaimPolicy
with a value ofRetain
. - Remove the
claimRef
from the Postgres 11PV
, so thePV
will move to StatusAvailable
. - Remove the old Postgres 9.6
PVC
- Create a new PVC with the exact same name, using a
volumeName
reference to the Postgres 11PV
You can also follow along to our how to fix PVC binding to wrong PV guide, as you will be doing the same thing here.
Starting Postgres 11
Now we can start postgres with the postgres:11 image:
The postgres database should now be running PG 11.
To clean up, remove the following resources:
A note on pg_hba
The pg_hba.conf
configuration file is not properly transfered when performing an upgrade job. You will most likely need to validate and/or reconfigure this.
When running with the official postgres images, you may need to execute the following command to allow connectivity to the database by regular users:
When modifying the pg_hba.conf, you need to restart the postgres database for it to have effect.
Troubleshooting
- If after you started the database, no applications can connect, verify you have configured the
pg_gba
correctly. Usekubectl port-forward
to manually connect to the database if needed. - You can validate disk usage in the pod by using
kubectl exec pod/<db> -- df -h
- Failures that occured during the migration job are most often due to incorrect
POSTGRES_INITDB_ARGS
configuration. - If the migration job failed to start right away, dubble check if you have the correct
subPath
for your volumes.
Last updated on