Upgrade a PostgreSQL pod to next major version

Upgrading a PostgreSQL pod to the next major version on Kubernetes

Situation

When you need to upgrade a PostgreSQL pod / statefulset from 9.6 to 11. You manage your own statefulset either through Helm or Kubernetes manifests. This may not be the right approach when using a PostgreSQL operator.

This approach works between any PostgreSQL versions (such as 11 to 14).

Goals:

  • Upgrade from postgres 9.6 to postgres 11 with minimal risk and down time (or from 11 to 12).
  • Repeatable.
  • Easy to automate at large scale.
  • Works on Kubernetes

Non-goals:

  • Data validation scripts
  • Provide production ready automation scripts to run this for large scale database deployments

Before you start

  • Make sure you have back-ups of your data. Check how to create a persistent volume snapshot.
  • This needs to be tweaked to work for your postgres set-up (mostly correct names of the pvc, correct subPath and postgres versions).
  • Please test this procedure in a test environment before updating your production PostgreSQL pods!

Step by step

This is done based on prior work done by tianon. Please note that you will need to verify this approach on your set-up before attempting to upgrade production databases. Also make sure you have a working back-up before starting.

Preperation

You will need two seperate volumes:

  1. The old data directory for postgres
  2. A new data directory for the new version of postgres

!! Pay attention to the logging output during an upgrade. It may generate a post-upgrade.sql file that needs to be run. !!

If any post-upgrade processing is required, pg_upgrade will issue warnings as it completes. It will also generate script files that must be run by the administrator. The script files will connect to each database that needs post-upgrade processing. (See 13. Post-Upgrade processing on pgupgrade docs)

The upgrade step should handle this for you, but be sure to pay attention to the output of this. example:

upgrade_1  | fixing permissions on existing directory /var/lib/postgresql/11/data ... ok
upgrade_1  | creating subdirectories ... ok
upgrade_1  | selecting default max_connections ... 100
upgrade_1  | selecting default shared_buffers ... 128MB
upgrade_1  | selecting default timezone ... Etc/UTC
upgrade_1  | selecting dynamic shared memory implementation ... posix
upgrade_1  | creating configuration files ... ok
upgrade_1  | running bootstrap script ... ok
upgrade_1  | performing post-bootstrap initialization ... ok
upgrade_1  | syncing data to disk ... ok

Demo set-up

This section describes a demo set-up on how to upgrade a Postgres 9.6 deployment to Postgres 11 on Kubernetes. Adjust based on your own deployment. This is environment specific.

You can probably adopt the upgrade and upgrade-post-migration jobs to aid you in this.

First create the demo database. For this, we assume a Gitlab PostgreSQL database.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: gitlab-database
  namespace: default
  labels:
    app.kubernetes.io/name: gitlab
    app.kubernetes.io/component: database
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: gitlab
      app.kubernetes.io/component: database
  serviceName: gitlab-database
  replicas: 1
  template:
    metadata:
      annotations:
        backup.velero.io/backup-volumes: pg-data
      labels:
        app.kubernetes.io/name: gitlab
        app.kubernetes.io/component: database
    spec:
      containers:
      - name: pg
        image: postgres:9.6
        ports:
        - name: pg
          containerPort: 5432
        env:
        - name: POSTGRES_DB
          value: gitlab
        - name: POSTGRES_USER
          value: 'gitlab'
        - name: POSTGRES_PASSWORD
          value: gitlab
        volumeMounts:
        - mountPath: /var/lib/postgresql/data
          name: pg-data
          subPath: data
  volumeClaimTemplates:
  - metadata:
      name: pg-data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 10Gi
---
apiVersion: v1
kind: Service
metadata:
  name: gitlab-database
  namespace: default
spec:
  selector:
    app.kubernetes.io/name: gitlab
    app.kubernetes.io/component: database
  type: ClusterIP
  ports:
  - name: pg
    port: 5432
    targetPort: pg

Apply the manifests from above:

kubectl apply -f k8s/database.yaml
kubectl rollout status sts/gitlab-database -n default;

This should initalize a new database.

Next, scale the database to zero:

kubectl scale -n default sts/gitlab-database --replicas 0

Wait until the database has shutdown, before moving to the next step.

Upgrade job

Save the following into upgrade-job.yaml:

apiVersion: batch/v1
kind: Job
metadata:
  name: upgrade-postgres
  namespace: default
spec:
  template:
    metadata:
      labels:
        app: upgrade-postgres
    spec:
      containers:
      - image: tianon/postgres-upgrade:9.6-to-11
        name: upgrade-postgres
        env:
        - name: PGUSER
          value: gitlab
        - name: POSTGRES_INITDB_ARGS
          value: '-U gitlab'
        volumeMounts:
        - mountPath: /var/lib/postgresql/9.6/data
          name: pg-data
          subPath: data
        - mountPath: /var/lib/postgresql/11/data
          name: pg-data-temp
          subPath: data
      restartPolicy: Never
      volumes:
        - name: pg-data
          persistentVolumeClaim:
            claimName: pg-data-gitlab-database-0
        - name: pg-data-temp
          persistentVolumeClaim:
            claimName: pg-data-temp
status: {}
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pg-data-temp
  namespace: default
spec:
  accessModes:
    - ReadWriteMany
  # Retain, so we can switch the PV to a new PVC
  persistentVolumeReclaimPolicy: Retain
  resources:
    requests:
      storage: 10Gi

Start the upgrade job:

kubectl apply -f k8s/upgrade-job.yaml 

Example expected output:

$ k logs -f upgrade-postgres-mtgtd
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/11/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default timezone ... Etc/UTC
selecting dynamic shared memory implementation ... posix
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the option -A, or

--auth-local and --auth-host, the next time you run initdb.
Success. You can now start the database server using:

    pg_ctl -D /var/lib/postgresql/11/data -l logfile start

Performing Consistency Checks
-----------------------------
Checking cluster versions                                   ok
Checking database user is the install user                  ok
Checking database connection settings                       ok
Checking for prepared transactions                          ok
Checking for reg* data types in user tables                 ok
Checking for contrib/isn with bigint-passing mismatch       ok
Checking for invalid "unknown" user columns                 ok
Creating dump of global objects                             ok
Creating dump of database schemas
  gitlab
  postgres
  template1
                                                            ok
Checking for presence of required libraries                 ok
Checking database user is the install user                  ok
Checking for prepared transactions                          ok

If pg_upgrade fails after this point, you must re-initdb the
new cluster before continuing.

Performing Upgrade
------------------
Analyzing all rows in the new cluster                       ok
Freezing all rows in the new cluster                        ok
Deleting files from new pg_xact                             ok
Copying old pg_clog to new server                           ok
Setting next transaction ID and epoch for new cluster       ok
Deleting files from new pg_multixact/offsets                ok
Copying old pg_multixact/offsets to new server              ok
Deleting files from new pg_multixact/members                ok
Copying old pg_multixact/members to new server              ok
Setting next multixact ID and offset for new cluster        ok
Resetting WAL archives                                      ok
Setting frozenxid and minmxid counters in new cluster       ok
Restoring global objects in the new cluster                 ok
Restoring database schemas in the new cluster
  template1
  gitlab
  postgres
                                                            ok
Copying user relation files
  /var/lib/postgresql/9.6/data/base/16384/2613
  /var/lib/postgresql/9.6/data/base/16384/2683
  /var/lib/postgresql/9.6/data/base/16384/2995
  /var/lib/postgresql/9.6/data/base/16384/2996
  /var/lib/postgresql/9.6/data/base/12407/2613
  /var/lib/postgresql/9.6/data/base/12407/2683
  /var/lib/postgresql/9.6/data/base/12407/2995
  /var/lib/postgresql/9.6/data/base/12407/2996
  /var/lib/postgresql/9.6/data/base/1/2613
  /var/lib/postgresql/9.6/data/base/1/2683
  /var/lib/postgresql/9.6/data/base/1/2995
  /var/lib/postgresql/9.6/data/base/1/2996
                                                            ok
Setting next OID for new cluster                            ok
Sync data directory to disk                                 ok
Creating script to analyze new cluster                      ok
Creating script to delete old cluster                       ok
Checking for hash indexes                                   ok

Upgrade Complete
----------------
Optimizer statistics are not transferred by pg_upgrade so,
once you start the new server, consider running:
    ./analyze_new_cluster.sh

Running this script will delete the old cluster's data files:
    ./delete_old_cluster.sh

Next, we can validate that the postgres 11 upgrade succeeded. Use this step to run any validation query before moving on and removing the temp volumes and old pg 9.6 data.

Save the following to validation-database.yaml:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: gitlab-database-validation
  namespace: default
  labels:
    app.kubernetes.io/name: gitlab
    app.kubernetes.io/component: database-validation
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: gitlab
      app.kubernetes.io/component: database-validation
  serviceName: gitlab-database
  replicas: 1
  template:
    metadata:
      annotations:
        backup.velero.io/backup-volumes: pg-data
      labels:
        app.kubernetes.io/name: gitlab
        app.kubernetes.io/component: database-validation
    spec:
      containers:
      - name: pg
        image: postgres:11
        ports:
        - name: pg
          containerPort: 5432
        env:
        - name: POSTGRES_DB
          value: gitlab
        - name: POSTGRES_USER
          value: 'gitlab'
        - name: POSTGRES_PASSWORD
          value: gitlab
        volumeMounts:
        - mountPath: /var/lib/postgresql/data
          name: pg-data-temp
          subPath: data
      volumes:
      - name: pg-data-temp
        persistentVolumeClaim:
          claimName: pg-data-temp
---
apiVersion: v1
kind: Service
metadata:
  name: gitlab-database-validation
  namespace: default
spec:
  selector:
    app.kubernetes.io/name: gitlab
    app.kubernetes.io/component: database-validation
  type: ClusterIP
  ports:
  - name: pg
    port: 5432
    targetPort: pg

Apply it to the cluster:

kubectl apply -f k8s/validation-database.yaml

This will start a sepearte statefulset with a PostgreSQL container running PG 11 on the temp data directory.

Use kubectl logs to validate the database started succesfully and did not initialize a new database. Run any validation queries you see fit, but note that any modifications made in this database will end-up becoming live after the next step.

Once done, tear down the validation database:

kubectl delete -f k8s/validation-database.yaml

Wait until the database has shutdown before moving on to the next step.

Switching the persistent volume

Here we have two options;

  1. Move the data from one PVC to the other.
  2. Switch the Persistent Volumes used by the PVC.

Alternatively, simply start-up the postgres 11 database on a new volume and continue to use that instead. Remove the old PVC once you go live. You will be done after the previous step (validation-database).

Option 1) Move data between PVCs

This method will destroy the old data.

apiVersion: batch/v1
kind: Job
metadata:
  name: upgrade-post-migration
  namespace: default
spec:
  template:
    metadata:
      labels:
        app: upgrade-post-migration
    spec:
      containers:
      - image: postgres:11
        name: transfer-directory
        command:
          - /bin/bash
          - -c
          - |
            ls -la /var/lib/postgresql/9.6/data && ls -la /var/lib/postgresql/11/data && \
            rm -rf /var/lib/postgresql/9.6/data && cp -r /var/lib/postgresql/11/data /var/lib/postgresql/9.6/data && \
            chown -R postgres:postgres /var/lib/postgresql/
        volumeMounts:
        - mountPath: /var/lib/postgresql/9.6/data
          name: pg-data
          subPath: data
        - mountPath: /var/lib/postgresql/11/data
          name: pg-data-temp
          subPath: data
      restartPolicy: Never
      volumes:
        - name: pg-data
          persistentVolumeClaim:
            claimName: pg-data-gitlab-database-0
        - name: pg-data-temp
          persistentVolumeClaim:
            claimName: pg-data-temp

Start the upgrade-post-steps job:

kubectl apply -f k8s/upgrade-post-migration.yaml

Wait until this job has run to complation.

Option 2) Switch PV used by PVC

This method will preserve the old data, but requires manual work and is more complex to execute.

  1. Edit PV of the postgres 11 PVC to have persistentVolumeReclaimPolicy with a value of Retain.
  2. Make sure you have a copy of the references between PVC and PV’s (kubectl get pvc should be sufficient).
  3. Delete the PVC holding the PV pg-data-temp.
  4. The pv should now have the status Released
  5. In order to not nuke the old PV holding the postgres 9.6 data, also modify the persistentVolumeReclaimPolicy with a value of Retain.
  6. Remove the claimRef from the Postgres 11 PV, so the PV will move to Status Available.
  7. Remove the old Postgres 9.6 PVC
  8. Create a new PVC with the exact same name, using a volumeName reference to the Postgres 11 PV

You can also follow along to our how to fix PVC binding to wrong PV guide, as you will be doing the same thing here.

Starting Postgres 11

Now we can start postgres with the postgres:11 image:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: gitlab-database
  namespace: default
  labels:
    app.kubernetes.io/name: gitlab
    app.kubernetes.io/component: database
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: gitlab
      app.kubernetes.io/component: database
  serviceName: gitlab-database
  replicas: 1
  template:
    metadata:
      annotations:
        backup.velero.io/backup-volumes: pg-data
      labels:
        app.kubernetes.io/name: gitlab
        app.kubernetes.io/component: database
    spec:
      containers:
      - name: pg
        image: postgres:11
        ports:
        - name: pg
          containerPort: 5432
        env:
        - name: POSTGRES_DB
          value: gitlab
        - name: POSTGRES_USER
          value: 'gitlab'
        - name: POSTGRES_PASSWORD
          value: gitlab
        volumeMounts:
        - mountPath: /var/lib/postgresql/data
          name: pg-data
          subPath: data
  volumeClaimTemplates:
  - metadata:
      name: pg-data
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 10Gi
kubectl apply -f k8s/database11.yaml

The postgres database should now be running PG 11.

To clean up, remove the following resources:

kubectl delete -f k8s/upgrade-job.yaml
kubectl delete -f k8s/upgrade-post-migration.yaml

A note on pg_hba

The pg_hba.conf configuration file is not properly transfered when performing an upgrade job. You will most likely need to validate and/or reconfigure this. When running with the official postgres images, you may need to execute the following command to allow connectivity to the database by regular users:

echo "host all all all md5" >> /var/lib/postgresql/data/pg_hba.conf

Troubleshooting

  • If after you started the database, no applications can connect, verify you have configured the pg_gba correctly. Use kubectl port-forward to manually connect to the database if needed.
  • You can validate disk usage in the pod by using kubectl exec pod/<db> -- df -h
  • Failures that occured during the migration job are most often due to incorrect POSTGRES_INITDB_ARGS configuration.
  • If the migration job failed to start right away, dubble check if you have the correct subPath for your volumes.