Leverage a New Way to Import an Existing Postgres Database to Kubernetes

August 15, 2022

Are you thinking about moving your PostgreSQL databases to Kubernetes but wondering how to do it? What about importing databases from RDS for PostgreSQL or another database as a service?

Release 1.16 of the CloudNativePG open source operator introduces a new feature which makes it easier to import inside Kubernetes an existing Postgres database, from any location as long as it can be reached via the network.

The same feature also enables major version upgrades of PostgreSQL, as well as migrating to CloudNativePG any existing PostgreSQL database that you are already running inside Kubernetes with a different operator—or without one, using a pure statefulset based deployment.

This feature enhances the initdb bootstrap, by introducing a new subsection called import. Such a section is evaluated only if it is not empty, after the cluster has been initialized from scratch, and it defines which data to import from an existing Postgres instance. Such a Postgres instance can be running in a virtual machine, or on bare metal, or even as a service—like Amazon RDS. The important thing is that objects can be exported using logical backup from the source, and subsequently imported in the target instance.


Our scenario: from Postgres 10 to 14

Let’s suppose you have an existing PostgreSQL 10 cluster in Amazon RDS, hosting a few databases, each reserved to a different application.

Your organization has finally decided to move your database alongside applications inside Kubernetes and on Day 0 you have decided to trust CloudNativePG for Day 1 and Day 2 operations involving PostgreSQL to setup, configure and then run your future production environment (see the talk I presented at Postgres Vision 2022 for more details). At the same time, you want to upgrade the application to PostgreSQL 14, given that PostgreSQL 10 will not be supported anymore by the PGDG Community at the end of the year.

The temptation is to lift and shift the database architecture, and map it in Kubernetes the same way it was outside. In other words, create a single cluster hosting all the existing databases and roles.

Our recommendation though is to take the opportunity to move to a microservice architecture for the database too. This means that each database that exists in the PostgreSQL 10 cluster can be moved, one at a time, into a separate cluster managed by CloudNativePG inside Kubernetes (more on this in the “How many databases should be hosted in a single PostgreSQL instance?” FAQ).

More importantly, especially in the case of a major upgrade of PostgreSQL, you will need to perform this migration/import in a few stages. It is always recommended to run a first import and take the opportunity to:

  • Test the process
  • Test the database with your applications
  • Measure the time of export/import

Based on my experience, you will find yourself repeating this process a few times, until you master it, and you have built solid data about the amount of time required to import all objects.


Configuring the Postgres cluster to import data

Let’s suppose that your PostgreSQL 10 cluster is available on the host “gabriele.XXXXXXXX.eu-central-1.rds.amazonaws.com” (this could be anything, including an IP address), and you want to import an existing database called “freddie” as a separate 3 node CloudNativePG “Cluster”, with 4GB of RAM, 8 cores, and 50GB of storage using the default storage class. It also configures Postgres for 1GB of shared buffers, data checksums, and sets the size of a WAL segment to 32MB.

Given that CloudNativePG is designed to be fully declarative, the following YAML file called “freddie.yaml” will take care for everything we just described above:

apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
  name: freddie
spec:
  # One primary and two standby servers
  instances: 3
  # Sets Postgres shared buffers to 1GB
  postgresql:
    parameters:
      shared_buffers: "1GB"

  # Sets the resources for Guaranteed QoS
  resources:
    requests:
      memory: "4Gi"
      cpu: 8
    limits:
      memory: "4Gi"
      cpu: 8

  # Sets the 50GB storage for PGDATA
  # This volume will also be used by the import
  # process to temporarily store the custom format
  # dump coming from the source database
  storage:
    size: 50Gi

  # Initializes the cluster from scratch (initdb bootstrap)
  bootstrap:
    initdb:
      # Enables data checksums
      dataChecksums: true
      # Sets WAL segment size to 32MB
      walSegmentSize: 32
      # Defines the import strategy, using the microservice
      # method to import a single database, called freddie,
      # from the source external cluster called cluster-pg10 
      import:
        type: microservice
        databases:
          - freddie
        source:
          externalCluster: cluster-pg10

  # Defines the cluster-pg10 external cluster
  # by providing information on how to connect to the Postgres
  # instance, including user and password (contained in a
  # separate secret that you need to create).
  externalClusters:
    - name: cluster-pg10
      connectionParameters:
        host: gabriele.XXXXXXXX.eu-central-1.rds.amazonaws.com
          user: postgres
        dbname: postgres
      password:
        name: cluster-pg10-superuser
        key: password

# Secret containing the password for the Postgres user in RDS
apiVersion: v1
stringData:
  username: postgres
  password: MYPASSWORD
kind: Secret
metadata:
  name: cluster-pg10-superuser
type: kubernetes.io/basic-auth

---

IMPORTANT: You need to make sure that you provide the right credentials for the new cluster to access the external Postgres instance via the network from inside Kubernetes. Connecting to the source cluster goes beyond the scope of this article.

Another important aspect to consider is that, at the moment, CloudNativePG uses the same volume where the PGDATA resides to temporarily store the dump file generated by pg_dump. This means that you need to properly size the storage section in the “freddie.yaml” file (we are thinking of improving this by adding in the future the possibility to specify a temporary volume for the export).


Creating the new Postgres cluster with imported data

Once configuration is ready, all you have to do is run “kubectl apply -f freddie.yaml” to start the initialization of the new cluster.

It is important to note that when you run this process, the effects on the current source database will be the same as any logical backup taken with “pg_dump”: nothing more than an increase in I/O and the possible creation of long running transactions - so dear to VACUUM!

Anyway, once you run the above command, a new job called “freddie-1-import-XXXXX” will be started with the following duties:

  • Initialize the new instance by running PostgreSQL’s “initdb” command on the freddie-1 PVC
  • Start the “microservice” import phase to export the “freddie” database from the cluster-pg10 PostgreSQL instance

Just plug yourself to the job using “kubectl logs” for details. While the first task is a standard initdb bootstrap, the second one represents the main feature included in version 1.16. Under the hood, CloudNativePG invokes “pg_dump” to export the “freddie” database in the custom format (“-Fc” option of pg_dump) inside the “/var/lib/postgresql/data/pgdata/dumps” folder of the volume, within PGDATA. This is the relevant excerpt from the job log:

{"level":"info","ts":1658225066.0524056,"msg":"starting microservice clone process","logging_pod":"freddie-1"}
{"level":"info","ts":1658225066.0525804,"msg":"exporting database","logging_pod":"freddie-1","databaseName":"freddie"}
{"level":"info","ts":1658225066.0526383,"msg":"Running pg_dump","logging_pod":"freddie-1","cmd":"pg_dump","options":["-Fc","-f","/var/lib/postgresql/data/pgdata/dumps/freddie.dump","-d","dbname='postgres' host='gabriele.XXXXXXXX.eu-central-1.rds.amazonaws.com' user='postgres' dbname=freddie","-v"]}

Then, it connects to the application database (default “app”), with the application user (default “app”) in the new “freddie” cluster to run “pg_restore”, import data from the dump (pre-data, data, and post-data), and complete the process by running “ANALYZE”. The JSON log shows the following entries:

{"level":"info","ts":1658225073.7432182,"msg":"temporarily granting superuser permission to owner user","logging_pod":"freddie-1","owner":"app"}
{"level":"info","ts":1658225073.7466717,"msg":"executing database importing section","logging_pod":"freddie-1","databaseName":"freddie","section":"pre-data"}
{"level":"info","ts":1658225073.7467177,"msg":"Running pg_restore","logging_pod":"freddie-1","cmd":"pg_restore","options":["--no-owner","--no-privileges","--role=app","-d","app","--section","pre-data","/var/lib/postgresql/data/pgdata/dumps/freddie.dump"]}
{"level":"info","ts":1658225073.7845924,"msg":"executing database importing section","logging_pod":"freddie-1","databaseName":"freddie","section":"data"}
{"level":"info","ts":1658225073.7846496,"msg":"Running pg_restore","logging_pod":"freddie-1","cmd":"pg_restore","options":["--no-owner","--no-privileges","--role=app","-d","app","--section","data","/var/lib/postgresql/data/pgdata/dumps/freddie.dump"]}
{"level":"info","ts":1658225074.0922115,"msg":"executing database importing section","logging_pod":"freddie-1","databaseName":"freddie","section":"post-data"}
{"level":"info","ts":1658225074.092287,"msg":"Running pg_restore","logging_pod":"freddie-1","cmd":"pg_restore","options":["--no-owner","--no-privileges","--role=app","-d","app","--section","post-data","/var/lib/postgresql/data/pgdata/dumps/freddie.dump"]}
{"level":"info","ts":1658225074.2391593,"msg":"removing superuser permission from owner user","logging_pod":"freddie-1","owner":"app"}
{"level":"info","ts":1658225074.2424512,"msg":"running analyze for database: app","logging_pod":"freddie-1"}

As you can see, the major difference here is that in the original cluster, the database was called “freddie” and owned by “freddie”. The “microservice” import remaps this into the default application user and database (both called “app”) with the idea to reserve the entire PostgreSQL cluster to one single database. Hence why we call the cluster “freddie”.


Verifying the import

It’s now time to connect to the new database and verify that everything works as expected, by connecting to the database and running queries. The best way to do this is to connect using the “freddie-rw” service that CloudNativePG automatically provides. As a quick workaround though, you can connect using “psql” inside the pod, with:

k exec -ti freddie-1 -- psql app


There’s more …

If your application requires different database names or users that own the database, you can simply change them in the configuration file. However, there shouldn’t be the need for that as the identifier for your applications will be the cluster name and, consequently, the service name (which is built from the cluster name, adding the “-rw” suffix for the primary, or the “-ro” suffix for the read replicas).

You can customize your instance and database if needed, by looking for example at:

  • bootstrap.initdb.postInitSQL: to create additional roles for your applications or assign permissions globally
  • bootstrap.initdb.import.postImportApplicationSQL: to execute GRANT operations to schemas or tables after the import operation

The microservice method is our recommended approach for importing databases in a microservice environment. However, CloudNativePG provides another method, called “monolith” that enables importing multiple databases and/or roles in Kubernetes. This will be covered in the next article!

Please spend some time reading the “Importing Postgres Databases” section from the official documentation if you are interested in more information. I encourage you to try and import databases by yourself following the instructions you find here, and report any problems you experience by opening issues or starting new discussions in the Github project.

CloudNativePG is available for Kubernetes 1.22, 1.23, and 1.24 and is supported by the CloudNativePG community for PostgreSQL 10, 11, 12, 13, and 14.


Get support from EDB!

We at EDB are proud to be the company that originally created CloudNativePG and open sourced it with the multi-year goal to become a CNCF project. We are a Kubernetes Certified Service Provider (KCSP) and an active contributor of the Data on Kubernetes Community.

We provide professional support on the entire open source stack (Kubernetes, CloudNativePG, and PostgreSQL) through the Community 360 subscription plan.

Additionally, we distribute EDB Postgres for Kubernetes, an operator that is based on CloudNativePG and that introduces support for EDB Postgres Advanced to facilitate migrations from Oracle through its compatibility layer. EDB Postgres for Kubernetes is also available on the Red Hat OpenShift platform as a certified operator.
 

Share this