What is Database as a Service (DBaaS)?
A recent IDC survey revealed that 63% of enterprises are actively moving their databases to the cloud, with another 29% considering it within the next three years. Database as a Service (DBaaS) is emerging as a popular solution for this cloud migration. In 2022, an EDB survey found that 50% of participants planned to use a DBaaS for their Postgres cloud migration; 39% were looking into containers and Kubernetes, and 11% aimed to migrate to virtual machines. This article will focus on DBaaS as a leading cloud migration option.
DBaaS is a cloud-based service model that enables users and organizations to create, manage, maintain, and query databases in the cloud. It eliminates the need for physical hardware and software installations and minimizes operational tasks like monitoring, backup, recovery, and patching. These tasks are handled by the Cloud Database Operator (CDO), who also manages the hardware and software and ensures a committed service level availability (SLA).
With DBaaS, users can concentrate on their business priorities and optimize their databases for applications, instead of spending resources on managing and monitoring tasks, high availability, backups, security and compliance, OS and database patching, or network configuration.
DBaaS is also known as Database Platform as a Service (DBPaaS) or fully managed database. Its adoption is growing rapidly, with all major cloud platforms offering DBaaS solutions, including AWS RDS, Azure Database, and Google Cloud SQL. Specialized vendors like MongoDB's Atlas, EDB's Postgres® AI Cloud Service and Crunchy Data's Crunchy Bridge also provide advanced DBaaS options.
While DBaaS focuses on databases, understanding the “as a Service” aspect is crucial. The National Institute of Standards and Technology (NIST) outlines five essential characteristics of cloud computing:
- On-demand self-service: Automated provisioning without human intervention
- Broad network access: Access from anywhere across various platforms
- Resource pooling: Multi-tenant use of hardware and software (with boundaries) for greater flexibility and resource utilization vs. idle servers sitting in a datacenter
- Rapid elasticity: Seamless and rapid scaling and releasing of provisioned resources based on demand
- Measured service: Monitoring, controlling, and reporting resource utilization for transparency and billing based on usage
These features suit the needs of most businesses considering cloud migration:
- Focus on value-added activities rather than basic maintenance tasks
- Support for an agile model with quick resource availability
- Pay for what you need
How Does Database as a Service Work?
Understanding how a Database as a Service (DBaaS) works involves two key components:
- Shared Responsibility Model: Defines the responsibilities of both the Cloud Database Operator (CDO) and the user.
- Control Plane: Describes how the CDO’s management tools interact with the user’s databases to ensure they are available, patched, backed up, and monitored.
The Shared Responsibility Model
In the shared responsibility model, both the CDO and the user share the tasks of deploying, configuring, and maintaining the DBaaS. The cloud provider handles various maintenance activities like database patching, hardware upgrades, ensuring high availability, and managing backups, while the consumer/end-user focuses on query performance, password management, and selecting appropriate resources for their workload. Although the exact responsibilities can vary between different cloud providers, the idea is that both parties share the responsibility to make the DBaaS work as intended.
For example, in most DBaaS setups, the provider takes care of hardware maintenance, networking, OS installation and patching, database software installation and updates, monitoring, and availability of network access (ping), performance (pipe) and power. Additionally, they ensure compliance with key standards like SOC 1 and 2, PCI, HIPAA, and FedRAMP. Meanwhile, users are responsible for creating and maintaining databases, tables, queries, indexes, stored procedures, and other data-related tasks.
The Control Plane
The control plane serves as a bridge between the user’s and the CDO’s domain. It allows the CDO’s Service Reliability Engineers (SRE) to manage the user’s databases and underlying cloud resources, such as compute, storage, and networking. It also securely runs commands that users enter through the DBaaS GUI or API.
The Control Plane of EDB’s Postgres® AI Cloud Service DBaaS uses EDB’s Kubernetes operators
The control plane is essential to the DBaaS operation, as it ensures the implementation of the shared responsibility model and supports essential “as a Service” characteristics (on-demand, rapid elasticity, pooled resources, and metering).
How is Database as a Service Different from On-Premises Database Management Systems?
On-premises database systems are typically installed on virtual machines or physical servers. These setups often use Infrastructure as Code (IaC) tools like Ansible, Terraform, Chef, or Puppet for installation and configuration. While these tools expedite the initial setup, they don’t monitor hardware or software or automatically address issues – which DBaaS offers to relieve users of operational burdens. On-premises systems usually have limited resource pools, resulting in limited elasticity.
IaC approaches fall short of the fundamental reasons businesses switch to DBaaS: agility, innovation, data center closures, global markets, and focusing on value-added activities.
Key reasons businesses move to cloud-based DBaaS
- Agility: On-premises systems aren’t truly agile. While IaC can speed up deployment, users still need to set up compute, storage, and networking first, which limit self-service and scalability. In most cases, these setups typically do not support consumption-based licenses.
- Innovation: Cloud DBaaS provides services that allow for quick testing and use of specialized technologies for projects, enabling a "fail fast" approach. However, it requires significant investments in licenses, hardware, and operational skills, making it tough to experiment quickly.
- Globalization: To meet performance and compliance requirements, data must be stored close to users. On-premises solutions for this are costly and complex, unable to efficiently support globalization demands.
- Focus on value-added activities: On-premises deployments can automate many tasks using reference architectures and IaC, but in-house teams are still needed for monitoring, managing, maintaining, and updating the systems. Enterprises prefer their staff to focus on innovation and business value, like data analysis, instead of routine maintenance.
Is DBaaS Considered to be SaaS, PaaS, or IaaS?
In this section, we will compare DBaaS to Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS). According to NIST SP 800-145, these models are defined as follows:
- Software as a Service (SaaS): Allows users to access applications hosted by a service provider on the cloud. They can use these applications through either a thin client interface, such as web browser (e.g. web-based email) or a program interface, but do not manage the cloud infrastructure, network, servers, operating systems, or storage, except for some application configuration settings.
- Platform as a Service (PaaS): Enables users to deploy their own applications onto the cloud infrastructure using the provider’s programming languages, libraries, services, and tools. They have control over their applications and can adjust settings for the application-hosting environment, but do not manage the underlying cloud infrastructure, network, servers, operating systems, or storage.
- Infrastructure as a Service (IaaS): Users can access fundamental computing resources like processing, storage, and networks. They can deploy and run various software, including operating systems and applications, and have control over these installations. However, they don't manage the underlying infrastructure but may have limited control over networking components like host firewalls.
Databases, especially with SQL, are often considered a type of PaaS. While some features, like GUI-based access, might fit into SaaS, overall, DBaaS is a good match for the PaaS category.
Comparing Virtual Machines, Kubernetes, and DBaaS
There are three main ways to migrate a database to the public cloud:
1. Virtual Machines: Can be combined with (IaC) solutions like Ansible. Examples include:
- Postgres with RepMgr on AWS Elastic Compute Cloud (EC2)
- EDB Postgres Advanced Server with EDB Failover Manager
- Postgres Enterprise Manager on Google Compute Machine
2. Containers Orchestrated with Kubernetes: This approach includes:
- Postgres with the Patroni operator on EKS
- EDB Cloud Native Postgres on AKS
3. Database as a Service (DBaaS): This features:
- Postgres® AI Cloud Service by EDB
- AWS Relational Database Service (RDS) for Postgres
By evaluating these options, you can choose the best fit for your cloud database needs.
Three options to take a database to the cloud: Virtual machines, containers or DBaaS
A recent LinkedIn survey found that Database as a Service (DBaaS) is the most popular choice for cloud migration. However, using Virtual Machines (VMs) on IaaS and Containers with Kubernetes (K8s) are also doing well in the market.
Q: How are you moving your databases to the cloud?
Answer | % of responses |
---|---|
Virtual Machines on laaS | 10% |
K8s and Containers | 35% |
Database as a Service | 55% |
While DBaaS is the most popular option, we want to clarify how well each of these services meets the “as a Service” requirements mentioned earlier. Here’s a breakdown:
Only DBaaS truly meets the requirements of a cloud service. While K8s is better than VMs in terms of “as a Service” requirements, it still lacks the central control plane that is key to implementing the shared responsibility model.
However, there are other trade-offs to consider:
Advantages | Things to Consider | |
---|---|---|
VMs on IaaS |
|
|
K8s on Containers |
|
|
Managed Database Service |
|
|
What Are the Advantages and Disadvantages of DBaaS?
In a DBaaS, the CDO takes on key responsibilities, including the management of software, hardware, and network resources, ensuring service availability.
This has an advantage and a drawback:
- Advantage: DBaaS customers don’t have to worry about managing these tasks.
- Drawback: Customers become completely dependent on the CDO to perform these tasks in a timely manner, which is crucial to avoid interruptions in database service but it’s not always the case. For example, at the time of writing, one of the leading DBaaS providers for Postgres was over two years behind on delivering the latest version of open-source Postgres.
Another challenge arises from the segregation of duties:
- Since the CDO is responsible for the maintenance of the operating system and the installation of database software, users will not be granted operating-level access. This restriction means that certain configuration parameters, logs, and software add-ons will be inaccessible for tuning and diagnostics.
Understanding these advantages and disadvantages is crucial for organizations considering a shift to a DBaaS model, as the right choice can significantly affect their operational efficiency and overall performance.
What Are the DBaaS Service Variations?
There are two major DBaaS variations: those that operate within a customer’s cloud account and those that run in the service provider's account (CDO’s account).
1. Customer Account DBaaS: Examples include AWS RDS, Google Cloud SQL, and EDB’s Postgres® AI Cloud Service. In this model, the database runs in the customer’s own cloud account. This allows customers to monitor how much they are spending on DBaaS resources and helps them manage their budget more effectively, especially when negotiating volume purchases with cloud providers.
2. CDO Account DBaaS: An example is Mongo Atlas, which operates from the service provider's account. When the database runs in the CDO’s account, the management of resources is handled by the provider, which can simplify operations but may limit the customer's visibility over usage.
Some services, like Couchbase Capella and ScyllaDB, offer both options, giving consumers the flexibility to choose based on their needs. Many organizations prefer keeping their databases and data within their own cloud accounts to easily meet compliance requirements.
What is PostgreSQL as a Service?
PostgreSQL as a Service (PGaaS) is a specific form of Database as a Service (DBaaS) that enables users to easily create, manage, and use Postgres databases in the cloud. Various cloud service providers offer PGaaS options, including AWS with RDS for Postgres, Microsoft's Azure Database for Postgres, and Google Cloud's Postgres service.
Specialty vendors like EDB and Crunchy Data also provide Postgres DBaaS solutions on public clouds. These vendors bring extensive PostgreSQL expertise to enhance the service quality.
One notable PGaaS offering is EDB’s Postgres® AI Cloud Service, which combines EDB’s PostgreSQL knowledge with the native Oracle compatibility of EDB’s Postgres Advanced Server. This service ensures continuous availability through EDB’s Cloud Native Postgres Operator. Postgres® AI Cloud Serviceis available on Microsoft Azure, AWS, and Google Cloud Platform.
Common Risks When Moving to the Cloud
This section covers the potential risks customers might face when migrating their database to a cloud-based database service (DBaaS), particularly if it uses open-source software like Apache Cassandra, MariaDB, MySQL, Postgres, or Redis.
We categorize these risks into five areas: support, service, technology stagnation, cost, and lock-in. If businesses move to the cloud without careful planning and risk management, they could experience high costs and project delays. Most importantly, they might not achieve the expected benefits from cloud migration.
1. Support risk
Customers using software for production applications need reliable support, whether their systems are in the cloud or on premises. Enterprise software support should focus on two main areas: providing expert guidance on proper product use, especially in challenging situations, and quickly addressing bugs that affect production.
For commercial software, basic support is usually included with the license. However, many open-source database providers may not invest enough in the community to fix bugs or offer support for open-source databases when operating a DBaaS.
To assess a cloud database provider's support capabilities during a cloud migration, customers can review the release notes of open-source software and look for team members who contribute to the project. For instance, PostgreSQL's release notes are freely available and list everyone who has added new features or fixed bugs. Other open source communities have similar practices.
If open source cloud database providers are not actively involved in development and bug fixing, they may struggle to offer the necessary support – both advice and rapid response to problems –putting cloud migration at risk.
2. Service risk
Databases can be complex software systems and many users require expert advice and hands-on help to set them up for optimal performance and high availability. This need is especially critical when transitioning from on-premises setups to the cloud.
Cloud database providers that don't offer consultative and expert services can introduce risks into this process. They often leave customers to manage coordination between the DBaaS provider and any additional professional services needed, making it harder to achieve a smooth deployment. Instead of having one point of contact for support, customers may find themselves caught in the middle, dealing with multiple vendors and potential issues.
To minimize these risks, customers should be clear about who is responsible for the success of their deployment and ensure that this party can effectively manage the entire project. EDB’s DBA Services have proven to be a great resource, providing an integrated end-to-end solution that ensures very high availability.
3. Technology stagnation risk
The shared responsibility model is a key component of a DBaaS. While the users handle schema definitions and query tuning, the cloud database providers take care of minor version updates and major upgrades. However, not all providers are prompt with upgrades; some can fall years behind. For example, one major Postgres DBaaS provider was nearly three years behind the open-source community in updating to the latest Postgres versions. Delayed updates can mean customers miss out on new database capabilities for an extended period, so it’s important for users to check a provider’s historical track record of applying updates.
Another risk arises when a proprietary cloud database provider tries to create their own version of well-known open-source software. This might be done to optimize the software for cloud use or to address license restrictions. But these forked versions can deviate significantly from the better-known parent or fall behind the open source version. Examples include Aurora Postgres (a variant of Postgres), Amazon DocumentDB (compatible with MongoDB), and Amazon OpenSearch Service (based on Elasticsearch).
Users need to be careful when adopting cloud-specific versions or forks of open source software, as features may deviate over time, and the cloud database provider may not keep up with the latest capabilities of the open source version.
4. Cost risk
Leading cloud database services have yet to experience meaningful direct price hikes. However, there's a growing awareness that the nature of cloud services can drive significant cost risk, especially when using self-service and rapid elasticity with unclear cost model. In on-premises environments, database administrators (DBAs) and developers need to optimize their code to get the best performance from the hardware. it's often easier to just ask the provider to boost input/output operations per second (IOPS), compute, or memory to optimize performance. While this quick fix can improve speed, it can also lead to higher costs in the long run.
To manage these cost risks, users can take two main approaches:
- Closely monitor the increases in IOPS, CPU, and memory to ensure they align with the costs of optimizing applications
- Carefully evaluate the pricing models of DBaaS with vendors to steer clear of those with complex and unpredictable pricing structures
5. Lock-in risk
Cloud database services can create a "Hotel California" effect, making it hard to move data out of the cloud once it's in. While data egress cost is often mentioned, general data gravity and integration of other cloud-specific tools for data management and analysis are more impactful. Essentially, once a business data set is on a cloud platform, it's likely that more applications will use that data there, making it harder to move it elsewhere without causing problems for the business.
Cloud-specific tools are also a meaningful driver for lock-in, as each cloud platform offers specialized tools for managing and analyzing data. While they help derive business value quickly, they can also create lock-in.
To avoid being locked into a specific cloud provider, users should be cautious about relying on proprietary tools and should choose DBaaS solutions that allow for efficient data replication to other cloud services.
6. Planning for risk
Many organizations aim to move databases to the cloud, but this approach comes with risks. Businesses should thoroughly assess the potential weaknesses of cloud database providers in areas like support, services, technology stagnation, costs, and lock-in. While these risks shouldn’t deter organizations from moving to the cloud, it's crucial to tackle them upfront and integrate them into a well-planned cloud migration strategy.
EDB's Postgres® AI Cloud Service is designed to tackle these concerns, backed by top contributors to the Postgres project, a dedicated services team, and a commitment to maintaining the latest Postgres versions using a clear cost structure.
For further insights on challenges related to PostgreSQL in a cloud environment, check out EDB's blog post on 5 Common PostgreSQL Challenges in the Cloud.
DBaaS Tools
In this section, we will explore development tools, APIs and CLIs, CI/CD tools, and Infrastructure as Code environments.
Developmental tools
Postgres comes with a handy command line client called pqsl, which is used by many developers. This client operates in both interactive and batch modes, allowing users to run complex scripted operations. It is often the preferred tool for experienced PostgreSQL DBAs and works effectively with Postgres in the cloud.
pgAdmin’s ERD diagramming Tool
The most popular GUI-based development tool for PostgreSQL is pgAdmin. Individual developers can use it as a standalone tool or configure it in server mode for team collaboration across multiple database servers. The pgAdmin environment facilitates the browsing of database objects, configuration management, query development, ERD diagramming, and schema diffing to identify necessary patches for production deployment. Additionally, pgAdmin seamlessly connects to all major Postgres DBaaS platforms.
For developers moving from Oracle, Toad Edge is another familiar option. While there are other tools for Postgres, pgAdmin is the top choice among the Postgres development community.
For a complete list of tools that work with Postgres, check out The Expert’s Guide to Integrating PostgreSQL.
Database compatibility
Not every cloud migration is a simple lift-and-shift. Often, the database platform changes as well. In EDB’s experience, Oracle is the most common source when moving to Postgres in the cloud. It's crucial to understand how to migrate the schema, data, stored procedures, and APIs to a DBaaS.
EDB Postgres Advanced Server was specifically designed to facilitate migration from Oracle and is available on Postgres® AI Cloud Service, one of the Postgres distributions.
For a clear, step-by-step guide on the migration process and the tools available for Oracle migration, please refer to our whitepaper.
APIs and CLIs
When deploying DBaaS at scale, especially in DevOps or CI/CD settings where app and database changes must be synchronized, managing databases using an API (application programming interface) or CLI (command line interface) is key. Relying on manual deployments is not only slow and unscalable but also prone to errors.
Other cloud providers, like AWS RDS, offer similar APIs and CLIs. Knowing how to use these tools effectively, including programming with them in languages like Python or Go, is vital for successfully managing DBaaS at scale.
The following description is based on a series of blogs by Doug Ortiz about the use of APIs and CLIs to manage Postgres databases on Postgres® AI Cloud Service.
Postgres® AI Cloud Service API cluster management
Postgres® AI Cloud Service APIs provide a streamlined way to manage various components and resources, including billing, cloud providers, clusters, events, organizations, permissions, Postgres types, Postgres versions, provider, roles, status, users, and authentication.
A Postgres® AI Cloud Service cluster consists of databases, users, roles, and tablespaces, all managed by a group of nodes designed to run Postgres/EDB Postgres Advanced Server. The key API for managing these clusters is called CLUSTERS. This API enables you to easily create, list, and delete clusters – streamlining your management tasks.
Authentication
Before we get started with cluster management in Postgres® AI Cloud Service we must provide our credentials and authenticate against it. We will focus on utilizing get-token.sh available in the Postgres® AI Cloud Service Cloud Utilities GitHub repository.
This method will consist of the following steps:
1. Download and execute get-token.sh
2.Authenticate with the URL provided by the get-token.sh script
- Navigate to the indicated URL in a browser
- Click the “Confirm” button for the indicated device code
- Confirm the successful login
- Retrieve the provided token after the successful login
3. Assign the authentication token to a variable for use across the command line
TOKEN=' <token>';
Creating clusters
The creation of a cluster requires multiple parameters to be sent via the command line. To prevent writing all those parameters, create a data.json file with the following content to send all the parameters to the clusters API.
{ "clusterName": "do-cluster-1", "instanceType": { "cpu": 2, "id": "azure:Standard_E2s_v3", "instanceType": "E2s v3", "ram": 16 }, "password": "SuperSecretPassword", "pgConfigMap": [ [ "max_connections", "100" ] ], "postgresType": { "dockerImage": "", "id": "epas" }, "postgresVersion": "13", "privateNetworking": true, "provider": { "cloudProviderId": "azure" }, "region": { "regionId": "westus2" } }
Once data.json is created, we can refer to it in the command line.
The command line statement below will create a cluster.
curl -X 'POST' 'https://portal.biganimal.com/api/v1/clusters' -H "Authorization: Bearer ${TOKEN} " -H 'accept: application/json' -H 'Content-Type: application/json' -d '@data.json'
A successful cluster creation will return a json array with cluster details.
[{"name":"do-cluster-1","instance":3,"currentPrimary":"p-c6b6543ehcerrn8s6rg-1", "ehcerrn8s6rg-r","phase":"Creating a new replica","phaseReason","Creating replica",0dc168873bea9bddacc45e63fdea17bdc28a7c8b450","postgresConfig":"","maintenanceWi
Listing clusters
Listing clusters can be accomplished by using various parameters defined in the data.json file used for cluster creation. These parameters enable result filtering, with examples including name, provider, pgtype, and pgVersion.
Below is the command used to list clusters based on name, provider, pgtype, and pgVersion, along with the parameter used for sorting.
curl -X 'GET' 'https://portal.biganimal.com/api/v1/clusters?name=DO&provider=azure&pgType=epas&pgVersion=13&sort=%2Bname' -H 'accept: application/json' -H "Authorization: Bearer ${TOKEN}"
Deleting clusters
Deleting clusters is accomplished by providing the pgID, which can be retrieved by listing the cluster details and seeking the “pgId” value.
curl -X 'DELETE' 'https://portal.biganimal.com/api/v1/clusters/<pgId>' -H 'accept: *.*' -H "Authorization: Bearer ${TOKEN}"
Continuous integration/continuous delivery
Continuous Integration and Continuous Delivery (CI/CD) are essential for agile development. Two top tools for managing CI/CD with Postgres are Flyway and Liquibase. These tools can help manage database schema and reference data changes, promoting these changes from development to production through various stages in the release pipeline. They also offer features like dry-runs and rollbacks to ensure everything runs smoothly.
The following example shows the use of Liquibase with EDB’s Postgres® AI Cloud Service DBaaS.
Creating a project in Liquibase hub
In this example, we'll create a demo Liquibase project to track all data changes using the registerChangeLog subcommand. The first step is to set up the Liquibase project. Here’s how to create a Liquibase project:
- Navigate and log in to Liquibase Hub – a registered account should be created
- Click Projects Link
- Click Create Project Link
- Type Project
- Name
- Description
- Click Create Project Button
Downloading your Liquibase jar file
- Go to the Liquibase download page
- Review the system requirements
- Choose the desired method of installation
- Download the file that matches your environment and specific use case
Creating your liquibase.properties file
A liquibase.properties file contains the information needed to authenticate and connect using the JCBC driver. Before filling out this file, it is recommended to have all the required details ready.
The classpath location should indicate the path for its location if the Liquibase jar file is not located alongside the liquibase.properties file.
changeLogFile: dbchangelog.xml
url: jdbc:postgresql://<db_url>:<port>/<database>
username: <username>
password: <password>
classpath: postgresql-42.3.0.jar
liquibaseProLicenseKey: <licensekey>
Generating changelog
The subcommand to generate the changelog file is:
liquibase generateChangeLog
The generateChangeLog subcommand creates a script file that lists all objects in the database as a changeset (a group of changes) when you run it. This file is useful for fully rolling back the database in case of a major failure.
When the generateChangeLog runs successfully, it creates a file named according to the changeLogFile parameter in 'liquibase.properties' and shows the following output in the command line:
Liquibase command 'generateChangeLog' was executed successfully.
Creating the changeset file
Create a template file to outline all the data changes to be applied to the database. Remember, with Liquibase Pro Edition, you can roll back changes for each changeset, as shown in the example format. Each data change should follow this structure:
<changeSet author="<user> (generated)" id="<id>-<numberofchange>">
<insert catalogName="<databasename>" dbms="postgresql" schemaName="public" tableName="departments">
<column name="deptno" value="40"/>
<column name="dname" value="Sales"/>
<column name="location" value="USA"/>
</insert>
<rollback> DELETE FROM public.departments WHERE deptno=40; </rollback>
</changeSet>
ChangeLog files are supported in multiple formats: SQL, XML, JSON, YAML and other formats.
Update dbchangelog.xml file
Before you run the registerChangeLog subcommand, make sure to complete the following steps to update dbchangelog.xml:
- Create a copy of the generated dbchangelog.xml file
- Remove all the of the <changeSet> tags, except the first one
- Copy all of the data changes to be applied after the first <changeSet> tag
- Copy ID from generated ID in generated dbchangelog.xml
- Apply ID along with dash and number of data change and remove the first <changeSet> tag
<id>-<numberofchange>
Registering the changelog
The Liquibase subcommand to register the changeLog is:
liquibase registerChangeLog
A prompt will appear listing the Liquibase projects for you to choose from, allowing you to select the desired project to attach.
Upon successful selection, a confirmation message will appear in the terminal output:
Liquibase command 'registerChangeLog' was executed successfully.
Pushing changes to the DB
At this point, we are ready to push our data changes to the database with the Liquibase subcommand:
liquibase update
A successful update will look like this:
Liquibase command 'update' was executed successfully.
Rollback changes to the DB
Rolling back specific changes is only available in the Liquibase Pro version. To roll back to a particular changeset, you need its changeset number, as shown here:
liquibase rollbackCount <count>
A successful rollbackCount message will be displayed:
Liquibase command 'rollbackCount' was executed successfully.
Infrastructure as Code
Infrastructure as Code (IaC) is a way to manage and set up computer data centers using files that machines can read, rather than relying on physical hardware, or interactive configuration tools like Wikipedia. In a DBaaS environment, the CDO provides compute, storage, and networking, but this infrastructure still needs to be connected and configured form a complete data management solution. There are various IaC tools like Ansible and Terraform that work well with most DBaaS environments. For example, you can find AWS Terraform modules on Terraform.io, and Ansible collections for RDS are available on the Ansible Galaxy website.
Security and Compliance
Security Postgres in the cloud
Securing Postgres in the cloud, especially with DBaaS, is much simpler than securing it on premises or in an IaaS environment. This is because CDO takes care of things like physical access control, on-disk encryption, network access, and identity management. Check out Dave Page’s great blog post on best practices and tips for security hardening: How to Secure PostgreSQL: Security Hardening Best Practices and Tips to learn more about securing Postgres for on-premises or IaaS deployment.
When setting up any security configuration, it's important to follow the principle of least privilege. This means giving users only the access they need to make the system work, and nothing extra.
Connecting
There are two main ways to connect to a Postgres server: using a Unix Domain Socket or a TCP/IP Socket. In a DBaaS environment, only TCP/IP connections are allowed.
In a DBaaS setup, CDO does not give access to the Postgres configuration file (hba.conf) that controls host-based access. Instead, the CDO uses IP address filtering to restrict network access from public networks and utilizes Virtual Private Clouds (VPCs) to keep the database isolated from any public internet access. Most cloud providers suggest using built-in firewalls, which simplify management by allowing users to create reusable rule sets for multiple servers. These firewalls can be managed via web interfaces, command line tools, and REST APIs.
Many cloud providers also provide Virtual Private Clouds (VPCs), where multiple servers can exist within a private network. This configuration is often the default choice, making it easier to deploy multi-tiered systems on the public cloud while keeping non-public tiers separate from the internet. Using multiple subnets in a VPC allows for further segregation, such as placing public servers in a "DMZ" subnet and limiting access to database servers located in a private subnet without direct internet connection.
CDOs also offer a range of approaches to establish secure and encrypted connections. For example, Postgres® AI Cloud Service offers guidelines for securely connecting to your database cluster.
Transport Encryption
If traffic is moving to the database server over the network, it's important to encrypt that traffic for security. Postgres uses OpenSSL to secure connections with TLS (previously known as SSL).
While on-premises deployments require Postgres configuration to encrypt the connection, DBaaS providers make it much easier by offering predefined Certification Authorities (CA) and step-by-step guidance for securely connecting to the database cluster.
Authentication
In on-premises or IaaS deployments of Postgres, the pg_hba.conf file is used to select authentication methods (like trust, peer, md5, scram, LDAP, Kerberos, and TLS) and set parameters like the authentication timeout period. However, this is simplified in a DBaaS. The CDO provides integration with the cloud’s identity and access methods. For example, Postgres® AI Cloud Service leverages Azure’s ActiveDirectory for single sign-on.
Roles
Very old – almost prehistoric – versions of PostgreSQL allowed users and user groups to group accounts together. Starting with PostgreSQL 8.1, this system was replaced by the SQL standard-compliant roles system.
A role can include other roles as members or have roles that it includes. This process is often called "granting" a role to another role. Roles come with various attributes, some of which allow them to function like user accounts for logging into the database server. Below is an example of how to grant a role to another role:
GRANT pg_monitor TO nagios;
This makes the Nagios role a member of pg_monitor, which gives Nagios access to extra features for superusers and pg_monitor members.
Role attributes
Roles have several established attributes that can be set:
- LOGIN: Can this role be used to log in to the database server?
- SUPERUSER: Is this role a superuser?
- CREATEDB: Can this role create databases?
- CREATEROLE: Can this role create new roles?
- REPLICATION: Can this role initiate streaming replication?
- PASSWORD: Does this role require a password?
- BYPASSRLS: Can this role bypass Row Level Security checks?
- VALID UNTIL: An optional timestamp after which the password will no longer be valid.
Roles with the SUPERUSER flag automatically bypass all permission checks, except for the right to log in1.
Several less commonly used role attributes can also be configured. For further details, please refer to the Postgres documentation.
Password complexity
PostgreSQL, unlike EDB Postgres Advanced Server, doesn’t enforce password complexity rules by default. It does offer a way to add a module for checking password complexity, but this won’t work if a user changes their password using a pre-hashed string.
Password profiles
EDB Postgres Advanced Server has a password profile feature that works with password authentication (note: don’t use this option as it sends passwords in plain text), md5, and scram-sha-256 methods set in pg_hba.conf. Superusers can set up password profiles and apply them to multiple roles. A profile allows you to define the following options:
- FAILED_LOGIN_ATTEMPTS: Refers to the number of failed login attempts allowed before the account is locked in the duration specified in the PASSWORD_LOCK_TIME parameter
- PASSWORD_LIFE_TIME: Number of days a password can be used before the user is required to change it
- PASSWORD_GRACE_TIME: The length of grace period after a password expires, during which the user can still connect but must change their expired password to execute any commands
- PASSWORD_REUSE_TIME: Number of days a user must wait before they can reuse a password
- PASSWORD_REUSE_MAX: Number of password changes that must happen before a password can be reused
If a user changes their password by providing a new one in a pre-hashed format, it's impossible to check its complexity using the PASSWORD_VERIFY_FUNCTION option or manage reuse with the PASSWORD_REUSE_MAX option. To address this, you can set the PASSWORD_ALLOW_HASHED option to false in the password profile.
If you're using EDB Postgres Advanced Server, consider implementing password profiles to help ensure your users keep strong and regularly updated passwords.
SET ROLE
The SET ROLE SQL command allows a user to change their session's user identifier to any role they belong to. This can be used to add or limit privileges for the session and can be reset with the RESET ROLE command, making SET ROLE less suitable for multi-tenancy solutions. It’s similar to using the command sudo su - <user> on a Unix-like system, allowing you to execute SQL commands as that other user.
By default, when a role is a member of another role, it automatically inherits the privileges of that role. To use SET ROLE effectively, it’s recommended to use the NOINHERIT keyword when creating the role, so that privileges must be explicitly gained through SET ROLE when needed.
There is also a SET SESSION AUTHORIZATION command available only to superusers2. The key difference is that SET ROLE changes the current_user value but not the session_user, while SET SESSION AUTHORIZATION changes both. This means that after executing SET SESSION AUTHORIZATION, any following SET ROLE commands will only be allowed if the session_user could perform, regardless of the original session_user’s superuser status. This feature helps superusers imitate other users more accurately.
Consider using SET ROLE to let users temporarily elevate their privileges only when necessary for potentially risky tasks.
Monitoring roles
Postgres has several built-in monitoring roles that let you access features previously only available to superusers. These roles help you give specific privileges to users monitoring the system, without needing to give them full superuser access:
The pg_monitor role includes the following permissions:
- pg_read_all_settings: Allows reading all configuration variables, including those typically visible only to superusers.
- pg_read_all_stats: Grants access to all pg_stat_* views and various statistics-related extensions, again including those usually restricted to superusers.
- pg_stat_scan_tables: Enables execution of monitoring functions that may acquire ACCESS SHARE locks on tables, potentially for extended periods.
Use the pg_monitor role to provide elevated privileges to roles responsible for monitoring your database servers. This approach helps avoid the need for granting superuser access. Always ensure that implemented roles have only the minimum privileges needed to perform their functions.
Data Access Control
Access Control Lists (ACLs) are like cryptic strings attached to objects in Postgres, such as tables, functions, views, and columns. They list the privileges (like select, insert, execute) that different roles can use. Each privilege can have an optional flag (*) that shows if a role can give that privilege to others. ACLs also include the name of the role that granted the privileges.
For example, an ACL for a table created by Joe might look like this:
joe=arwdDxt/joe =r/joe sales_team=arw/joe
The second section indicates that Joe has granted read access to PUBLIC (a special pseudo-role meaning everyone), while the third section shows that the Sales Team has been given INSERT, SELECT, and UPDATE privileges, also by Joe.
Privilege flags in ACLs can vary significantly depending on the type of object, so be sure to review the documentation for more details.
Understanding how ACLs are structured in Postgres is important, especially if you prefer using command line tools, which display them in an internal format. Graphical tools like pgAdmin will present the ACL in a more user-friendly visual format.
A well-designed system should use roles alongside ACLs to secure the schema and data in the database. It’s best practice for the schema (such as tables and other objects) to be owned by a non-superuser role, which the application shouldn't use to connect to the database or grant privileges to other roles. Create group roles that match the permissions needed within your application and grant those roles to login roles as necessary. Generally, it’s not advisable to grant privileges directly to login roles used by end users, as this can quickly become difficult to manage.
Take the time to fully understand the privileges necessary for your system so that users and applications can perform their tasks effectively. Limit privileges to only those required, separate schema ownership from data, and make use of group roles to simplify privilege management for individual login roles.
GRANT and REVOKE
ACLs (Access Control Lists) in Postgres are managed using the GRANT and REVOKE SQL commands. When an object is created, typically only the owner has the rights to use or interact with it. However, there are exceptions: for instance, PUBLIC is granted EXECUTE permission on functions and procedures, CONNECT, and TEMPORARY permission on databases and USAGE permission on languages, data types, and domains. These privileges can be revoked if needed.
The ability to modify or delete an object is always reserved for the owner and superusers. You can reassign object ownership using the ALTER SQL command.
Default privileges can be changed with the ALTER DEFAULT PRIVILEGES command for certain object types. This allows you to configure the system so that specific privileges are automatically granted to roles for any new objects created. For example, Joe could issue a command to grant the Sales Team insert, select, and update privileges on any new tables (but not on existing ones, which may need manual updates).
ALTER DEFAULT PRIVILEGES
GRANT INSERT, SELECT, UPDATE
ON TABLES
TO sales_team;
Assuming when a new object is created without the necessary privileges in the ACL, we can use the GRANT and REVOKE commands to configure the ACL as needed. For example, Joe could use the following SQL command to grant the Sales Team access to the orders table:
GRANT INSERT, SELECT, UPDATE ON orders TO sales_team;
To revoke automatically granted privileges or to adjust previously granted privileges in response to meet changing business needs, we can use the REVOKE SQL command. For example:
REVOKE UPDATE ON orders FROM sales_team;
Assuming the Sales Team previously had INSERT, SELECT, and UPDATE privileges, removing the UPDATE privilege will allow them to view and add orders, but not modify them.
It’s worth noting that using ACLs on columns can sometimes lock users out. This happens because a wildcard in a SELECT * FROM query includes all columns, even those the user doesn't have access to, leading to an access denied message. To avoid this, users should explicitly list the columns they are allowed to SELECT.
Once group roles are created to organize logged-in users, use the GRANT and REVOKE SQL commands to assign the minimum necessary privileges to these group roles. Use default privileges where appropriate to save time but be careful not to grant excessive privileges in the future. Use GRANT to give privileges to the necessary login roles by making them members of the group roles.
Row Level Security (RLS)
Row Level Security (RLS) is a feature in Postgres that lets you create policies to restrict which rows in a table visible to certain roles. Before we dive into how to set up an RLS policy, here are two important points to keep in mind:
- 1. Superusers and roles with the BYPASSRLS attribute can always bypass RLS policies, as can table owners unless they enforce the policy on themselves.
- 2. Users might still infer the existence of a row through "covert channels." For example, if there's a unique constraint on a field such as a social security number, a user may not insert a duplicate row, meaning they can't access it, but they can deduce that a record with that social security number exists.
By default, RLS is disabled on tables in Postgres. You can enable it using the command ALTER TABLE...ENABLE ROW LEVEL SECURITY, which sets a restrictive policy that prevents access to all data until additional policies are created.
The policy itself includes a name, the table it applies to, an optional role and a USING clause that defines how to identify matching or allowed rows. For instance, we might restrict access to orders to only the Sales Team member who created them:
CREATE POLICY sales_team_policy ON orders TO sales_team
USING (sales_person = current_user);
We can also specify which operations the policy applies to. The following example would allow all members of the Sales Team to select any orders, but only the original salesperson would be able to update or delete an order:
CREATE POLICY sales_team_select_policy ON users FOR SELECT USING (true); CREATE POLICY sales_team_modify_policy ON users USING (sales_person = current_user);
By default, permissive policies are used, meaning that when multiple policies apply, they are combined using a Boolean OR. You can also use restrictive policies, where a Boolean AND is applied to determine if access to a row meets the combined policies.
Setting up RLS policies can take some effort, and you must also consider index design. However, in certain situations like a medical records system, it's crucial to restrict access to patient records solely to the medical staff responsible for the patient's care due to legal requirements.
When designing your tables, consider the legal and ethical obligations to limit access to specific rows. Implement RLS policies to comply with these requirements where necessary and be cautious to minimize covert channels by avoiding the use of sensitive data in your constraints.
Views
Views are valuable for simplifying commonly executed queries by allowing them to be treated like tables. They also help prevent unauthorized access to data by ensuring that users with certain roles cannot select from the underlying tables but must access the data through the view instead. A classic example is found in Postgres with the pg_catalog.pg_authid table, which holds a row for each role in the database, including a column for the hashed password if set. Since this hash is sensitive information, only the superuser has SELECT privileges on this table.
Instead, a view named pg_catalog.pg_roles is available for all users. When accessed, the password appears as ********, making it more convenient than applying access controls directly on the password column in the underlying table, which would trigger a permissions error if someone tried to use SELECT * FROM.
For updateable views, a CHECK OPTION can be defined. When omitted, the view allows users to insert or update records that wouldn’t be visible through it. However, if specified, the insert or update is permitted only if the row is visible in the view. The LOCAL CHECK OPTION checks visibility against only the conditions of the view directly used, while the CASCADED CHECK OPTION (the default) checks against both the view and any underlying views.
Consider using views over secured tables to grant access to a limited set of columns in the underlying table for appropriate roles.
Security barriers
Using views to restrict access to a column is common, but many also use them to limit access to certain rows. While this can be valuable, it's important to be aware of one tricky side effect: the Postgres optimizer might inadvertently expose hidden data!
This isn't a bug; it's how the system is designed. When a user runs a query against a view and includes a low-cost function in that query, the optimizer might execute the query on every row in the underlying data before applying selectivity clauses in the view. This can allow the function to access restricted data. Robert Haas from EDB explains this well in a blog post.
To solve this problem, we use a security barrier. This is an option set when creating the view that tells Postgres to always apply the view qualifiers first, ensuring the function never accesses hidden rows.
There’s also the LEAKPROOF parameter for functions. Only superusers can use this when creating a function, and it certifies that the function won't leak any information other than its intended return value. This helps Postgres better optimize queries involving functions and security barrier views, confident that no extra information will be leaked.
There’s also the LEAKPROOF parameter for functions. Only superusers can use this when creating a function, and it certifies that the function won't leak any information other than its intended return value. This helps Postgres better optimize queries involving functions and security barrier views, confident that no extra information will be leaked.
Be careful when using views to hide rows, and make sure they are marked as security barriers to prevent data leaks. Also, consider whether RLS might be a better option for restricting access to specific rows.
SECURITY DEFINER functions
By default, functions and procedures in Postgres are known as SECURITY INVOKER functions. This means they run with the privileges of the role that calls them.
If you use the SECURITY DEFINER option when creating a function, it will execute with the privileges of its owner instead of the calling role. This is similar to the setuid bit in Unix, which allows a program to run with the owner's permissions rather than those of the user who executed it.
This ability can be helpful in various situations. For example, a function might be called by a trigger on a table to log records in an audit log that all login and group roles are restricted from accessing. However, it’s crucial to think carefully about the implications of using SECURITY DEFINER functions. They should be kept as simple as possible, performing only one task and avoiding parameters that could allow for unintended use.
Consider using SECURITY DEFINER functions to provide specific capabilities to roles that cannot perform those tasks directly. Be mindful of potential misuse and ensure these functions are limited to their intended purposes.
Data redaction
Data redaction is a technique that hides specific pieces of sensitive information from users by altering the displayed value. While this can be partially achieved with views in PostgreSQL as described above, EDB Postgres Advanced Server offers native data redaction capabilities.
In EPAS, redaction is implemented through data redaction policies applied to tables. These policies define which columns to modify, the conditions for applying the policy, the function used for redaction, the scope and any exceptions. Refer to the documentation link above on how to create these policies.
When using EDB Postgres Advanced Server to handle sensitive data like credit card numbers, it's advisable to use data redaction policies to change the displayed data to a redacted format, such as "XXXX XXXX XXXX XXXX 8397," to prevent unnecessary access to sensitive information.
Encryption
In a DBaaS implementation, the CDO is responsible for encrypting data on disk and managing backups as part of the shared responsibility model. The CDO also ensures that data is encrypted while being transmitted over the network.
Some CDOs, like AWS RDS, even support additional encryption tools, such as pgcrypto, for encrypting specific columns.
pgcrypto
pgcrypto is a standard extension for PostgreSQL and EPAS that offers SQL functions for encryption and hashing. Use the pgcrypto extension in your databases when you need to hash or encrypt specific pieces of data to comply with regulations or similar needs.
Hashing in pgcrypto
Hashing is a method for generating cryptographically secure representation of data, usually with a fixed length that depends on the algorithm used. It's important to note that hashing is non-reversible, meaning you can't retrieve the original data from the hash value. However, since each hash is unique to its original data, it can serve as a checksum to check if the data has changed or to verify that a user-provided value matches the original.
Hashing is commonly used to store passwords and other sensitive information that needs to be verified but not returned. For example, we can use pgcrypto to hash a password that the user will use in the future:
INSERT INTO users
(username, email,password)
VALUES
('pgsnake', 'dave.page@enterprisedb.com', crypt( 'new password', gen_salt('md5')));
To verify this password later, SELECT the user record from the table:
SELECT
*
FROM
users
WHERE
username = 'pgsnake' AND
password = crypt( 'entered password', password)
If a record is returned, the password was entered correctly – otherwise, it was incorrect.
It's crucial to remember that including passwords in SQL commands, as shown above, can lead to them being written to log files on the database server. Network communications may also expose these commands if not secured with encryption.
Never store user passwords in plain text or in a form that can be easily reversed in the database. Avoid using reversible encryption unless absolutely necessary for the application's functionality, like in a password manager application. Whenever possible, use non-reversible hashing for application passwords and other information that needs to be verified but not retrieved.
Encryption in pgcrypto
pgcrypto offers features for encrypting data, which is helpful when storing information that needs to be securely retrieved later. It provides both "raw" encryption/decryption functions and PGP functions. It's recommended to use PGP functions instead of the raw ones, as the latter rely on a user-provided key for encryption, lack integrity checking, require users to manage all encryption parameters, and only work with bytea data rather than text.
Symmetric key encryption is the simplest option since it doesn't need a PGP key. For instance, you can encrypt and decrypt data using this straightforward SQL command, where the inner function encrypts the data and the outer one decrypts it:
SELECT pgp_sym_decrypt(
pgp_sym_encrypt('Hi There', 'password'),
'password') ;
Please note that the cipher text returned by the encryption function and passed to the decryption function is in bytea format.
To utilize public key functionality, you first need a key. You can generate one using GnuPG with a command like this:
gpg --gen-key
The PostgreSQL documentation recommends using "DSA and Elgamal" as the preferred key type. After generating the key, you'll need to export it:
# List the keys in the keyring: gpg --list-secret-keys # Export a public key in ASCII armor format: gpg -a --export KEYID > public.key # Export a secret key in ASCII armor format: gpg -a --export-secret-keys KEYID > secret.key
The public key can now be used to encrypt data with the SQL encryption function:
pgp_pub_encrypt('<data>', '<public key>')
Similarly, the data can later be decrypted using:
pgp_pub_decrypt(<cipher text>, '<private key>')
Again, note that the cipher text is in bytea format.
Use encryption to store sensitive data in the database that you may need to access later. Think carefully about whether symmetric or public key encryption is best for your situation. Public keys are usually better for sharing data with others since there's no need for a shared secret, while symmetric keys are often more suitable for self-contained applications.
Compliance considerations
General considerations: SOC 1, SOC 2, SOC 3, and ISO 27001
System and Organization Control (SOC) reports are administered by the Auditing Standards Board (ASB) of the American Institute of Certified Public Accountants (AICPA). These reports are audits that highlight key controls in data centers and cloud operations. SOC 1 focuses on internal financial controls, while SOC 2 emphasizes security, availability, confidentiality, privacy, and processing integrity.
- Security: How is information protected from unauthorized access?
- Availability: Is the technology designed to be reliably available?
- Confidentiality: Is the confidentiality of customer data monitored and ensured?
- Privacy: Is customer data privacy guaranteed?
- Processing Integrity: Is the system processing complete, valid, accurate, timely, and authorized to meet the entity’s objectives?
The specifics of the SOC 2 controls are outlined here.
SOC 2 reports are more detailed than SOC 3 reports and are typically shared with customers under a non-disclosure agreement (NDA). While SOC 3 reports cover the same controls as SOC 2, they provide less detail and are meant for general distribution and marketing. SOC 3 is also known as the “Trust Services Criteria for General Use Report.”
ISO/IEC 27001 focuses on managing information security risks by developing and implementing an Information Security Management System, making it comparable in scope to SOC 2. SOC compliance is more common in North America, whereas ISO/IEC 27001 tends to be more popular outside of North America.
SOC 2 is a critical compliance for many cloud services. It’s advisable to choose a service that either has SOC 2 compliance or is ISO/IEC 27001 certified.
Industry or use case-specific considerations: GDPR DPA, HIPAA BAA, PCI DSS, FedRAMP
The Federal Risk and Authorization Management Program (FedRAMP) is a security framework designed to facilitate the adoption of cloud services by the federal government. Currently, Postgres DBaaS options are available for “FedRAMP High” or “FedRAMP Moderate” levels.
General Data Protection Regulation (GDPR) Data Processing Agreements (DPA) are necessary for CDOs to process or store personally identifiable information (PII) for data processors or controllers that need to comply with GDPR.
Health Insurance Portability and Accountability Act (HIPAA) Business Associate Agreements (BAA) must be established for any provider accessing protected health information (PHI). A CDO managing PHI on its DBaaS must have a HIPAA BAA in place.
Additionally, a Payment Card Industry Data Security Standard (PCI DSS) certification is required for a CDO to handle payment transactions as part of their DBaaS.
Is it safe to use Postgres in the cloud?
Yes, provided that:
- The CDO possesses SOC 2, SOC 3, or ISO/IEC 27001 certification
- The CDO has extensive Postgres knowledge available to the customer for planning, support, and advice
- The customer understands the shared responsibility model
- Security and compliance guidelines are adhered to
- The CDO has a proven track record of maintaining software, hardware, storage, and networking infrastructure to industry standards
Reputable and experienced Postgres DBaaS providers, such as AWS, Azure, Crunchy Data, and EDB, are often more likely to deliver secure and reliable Postgres services compared to many in-house database operations. Here are a few reasons why:
- The CDO’s SRE team continuously monitors database operations to uphold SLAs
- Encryption at rest and in transit is integrated into the infrastructure
- SOC 2 and ISO/IEC 27001 standards are established and audited regularly, which is not always the case with in-house database teams
How to Choose the Best DBaaS for Your Needs?
A successful DBaaS combines two essential operational competencies:
- Provisioning and managing the infrastructure for CPU, storage, and networking
- Provisioning and managing the database software, such as Postgres
It’s essential to choose a vendor who excels in both areas and has a strong track record. Here are some key questions to ask:
- What is the SLA track record for the overall service in each region where it's offered? CDOs that can't provide this record likely don't manage it effectively and should be avoided.
- Does the service utilize the latest technology to ensure maximum performance in compute, storage, and networking? Since databases are highly demanding in terms of I/O and compute, it’s crucial to select a vendor committed to updating their technology for optimal performance.
- Does the vendor have the in-depth software expertise to provide support, maintenance, and bug fixes? It's tempting to use open-source software on a cloud platform, but without the right expertise, it can be hard to meet enterprise standards. For example, with Postgres, the vendor should have enough staff who are active Postgres contributors or who have participated in the latest version of Postgres. Avoid CDOs that lack a direct support agreement with the software vendor or are not engaged with the open-source community.
- How easy and cost-effective is it to move data in and out of the DBaaS? Some DBaaS options make it easy to upload data but can be very costly and complicated to extract it. This can create a “Hotel California” scenario, where you can't easily leave the DBaaS due to high downtime or costs.
- Does the vendor have a clear service roadmap and a strong history of delivering on it? If technology stagnates and there are high costs for data extraction, it can lead to a zombie database that hinder the business.
- Are all pricing structures clear? Some vendors may charge heavily for input/output (IO) without offering tools to manage or limit those costs, which can result in unexpected expenses.
These guidelines can help you identify a suitable CDO, and we recommend regularly reassessing a CDO based on these criteria.
Migrating to the Cloud
The following section is based on a blog post by Bruce Momjian that outlines the general process of migrating databases to the cloud. View the original post for a complete look at the full article.
What is being moved?
Databases rarely operate in isolation. Typically, there are one or more applications that interact with the database, along with middleware, monitoring, backup, failover, and other tools. This section will focus on data movement and replacing data management tools, but it’s important to recognize that a database cannot be viewed in isolation. Most databases are closely connected to the broader enterprise.
How is it currently hosted?
As mentioned earlier, the database can be hosted on bare metal, virtual machines, or Kubernetes/containers. Why is this important? You’ll want to replicate your current setup in the cloud as closely as possible. If you use Kubernetes/containers, you’ve already abstracted the database dependencies, allowing you to potentially move Kubernetes pods directly to cloud infrastructure. If you’re using virtual machines, you likely have tools to recreate these in the cloud. For bare metal, the database may be closely tied to the operating system, sometimes in ways the staff doesn’t fully understand. During migration, it’s essential to identify all database dependencies to ensure they can be replicated in the cloud environment.
Migrating databases to the cloud: the moving process
Moving databases is never an easy task. First, they hold a lot of data – often terabytes – which can make the transfer slow. Second, databases are usually part of critical enterprise infrastructure, so minimizing downtime is essential.
The process of moving a database consists of two main parts: transferring the data and handling everything else, like binaries, configurations, and extensions.
Postgres provides several methods for moving data to another system:
- File system snapshot
- Logical dump (e.g., pg_dumpall)
- Binary replication
- Logical replication
The first method requires the server to be down, while the last two methods allow for continuous synchronization, which helps reduce downtime. If you choose the first or third options, you'll need to ensure that the same versions of Postgres and the operating system are used in the cloud. The second option allows for moving between different software versions, but it may take longer to restore since SQL statements need to be executed to set up the schema and data. Moving the other components is mostly a straightforward mechanical process.
Hardware considerations
Cloud providers offer a wide range of storage and compute options, which is a significant advantage over on-premises deployments where adjusting compute or storage can be challenging or even impossible. Many cloud vendors allow for annual or multi-year pre-purchases of compute capacity. We recommend entering into these agreements only after the cloud deployment has stabilized and the capacity requirements are clearly understood.
DBaaS Is Not an All-or-Nothing Solution
Most of our customers use hybrid approaches for their database deployments. While DBaaS is by far the most popular option, solutions based on containers and virtual machines (VMs) remain important – especially in industries that aren't ready to fully commit to the public cloud or in regions with strict data locality requirements that necessitate on-premises deployments.
Increasingly, enterprises are choosing to program with the Postgres API, enabling them to deploy across virtually every cloud, all major operating systems and key hardware platforms. Postgres supports bare metal, VM-based, container-based, and DBaaS-based deployments through a consistent API, making it the ideal database platform for hybrid deployment strategies.
Conclusion
DBaaS, or cloud databases, is quickly becoming the standard for enterprise databases. Many vendors offer these services, from cloud service providers managing multiple databases to specialty vendors focused on database development. When choosing a DBaaS vendor, customers should be cautious, as databases are critical to operations, not just commodities.
Security is a major consideration. Industry standards like AIPCA’s SOC and IEC/ISO 27000 are valuable tools for assessing and ensuring cloud security, especially when combined with strong built-in database security features. Often, trusting a reputable cloud provider with your databases can be more secure than managing them in-house.
DBaaS adoption, particularly for Postgres, is rapidly increasing. We highly recommend considering Postgres database services from specialty vendors like Crunchy Data or EDB. It's important to note that not all databases will move to the cloud, and many enterprises will require consistent Postgres solutions across on-premises, private cloud, and public cloud environments.
Acronyms and Abbreviations
AICPA:
American Institute of Certified Public Accountants
AKS:
Azure Kubernetes Service
API:
Application Programming Interface
ASB:
Auditing Standards Board
AWS:
Amazon Web Services
CDO:
Cloud Database Operator
CLI:
Command Line Interface
DBaaS:
Database as a Service
DBPaaS:
Database Platform as a Service
EC2:
Elastic Compute Cloud
EKS:
Elastic Kubernetes Service
FedRAMP:
Federal Risk and Authorization Management Program
GCE:
Google Compute Engine
GDPR:
General Data Protection Regulation
GDPR DPA:
GDPR Data Processing Agreement
GUI:
Graphical User Interface
HIPAA:
Health Insurance Portability and Accountability Act
HIPAA BAA:
HIPAA Business Associate Agreement
IaaS:
Infrastructure as a Service
IaC:
Infrastructure as Code
IEC:
International Electrotechnical Commission
ISO:
International Standards Organization
NIST:
National Institute of Standards and Technology
OS:
Operating System
PaaS:
Platform as a Service
PCI:
Payment Card Industry
PCI DSS:
Payment Card Industry Data Security Standard
PHI:
Personal Health Information
PII:
Personally Identifiable Information
RDS:
Relational Database Service
SaaS:
Software as a Service
SOC:
System and Organization Control
SLA:
Service Level Availability
SRE:
Service Reliability Engineer
VPC:
Virtual Private Cloud
¹ DBaaS implementations commonly do not provide users access to the SUPERUSER role, as this would interfere with the shared responsibility model.
² DBaaS may not provide access to the superuser role.
DBaaS stands for Database as a Service, a cloud-based service model that enables users and organizations to create, manage, maintain, and query databases in the cloud.
DBaaS operates by providing a cloud-based environment where database instances are hosted and managed by the service provider, allowing users to focus on development and data management rather than infrastructure setup.
DBaaS offers scalability, automated backups, high availability, and reduced management overhead, making it easier for enterprises to handle their database needs without extensive in-house resources.
DBaaS can host various types of databases, including relational databases like PostgreSQL and MySQL, as well as non-relational databases such as MongoDB and Cassandra.
Yes, many DBaaS providers offer capabilities to access databases across multiple cloud environments, facilitating hybrid deployment strategies.
Security measures typically include encryption, access controls, compliance with industry standards, and regular security audits conducted by the service provider.
Cloud migration can enhance database performance through scalable resources, but it requires careful planning to ensure that network latency and storage configurations are optimized.
Regular backups are crucial in DBaaS to safeguard data against loss or corruption, with many service providers offering automated backup solutions.
Potential limitations may include reduced control over the hardware and software stack, vendor lock-in, and potential performance constraints based on the provider's infrastructure.
While most DBaaS offerings come with predefined configurations, many providers allow for some level of customization regarding database parameters and resource allocations.
Costs can vary widely depending on the provider, based on factors such as storage, compute resources, and additional features. It’s essential to assess your needs and the provider's pricing model.
Yes, data migration from on-premises databases to DBaaS is a common practice and typically involves data transfer methods such as logical dumps or replication.
Most DBaaS platforms provide integrated management tools for monitoring, administration, and performance tuning, allowing users to effectively manage their databases.
To ensure compliance, select a DBaaS provider that adheres to relevant data protection regulations and offers features that support compliance auditing and reporting.
Evaluate factors like security features, performance guarantees, support services, pricing, scalability, and the provider's reputation in the industry before making a decision.