Lakehouse Sync

Suggest edits

Overview

Performing a Lakehouse Sync is a way to capture information from a transactional database at a point in time and sync that information to a Managed Store Location (MSL).

The Lakehouse sync process organizes the transactional database data into Lakehouse tables stored in the MSL. This process allows the data to be queried by a Lakehouse node, which is optimized for higher-performance queries using a vectorized query engine designed for Lakehouse tables.

Performing a Lakehouse Sync

Prerequisites

  • a Postgres cluster hosted and managed by EDB Postgres AI® Cloud Service
  1. Go to the EDB Postgres AI Console.

  2. From the landing page, select the project with the database instance you want to sync. If it's not shown on the landing page, select the View Projects link in the Projects section and select your project from there.

  3. Select the Migrate dropdown in the left navigation bar and then select Migrations.

  4. Select the Create New Migration button.

Define Lakehouse Sync

  1. Give the sync a Name, then select a Source Cluster and the Database you want to sync.

  2. If you have already created an MSL you want to use, select that MSL from the list of available MSLs and move on to Selecting Tables below. If not, select the Create New Managed Storage Location button to open the Add Managed Storage Location dialog.

  3. Select the AWS region for the new MSL.

  4. Set a location prefix in the form near the bottom of the Add Managed Storage Location dialog to complete the definition of the MSL. A location prefix is a unique name used to identify any resources and assets associated with the MSL.

List of MSLs

  1. Select the Create Managed Storage Location button.

Selecting Tables

  1. Select the Tables tab next to the Get Started tab near the top of the page and select which tables and columns you wish to be included in the migration.

Start Lakehouse Sync

  1. Select the Start Lakehouse Sync button.

  2. If successful, you'll see your Lakehouse sync with the 'Creating' status under 'MOST RECENT' migrations on the Migrations page. The time taken to perform a sync can depend upon how much data is being synchronized and may take several hours.

Note

The first sync in a project will take a couple of hours due to the provisioning of the required infrastructure.


Could this page be better? Report a problem or suggest an addition!