Pipelines - Preparers - Usage

Creating a new preparer

The source data Preparer can come from a Postgres table or a PGFS volume. Given the different nature of the data sources and the options required for each, we have different functions to create them.

Note

The behavior of the data preparation operation for the preparer can be customized with different options. The API for these options is identical between the primitives and the preparer, so you can prototype options with the aidb.chunk_text() primitive for use with a scalable Preparer that performs the ChunkText operation. Learn more in the primitives section.

Preparer for a table data source

The aidb.create_preparer_for_table function is used to create a preparer for a table data source. This is the function signature:

aidb.create_preparer_for_table(
-------------------------------------------------------------------------------
    name                        TEXT,
    operation                   aidb.DataPreparationOperation,
    source_table                TEXT,
    source_data_column          TEXT,
    destination_table           TEXT,
    destination_data_column     TEXT,
    source_key_column           TEXT DEFAULT 'id',
    destination_key_column      TEXT DEFAULT 'id',
    options                     JSONB DEFAULT '{}'::JSONB
)

Example: Creating a preparer

SELECT aidb.create_preparer_for_table(
    name => 'test_preparer',
    operation => 'ChunkText',
    source_table => 'test_source_table',
    source_data_column => 'content',
    destination_table => 'chunked_data_destination_table',
    destination_data_column => 'chunks',
    source_key_column => 'id',
    destination_key_column => 'id',
    options => '{"desired_length": 100}'::JSONB  -- Configuration for the ChunkText operation
);

Preparer for a volume data source

The aidb.create_preparer_for_volume function is used to create a preparer for a volume data source. This is the function signature:

aidb.create_preparer_for_volume(
-------------------------------------------------------------------------------
    name                        TEXT,
    operation                   aidb.DataPreparationOperation,
    source_volume_name          TEXT,
    destination_table           TEXT,
    destination_data_column     TEXT,
    destination_key_column      TEXT DEFAULT 'id',
    options                     JSONB DEFAULT '{}'::JSONB
)

Example: Creating a preparer for a volume

SELECT aidb.create_preparer_for_volume(
    name => 'test_preparer_volume',
    operation => 'ChunkText',
    source_volume_name => 'test_volume',
    destination_table => 'chunked_data_destination_table',
    destination_data_column => 'chunks',
    destination_key_column => 'id',
    options => '{"desired_length": 100}'::JSONB  -- Configuration for the ChunkText operation
);

Executing the data preparation

You can use bulk data preparation to execute the processing operation for all existing data in the source location.

SELECT aidb.bulk_data_preparation('test_preparer');

Listing the preparers

A view is available that lists all the preparers. aidb.preparers also includes some of the preparers configuration:

SELECT * FROM aidb.preparers;
Output
 id |     name      | operation | destination_schema |       destination_table        | destination_key_column | destination_data_column |         options         | source_type | source_schema |   source_table    | source_data_column | source_key_column | source_volume_name
----+---------------+-----------+--------------------+--------------------------------+------------------------+-------------------------+-------------------------+-------------+---------------+-------------------+--------------------+-------------------+--------------------
  1 | test_preparer | ChunkText | public             | chunked_data_destination_table | id                     | chunks                  | {"desired_length": 100} | Table       | public        | test_source_table | content            | id                |
(1 row)

Deleting a preparer

SELECT aidb.delete_preparer(<name>);
Note

This does not delete the destination table or any data in it.

End to end examples

You can find end-to-end examples for preparers on the examples page.


Could this page be better? Report a problem or suggest an addition!