Preparers - Concepts

Preparers are used to perform common pre-processing steps on source data from either a Table or Volume source. The processed data is stored in a destination table and can be used by other Preparers, or by Retrievers for embedding generation.

Concepts

Data preparation operation

The data preparation operation is what how the Preparer will transform the source data. The supported operations are encoded as variants of the aidb.DataPreparationOperation enum.

Note

Each operation has its own set of parameters that are used to customize the operation. Learn more in the primitives section.

Data sources

A data source is the input data for the data preparation operation. The aidb extension supports two types of data sources for preparers:

  • Table: a column in a table in the PG database.
  • Volume: a PGFS "volume" which is a wrapper for accessing an S3 object store or local file system.

Execution

Primitive functions facilitate the testing of operations and their configurations on individual inputs with minimal setup. This is useful for quick experimentation before scaling up with a Preparer for bulk data preparation.

Bulk data preparation performs a Preparer's associated operation for all of the Preparer's source data.


Could this page be better? Report a problem or suggest an addition!