Getting started with Pipelines

Where to start

The best place to start is with the Pipelines overview to get an understanding of what Pipelines is and how it works.

Installation

Pipelines is included with the EDB Postgres AI - AI Accelerator suite of tools. To install Pipelines, follow the instructions in the AI Accelerator Installation Guide.

Using Pipelines

After you have Pipelines installed, you can start using it to work with your data.

Log into your Postgres server and ensure the Pipelines extension is installed:

CREATE EXTENSION aidb CASCADE;

This example works solely with Postgres table data, so you don't need to install the pgfs extension.

For the example, create a table to work with:

CREATE TABLE products (
   id SERIAL PRIMARY KEY,
   product_name TEXT NOT NULL,
   description TEXT,
   last_updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP
);

Output

CREATE TABLE

Then insert some data into it:

INSERT INTO products (product_name, description) VALUES
      ('Hamburger', 'A delicious combination of bread and meat'),
      ('Cheesburger', 'Improving on a classic, the cheese brings favorite flavors'),
      ('Fish n Chips', 'The fish is a little greasy and the chips do not help'),
      ('Fries', 'Never sure about these on their own, needs seasoning'),
      ('Burrito', 'Always ready for this parcel of edible wonder'),
      ('Pizza', 'It is very much a staple, but the rolled dough with toppings does not inspire'),
      ('Sandwich', 'The blandest of offerings, the sandwich is predominantly boring bread'),
      ('Veggie Burger', 'The ultra-processed vegetable product in this is neither healthy nor delicious'),
      ('Kebab', 'Maybe one of the great edible treats, sliced lamb, salad and crisp pitta');

Output

INSERT 0 9

The result is a table with some data in it: food products and some very personal opinions about them.

Note

This example uses simple, processed data. Pipelines can also handle multiple preprocessing steps via preparers.

Creating a retriever

The core of Pipelines is the retriever. A retriever is a way to access the data in the table and use it in AI workflows.

select aidb.create_retriever_for_table('products_retriever', 't5', 'products', 'description', 'Text');

Output

 create_retriever_for_table
------------------------------
 products_retriever
(1 row)

Querying the retriever

After creating a retriever, you can query it to get similar results based on the data in the table.

select * from aidb.retrieve_key('products_retriever','I like it',5);

Output

ERROR:  Query returned no data. Hint: The "products_retriever_vector" table is likely empty. Make sure the embeddings have been computed.

This example hasn't computed embeddings for the retriever yet.

The products_retriever_vector table is where aidb keeps the computed embeddings for the retriever. Compute those embeddings now using aidb.bulk_embedding:

select aidb.bulk_embedding('products_retriever');

Output

INFO:  bulk_embedding_text found 9 rows in retriever products_retriever
 bulk_embedding
----------------

(1 row)

Now you can query the retriever again:

select * from aidb.retrieve_key('products_retriever','I like it',4);

Output

key |      distance
-----+--------------------
 4   | 1.0369428080621286
 3   |   1.03737124138149
 2   | 1.0839594107837638
 5   | 1.0869412071766262
(4 rows)

This query produces some results. The key column is the primary key of the row in the products table, and the distance column is the distance between the query and the result. The lower the distance, the more similar the result is to the query.

What you really want is the actual matching text, not just the key. You can use aidb.retrieve_text for that:

select * from aidb.retrieve_text('products_retriever','I like it',4);

Output

key |                           value                            |      distance
-----+------------------------------------------------------------+--------------------
 4   | Never sure about these on their own, needs seasoning       | 1.0369428080621286
 3   | The fish is a little greasy and the chips do not help      |   1.03737124138149
 2   | Improving on a classic, the cheese brings favorite flavors | 1.0839594107837638
 5   | Always ready for this parcel of edible wonder              | 1.0869412071766262
(4 rows)

Now you have the actual data from the table that matches the query.

You may want the row data from the products table instead of the products_retriever_vector table. You can do that by joining the two tables:

 select * from aidb.retrieve_key('products_retriever','I like it',4) as a
            left join products as b
            on a.key=b.id;

Output

key |      distance      | id | product_name |                        description                         |         last_updated_at
-----+--------------------+----+--------------+------------------------------------------------------------+----------------------------------
 2   | 1.0839594107837638 |  2 | Cheesburger  | Improving on a classic, the cheese brings favorite flavors | 04-DEC-24 16:48:52.599806 +00:00
 3   |   1.03737124138149 |  3 | Fish n Chips | The fish is a little greasy and the chips do not help      | 04-DEC-24 16:48:52.599806 +00:00
 4   | 1.0369428080621286 |  4 | Fries        | Never sure about these on their own, needs seasoning       | 04-DEC-24 16:48:52.599806 +00:00
 5   | 1.0869412071766262 |  5 | Burrito      | Always ready for this parcel of edible wonder              | 04-DEC-24 16:48:52.599806 +00:00
 (4 rows)

Now you have the actual data from the products table that matches the query. As you can see, the full power of Postgres is available to you to work with your AI workflows.

One more thing: auto-embedding

As it stands, vectors were calculated for the data, but if you add data to the table, it isn't automatically embedded. The retriever would go out of sync.

To keep the embeddings up to date, enable auto-embedding:

select aidb.set_retriever_auto_processing('products_retriever', 'Live');

Output

 set_retriever_auto_processing
-------------------------------

(1 row)

Now, if you add data to the table, the embeddings are automatically calculated. You can quickly test this:

INSERT INTO products (product_name, description) VALUES
      ('Pasta', 'A carb-heavy delight that is always welcome, especially with a good sauce'),
      ('Salad', 'Meh, it is what it is and it is not much. Occasionally saved by a good dressing');

Output

NOTICE:  Running auto embedding for retriever products. key: "10" content: "A carb-heavy delight that is always welcome, especially with a good sauce"
NOTICE:  Running auto embedding for retriever products. key: "11" content: "Meh, it is what it is and it is not much. Occasionally saved by a good dressing"
INSERT 0 2

select * from aidb.retrieve_key('products_retriever','I like it',4) as a
            left join products as b
            on a.key=b.id;

Output

key |      distance      | id | product_name |                                   description                                   |         last_updated_at
-----+--------------------+----+--------------+---------------------------------------------------------------------------------+----------------------------------
 10  | 1.0351907976251493 | 10 | Pasta        | A carb-heavy delight that is always welcome, especially with a good sauce       | 04-DEC-24 17:09:44.97484 +00:00
 11  |  0.979874632270706 | 11 | Salad        | Meh, it is what it is and it is not much. Occasionally saved by a good dressing | 04-DEC-24 17:09:44.97484 +00:00
 3   |   1.03737124138149 |  3 | Fish n Chips | The fish is a little greasy and the chips do not help                           | 04-DEC-24 16:48:52.599806 +00:00
 4   | 1.0369428080621286 |  4 | Fries        | Never sure about these on their own, needs seasoning                            | 04-DEC-24 16:48:52.599806 +00:00
(4 rows)

Getting started with Pipelines

Where to start

Installation

Using Pipelines

Note

Creating a retriever

Querying the retriever

One more thing: auto-embedding

See also

← Prev

↑ Up