The best place to start is with the Pipelines Overview to get an understanding of what Pipelines is and how it works.
Installation
Pipelines is included with the EDB Postgres AI - AI Accelerator suite of tools. To install Pipelines, follow the instructions in the AI Accelerator Installation Guide.
Using Pipelines
Once you have Pipelines installed, you can start using it to work with your data.
Log in to your Postgres server and ensure the Pipelines extension is installed:
We'll be working solely with Postgres table data for this example, so we won't need to install the pgfs extension.
Let's also create an example table to work with:
And let's insert some data:
So now we have a table with some data in it, food products and some very personal opinions about them.
Registering a Retriever
The first step to using Pipelines with this data is to register a retriever. A retriever is a way to access the data in the table and use it in AI workflows.
Querying the retriever
Now that we have a retriever registered, we can query it to get similar results based on the data in the table.
First, we haven't computed embeddings for our retriever yet.
The products_retriever_vector table is where aidb keeps the computed embeddings for the retriever.
Let's compute those embeddings now using aidb.bulk_embedding:
Now we can query the retriever again:
Now we have some results. The key column is the primary key of the row in the products table, and the distance column is the distance between the query and the result. The lower the distance, the more similar the result is to the query.
What we really want is the actual matching text, not just the key. We can use aidb.retrieve_text for that:
Now we have the actual data from the table that matches the query.
You may want the row data from the products table instead of the products_retriever_vector table. You can do that by joining the two tables:
Now you have the actual data from the products table that matches the query and as you can see, the full power of Postgres is available to you to work with your AI workflows.
One more thing, auto-embedding
As it stands vectors have been calculated for our data, but if we added data to the table it wouldn't be automatically embedded. The retriever would go out of sync.
To keep the embeddings up to date, we can enable auto-embedding:
Now, if we add data to the table, the embeddings will be automatically calculated. We can quickly test this:
Further reading
In the Models section, you can learn how to register more models with Pipelines, including external models from OpenAI API compatible services.
In the Retrievers section, you can learn more about how to use retrievers with external data sources, local files or S3 storage, and how to use the retriever functions to get the data you need.