Using an OpenAI compatible API with Pipelines

To make use of an OpenAI compliant API, you can use the embeddings or completions model providers. Note that a retriever will need to encode first so you can only use the embeddings model provider with a retriever.

Why use an OpenAI compatible API?

Some examples of why you might want to use an OpenAI compatible API include:

  • If you have a local system running Ollama and, assuming you have configured Ollama as a server, then you may want that local system to handle embeddings.

  • If you have access to a service which provides different or specifically tuned models, you can use it instead of other models.

Creating the model

The starting point for this process is creating a model. When you create a model, you can pass options and credentials to the registration. The defaults will point to the OpenAI service endpoint, so by overriding that, you can point to any service. Here is an example that will create a model that uses a local Ollama server:

select aidb.create_model(
'my_local_ollama',
'embeddings',
'{"model":"llama3.2", "url":"http://llama.local:11434/v1/embeddings", "dimensions":3072}'::JSONB);

Model name and model provider

The model name is the first parameter and set to “my_local_ollama” which we will use later.

We specify the model provider as “embeddings” which is the provider that defaults to using OpenAI servers, but can be overridden by the configuration (the next parameter), to talk to any compliant server.

Configuration

The next parameter is the configuration. This is a JSON string, which when expanded has three parameters, the model, the url and the dimensions.

'{"model":"llama3.2", "url":"http://llama.local:11434/v1/embeddings", "dimensions":3072}'::JSONB

In this case, we are setting the model to “llama3.2”, a relatively new and powerful model. Remember to run ollama run llama3.2 to pull and start the model on the server.

The next json setting is the important one, overriding the endpoint that the aidb model will use.

  • Our server is running on a machine called llama.local.
  • It has port 11434 (the default port for Ollama) open to service requests over HTTP (not HTTPS in this case).
  • The path to the endpoint on the server /v1/embeddings; the same as OpenAI.

Putting those components together we get http://llama.local:11434/v1/embeddings as our end point.

The last JSON parameter in this example is “dimensions” which is a hint to the system about how many vector values to expect from the model. If we look up llama3.2’s properties we can see the llama.embedding_length value is 3072. The provider defaults to 1536 (with some hard-wired exceptions depending on model) but it doesn’t know about llama3.2's embedding length. So in this case, we need to pass "dimensions":3072 to configure aidb accordingly.

That completes the configuration parameter.

If the endpoint requires an API key, that would be passed in the credentials parameter. As this is a local model, we don’t need to pass any credentials.

Note

When using indexing in pgvector, consider the pgvector indexing limitations. Aidb does not automatically configure an index today but if you add one manually, make sure it supports the number of dimensions your model uses.

Using the model

Use the model name you created earlier to use the model just like any other Pipelines model. Here is an example of how to use the model to get an embedding:

select aidb.encode_text('my_local_ollama','I like it');

Pipelines will take care of all the connection management leaving you to focus on your data and the model results.


Could this page be better? Report a problem or suggest an addition!