Using Nvidia NIM model in your environment

Suggest edits

To use a Nvidia NIM that is hosted in your own environment, you first need a instance of the model. For this tutorial, we will show how to configure an AWS hosted instance with the Nvidia NIM model. We will use the Nvidia NIM model llama3-8b-instruct.

Prerequisites

A system capable of running Nvidia CUDA Toolkit. We suggest, for this tutorial, using an EC2 g5.8xlarge instance with 1024 GB of gp3 storage running Ubuntu 24.04 LTS, although smaller instance sizes may also work.
An Nvidia NGC account. (If you don't have one, you can create one here)

Configuring the system

1. Install Nvidia CUDA Toolkit

Download and install the CUDA Toolkit from Nvidia's official page.

2. Install Docker

If your system does not have Docker installed, download and install docker. Instructions for Ubuntu are here. Instructions for other Linux platforms are available too.

3. Generate an NGC API Key

Obtain an API key from Nvidia NGC. This key will be referred to as the NGC API KEY in the following steps.

docker login nvcr.io

Use the following credentials:

Username: $oauthtoken
Password: <NGC API KEY>

4. Install Nvidia NGC CLI

Download and install the Nvidia NGC CLI from here.

5. Run the NIM Model

Save the following script as a shell script and execute it (more information can be found from Nvidia's documentation). Remember to substitute the <NGC API KEY> with the API key you generated:

# Choose a container name
export CONTAINER_NAME=Llama3-8B-Instruct
NGC_API_KEY="<NGC API KEY>"
# Define the repository and image
Repository=nim/meta/llama3-8b-instruct
export IMG_NAME="nvcr.io/${Repository}:latest"
# Set local cache path
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p "$LOCAL_NIM_CACHE"
# Start the LLM NIM container
docker run -it --rm --name=$CONTAINER_NAME \
--runtime=nvidia \
--gpus all \
--shm-size=16GB \
-e NGC_API_KEY=$NGC_API_KEY \
-v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
-u $(id -u) \
-p 8000:8000 \
$IMG_NAME

6. Test the Deployment

Run the following command to verify the model is working:

curl -X POST "http://0.0.0.0:8000/v1/chat/completions" \
    -H "Accept: application/json" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "meta/llama3-8b-instruct",
        "messages": [
        {
            "role": "user",
            "content": "Tell me a story"
        }
        ]
    }'

Once you have verified the model is working, you can use it with EDB Postgres AI Accelerator.

Make a note of the public IP address of the EC2 instance, as you will need it to connect to the model from EDB Postgres AI Accelerator. This will be referred to as <NIM_HOST> in the following steps.

Integrating the Model with AI Accelerator

1. Enable AI Accelerator in EDB Postgres AI

With an EDB Postgres Advanced Server, EDB Postgres Extended Server, or a Community Postgres instance running, connect to the database and enable the AI Accelerator extension by running the following SQL commands:

CREATE EXTENSION aidb CASCADE;

2. Register the Model

SELECT aidb.create_model(
'my_nim_llm',
'nim_completions',
'{"model": "meta/llama-3.3-70b-instruct", "url": "http://<NIM_HOST>:8000/v1/chat/completions"}'::JSONB
);

3. Run the Model

Execute the following query to interact with the model:

SELECT aidb.decode_text('my_nim_llm', 'Tell me a short, one sentence story');

Output

                                          decode_text                                       
    ----------------------------------------------------------------------------------------
     As the clock struck midnight, a single tear fell from the porcelain doll's glassy eye.

Your output may vary. You have successfully used Nvidia NIM models via EDB’s AI Accelerator.