How to implement faceted search with Django and PostgreSQL

January 19, 2023

What is faceted search?

Faceted search is a technique for searching or narrowing down search results by applying different filters. Faceted search is also called guided navigation, faceted browsing, or faceted navigation. 

Let’s understand faceted search by looking at an example.

Suppose a user is buying a mobile phone on an ecommerce website. When the user searches on the site for mobiles (just by clicking on the mobiles category), it shows some results that are not useful. With faceted search, the user selects the company name of the mobile manufacturer—say, Nokia, Samsung, or Apple. They can then drill down the results by clicking on Apple, for example, and can apply price and specifications filters to get more appropriate results.

 

Benefits of faceted search

Faceted search provides an effective way to refine search results until the user gets the right product. This is useful if we have a large product base on our website. If we have a few hundred products, then basic search or filters are more useful. But if we have a large product base, basic filters are not enough, even if our customer knows what they want.

  

Faceted search using Django and PostgreSQL

We can achieve faceted search in PostgreSQL by using the three following techniques:

  • Full-text search
  • JSON support
  • Window function 

In this post we will cover faceted search using full-text search.

Django uses the module django.contrib.postgres.search to perform PostgreSQL’s full-text search. Let’s quickly create a Django project and app using the django-admin utility:

 

1. Start a project:

$ django-admin startproject edb-blog

  This will create some directories:

  edb-blog/

    manage.py

    edb-blog/

        __init__.py

        settings.py

        urls.py

        asgi.py

        wsgi.py

 

2. Start an app:

$ django-admin startapp blog

 

This will create the directory blog:

blog/

    __init__.py

    admin.py

    apps.py

    migrations/

        __init__.py

    models.py

    tests.py

    Views.py

(Novice users can refer to the Django documentation for more details.)

Prerequisites

Edit /edb-blog/edb-blog/settings.py

 

1. Setup the database:

DATABASES = {

   'default': {

       'ENGINE': 'django.db.backends.postgresql',

       'NAME': '<db_name>',

       'USER': '<db_user>',

       'PASSWORD': '<password>',

       'HOST': '127.0.0.1',

       'PORT': '<db_port>',

   }

}

 

2. Add ‘django.contrib.postgres’ and ‘blog’ app in INSTALLED_APPS:

INSTALLED_APPS = [

   'django.contrib.admin',

   'django.contrib.auth',

   'django.contrib.contenttypes',

   'django.contrib.sessions',

   'django.contrib.messages',

   'django.contrib.staticfiles',

   'django.contrib.postgres',

   ‘blog’

…

]

 

Example

Implement a faceted search for blogs using Django and PostgreSQL. Assume that we have some blogs written and have entries in the PostgreSQL database. We can search for a blog by name or tagline from the database using some search term or string. 

For example, search all blogs having the text “recipes” in the blog name or its tagline.

I have created three model classes: Blog, Author, and Entry. Each class has its appropriate fields (the columns for each table). 

 

# models.py:

from django.db import models



class Blog(models.Model):

    name = models.CharField(max_length=100)

    tagline = models.TextField()

    lang = models.CharField(max_length=100, default='english')



    def __str__(self):

        return self.name



class Author(models.Model):

    name = models.CharField(max_length=200)

    email = models.EmailField()



    def __str__(self):

        return self.name



class Entry(models.Model):

    blog = models.ForeignKey(Blog, on_delete=models.CASCADE)

    headline = models.CharField(max_length=255)

    body_text = models.TextField()

    pub_date = models.DateField(auto_now_add=True)

    mod_date = models.DateField(auto_now=True)

    authors = models.ManyToManyField(Author)

    n_comments = models.IntegerField(default=0)

    n_pingbacks = models.IntegerField(default=0)

    rating = models.IntegerField(default=5)



    def __str__(self):

        return self.headline

 

Simple Search

The basic approach to text searching is the LIKE keyword with the % wildcard operator:

SELECT * FROM blog WHERE name LIKE '%recipes%'

 

This query find all records for which the column contains the  ‘recipes’ string.

This can be achieved in Django using the filters contains or icontains (for case insensitivity). The limitation of using contains or icontains is that we cannot use an index on the columns. When we use contains or icontains Django ORM internally uses the LIKE query with the % operator:

>>> Blog.objects.filter(name__contains='recipes')



output:

<QuerySet [<Blog: recipes>]>

 

This query does a sequential scan on each row for columns containing ‘recipe’. Let’s check the explain plan for how this is executed using the Queryset.explain() method:

>>> Blog.objects.filter(name__contains='recipe').explain()



output:

"Seq Scan on blog_blog  (cost=0.00..12.00 rows=1 width=472)\n  Filter: ((name)::text ~~ '%recipe%'::text)"

 

Filters like contains, icontains, and startswith are useful when we want to do a simple keyword search. For this we need to know the column name that contains the word, but it cannot be used if we are searching for strings that appear in the middle of the text.

 

Search Vectors

Using simple search we are able to search in the table for specific columns or with a single field. But this is still limiting the search results if we are creating any ecommerce website where a user gives a single search string in the product filters and expects multiple appropriate results.

The filter below is used in a same way that we used contains or icontains but it produces a different query:

>>>Blog.objects.filter(tagline__search='recipes')



output:

<QuerySet [<Blog: veggie>]>

 

When we start using search filters Django ORM internally uses two functions: 

  • to_tsvector (wraps the columns for which we need to search the string)
  • plainto_tsquery (wraps the search term)

Let’s check the explain plan in Django:

>>> Blog.objects.filter(tagline__search='recipes').explain()



Output:

"Seq Scan on blog_blog  (cost=0.00..92.00 rows=1 width=472)\n  Filter: (to_tsvector(COALESCE(tagline, ''::text)) @@ plainto_tsquery('recipes'::text))"

 

We can see Django internally uses the to_tsvector and plainto_tsquery functions when we use a search(__search) filter.

 

Using annotate queryset

In case we need to search for multiple fields, then we need to annotate the queryset using Django’s SearchVector:

>>>from django.contrib.postgres.search import SearchVector

>>>Blog.objects.annotate(search=SearchVector('name','tagline')).filter(search='recipes')



output:

<QuerySet [<Blog: recipes>, <Blog: veggie>]>

 

Let’s check the explain plan in Django:

>>>Blog.objects.annotate(search=SearchVector('name','tagline')).filter(search='recipes').explain()

"Seq Scan on blog_blog  (cost=0.00..93.06 rows=1 width=504)\n  Filter: (to_tsvector((((COALESCE(name, ''::character varying))::text || ' '::text) || COALESCE(tagline, ''::text))) @@ plainto_tsquery('recipes'::text))"

 

We can observe that sequential scan is used again and according to the query plan it is slower than the sequential scan of the LIKE query used before.

Now, we have a nice approach to have faceted search using full-text search, except for performance. We can improve the performance in two ways:

  • Precompute the to_tsvector
  • Add an index

 

Precompute the to_tsvector and add an index

When we say we want to precompute the to_tsvector that means we know the fields on which we will search. To precompute to_tsvector Django provides the SearchVectorField we need to add to the Django model:

# models.py

from django.contrib.postgres.search import SearchVectorField



class Blog(models.Model):

   name = models.CharField(max_length=100)

   tagline = models.TextField()

   lang = models.CharField(max_length=100, default='english')

   search = SearchVectorField(null=True)



   def __str__(self):

       return self.name

 

Now, create the migrations and apply them:

$ python manage.py makemigrations

makemigrations will create migration_number.py files in your app’s migrations folders. These lines of code are nothing but statements that help in creating actual fields in your respective database, similar to SQL statements.

We have to apply migrations created by the above command. By this command migrations get reflected into the database.

   $ python manage.py migrate

 

Update the existing records:

>>> from django.contrib.postgres.search import SearchVector

>>> Blog.objects.all().update(search=SearchVector('name'))

 

The preferred index type for searching columns with text (or JSON) is a Generalized Inverted Index or GIN index.

Add the GIN index to model meta class:

# models.py

from django.contrib.postgres.indexes import GinIndex



class Blog(models.Model):

   name = models.CharField(max_length=100)

   tagline = models.TextField()

   lang = models.CharField(max_length=100, default='english')

   search = SearchVectorField(null=True)



   class Meta:

        indexes = [

            GinIndex(fields=['search']),

        ]



   def __str__(self):

       return self.name







>>> from blog.models import Blog

>>> Blog.objects.filter(search='recipes')



output:

<QuerySet [<Blog: recipes>]>

 

Let’s check the explain plan in Django:

>>> Blog.objects.filter(search='recipes').explain()



output:

"Seq Scan on blog_blog  (cost=0.00..4.15 rows=1 width=504)\n  Filter: (search @@ plainto_tsquery('recipes'::text))"

 

As we can see, when we apply an index on the columns, we get much better performance while searching for the records. 

In this way we have implemented faceted search using PostgreSQL by full-text search technique.

Tech stack used:

Python = 3.7

Django = 3.0.2

PostgreSQL = 12

 

References:

https://docs.djangoproject.com/en/3.0/ref/contrib/postgres/search/

https://www.postgresql.org/docs/current/textsearch.html

 

Share this