Multiple Column PostgreSQL Full Text Search

In Part 1 of this tutorial about full text search using PostgreSQL, we talked about using functions such as to_tsquery and websearch_to_tsquery, which builds upon PostgreSQL ability to decompose longer sentences to smaller chunks called lexemes using tsquery.

Now we’re going to expand our knowledge and implement multi-column full text search and add indexes for a more performant search. In this next step we want to search film titles AND film descriptions using our dvdrental database. In the stock database multi-column indexed search is already implemented but we’re going to scrap that for the sake of this tutorial:

drop index idx_fulltext;

alter table film drop column fulltext;

Creating a combined tsvector column

Doing a full text search through multiple columns is easy as combining the to_tsvector results of wanted columns into one concatenated result:

select to_tsvector(title) || ' ' ||  to_tsvector(description) as fulltext 
from film;

1000 rows

But in this approach each column has the same ‘importance’ in our search. To optimize this we want to add more importance to our title column in front of the description. Fortunately, Postgres has this covered with the setweight function! This functions accepts two parameters, a tsvector value and a letter from A to D which determines the importance, or weight, of the column. Combining all this we get the following query:

select setweight(to_tsvector(title), 'A') || ' ' ||  setweight(to_tsvector(description), 'B') as fulltext
from film;

1000 rows

We could add websearch_to_tsquery to the mix from the Part 1 but the query would be too messy and hard to read. Instead, lets add a new column which is a calculated tsvector column from the combination of title and description and has a search weight added for prioritizing results. The best candidate are generated columns:

A generated column is a special column that is always computed from other columns. Thus, it is for columns what a view is for tables. - PostgreSQL documentation

alter table film 
add fulltext tsvector 
generated always as	(
	setweight(to_tsvector('simple', title), 'A') || ' ' ||  setweight(to_tsvector('simple', description), 'B') :: tsvector
) stored;

Note the added simple value in our to_tsvector function. We add this because PostgreSQL throws an exception if we explicitly don’t define the ‘regional configuration’ or a dictionary which will be used for deriving lexemes. Other options are english, spanish etc. Read more about that in the official PostgreSQL documentation.

Now we can easily search through both of our columns, giving priority to the title column using websearch_to_tsquery:

select title, ts_rank(fulltext, websearch_to_tsquery('desert')) as rank
from film
where fulltext @@ websearch_to_tsquery('desert')
order by rank desc;

44 rows

Indexes

When we do a ANALYZE EXPLAIN we can see that the PostgreSQL query profiler is not happy:

Profiler is not happy

To optimize this query we can add an index to our fulltext column. The most optimized index for a tsvector data type is called GIN and the following query will create one:

CREATE INDEX search_fulltext ON film USING GIN(fulltext);

Now when we check the profiler we can see the results are much much better:

Much better!

We got from 1.609ms to 0.144ms, that’s a huge performance improvement!

What next?

In the next part we’ll cover how to setup a REST API direct on top of our PostgreSQL db and do full text search on our data.