API with NestJS #83. Text search with tsvector and raw SQL

JavaScript NestJS SQL

This entry is part 83 of 168 in the API with NestJS

It is very common to implement a feature of searching through the contents of the database. In one of the previous articles, we learned how to implement it in a simple way using pattern matching.

Today we take it a step further and learn about the data types explicitly designed for full-text search.

Text Search Types

PostgreSQL provides two data types that help us implement full-text search. They allow us to search through a collection of texts and find the ones that match a given query the most.

tsvector

The column stores the text in a format optimized for search. To parse a string into the format, we need the function.

When we look at the result of the above query, we notice a set of optimizations. One of the most apparent is grouping duplicates. Thanks to using the English dictionary, PostgreSQL noticed that “quick” and “quickly” are two variants of the same word.

Also, using the type can help us filter out stop words. They are very common, appear in almost every sentence, and don’t have much value when searching through text. Since we used the English dictionary in the above example, PostgreSQL filtered out the words “the” and “over”.

tsquery

The data type stores the text we want to search for. To transform a string into the format, we can use the function.

To check if a certain matches the , we need to use the operator.

true

When doing the above, we can play with the , , and boolean operators. For example, we can use the operator to make sure a given text does not contain a particular word.

true

Check out the official documentation for a good explanation of all available operators.

Another handy function is . It takes an unformatted phrase and inserts the operator between words. Because of that, it is an excellent choice to handle the input from the user.

true

Transforming the existing data

Let’s take a look at our  table.

Unfortunately, it does not contain a column. The most straightforward solution to the above problem is to convert our data to on the fly.

We can take the above even further and combine the contents of the and columns to search through both.

The crucial issue with the above approach is that it causes PostgreSQL to transform the text from every record of the database, which can take a substantial amount of time.

Instead, I suggest defining a generated column that contains the data transformed into the format.

If you want to know moure about generated columns, check out Defining generated columns with PostgreSQL and TypeORM

Since we use the keyword, we define a stored generated column that is saved in our database. PostgreSQL updates it automatically every time we modify the and columns.

We can now use our generated column when making a query to improve its performance drastically.

Ordering the results

So far, we haven’t paid attention to the order of the results of our query. Sorting the search results based on relevance could help the users quite a bit.

For example, we can indicate that the text from the column is more important than the column. To do that, let’s change how we create our column and use the function.

Let’s compare the two following posts after modifying the column:

The combined value of the and is the same in both posts. However, the takes into account that the column is more important.

Thanks to the above, we can now use the function to order our results based on the weight of each column.

Implementing full-text search with NestJS

Let’s create a migration first to implement the above functionalities in our NestJS project.

20221113211441_add_post_tsvector.ts

The crucial thing to notice above is that we are creating a Generalized Inverted Index (GIN). It works well with text searching and is appropriate when a column contains more than one value. Doing that can speed up our queries very significantly.

If you want to know more about indexes, check out API with NestJS #82. Introduction to indexes with raw SQL queries

In one of the previous parts of this series, we implemented the support for the query parameter.

posts.controller.ts

Finally, we need to modify the SQL queries that we make in our repository.

posts.repository.ts

Above, we use the keyset pagination that prevents us from sorting the results in a straightforward way. If you want to know more, check out API with NestJS #77. Offset and keyset pagination with raw SQL queries

Summary

In this article, we’ve gone through implementing full-text search in PostgreSQL. To do that, we had to learn about the and data types. In addition, we’ve created a stored generated column and a Generalized Inverted Index to improve the performance. By doing all of the above, we’ve created a fast search mechanism that is a good fit for many applications.

Series Navigation<< API with NestJS #82. Introduction to indexes with raw SQL queriesAPI with NestJS #84. Implementing filtering using subqueries with raw SQL >>
Subscribe
Notify of
guest
1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Choonghee Lee
Choonghee Lee
1 year ago

Thank You. It was very interesting 😀