API with NestJS #82. Introduction to indexes with raw SQL queries

November 7, 2022

This entry is part 82 of 187 in the API with NestJS

As our database grows, we need to put more and more emphasis on performance. A popular approach to dealing with this problem is by implementing indexes. In this article, we look into doing that with raw SQL in a project that does not use ORM.

Introducing indexes

So far, in this series, we’ve created a table for storing posts.

CREATE TABLE posts (

id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,

title text NOT NULL,

post_content text NOT NULL,

author_id int REFERENCES users(id) NOT NULL

)

We’ve also implemented getting a list of posts written by a particular author.

posts.repository.ts

import {

Injectable,

} from '@nestjs/common';

import DatabaseService from '../database/database.service';

import PostModel from './post.model';

@Injectable()

class PostsRepository {

constructor(private readonly databaseService: DatabaseService) {}

async getByAuthorId(

authorId: number,

offset = 0,

limit: number | null = null,

idsToSkip = 0,

) {

const databaseResponse = await this.databaseService.runQuery(

WITH selected_posts AS (

SELECT * FROM posts

WHERE author_id=$1 AND id > $4

ORDER BY id ASC

OFFSET $2

LIMIT $3

total_posts_count_response AS (

SELECT COUNT(*)::int AS total_posts_count FROM posts

WHERE author_id=$1 AND id > $4

)

SELECT * FROM selected_posts, total_posts_count_response

[authorId, offset, limit, idsToSkip],

);

const items = databaseResponse.rows.map(

(databaseRow) => new PostModel(databaseRow),

);

const count = databaseResponse.rows[0]?.total_posts_count || 0;

return {

items,

count,

};

}

// ...

}

export default PostsRepository;

The above method also implements pagination. If you want to know more, check out API with NestJS #77. Offset and keyset pagination with raw SQL queries

In our SQL query, we add WHERE author_id=$1. Because of that, PostgreSQL needs to scan the entire posts table to find matching records. The bigger the table, the more it affects the performance. Let’s run a simple query and see how fast it is

EXPLAIN ANALYZE

SELECT * FROM posts

WHERE author_id = 14

The crucial part of the above query plan is that PostgreSQL uses the parallel sequential scan that takes quite a lot of time. We can deal with this problem by adding an index.

Adding an index

An index can make some of our queries faster by organizing a table using a particular column. Let’s create a migration that adds an index using the author_id column.

1	npx knex migrate:make add_post_author_id_index

20221105181728_add_post_author_id_index.ts

import { Knex } from 'knex';

export async function up(knex: Knex): Promise<void> {

return knex.raw(`

CREATE INDEX post_author_id_index ON posts (author_id)

`);

}

export async function down(knex: Knex): Promise<void> {

return knex.raw(`

DROP INDEX post_author_id_index

`);

}

After running the npx knex migrate:latest command, our database is ready to go. Let’s rerun the same query and compare.

EXPLAIN ANALYZE

SELECT * FROM posts

WHERE author_id = 14

As we can see above, the execution time of our query dropped drastically. Now, PostgreSQL can run an index scan instead of a parallel sequential one.

When we create an index, PostgreSQL maintains a data structure organized using a certain column. We can imagine the index as key and value pairs. In the case of the above example, the keys are author ids, and the values point to particular posts.

author_id	post_id
1	1
2	2
2	3
2	4
3	5

To be more precise, the actual data structures PostgreSQL uses for indexing are more intricate. By default, PostgreSQL implements the B-tree data structure with each leaf poiting to a particular table row.

Thanks to having a data structure sorted by the author id, PostgreSQL can quickly find all posts written by a particular author. However, besides the obvious advantages when fetching data, indexes have some important downsides.

Indexes can speed up our SELECT queries but make our inserts and updates slower. This is because PostgreSQL needs to update the indexes each time we modify the data. Indexes also require additional space in our database.

Multi-column indexes

Making queries with multiple conditions is a popular case. Let’s take a look at our comments table:

CREATE TABLE comments (

id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,

content text NOT NULL,

post_id int REFERENCES posts(id) NOT NULL,

author_id int REFERENCES posts(id) NOT NULL,

deletion_date timestamptz

);

A common query would be selecting comments that are not deleted and are related to a particular post.

1 2	SELECT * FROM comments WHERE post_id=1 AND deletion_date IS NULL

Creating an index just on the post_id would speed up the above query. However, if we want to take it even further, we could create a multi-column index.

1 2	CREATE INDEX post_id_deletion_date_comments_index ON comments (post_id, deletion_date)

Unique indexes

In one of the previous articles in this series, we’ve defined a table of users.

CREATE TABLE users (

id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,

email text NOT NULL UNIQUE,

name text NOT NULL,

password text NOT NULL

)

We’ve decided to mark the email column as UNIQUE. Because of that, every time we insert a new row into the users table, PostgreSQL checks if the given email is already in the table.

The important thing is that creating a unique constraint causes PostgreSQL to create an index. Thanks to that, the database can quickly search through all existing records to determine if a given value is unique.

The index created through the UNIQUE keyword can also be helpful in the SELECT queries if the database decides it might cause a performance boost.

Types of indexes

All of the above indexes so far have used the B-tree structure. This is because it fits most use cases, but there are other options too.

Hash indexes

The hash indexes use the hash table under the hood, which might prove to be beneficial in some use cases.

1	CREATE INDEX posts_title_index ON posts USING hash (title)

Generalized Inverted Indexes (GIN)

For the GIN indexes to work, we might need to enable the pg_trim and btree_gin extensions.

1 2	CREATE EXTENSION pg_trgm; CREATE EXTENSION btree_gin;

Once we do the above, we can use the USING GIN keyword to create the index.

1	CREATE INDEX posts_title_index ON posts USING GIN (title)

Generalized Search Tree (GIST)

The GIST indexes might be preferable over GIN in some cases. They might be useful when indexing geometric data and implementing text search.

For it to work, we might need to enable the btree_gist extension.

1	CREATE EXTENSION btree_gist;

We can now use the USING GIST keyword.

1	CREATE INDEX posts_title_index ON posts USING GIST (title)

Block Range Indexes (BRIN)

When dealing with data types with a linear sort order, the Block Range Indexes might prove to be useful.

1	CREATE INDEX posts_title_index ON posts USING BRIN (title)

Summary

In this article, we’ve gone through the basics of indexes and considered their advantages and disadvantages. Then, we implemented an example that improves the performance of our SELECT queries.

Besides the regular indexes, we’ve also mentioned multi-column indexes and indexes using data structures different from the B-tree. Some of them are worth a separate article, so stay tuned!

Series Navigation<< API with NestJS #81. Soft deletes with raw SQL queriesAPI with NestJS #83. Text search with tsvector and raw SQL >>