API with NestJS #126. Improving the database performance with indexes and Kysely

September 25, 2023

This entry is part 126 of 187 in the API with NestJS

SQL indexes act like guides in our database and help us retrieve the data faster. The bigger our database is, the more emphasis we need to put on its performance. By using indexes, we can help our PostgreSQL database retrieve the data faster.

In this article, we learn what indexes are and how to create them with PostgreSQL, Kysely, and NestJS.

The idea behind indexes

Recently, we’ve added the articles table to our project.

articlesTable.ts

import { Generated } from 'kysely';

export interface ArticlesTable {

id: Generated<number>;

title: string;

article_content: string;

author_id: number;

}

In our repository, we implemented a method to get a list of articles written by an author with a particular id.

articlesTable.ts

import { Database } from '../database/database';

import { Article } from './article.model';

import { Injectable } from '@nestjs/common';

@Injectable()

export class ArticlesRepository {

constructor(private readonly database: Database) {}

async getByAuthorId(

authorId: number,

offset: number,

limit: number | null,

idsToSkip: number,

) {

const { data, count } = await this.database

.transaction()

.execute(async (transaction) => {

let articlesQuery = transaction

.selectFrom('articles')

.where('author_id', '=', authorId)

.where('id', '>', idsToSkip)

.orderBy('id')

.offset(offset)

.selectAll();

if (limit !== null) {

articlesQuery = articlesQuery.limit(limit);

}

const articlesResponse = await articlesQuery.execute();

const { count } = await transaction

.selectFrom('articles')

.where('author_id', '=', authorId)

.select((expressionBuilder) => {

return expressionBuilder.fn.countAll().as('count');

})

.executeTakeFirstOrThrow();

return {

data: articlesResponse,

count,

};

});

const items = data.map((articleData) => new Article(articleData));

return {

items,

count,

};

}

// ...

}

Above, we use pagination. If you want to learn more about it, check out API with NestJS #125. Offset and keyset pagination with Kysely

In the getByAuthorId method, we use where('author_id', '=', authorId). PostgreSQL needs to scan the entire articles table to find the matching records. Let’s visualize that using the EXPLAIN ANALYZE query.

EXPLAIN ANALYZE

SELECT * FROM articles

WHERE author_id = 1

In the query plan above, we can see that PostgreSQL does the sequential scan. While performing the sequential scan, the database reads all rows in the table one by one to find the data that matches the criteria. Sequential scans can be slow and resource-intensive, especially on large data sets. We can improve this situation by adding an index.

Introducing indexes

The SQL index acts similarly to a book’s index and helps the database find the information quickly. Let’s add an index on the author_id column to make the above query faster faster.

20230924200603_add_author_id_index.ts

import { Kysely } from 'kysely';

export async function up(database: Kysely<unknown>): Promise<void> {

await database.schema

.createIndex('article_author_id')

.on('articles')

.column('author_id')

.execute();

}

export async function down(database: Kysely<unknown>): Promise<void> {

await database.schema.dropIndex('article_author_id').execute();

}

Let’s run the above migration and try to analyze our SELECT query.

EXPLAIN ANALYZE

SELECT * FROM articles

WHERE author_id = 1

PostgreSQL considers quite a few factors when deciding whether to use an index. If it does not work for you, check out the VACUUM command.

The moment we create an index, PostgreSQL starts maintaining a data structure organized around a particular column. We can think of it as key and value pairs.

author_id	article_id
1	1
2	2
3	3
3	4
3	5
4	6

Under the hood, PostgreSQL uses a B-tree data structure where each leaf points to a particular row.

Now, PostgreSQL can quickly find all articles written by a particular author thanks to having a structure sorted by the author’s id. Unfortunately, indexes have some disadvantages, too.

Having to maintain an additional data structure takes extra space in our database. While it speeds up the queries that fetch data, maintaining indexes includes additional work for the database when inserting, updating, or deleting records from our database. Therefore, we must think our indexes through to avoid hurting the overall performance due to increased overhead.

Unique index

When working with Kysely, we’ve created the users table with the email column.

20230813165809_add_users_table.ts

import { Kysely } from 'kysely';

export async function up(database: Kysely<unknown>): Promise<void> {

await database.schema

.createTable('users')

.addColumn('id', 'serial', (column) => {

return column.primaryKey();

})

.addColumn('email', 'text', (column) => {

return column.notNull().unique();

})

.addColumn('name', 'text', (column) => {

return column.notNull();

})

.addColumn('password', 'text', (column) => {

return column.notNull();

})

.execute();

}

export async function down(database: Kysely<unknown>): Promise<void> {

await database.schema.dropTable('users').execute();

}

By marking the email column as unique, we tell PostgreSQL to look for email duplicates every time we insert or modify records in the articles table.

Since going through all elements in the table might be time-consuming, PostgreSQL creates indexes whenever we create a unique constraint. We can verify that with a simple SQL query.

SELECT tablename, indexname

FROM pg_indexes

WHERE tablename='articles';

Defining a primary key also creates a unique index. Because of that, the above screenshot also contains the articles_pkey index.

Thanks to that, the database can quickly search the existing users to determine if a particular email is unique. This can also benefit various SELECT queries and give them a performance boost.

EXPLAIN ANALYZE

SELECT * FROM users

WHERE email = 'john@smith.com'

Multi-column indexes

We can create queries that include multiple conditions. A good example is finding an article written by a particular user and containing a specific title.

const articles = await this.database

.selectFrom('articles')

.where('author_id', '=', 1)

.where('title', '=', 'Hello world!')

.selectAll()

.execute();

If we create an index on the author_id or the title columns, we would speed up the above query. However, we can move it up a notch and create a multi-column index.

export async function up(database: Kysely<unknown>): Promise<void> {

await database.schema

.createIndex('article_author_id_title')

.on('articles')

.columns(['author_id', 'title'])

.execute();

}

By creating a multi-column index, we can improve the performance of queries that use a specific combination of column values.

Index types

All of the indexes we mentioned so far used the B-tree structure. While it works fine for most cases, we also have other options.

Generalized Inverted Indexes (GIN)

The GIN indexes can help us when querying complex data types such as arrays or JSON. It might also come in handy when implementing text searching.

await database.schema

.createIndex('article_title')

.on('articles')

.using('gin')

.column('title')

.execute();

To ensure that the GIN index is available in our database, we might need to enable the btree_gin and pg_trim extensions.

1 2	CREATE EXTENSION btree_gin; CREATE EXTENSION pg_trgm;

Hash indexes

The hash SQL index uses hashes to locate specific values quickly. It might be a good fit for some use cases.

await database.schema

.createIndex('article_title')

.on('articles')

.using('hash')

.column('title')

.execute();

Block Range Indexes (BRIN)

The Block Range Indexes (BRIN) are designed to handle very large tables.

await database.schema

.createIndex('article_title')

.on('articles')

.using('brin')

.column('title')

.execute();

Generalized Search Tree (GIST)

The GIST indexes can be helpful for indexing geometric data and implementing text search. They might be preferable over GIN indexes in some cases.

await database.schema

.createIndex('article_title')

.on('articles')

.using('gist')

.column('title')

.execute();

For them to work, we might need to enable the btree_gist extension.

1	CREATE EXTENSION btree_gist;

Summary

In this article, we’ve covered the basic principles behind indexes and implemented examples that improved the performance of our SELECT queries. We also considered the disadvantages that come with an incorrect use of indexes.

The default type of index in PostgreSQL is a B-tree index. It works well for a wide range of queries and is the most commonly used index type. While that’s the case, we also mentioned other types of indexes, such as GIN and hash indexes.

Thanks to the above, we now know how to handle indexes when working with Kysely.

Series Navigation<< API with NestJS #125. Offset and keyset pagination with KyselyAPI with NestJS #127. Arrays with PostgreSQL and Kysely >>