- 1. API with NestJS #1. Controllers, routing and the module structure
- 2. API with NestJS #2. Setting up a PostgreSQL database with TypeORM
- 3. API with NestJS #3. Authenticating users with bcrypt, Passport, JWT, and cookies
- 4. API with NestJS #4. Error handling and data validation
- 5. API with NestJS #5. Serializing the response with interceptors
- 6. API with NestJS #6. Looking into dependency injection and modules
- 7. API with NestJS #7. Creating relationships with Postgres and TypeORM
- 8. API with NestJS #8. Writing unit tests
- 9. API with NestJS #9. Testing services and controllers with integration tests
- 10. API with NestJS #10. Uploading public files to Amazon S3
- 11. API with NestJS #11. Managing private files with Amazon S3
- 12. API with NestJS #12. Introduction to Elasticsearch
- 13. API with NestJS #13. Implementing refresh tokens using JWT
- 14. API with NestJS #14. Improving performance of our Postgres database with indexes
- 15. API with NestJS #15. Defining transactions with PostgreSQL and TypeORM
- 16. API with NestJS #16. Using the array data type with PostgreSQL and TypeORM
- 17. API with NestJS #17. Offset and keyset pagination with PostgreSQL and TypeORM
- 18. API with NestJS #18. Exploring the idea of microservices
- 19. API with NestJS #19. Using RabbitMQ to communicate with microservices
- 20. API with NestJS #20. Communicating with microservices using the gRPC framework
- 21. API with NestJS #21. An introduction to CQRS
- 22. API with NestJS #22. Storing JSON with PostgreSQL and TypeORM
- 23. API with NestJS #23. Implementing in-memory cache to increase the performance
- 24. API with NestJS #24. Cache with Redis. Running the app in a Node.js cluster
- 25. API with NestJS #25. Sending scheduled emails with cron and Nodemailer
- 26. API with NestJS #26. Real-time chat with WebSockets
- 27. API with NestJS #27. Introduction to GraphQL. Queries, mutations, and authentication
- 28. API with NestJS #28. Dealing in the N + 1 problem in GraphQL
- 29. API with NestJS #29. Real-time updates with GraphQL subscriptions
- 30. API with NestJS #30. Scalar types in GraphQL
As our system grows, certain queries on our database might fail us in terms of performance. One of the popular ways of dealing with this issue are indexes. This article explores how we can use them both through TypeORM and writing our own Postgres queries.
Introduction to indexes
When we store information on a disk, we do so with blocks of data. When searching through it, we need to scan the entirety of it to find matching entries. Iterating over it from cover to cover does not seem like the most performant approach.
In the second part of this series, we’ve created a table of posts.
One of the most common queries that we might want to run here is to find posts of a
1 | SELECT * FROM post WHERE "authorId" = 1; |
Unfortunately, this means scanning the entire post table to find matching entries. As our table grows, this is going to take more and more time. We can improve this with the help of indexes.
The job of indexes is to make our queries faster. It requires quite a bit of disk space by holding a copy of the indexed field values and pointing to the record they relate to.
1 | CREATE INDEX post_authorId_index ON post ("authorId"); |
Postgres folds column names that we don’t put in double quotes to lower case. This is why we need to write "authorId" above
We can imagine them as key and value pairs. In our case, the keys would be ids of the authors, and the values would be pointers to the posts. This way, Postgres has a lot easier time finding all of the posts of a certain author.
This information is stored in a separate data structure. Whenever we query the data, Postgres can use it under the hood to increase the speed.
Unfortunately, it takes a noticeable amount of space, and Postgres needs to keep it synchronized. Every time we insert or update the data, Postgres needs to update the indexes too. When thinking about adding indexes, we need to consider the pros and cons.
Indexes could benefit our update queries if they have some search conditions, though.
Types of scans
Please note that the above select query that we perform needs to extract the data because we want to access the posts’ contents. Postgres has a concept of index-only scans when the index contains all information required by a query. For example, when we count the number of posts, we might experience an even greater improvement in speed because Postgres does not need to read our table’s contents.
The type of scan is chosen under the hood by Postgres. We can inspect it using the EXPLAIN command.
1 | EXPLAIN SELECT * FROM post WHERE authorId = 1; |
We can expect one of a few different scans to be applied:
- sequential scan
- sequentially scanning all items of a table
- index scan
- uses indexes to increase the performance of the scan. Accesses the data from the index and uses it to fetch the data from the actual table
- index-only scan
- also uses indexes but only scans the index data structure
- bitmap scan
- a process between an index scan and sequential scan
For a more detailed comparison of various scan methods, check out this article.
Implementing Indexes with TypeORM
So far, in this series, we’ve been using TypeORM. We can use it to generate indexes for certain columns using the @Index() decorator.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | import { Entity, ManyToOne, PrimaryGeneratedColumn, Index } from 'typeorm'; import User from '../users/user.entity'; @Entity() class Post { @PrimaryGeneratedColumn() public id: number; // ... @Index() @ManyToOne(() => User, (author: User) => author.posts) public author: User; } export default Post; |
After firing up pgAdmin, we can see that TypeORM generated a name for our index.
We can avoid the above behavior by providing a name when using the @Index() decorator.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | import { Entity, ManyToOne, PrimaryGeneratedColumn, Index } from 'typeorm'; import User from '../users/user.entity'; @Entity() class Post { @PrimaryGeneratedColumn() public id: number; // ... @Index('post_authorId_index') @ManyToOne(() => User, (author: User) => author.posts) public author: User; } export default Post; |
Multicolumn indexes
We might sometimes find ourselves making queries with multiple conditions, such as:
1 | SELECT * FROM post WHERE "authorId" = 1 AND "categoryId" = 2 |
The performance of the above might be improved by creating an index that uses two columns.
1 | CREATE INDEX post_authorId_columnId_index ON post ("authorId", "columnId"); |
TypeORM also supports indexes with multiple columns. To specify it, we need to use the @Index() decorator on the entity.
1 2 3 4 5 | @Entity() @Index(['postId', 'authorId']) class Post { // ... } |
Keep in mind that Postgres states in its documentation that multicolumn indexes should be used sparingly. Usually, an index on a single column is enough, and using more than three columns probably won’t be helpful.
Index types
Postgres has a few index types available under the hood. By default, it uses B-tree indexes that fit most cases. We also have a few other options:
- Generalized Inverted Indexes (GIN)
- designed to handle cases where the values contain more than one key – for example, arrays
- Hash indexes
- can only handle simple equality checks
- Block Range Indexes (BRIN)
- used for large tables with columns that have a linear sort order
- Generalized Search Try (GIST)
- useful for indexing geometric data and text search
Unfortunately, TypeORM does not support creating indexes with custom types. If we’d need one of the above types, we would have to write the query ourselves. For example, we could write a migration with it.
Summary
In this article, we’ve looked into the basics of creating indexes in the Postgres database. We’ve also briefly touched on the subject of various index types. To better understand how our database works, we also used the EXPLAIN command to see how effective our indexes are. Since indexes can substantially improve our application’s performance if used currently, they are definitely worth checking out.