Database normalization with PostgreSQL, NestJS, and Prisma

November 13, 2023

This entry is part 133 of 187 in the API with NestJS

Database normalization is a common topic in database design discussions. However, it’s usually explained using complicated terms, making it hard to understand. In this article, we’ll explain what normalization means and give examples using Prisma and PostgreSQL.

The goal of database normalization is to enhance the accuracy and minimize the duplication of our data. We do that by arranging the data according to specific rules known as normal forms. These rules might sound complex initially, but they are logical and easy to understand when explained in plain language.

1NF – first normal form

The main rule of the first normal form is that each field in a table should hold just a single piece of information. Let’s look at an example to understand this:

schema.prisma

model User {

id Int @id @default(autoincrement())

fullName String

address String

}

Upon closer inspection, it becomes clear that the entity mentioned violates the first normal form rule.

In the given example, we’re putting more than one piece of information in a single column. For instance, the fullName field holds both first and last names. It’s important to consider whether we’ll need to access just part of that information. A good example is when we want to find everyone with the last name “Williams”.

Watch out, because some names have prefixes. A common example is van in the name Ludwig van Beethoven.

Similarly, keeping it as one string could be fine if we only need to show the address. But, if we ever need to retrieve users from a specific country, we’d have a problem. That’s why we must carefully consider how we set up our columns. It might be wise to play it safe and divide the data into several fields.

schema.prisma

model User {

id Int @id @default(autoincrement())

firstName String

lastName String

buildingNumber String

apartmentNumber String

city String

zipCode String

country String

}

Designing for scalability

It’s important to steer clear of making groups of columns with very similar names and purposes. For instance, consider a situation with articles that can have multiple authors.

schema.prisma

model Article {

id Int @id @default(autoincrement())

title String

content String

firstAuthor User @relation(fields: [firstAuthorId], references: [id])

firstAuthorId Int

secondAuthor User @relation(fields: [secondAuthorId], references: [id])

secondAuthorId Int

}

If you’re interested in learning more about setting up relationships, take a look at API with NestJS #33. Managing PostgreSQL relationships with Prisma

This approach doesn’t scale well, unfortunately. For instance, if an article requires a third author, we’d need to add another column. Also, we’d have to check each column to find articles written by a specific user.

To address these problems, we can implement a many-to-many relationship.

schema.prisma

model User {

id Int @id @default(autoincrement())

firstName String

lastName String

buildingNumber String

apartmentNumber String

city String

zipCode String

country String

articles Article[]

}

model Article {

id Int @id @default(autoincrement())

title String

content String?

authors User[]

}

Using Array and JSON Columns with PostgreSQL

In this series, we have articles about how to store arrays, and the JSON data with PostgreSQL and Prisma. While using JSON and array columns can come in handy, they can be perceived as breaking the first normal form rule. However, they don’t automatically mean bad database design, because we should choose the appropriate tool for the task. In some cases, it’s better to use an array instead of creating separate tables, as this can make the database smaller and simplify queries by eliminating the need for joins.

2NF – second normal form

The first requirement of the second normal form is that the data must already comply with the first normal form. Additionally, the table should not have any partial dependencies.

Let’s imagine a table with a composite primary key. If some other column relies only on a part of our primary key, it means a partial dependency.

A primary key is a unique identifier for each record in a database table, ensuring that no two rows have the same key value. If the primary key consists of more than one column, we refer to it as a composite primary key.

schema.prisma

model ProductPurchase {

clientName String

productName String

productPrice Int

@@id([clientName, productName])

}

In our table, we have a composite primary key that consists of the clientName and productName columns. We can see a partial dependency because the productPrice depends only on one of the primary key fields – the productName. This can lead to duplicating our data.

Above, two rows describe people who purchased microwaves at 100 per unit. This shows a violation of the second normal form, leading to repeated information about the microwave’s price. This uses extra space in the database and creates problems when updating data. For instance, if the price of the microwave changes, it might require changes in several rows.

3NF – third normal form

To achieve the third normal form, a table must first satisfy the requirements of the second normal form. Additionally, all attributes should be functionally dependent only on the primary key.

schema.prisma

model ProductPurchase {

clientName String

productName String

productPrice Int

courierCarBrand String

courierCarModel String

@@id([clientName, productName])

}

In the above example, we add the courierCarBrand and courierCarModel properties that depend on each other. If a model is “Prius”, it tells us the brand is “Toyota”. Unfortunately, this breaks the third normal form.

Having the data set up like that could cause problems with keeping the information consistent. For example, if we change the brand of a car in just one row and not everywhere, the same car model might show up with different brands in different spots.

Key takeaways from 2NF and 3NF

The second and third normal forms tell us each table should describe one specific entity. We can make several related tables instead of putting all the data in one table. Also, it’s usually better to have an automatically created ID column than to use keys made from combining several columns.

Let’s use this knowledge to create multiple tables for the products, clients, card brands, and car models. We can then use those tables through a many-to-one relationship with the ProductPurchaseModel.

schema.prisma

model Product {

id Int @id @default(autoincrement())

name String

price Int

ProductPurchase ProductPurchase[]

}

model Client {

id Int @id @default(autoincrement())

name String

ProductPurchase ProductPurchase[]

}

model CarBrand {

id Int @id @default(autoincrement())

name String

CarModel CarModel[]

}

model CarModel {

id Int @id @default(autoincrement())

name String

brand CarBrand @relation(fields: [brandId], references: [id])

brandId Int

ProductPurchase ProductPurchase[]

}

model ProductPurchase {

id Int @id @default(autoincrement())

client Client @relation(fields: [clientId], references: [id])

product Product @relation(fields: [productId], references: [id])

courierCarModel CarModel @relation(fields: [courierModelId], references: [id])

clientId Int

productId Int

courierModelId Int

}

Thanks to this approach, the ProductPurchase model only contains IDs that relate to various other tables.

Summary

In this article, we’ve covered database normalization basics, using examples with Prisma and exploring different normal forms. Even though the official definitions can seem complex, the fundamental ideas behind data normalizations are pretty simple. Remembering these concepts will help ensure our database stays efficient and is scalable.

Series Navigation<< API with NestJS #132. Handling date and time in PostgreSQL with KyselyAPI with NestJS #134. Aggregating statistics with PostgreSQL and Prisma >>