Database normalization is a term often used when discussing database architecture design. Unfortunately, it is often presented with the use of complex definitions that make it difficult to grasp. In this article, we explain what normalization is and provide examples using TypeORM and PostgreSQL.
Database normalization aims to improve the integrity and reduce the redundancy of our data. It achieves that by organizing the data following sets of rules called the normal forms. Unfortunately, their definitions can seem quite complicated. Thankfully, when going through them, we notice that they make a lot of sense when explained in simple words.
First normal form (1NF)
The most fundamental rule of the first normal form is that every table field can contain only one item. Let’s consider the following example:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
import { Column, Entity, PrimaryGeneratedColumn } from 'typeorm'; @Entity() export class User { @PrimaryGeneratedColumn() public id: number; @Column() public fullName: string; @Column() public address: string; } |
While the above entity might seem correct at first glance, one could speculate that it violates the first normal form.
In the above example, we store more than one field per column. For example, we keep both the first name and the last name in the fullName field. When considering a column like that, it is worth asking whether it ever makes sense to access the part of the value in our case. A good example is fetching every person with the last name “Smith”.
Some names have prefixes. A good example is van in the name Ludwig van Beethoven.
The same goes for the address. If our only use case is to display the address, storing it as a single string might be acceptable. However, if at any time a use-case for displaying users only from a specific country would pop up, we would be in trouble. Therefore, we need to think hard about our choices in terms of columns. Sometimes it might be a good idea to play it safe and split the data into multiple fields.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
import { Column, Entity, PrimaryGeneratedColumn } from 'typeorm'; @Entity() export class User { @PrimaryGeneratedColumn() public id: number; @Column() public firstName: string; @Column() public secondName: string; @Column() public buildingNumber: string; @Column() public apartmentNumber: string; @Column() public city: string; @Column() public zipCode: string; @Column() public country: string; } |
Also, we should avoid creating groups of very similar column names. For example, let’s imagine posts that can have more than one author.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
import { Column, Entity, ManyToOne, PrimaryGeneratedColumn, } from 'typeorm'; import User from '../users/user.entity'; @Entity() export class Post { @PrimaryGeneratedColumn() public id: number; @Column() public title: string; @Column() public content: string; @ManyToOne( () => User, (author: User) => author.posts ) public firstAuthor: User @ManyToOne( () => User, (author: User) => author.posts, { nullable: true } ) public secondAuthor?: User } |
If you want to know more about defining relationships, check out API with NestJS #7. Creating relationships with Postgres and TypeORM
The above approach is still not a good design because it lacks in terms of scalability. For example, if we ever have a post that needs a third author, we must create a third column. Also, if we want to find the posts written by a particular user, we would need to consider all of the columns.
To deal with those issues, we can use the many-to-many relationship.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
import { Column, Entity, JoinTable, ManyToMany, PrimaryGeneratedColumn, } from 'typeorm'; import User from '../users/user.entity'; @Entity() export class Post { @PrimaryGeneratedColumn() public id: number; @Column() public title: string; @Column() public content: string; @ManyToMany(() => User) @JoinTable() public authors: User[] } |
Array and JSON columns with PostgreSQL
This blog introduced the array data type and described how to store JSON in our PostgreSQL database. While both of the above can be very useful, using them usually violates the first normal form. Using arrays and JSON does not make for a lousy database design right off the bat, and we should use the right tool for the job. Sometimes it makes sense to create an array instead of defining separate tables to reduce the size of the database and simplify queries by avoiding joins.
Second normal form (2NF)
The first rule of the second normal form is that the data needs to meet the definition of the 1NF first. Also, the table can’t contain any partial dependencies.
We can spot the partial dependencies in a table with composite primary keys.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
import { Column, Entity, } from 'typeorm'; @Entity() export class ProductPurchase { @Column({ primary: true }) public clientName: string; @Column({ primary: true }) public productName: string; @Column() public productPrice: number; } |
In the above entity, we can see that we have a composite primary key that consists of the clientName and the productName. Unfortunately, the ProductPurchase has a partial dependency because the productPrice depends only on one of the primary key fields. This can lead to data redundancy:
In the above example, we can see that two people bought fridges that cost 200 per unit. Because we’ve violated the second normal form, the information about the price of the fridge is duplicated. This not only causes the database to take more space but introduces issues while modifying the data. If the fridge ever changes its price, we might need to modify it in multiple rows.
Third normal form (3NF)
For the data to be in the third normal form, it needs to meet the definition of the 2NF. Also, all of the attributes need to be functionally dependent only on the primary key.
To visualize this issue, let’s add more properties to our previous example.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
import { Column, Entity, } from 'typeorm'; @Entity() export class ProductPurchase { @Column({ primary: true }) public clientName: string; @Column({ primary: true }) public productName: string; @Column() public productPrice: number; @Column() public courierCarBrand: string; @Column() public courierCarModel: string; } |
Above, we’ve added two columns with the information about the courier that delivered the product. Unfortunately, the courierCarBrand and the courierCarModel depend on each other. Therefore, it breaks the third normal form and can cause various issues.
Having the above data structure can lead to data inconsistency. For example, we could lose the data integrity by changing the courierCarBrand in only one of the rows. By doing that, the particular car model could have a different brand in another row.
What we can learn from 2NF and 3NF
The second and third normal forms conclude that each table should focus on a particular entity. Instead of keeping all of the above data in one table, we can create multiple ones that relate to each other. Also, it often makes more sense to create an autogenerated id column instead of using composite primary keys.
To deal with the above issues, let’s create the entities for the product, the client, and the car.
1 2 3 4 5 6 7 8 9 10 11 |
@Entity() export class Product { @PrimaryGeneratedColumn() public id: number; @Column() public name: string; @Column() public price: number; } |
1 2 3 4 5 6 7 8 |
@Entity() export class Client { @PrimaryGeneratedColumn() public id: number; @Column() public name: string; } |
1 2 3 4 5 6 7 8 9 10 11 |
@Entity() export class CourierCar { @PrimaryGeneratedColumn() public id: number; @Column() public brand: string; @Column() public model: string; } |
Now, we need to establish many-to-one relationships when creating the ProductPurchase entity:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
import { Entity, ManyToOne, PrimaryGeneratedColumn, } from 'typeorm'; import { Client } from './client.entity'; import { Product } from './product.entity'; import { CourierCar } from './courierCar.entity'; @Entity() export class ProductPurchase { @PrimaryGeneratedColumn() public id: number; @ManyToOne(() => Client) public client: Client; @ManyToOne(() => Product) public product: Product; @ManyToOne(() => CourierCar) public courierCar: CourierCar; } |
Doing the above creates a product_purchase table that contains ids relating to various other tables.
Summary
In this article, we’ve gone through the basics of database normalization. We’ve used examples with TypeORM and learned about various normal forms. While the formal definitions might often appear complicated, the basic concepts of data normalization are rather straightforward. Keeping them in mind will make sure that our database avoids redundancy and is scalable.
Hi there. Great article as always!
I noticed a typo in the Second NF section. The primary key columns talked about in the first sentence (immediately after the code snippet) doesn’t correspond with the actual columns defined in the snippet.
In the final statement above “Summary”: Doing the above creates a product_purchase column . . . shouldn’t that be product_purchase table?
That’s a great catch, thank you.