Select API with NestJS #141. Getting distinct records with Prisma and PostgreSQL

January 15, 2024

This entry is part 141 of 148 in the API with NestJS

PostgreSQL allows us to filter a query’s results and ensure we don’t get duplicate rows. This can be helpful when your table has many rows where the data in the columns is the same. In this article, we explore two ways PostgreSQL helps us solve this problem. We also learn how to do that with Prisma and configure it to use the native database features instead of filtering the duplicates in memory.

The DISTINCT keyword

Let’s say we have the following model created with Prisma.

schema.prisma

model Address {

id Int @id @default(autoincrement())

street String

city String

country String

user User?

}

model User {

id Int @id @default(autoincrement())

email String @unique

name String

password String

address Address? @relation(fields: [addressId], references: [id])

addressId Int? @unique

}

Above, we define a relationship between the users and the addresses. If you want to know more, check out API with NestJS #33. Managing PostgreSQL relationships with Prisma

Let’s say that we have the following addresses in our table:

id	street	city	country
1	350 5th Ave	New York	United States
2	290 Bremner Blvd	Toronto	Canada
3	Westminster	London	United Kingdom
4	Great Russell St	London	United Kingdom
6	1600 Pennsylvania Ave NW	Washington, D.C.	United States
7	1000 5th Ave	New York	United States
8	Hill Rise	Richmond	United Kingdom
9	East Main Street	Richmond	United States

First, let’s create a query to get a list of all city names from our database.

1	SELECT city FROM "Address";

city
New York
Toronto
London
London
Washington, D.C.
New York
Richmond
Richmond

We can modify the above query using the DISTINCT keyword to get a list of unique cities.

1	SELECT DISTINCT city FROM "Address";

city
New York
Washington, D.C.
London
Toronto
Richmond

Using multiple columns

When we look closer at the list of addresses, we can see a city called Richmond, both in the UK and the United States. Fortunately, we can use the DISTINCT keyword to get a unique combination of the city and country.

1	SELECT DISTINCT city, country FROM "Address";

city	country
New York	United States
Toronto	Canada
London	United Kingdom
Washington, D.C.	United States
Richmond	United Kingdom
Richmond	United States

The DISTINCT ON keyword

With PostgreSQL, we can access another handy tool called the DISTINCT ON. It is similar to DISTINCT but allows us to retain other columns from the row.

1	SELECT DISTINCT ON (city) * FROM "Address";

id	street	city	country
3	Westminster	London	United Kingdom
7	1000 5th Ave	New York	United States
9	East Main Street	Richmond	United States
2	290 Bremner Blvd	Toronto	Canada
6	1600 Pennsylvania Ave NW	Washington, D.C.	United States

Above, we selected distinct cities, but we also see ids, countries, and streets.

The catch is that DISTINCT ON selected one row per each distinct value, but it is unpredictable. In the case of New York, it removed 350 5th Ave (the Empire State Building) and selected 1000 5th Ave (The Metropolitan Museum of Art).

The DISTINCT ON keyword makes the most sense when combined with the ORDER BY clause. If we order the results before applying the DISTINCT ON filter, we can be sure that PostgreSQL will always choose the first row based on the order we specified. Thanks to that, we can have predictable results.

Let’s use DISTINCT ON to find the first user who signed up with an address from a particular country.

First, we need to join the User and Address table.

SELECT "User".id AS userId, "Address".country AS country

FROM "User"

JOIN "Address" ON "User"."addressId" = "Address".id

If you want to learn more about joins, check out API with NestJS #90. Using various types of SQL joins

Let’s make sure each country is unique using the DISTINCT ON clause. By ordering the rows by the user ID, we ensure we get the users with the lowest ID possible. Thanks to that, we get the first users who signed up from a particular country.

SELECT DISTINCT ON(country) "User".id AS "firstRegisteredUserId", "Address".country AS country

FROM "User"

JOIN "Address" ON "User"."addressId" = "Address".id

ORDER BY "Address".country, "firstRegisteredUserId"

firstRegisteredUserId	country
3	Canada
4	United Kingdom
2	United States

Finding distinct values using Prisma

Prisma allows us to filter duplicate rows when using the findMany query.

const distinctCities = await this.prismaService.address.findMany({

distinct: ['city'],

select: {

city: true

}

})

We can take it further and replicate the example we wrote before using the DISTINCT ON keyword.

const countriesWithFirstUser = await this.prismaService.address.findMany({

include: {

user: true,

distinct: ['country'],

orderBy: {

user: {

id: 'asc',

});

How it works under the hood

Let’s configure Prisma to log all SQL queries to the console.

prisma.service.ts

import { Injectable, OnModuleInit } from '@nestjs/common';

import { PrismaClient, Prisma } from '@prisma/client';

@Injectable()

export class PrismaService

extends PrismaClient<Prisma.PrismaClientOptions, Prisma.LogLevel>

implements OnModuleInit

{

constructor() {

super({

log: [

{

emit: 'event',

level: 'query',

});

}

async onModuleInit() {

await this.$connect();

this.$on('query', (event) => {

console.log(`Query: ${event.query}`);

});

}

If you want to know more about logging with Prisma and NestJS, take a look at API with NestJS #113. Logging with Prisma

Let’s run our findMany query that finds all distinct cities and inspect the SQL query that Prisma makes under the hood:

1	SELECT "public"."Address"."id", "public"."Address"."city" FROM "public"."Address" WHERE 1=1 OFFSET $1

Unfortunately, there is a catch. By default, Prisma makes separate SELECT queries and processes the data in memory when we use distinct. This can result in a performance that is not as good as a native SQL query that uses the DISTINCT keyword. Since Prisma 5.7.0, we can affect that by using a preview feature.

schema.prisma

generator client {

provider = "prisma-client-js"

previewFeatures = ["nativeDistinct"]

}

When we add the nativeDistinct preview feature and run npx prisma generate, Prisma starts using the DISTINCT ON clause.

1	SELECT DISTINCT ON ("public"."Address"."city") "public"."Address"."id", "public"."Address"."city" FROM "public"."Address" WHERE 1=1 OFFSET $1

Unfortunately, it only works with unordered queries right now. If we run our findMany query that finds the first users who signed up from a particular country, Prisma still parses the data in memory instead of using DISTINCT ON:

SELECT "public"."Address"."id", "public"."Address"."street", "public"."Address"."city", "public"."Address"."country" FROM "public"."Address" LEFT JOIN "public"."User" AS "orderby_1" ON ("orderby_1"."addressId") = ("public"."Address"."id") WHERE 1=1 ORDER BY "orderby_1"."id" ASC OFFSET $1;

SELECT "public"."User"."id", "public"."User"."email", "public"."User"."name", "public"."User"."password", "public"."User"."addressId" FROM "public"."User" WHERE "public"."User"."addressId" IN ($1,$2,$3) OFFSET $4;

Hopefully, Prisma will improve its distinct implementation soon and use the native DISTINCT ON in more cases. We can track the progress of this issue on GitHub.

Summary

PostgreSQL offers efficient methods to filter out duplicate rows in queries, which helps handle large datasets with repetitive data. In this article, we explored the DISTINCT and DISTINCT ON keywords in PostgreSQL and compared them using various examples.

We also learned how to filter out duplicate values through Prisma. Moreover, we’ve looked under the hood and learned that Prisma filters out the data in memory by default. Finally, we’ve learned how to change the default behavior and configure Prisma to rely more on the native features built into PostgreSQL.

Series Navigation<< API with NestJS #140. Using multiple PostgreSQL schemas with PrismaAPI with NestJS #142. A video chat with WebRTC and React >>