API with NestJS #141. Getting distinct records with Prisma and PostgreSQL

NestJS SQL

This entry is part 141 of 177 in the API with NestJS

PostgreSQL allows us to filter a query’s results and ensure we don’t get duplicate rows. This can be helpful when your table has many rows where the data in the columns is the same. In this article, we explore two ways PostgreSQL helps us solve this problem. We also learn how to do that with Prisma and configure it to use the native database features instead of filtering the duplicates in memory.

The DISTINCT keyword

Let’s say we have the following model created with Prisma.

schema.prisma

Above, we define a relationship between the users and the addresses. If you want to know more, check out API with NestJS #33. Managing PostgreSQL relationships with Prisma

Let’s say that we have the following addresses in our table:

idstreetcitycountry
1350 5th AveNew YorkUnited States
2290 Bremner BlvdTorontoCanada
3WestminsterLondonUnited Kingdom
4Great Russell StLondonUnited Kingdom
61600 Pennsylvania Ave NWWashington, D.C.United States
71000 5th AveNew YorkUnited States
8Hill RiseRichmondUnited Kingdom
9East Main StreetRichmondUnited States

First, let’s create a query to get a list of all city names from our database.

city
New York
Toronto
London
London
Washington, D.C.
New York
Richmond
Richmond

We can modify the above query using the keyword to get a list of unique cities.

city
New York
Washington, D.C.
London
Toronto
Richmond

Using multiple columns

When we look closer at the list of addresses, we can see a city called Richmond, both in the UK and the United States. Fortunately, we can use the keyword to get a unique combination of the city and country.

citycountry
New YorkUnited States
TorontoCanada
LondonUnited Kingdom
Washington, D.C.United States
RichmondUnited Kingdom
RichmondUnited States

The DISTINCT ON keyword

With PostgreSQL, we can access another handy tool called the . It is similar to  but allows us to retain other columns from the row.

idstreetcitycountry
3WestminsterLondonUnited Kingdom
71000 5th AveNew YorkUnited States
9East Main StreetRichmondUnited States
2290 Bremner BlvdTorontoCanada
61600 Pennsylvania Ave NWWashington, D.C.United States

Above, we selected distinct cities, but we also see ids, countries, and streets.

The catch is that selected one row per each distinct value, but it is unpredictable. In the case of New York, it removed 350 5th Ave (the Empire State Building) and selected 1000 5th Ave (The Metropolitan Museum of Art).

The keyword makes the most sense when combined with the clause. If we order the results before applying the filter, we can be sure that PostgreSQL will always choose the first row based on the order we specified. Thanks to that, we can have predictable results.

Let’s use to find the first user who signed up with an address from a particular country.

First, we need to join the  and table.

If you want to learn more about joins, check out API with NestJS #90. Using various types of SQL joins

Let’s make sure each country is unique using the clause. By ordering the rows by the user ID, we ensure we get the users with the lowest ID possible. Thanks to that, we get the first users who signed up from a particular country.

firstRegisteredUserIdcountry
3Canada
4United Kingdom
2United States

Finding distinct values using Prisma

Prisma allows us to filter duplicate rows when using the query.

We can take it further and replicate the example we wrote before using the keyword.

How it works under the hood

Let’s configure Prisma to log all SQL queries to the console.

prisma.service.ts

If you want to know more about logging with Prisma and NestJS, take a look at API with NestJS #113. Logging with Prisma

Let’s run our query that finds all distinct cities and inspect the SQL query that Prisma makes under the hood:

Unfortunately, there is a catch. By default, Prisma makes separate queries and processes the data in memory when we use . This can result in a performance that is not as good as a native SQL query that uses the keyword. Since Prisma 5.7.0, we can affect that by using a preview feature.

schema.prisma

When we add the preview feature and run , Prisma starts using the clause.

Unfortunately, it only works with unordered queries right now. If we run our query that finds the first users who signed up from a particular country, Prisma still parses the data in memory instead of using :

Hopefully, Prisma will improve its  implementation soon and use the native in more cases. We can track the progress of this issue on GitHub.

Summary

PostgreSQL offers efficient methods to filter out duplicate rows in queries, which helps handle large datasets with repetitive data. In this article, we explored the and keywords in PostgreSQL and compared them using various examples.

We also learned how to filter out duplicate values through Prisma. Moreover, we’ve looked under the hood and learned that Prisma filters out the data in memory by default. Finally, we’ve learned how to change the default behavior and configure Prisma to rely more on the native features built into PostgreSQL.

Series Navigation<< API with NestJS #140. Using multiple PostgreSQL schemas with PrismaAPI with NestJS #142. A video chat with WebRTC and React >>
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments