SQL DISTINCT: A Detailed Guide With Examples

by | SQL

SQL, or Structured Query Language, is an essential tool for managing and manipulating data within relational databases.

One common challenge database users face is working with duplicate values within tables. To address this issue, SQL provides a powerful keyword called DISTINCT, which allows users to filter out duplicate records and display only unique values in their query results.

The DISTINCT keyword is used within the SELECT statement to eliminate duplicate rows in the result set, ensuring that each value is displayed only once, regardless of how many times it might appear in the table. This feature is particularly useful when working with large datasets where identifying unique values can be time-consuming and cumbersome.

Furthermore, by using DISTINCT in a query, database users can quickly and efficiently obtain the desired results without having to deal with unnecessary duplicates.

SQL DISTINCT

In this guide, we will provide an overview of how to use DISTINCT in SQL queries, including SELECT DISTINCT examples. You will learn the syntax for applying DISTINCT on single and multiple columns, how it differs from the ALL keyword, and how to combine DISTINCT with other SQL clauses like WHERE, ORDER BY, and GROUP BY.

By the end, you will have a solid understanding of this simple but powerful tool for managing duplicate data. So let’s get started.

Understanding SQL DISTINCT

The DISTINCT clause is used in SQL SELECT statements to return only the unique values from a specified column or set of columns in a table. It is used to eliminate duplicate rows and display a unique list of values.

1. Basic SELECT DISTINCT Syntax

The basic syntax for using SELECT DISTINCT is as follows:

SELECT DISTINCT column_name
FROM table_name;

For example, consider the table invoices from the Chinook sample database:

Invoices table from the Chonook database

To retrieve unique customerIDs from the table, you can use the following SQL query:

SELECT DISTINCT customer_id
FROM invoices;

2. DISTINCT vs ALL

The DISTINCT keyword contrasts with the ALL keyword. By default, a SELECT query returns all rows, including duplicates. When you use the DISTINCT keyword, the query filters the result set to display only unique values, removing any duplicates.

On the other hand, the ALL keyword, when used explicitly, allows the inclusion of duplicate records in the result set.

Let’s see an example of using SELECT DISTINCT using the DISTINCT table in the HR Data sample database. The table has a record of 39 employees.

employees table from the HR Data sample database

If we want to find out the unique salaries of the employees, here’s how it can be done using SELECT DISTINCT:

SELECT DISTINCT salary FROM employees

This query retrieves all unique salary values from the employees table with a total of 32 results after excluding the duplicate salary records.

Output of the SELECT DISTINCT query

When using the ALL keyword, instead of DISTINCT:

SELECT ALL salary FROM employees

This query retrieves all 39 salary values from the employees table, including duplicate records.

Output of the SELECT ALL query

3. Using DISTINCT with Multiple Columns

When using the SQL SELECT DISTINCT statement, you are essentially telling the database to return only unique values in the specified columns. The database engine compares each row in the result set, and if it finds a row that matches another row in all specified columns, it discards the duplicate row.

When you want to retrieve unique combinations of values from multiple columns, simply list the columns after the DISTINCT keyword, separated by commas:

SELECT DISTINCT column1, column2, ...
FROM table_name;

For example, when you run the following SQL statement for the employees table:

SELECT DISTINCT First_Name, Last_Name, department_id FROM employees

The result set would include all 39 rows because each row is distinct when comparing all the specified columns.

SQL SELECT DISTINCT finds distinct records by comparing all columns specified in the query.

However, when running a similar query for a particular column:

SELECT DISTINCT department_id FROM employees

The result set would include only 11 rows since there are only 11 unique entries in the department_id column.

Output of SQL SELECT DISTINCT statement for column department_id

4. Using DISTINCT with Expressions

You can also use the DISTINCT keyword with expressions or calculations involving one or more columns. In this case, the expression is placed after the DISTINCT keyword:

SELECT DISTINCT expression
FROM table_name;

For example, let’s assume that a products table contains columns for price and quantity.

Dummy products table

If you want to retrieve distinct total amounts based on the expression price * quantity. You can use the following SELECT query:

SELECT DISTINCT quantity * price as total_amount
FROM products;

In this query, the expression quantity * price calculates the total amount for each order, and the DISTINCT keyword ensures that only unique total amounts are retrieved.

SELECT DISTINCT can be used to find unique values of expressions calculated using multiple columns

SQL DISTINCT Uses in Queries

In this section, we will explore the use cases and examples of SQL DISTINCT.

Specifically, we will discuss the following:

  1. Selecting Distinct Values
  2. Removing Duplicate Rows
  3. Counting Distinct Values

1. How to Select Distinct Values Using SQL DISTINCT

The SQL DISTINCT keyword is useful when you want to eliminate duplicate values and list only the different ones.

The syntax for selecting unique values is as follows:

SELECT DISTINCT column1, column2, ...
FROM table_name;

For example, you can select unique values from the salary column in an employees table like this:

SELECT DISTINCT salary
FROM employees
ORDER BY salary DESC;

This will return 32 unique or distinct values in the column salary, arranged in descending order:

Selecting Distinct Values using SQL SELECT DISTINCT

2. How to Remove Duplicate Rows Using SQL DISTINCT

When using SQL databases, it is common to encounter duplicate rows in the result set. The SELECT DISTINCT clause can help filter out these duplicates and return a clean set of unique rows. This applies to both single and multiple-column selections.

For instance, suppose you want to view the list of cities where you receive orders from. You can write the query as follows:

SELECT DISTINCT BillingCity
FROM invoices;

This query returns 52 unique rows containing the cities from where orders were received.

Removing Duplicate Rows using SELECT DISTINCT

3. How to Count Distinct Values Using SQL DISTINCT

You can also use the DISTINCT keyword within aggregate functions, such as COUNT(), to count the number of unique values present in a column.

The syntax for this operation is:

SELECT COUNT(DISTINCT column) AS AggregateName
FROM table_name;

If you want to count the number of distinct cities in the invoices table, you can use the following query:

SELECT COUNT( DISTINCT BillingCity) as UniqueCities
FROM invoices;

This query will return the count of unique cities:

Counting unique values vy combining the DISTINCT clause with COUNT()

Combining SQL DISTINCT with Other Clauses

In this section, we will explore how to combine the SQL DISTINCT clause with various other clauses.

We will go over the following:

  1. DISTINCT with WHERE
  2. DISTINCT with ORDER BY
  3. DISTINCT with GROUP BY
  4. DISTINCT with JOIN

1. How to Combine SQL DISTINCT With WHERE

The DISTINCT clause can be combined with the WHERE clause to filter and show only unique values based on specific conditions. This is useful when you want to return a list of unique values that meet certain criteria.

For example, let’s see how you can get the list of unique order totals for a particular city from the invoices table:

SELECT DISTINCT Total
FROM invoices
WHERE BillingCity = 'Berlin';

In this example, the query returns unique values from Total where the value in BillingCity is equal to ‘Berlin’.

SELECT DISTINCT can be combined with the WHERE clause to filter rows before finding distinct values

2. How to Combine SQL DISTINCT With ORDER BY

When you want to order the unique results from a query, you can use DISTINCT with the ORDER BY clause. This combination ensures the distinct results are displayed in the desired sorting order.

Let’s extend the example above in which we find out the list of unique order totals coming from Berlin:

SELECT DISTINCT Total
FROM invoices
WHERE BillingCity = 'Berlin'
ORDER BY Total ASC;

In this example, the query fetches unique values from Totals and orders them in ascending order.

SELECT DISTINCT can be combined with the ORDER BY clause to sort the unique results in ascending or descending order

3. How to Combine SQL DISTINCT With GROUP BY

The DISTINCT clause is often used with the GROUP BY clause to return unique values or unique combinations of multiple columns while applying aggregate functions. However, when using the GROUP BY clause, the DISTINCT keyword can be omitted because it is already implied.

Here’s an example:

SELECT BillingCity, COUNT(CustomerId) AS TotalCustomers
FROM invoices
GROUP BY BillingCity
ORDER BY TotalCustomers DESC;

In this example, the query groups the data in invoices table by BillingCity and counts the occurrences of unique values in CustomerID for each group. The output is a list of distinct BillingCity values and their corresponding counts in CustomerID.

GROUP BY can be used to apply aggregate functions on a column based on another column

4. How to Combine SQL DISTINCT with JOIN

You can combine DISTINCT with JOINs to retrieve unique values from multiple tables that meet specific joining conditions. This is helpful when you want to avoid duplicate values from the joined tables.

Here’s an example:

SELECT DISTINCT t1.column1, t2.column2
FROM table1 t1
JOIN table2 t2 ON t1.column3 = t2.column3;

Let’s break this down:

SELECT DISTINCT can be applied to multiple tables using the JOIN clause

In this example, the query fetches unique values from column1 of table1 and column2 of table2 where the joining condition column3 matches.

How to Handle NULL Values in SQL DISTINCT

In this section, we will explore how to handle NULL values in SQL DISTINCT. We will go through a 2 step process of understanding NULL values and then handling them.

1. Distinct and Null Values in the Result Set

When using the DISTINCT keyword in SQL, it’s important to understand how it handles NULL values. The purpose of DISTINCT is to eliminate duplicate records in a result set and only return unique values.

However, when it comes to NULL values, DISTINCT does not eliminate them but rather treats them as unique values.

For example, consider the following example using the table employees:

SELECT DISTINCT manager_id 
FROM employees;

The result set would include the null value (NaN):

Using SELECT DISTINCT on data with NULL values

2. Filtering Null Values

To exclude NULL values from the result set when using DISTINCT, you can use the WHERE clause with the NOT NULL operator.

Let’s consider the same example as above, and run the following query to filter out NULL values:

SELECT DISTINCT manager_id 
FROM employees
WHERE manager_id IS NOT NULL;

As a result, the NULL values will not appear in the result set:

Use the WHERE column_name IS NOT NULL with SELECT DISTINCT to filter out null values

Final Thoughts

The DISTINCT keyword is an indispensable tool for efficient data retrieval and analysis.

As we have seen in this guide, its simple addition to SQL queries can vastly simplify working with duplicate information in large databases.

By, using DISTINCT, you can cut through redundant data and focus on the unique values relevant to your analysis.

Also, by combining it thoughtfully with filtering, sorting, and grouping clauses, you can fine-tune your results to contain precisely the distinct information you need.

And, by carefully incorporating the SELECT DISTINCT statement into properly constructed queries, you can streamline your database management within your organization.

Remember, better queries also make data analysis work better, not just in SQL databases, but in tools like PowerBI too.

Frequently Asked Questions

In this section, you’ll find some frequently asked questions that you may have when working with SQL Distinct.

Frequently Asked Questions

What is the difference between DISTINCT and UNIQUE in SQL?

In SQL, the DISTINCT keyword is used in the SELECT statement to eliminate duplicate rows from the result set, allowing only unique values to be displayed. The UNIQUE keyword is a constraint applied to a table column that ensures that all values in that column are unique across all rows in the table.

How do I combine DISTINCT and IN operators in SQL?

To combine DISTINCT and IN operators in an SQL query, you can use a subquery to select distinct values and then use the IN operator to filter results in the main query. For example:

SELECT employee_id,first_name,last_name, department_id
FROM Employees
WHERE department_id IN (
  SELECT DISTINCT department_id
  FROM employees
  WHERE manager_id = 100
);

This query will return all employees that work in the departments where any employee has a manager_Id of 100. This would be useful if, for example, you were interested in looking at departments managed by a specific manager.

Here is the output:

By using SELECT DISTINCT with the IN operator, you can create complex and powerful queries

How does DISTINCT work with NULL values in SQL?

When using the DISTINCT keyword, NULL values are considered to be distinct from all other values, and from each other. Therefore, if there are multiple NULL values in the column being selected, DISTINCT will return only one NULL.

author avatar
Sam McKay, CFA
Sam is Enterprise DNA's CEO & Founder. He helps individuals and organizations develop data driven cultures and create enterprise value by delivering business intelligence training and education.

Related Posts