How to Remove Duplicates in Tableau: 2 Best Methods

by | Tableau

Dealing with duplicate data in Tableau can be a common challenge when analyzing or presenting information.

Clean, accurate, and reliable data is important as it is the foundation on which effective analysis and decision-making are built.

To remove duplicates in Tableau, you can either use the LOD calculations or the WINDOW_MIN(). To do this, you can create a calculated field to return only one value for a unique identifier. You can also create a calculated field using functions like WINDOW_MIN() to flag duplicates. After flagging, you can set up filters to hide the duplicates from the view.

But, wait, there’s more; let us show how to do this properly.

Read on.

This article will provide you with a solid understanding of how to remove duplicates to help ensure the integrity of your data and the reliability of your analysis.

Let’s get into it!

How to Remove Duplicates in Tableau

What Are LOD Calculations?

Level of Detail (LOD) calculations in Tableau are used to perform aggregations that are independent of the view level granularity.

Essentially, they allow you to control which dimensions are used in an aggregation, enabling more complex and customized calculations.

LOD calculations come in three main types: FIXED, INCLUDE, and EXCLUDE, each serving different use cases.

These calculations offer a powerful way to create more insightful visualizations and analyses by specifying the level of granularity you need without altering the structure of your data or view.

2 Methods to Remove Duplicates in Tableau

In this section, we’ll discuss two ways of removing duplicates in Tableau.

Specifically, we’ll go over the following:

  1. Using LOD calculations to remove duplicates
  2. Using Windows function to remove duplicates
2 Methods to Remove Duplicates in Tableau desktop

1. How to Use LOD Calculations to Remove Duplicates

The type of LOD calculation used for removing duplicates is the FIXED calculation.

Suppose we have the following dataset with duplicate values:

Dataset under analysis with aggregate step

To remove duplicates from this dataset, go to the worksheet and create a new calculated field.

Enter the following formula into the field:

{ FIXED [Customer ID]: MIN([Amount]) }

Click OK to create the field.

Creating calculated field for customer name and unique rows

Now, drag and drop Customer ID and Order ID to the rows shelf and Amount into the Text marks card.

Configuring rows field and text fields

You can see that you have only one entry in your data. The above operations will remove duplicate records from your data set.

Next, we’ll explore another method to remove duplicates in Tableau. We, however, recommend using method 1 from your projects as it is more straightforward compared to the second method.

2. How to Use Windows Function to Remove Duplicates

You can also use Windows functions to remove duplicate values in Tableau.

Let’s say you have a dataset with a ‘CustomerID’, ‘OrderID’, and ‘Amount’, and you want to remove duplicate ‘OrderID’ entries.

To achieve the above, you can follow the steps given below:

1. Identify Unique Values

First, you’ll need a unique identifier for each row.

You can create a calculated field named ‘UniqueOrderID’ as STR([OrderID])

Identifing unique values with unique id

2. Create a Calculated Field to Flag Min Index

Create another calculated field, and name it ‘MinIndex’.

In this calculated field, use the formula: WINDOW_MIN(INDEX())

Flagging minimum index in grouped fields section

3. Add Fields to Worksheet

Drag ‘OrderID‘, ‘CustomerID‘, and ‘Amount‘ into the Rows or Columns shelf to set up your worksheet.

Adding Fields to Worksheet

4. Place MinIndex into the View

Now drag your ‘MinIndex‘ calculated field into the view.

You will likely place it in the same Rows or Columns shelf, but it could also go in the Details shelf, depending on your needs.

Placing MinIndex into the View

5. Configure ‘Compute Using’ Option

Click on the ‘MinIndex’ pill in the shelf, and under ‘Compute Using’, select the fields you want to be unique (in this case, ‘UniqueOrderID’)

Configuring 'Compute Using' Option

6. Create Filter

Create a calculated field named ‘FilterUnique‘ with the formula INDEX() = [MinIndex]

Creating filter

7. Apply Filter

Drag ‘FilterUnique‘ into the Filters shelf and set it to ‘True’.

Applying filter

Now, your view should only display the first occurrence of each unique ‘OrderID’. The duplicate entries are effectively removed from the view.

Learn more about the future of data tech by watching the following video:

Final Thoughts

In wrapping up, understanding how to remove duplicated entries in Tableau is crucial for any data analyst or business professional. This is because dirty data leads to faulty insights.

Duplicates can skew your numbers, making metrics like averages or sums inaccurate. By learning techniques to remove or flag duplicates, you’re ensuring that your analyses are built on a foundation of reliable data.

When you clean up duplicates, your data becomes a clearer, more accurate reflection of reality. This makes your visualizations more compelling and your insights more actionable.

Frequently Asked Questions

In this second, you’ll find some frequently asked questions you may have when removing duplicates in Tableau.

Data analyst going through an analytics report

How do I eliminate duplicate dimensions in Tableau?

To eliminate duplicate dimensions in Tableau, first identify the unique identifier in your dataset (such as an ID or date/time field).

Then, create a fixed expression to remove duplicates based on this unique identifier.

For example, you can use { FIXED [Unique Identifier]: MAX([Dimension]) } to retain only the maximum value of the dimension for each unique identifier.

What is the method to identify and remove duplicate rows?

To identify and remove duplicate rows in your dataset, examine your data source to locate any duplicate records.

Next, use Tableau Prep or other data cleansing tools to clean your data and remove duplicates.

You can also create calculated fields with FIXED expressions in Tableau to eliminate duplicate records based on unique identifiers.

How can I prevent double counting in Tableau?

Preventing double counting in Tableau can be achieved by ensuring clean data and using the appropriate aggregations in your calculations.

If needed, create a calculated field using the SUM([Measure]) / COUNTD([Unique Identifier]) formula to ensure that your measures are divided by the unique count of the identifier, avoiding double counting.

What’s the process for grouping and summing without duplicates?

First, to group and sum data without duplicates in Tableau, exclude duplicates by using a calculated field containing a FIXED expression based on your unique identifier.

Then, use this calculated field in your view to group the data and calculate the sum as desired. Remember to use the appropriate aggregations like SUM(), COUNTD(), or AVG() as needed.

How can I use Tableau LOD calculations to address duplicates?

Tableau LOD (Level of Detail) calculations can address duplicate data by defining which dimensions should be considered when aggregating measures.

You can use a FIXED expression to specify the dimension(s) in your dataset that uniquely identifies each row value.

For example, use { FIXED [Dimension1], [Dimension2]: SUM([Measure]) } to calculate the sum of a measure without duplicating data based on the specified dimensions.

Is there a formula or filter to obtain unique values in Tableau?

You can obtain unique values in Tableau by using calculated fields and filters. Create a calculated field using the COUNTD([Dimension]) function to count the distinct values of the dimension.

Then, apply a filter to this calculated field to show only the unique values in your view.

You can also use FIXED expressions with COUNTD() or MAX() functions to ensure that unique values are displayed.

Related Posts