In this tutorial, you’ll learn a DAX optimization technique using the UNION function. This tutorial will compare the performance of the UNION function versus Power Query and other techniques for combining tables.
The UNION function combines two tables into a single table by combining their rows and removing any duplicate rows.
The example in this tutorial is a Sales table that’s been split into three tables for each year (2007, 2008, 2009).
The UNION Function In DAX
Open DAX Studio and launch the Server Timings and Query Plan.
The DAX code below is used as the example:
After you run the code, go to the Server Timings tab. You can see that the code takes a total of 5.4 seconds to execute. Most of the execution time was spent in the Storage Engine. Each of these three queries contains the same operation, but the years are different.
After these three queries are returned by the Storage Engine, the Formula Engine performs a UNION, which can be seen in the Logical Query Plan. The Query Plan also performs a SUMX to compute for Total Sales.
In the Physical Query Plan, you can see the three data caches containing the operations performed for each year in the Sales table.
Comparing DAX Performance
To build the comparison, let’s create a measure for Total Sales using the original Sales table.
If you run this code and go to the Server Timings tab, you can see that this is significantly more efficient than the UNION function.
The UNION function consumes 97.9 kilobytes whereas the new measure only consumes 1KB.
When you use UNION, the DAX Engine returns a huge amount of data cache back to the Formula Engine. This puts a lot of pressure on the RAM.
So if you’re dealing with a data model containing more than a million rows, it isn’t recommended to use the UNION function
UNION DAX Function Alternatives
Instead of using UNION, you can use three SUMX functions for this example. That is, one SUMX function for each year.
When you run this code and go to the Server Timings tab, you can see that the execution time is significantly lower. It went from 5,400 ms to 33 ms while still executing three different queries.
Another way is to append the Data Model table in Power Query or in the data source itself.
But the best option is to create a calculated table that will perform the union of those three tables. But this option has its pros and cons.
Using a calculated table allows you to save the overall processing time as opposed to performing an append in Power Query. However, this will cause the database size in VertiPaq to increase.
When you use a calculated table, you’re creating a new table that combines the three tables. But these three split tables will continue to remain in the database.
In this example, the three tables consume a total of 25 MB. If you create a calculated table, it will consume another 25 MB in your RAM.
For sensitive cases like these, you need to make an educated decision on what would be the best course of action to take.
***** Related Links *****
Power BI Modulo And Integer-Divide DAX Functions
DAX Function COLUMNSTATISTICS In Power BI
The Difference Between ALL And ALLSELECTED DAX Functions In Power BI
Conclusion
The UNION function in DAX combines two tables into one, making it easier to analyze data. It works by taking two tables as input and returning a new table that contains all the rows from both input tables. However, this function has its own limitations.
The UNION function can be slower than other methods of combining data, especially if the input tables are large. It can also consume more space in the storage as opposed to other alternatives.
Whenever a situation requires tables to be combined, it’s important to first weigh the pros and cons before deciding to use the UNION function. Your alternatives include the SUMX function, a Data Model table, or a calculated table.
All the best,
Enterprise DNA Experts