There is still a lot of confusion about the difference between SUM vs SUMX in Power BI. This is key knowledge that users have to master because both functions can be used across different scenarios, but there are cases where one is more efficient than the other. You may watch the full video of this tutorial at the bottom of this blog.
I’m going to focus on one example here that would show the distinction between the two. But before I jump into that example, it is important to understand the difference between an aggregating function and an iterating function.
Aggregators Vs Iterators
When it comes to DAX, there are two types of calculation engines – the aggregators and iterators.
Iterating functions go through every single row of a table to add logic to each of these rows.
Aggregating functions look at the entire column left over after the context is placed in a formula. From there, a single aggregation is done for the entire column at a single time.
SUM As An Aggregator
How is SUM used as an aggregator?
In this example, I’m going to compute for the Total Revenue in the sample data given.
The context is always important here. In this case, each specific date is the context of each specific result.
If I dig deeper into this table, it will show that there is a direct relationship flowing from the Date going into the Sales table.
Then if I look at the data working underneath this model, this is how everything fits together.
So the relationship is linked to the Order Date column here. Once specific dates from this column are filtered, the corresponding results are shown under the Revenue column.
From there, the SUM would just do one big calculation of the filtered results.
SUMX As An Iterator
Now, I’m going to use SUMX on the same sample data so that you can see the difference. I can actually calculate for that Revenue without touching the Revenue column.
When the SUMX function is used, it will always ask for a table. Note that either a physical table or a virtual table can be used here.
To come up with the Revenue, I’m going to choose the Sales table. Then, I’ll place an expression, which can be a measure or a specific column from that table into this formula so that it can start running logic on every row. The expression, as explained here, returns the sum of an expression evaluated for each row of the table.
Since the sample data includes the Order Quantity, I’m going to use that here to get the Total. I’m also going to use the Unit Price.
Once I drag that formula into the report, the results are exactly the same.
Of course, they’re both showing the same results because they are both deriving data from the same two columns – the Order Quantity and the Unit Price.
Why use the SUMX if it yields the same result as the SUM anyway?
With the SUMX, the logic is applied not just to an entire column, but to every single row within that column. In fact, I could delete the Revenue column and still be able to retrieve specific results.
So imagine that logic being applied at every row. It multiplies the Order Quantity and Unit Price for the 1st row then saves that into the memory. It does the same thing to the 2nd row and all the other rows after that, saving each individual result.
This means that at the end, what’s being used to calculate the SUMX is not the physical data on the table, but the results saved in the memory.
Hopefully I was able to explain the main difference between SUM vs SUMX in Power BI, especially to those who are still getting the hang of what Power BI can really do.
The rule of thumb is if it’s a simple, straightforward scenario that can be solved by an aggregating function, use SUM. But if more advanced logic is required, use SUMX.
SUMX will also be useful in cases where you have thousands to millions of rows. As long as the tables and columns referenced in your measures are there, using iterating functions would make the process more efficient.
All the best,