GroupBy In Power BI: An Implied Function In VertiPaq

by | Power BI

This tutorial will discuss about the storage engine and implied GroupBy in Power BI. You’ll learn what’s happening behind the performance and see potential bottlenecks. This will allow you to create ideas and strategies to mitigate those problems for faster DAX performance.

The storage engine queries are expressed in a language called xmSQL. It’s not a real programming language but a visual representation of what the storage engine is doing. It allows the users to see what the calculation is doing in the background when running various queries.

Implied GroupBy In Power BI

The first thing you need to look at is the implied GroupBy which is found in every xmSQL.

This is a sample query that has the EVALUATE and VALUES function:

groupby in power bi

The query gets all the values for the year from the DimCalendar table. You can see that it returned 10 rows of data.

The process is shown in the SQL statement where the query selects the Calendar Year from the DimCalendar and then groups it by year. The statement implied a GroupBy function in the calculation because there’s nothing else in the query. Implied GroupBy is more evident when you use aggregations.

Aggregations Inside Storage Engine

To have an idea of what aggregation is, go to DAX Studio and turn on the Server Timings and Query Plan. Next, run this sample measure:

groupby in power bi

After running the query, you can see that unlike the previous sample query that has one scan, this measure has two. This is because the first sample only has the values of DimCalendar Year. This sample query has the Calendar Year and an aggregation.

The first scan is the actual query.

It selects the DimCalendar Year and sums the Sales Quantity from the Sales Data table, which is following the relationships of the data model.

groupby in power bi

In the data model, the DimCalendar table follows the Purchase Date column of the Fact table to the Date column of the Calendar table.

The second scan ensures that all missing values are accounted for.

You can see in the results that the Sales amount only appear from 2015 to 2018. However, it still shows the years 2012 to 2014 even if they’re blank.

Conclusion

The storage engine in DAX Studio shows you the whole process behind a calculation. It allows you to monitor each function and see possible hits that may slow the performance of your DAX. With such information, you can create solutions to mitigate potential bottlenecks in the execution of your DAX code.

If your DAX is running slow, look at the storage engine and find out what you can do to improve the performance of your report.

Nick

***** Related Links *****
VertiPaq Storage Engine On Multi-Threaded Environment
VertiPaq Compression And Storage In Power BI
What Is DAX Studio In Power BI Desktop

***** Related Course Modules *****
Ultimate Beginners Guide To DAX
Mastering DAX Calculations
DAX Formula Deep Dives

***** Related Support Forum Posts *****
Context Transition – Behind The Scenes
DAX – Switch Optimisation
Data Model Optimisation
For more vertipaq support queries to review see here….

Related Posts