Materialization Ideas For Data Caches In DAX Studio

by | Power BI

This tutorial will discuss about materialization for data caches in DAX Studio. This post will show how it affects your calculation while generating results with your query. You’ll learn which materialization is better in order to maximize DAX’s performance.

There are two ideas of materialization: Early Materialization and Late Materialization.

Early Materialization

Early Materialization is when the number of materialized rows is greater than the number of rows needed for the output.

If your output is a table by calendar year and you have five years of data, bringing a lot of materialized rows is unnecessary. The best practice is to have the number of materialized rows equal to the number of rows needed. However, that’s not always the case because it depends on the complexity of your DAX and data setup.

Always remember that data consists of columns. So when the formula engine needs to work on a table, columns are brought back together in a table format, which is the materialization. In an Early Materialization idea, the storage engine sends data cache to the formula engine. The formula engine then works on that cache and performs complex processes.

Early materialization can be caused by having complex joins or a many-to-many relationship in your data models. It can also be caused by having complex filters or iterators.

It’s not wrong to have complex measures; you just have to set them up correctly to leverage the storage engine as much as possible.

Late Materialization

Late Materialization is when you have the number of materialized rows equal or close to the number of rows needed for your output.

The storage engine does almost all the work leaving nothing for the formula engine. This makes the whole calculation faster.

These are example queries of Early and Late Materialization:

materialization

Before you run these queries, make sure to clear the cache. This will make your queries run against the cold cache. If you run your DAX without clearing the cache, you will get this result:

The total execution time is only 1861 milliseconds because it already used a cache. So always clear your cache before you run your queries.

If you run the Early Materialization query, you can see that the total time it took is 9485 milliseconds. It generated one data cache and a one-row output but returned 25 million rows.

You don’t need to materialize 25 million rows to get a one-row result because it takes time and slows down DAX’s performance.

If you run the Late Materialization query, you can see that the whole calculation only took 1340 milliseconds. It also generated 2 data caches, with each returning 5003 rows.

If you look at the Physical Query Plan, you can see 5000 rows. They don’t have equal number of rows. The Server Timings’ result will sometimes have a little discrepancy to the accurate number of rows in the Query Plan.

materialization

So from working on one data cache with 25 million rows, you now have two data caches with 5003 rows. That’s why working with Later Materialization gets faster results than Early Materialization.

Inside Each Materialization Query

By just looking at the query, you can already see that the Later Materialization is faster. The Early Materialization query counts the rows of a summarized table.

materialization

The table that is summarized and materialized is bigger than just doing a CROSSJOIN of VALUES on the table, which is what the Late Materialization query is doing.

If your query is slow, start by looking at how many rows are being pulled in your query and how many are needed for the output. You can also go to the Logical Query Plan and follow the flow of work that’s happening within the calculation.

***** Related Links *****
What Is DAX Studio In Power BI Desktop
DAX Tutorial: Context & The Calculation Engine
Measure In Power BI: Optimization Tips And Techniques

Conclusion

Materialization occurs when your queries pull columns or rows from your data model. This process happens naturally when your engine gets data from your data model in order to provide results for your query.

However, problems may occur if your DAX pulls too many rows than what is needed for the result. This issue can be solved by simplifying your DAX to perform specific processes.

Nick

Related Posts

Comprehensive Data Analysis using Power BI and DAX

Data Model Discovery Library

An interactive web-based application to explore and understand various data model examples across multiple industries and business functions.