This tutorial will discuss about materialization for data caches in DAX Studio. This post will show how it affects your calculation while generating results with your query. You’ll learn which materialization is better in order to maximize DAX’s performance.
There are two ideas of materialization: Early Materialization and Late Materialization.
Early Materialization is when the number of materialized rows is greater than the number of rows needed for the output.
If your output is a table by calendar year and you have five years of data, bringing a lot of materialized rows is unnecessary. The best practice is to have the number of materialized rows equal to the number of rows needed. However, that’s not always the case because it depends on the complexity of your DAX and data setup.
Always remember that data consists of columns. So when the formula engine needs to work on a table, columns are brought back together in a table format, which is the materialization. In an Early Materialization idea, the storage engine sends data cache to the formula engine. The formula engine then works on that cache and performs complex processes.
Early materialization can be caused by having complex joins or a many-to-many relationship in your data models. It can also be caused by having complex filters or iterators.
It’s not wrong to have complex measures; you just have to set them up correctly to leverage the storage engine as much as possible.
Late Materialization is when you have the number of materialized rows equal or close to the number of rows needed for your output.
The storage engine does almost all the work leaving nothing for the formula engine. This makes the whole calculation faster.
These are example queries of Early and Late Materialization:
Before you run these queries, make sure to clear the cache. This will make your queries run against the cold cache. If you run your DAX without clearing the cache, you will get this result:
The total execution time is only 1861 milliseconds because it already used a cache. So always clear your cache before you run your queries.
If you run the Early Materialization query, you can see that the total time it took is 9485 milliseconds. It generated one data cache and a one-row output but returned 25 million rows.
You don’t need to materialize 25 million rows to get a one-row result because it takes time and slows down DAX’s performance.
If you run the Late Materialization query, you can see that the whole calculation only took 1340 milliseconds. It also generated 2 data caches, with each returning 5003 rows.
If you look at the Physical Query Plan, you can see 5000 rows. They don’t have equal number of rows. The Server Timings’ result will sometimes have a little discrepancy to the accurate number of rows in the Query Plan.
So from working on one data cache with 25 million rows, you now have two data caches with 5003 rows. That’s why working with Later Materialization gets faster results than Early Materialization.
Inside Each Materialization Query
By just looking at the query, you can already see that the Later Materialization is faster. The Early Materialization query counts the rows of a summarized table.
If your query is slow, start by looking at how many rows are being pulled in your query and how many are needed for the output. You can also go to the Logical Query Plan and follow the flow of work that’s happening within the calculation.
Materialization occurs when your queries pull columns or rows from your data model. This process happens naturally when your engine gets data from your data model in order to provide results for your query.
However, problems may occur if your DAX pulls too many rows than what is needed for the result. This issue can be solved by simplifying your DAX to perform specific processes.