Next Gen Data Learning – Amplify Your Skills

Blog Home

Blog

Data Cache: What It Is And How It Helps To Optimize Queries

by | DAX, DAX Studio, Power BI

In this tutorial, you’ll learn what a data cache is and why it’s important. A data cache stores bits of data that could become valuable for future use.

If you need the same pieces of information in a future measure or action, the data cache will provide the information you need. It eliminates making the system fetch again the same sets of data.

In this tutorial, you’ll go through eight important things to remember when dealing with data caches in Power BI:

data cache

Definition & Purpose Of A Data Cache

data cache

A data cache is a temporary table created in memory by the storage engine. This temporary table is sent to the formula engine which then iterates over this information and performs different operations following the query.

All the communication between the formula engine and the storage engine always happens in the form of a data cache. The formula engine prepares the query plans and then sends the instructions to the storage engine. The storage engine then sends it back to the formula engine which returns the result back to the user.

Since the formula engine doesn’t have direct access to the data stored in DirectQuery and VertiPaq, it relies on the storage engine for the data caches.

data cache

The data cache sent by the storage engine is in an uncompressed format. This means that the size of the data stored in the Vertipaq tool can be reduced.

But when it’s time to return the data cache back to the formula engine, it can’t be compressed because the formula engine doesn’t work with compressed datasets.

What Is A Data Cache’s Impact To The RAM

data cache

If the storage engine is sending huge chunks of data caches, this will take a lot of space on the software’s RAM. It’s also going to consume a significant amount of time for the formula engine to perform iterations and produce a result.

So when you write your DAX code, you need to understand that it should be written in a way that minimizes the rows in the data cache.

The best case scenario would be to match the number of rows shown in the visual with the number of rows returned in the data cache.

Most of the time, it’s difficult to return the same amount of rows. If this happens, you’ll need to review your DAX code and break it down in a way that optimizes the data cache produced by the storage engine.

How A Data Cache Optimizes Queries

Unoptimized Functions

data cache

Some of the functions in the DAX language aren’t optimized for the storage engine. To better understand this, let’s look at an example.

For example, in your code, instead of using SUMMARIZE, you can use the GROUPBY function. The SUMMARIZE and GROUPBY functions perform the same operation. However, there are notable differences depending on the scenario.

The GROUPBY function isn’t optimized for the Storage Engine. If you use the GROUPBY function on a fact table with 12 million rows, it will produce a data cache with the same number of rows. This can be as large as 600mb up to 1gb.

This size can put a lot of pressure on your RAM. So instead of using GROUPBY, use SUMMARIZE. The SUMMARIZE function results in a smaller data cache size. You can can pair it with GROUPBY as a top level function.

Reduced Reiterations

When you write your DAX measures or queries, you need to make sure that the code produced behind the scenes is suitable for the storage engine.

You need to write your DAX code in a way that produces data cache that can also be used in subsequent queries. The benefit of doing this is that instead of performing a separate scan using VertiPaq, the storage engine will check whether a similar query was already received.

If this is true, then it means that its data cache is available in memory. So instead of scanning the query again, VertiPaq will simply return it from the cache memory to the formula engine.

If you see a CALLBACKDATAID operator in your storage engine query, it means that those queries can’t be cached. If in case a similar query is sent to the storage engine, it will perform a reiteration.

Thus, VertiPaq will have to call the formula engine to resolve the query. This increases the overall execution time because the cache memory can’t be used in this case.

***** Related Links *****
Materialization Ideas For Data Caches In DAX Studio
Data Caches In A DAX Calculation
Storage Engine In Power BI – VertiPaq Operators & Queries

Conclusion

The data cache is your best friend in the DAX language. Remember to write your DAX code in a way that maximizes the storage space and reduces the VertiPaq tool’s number of iteration. You can reuse the data cache already produced by the Storage Engine to reduce the query execution time.

Enterprise DNA Experts

Related Posts