In this tutorial, you’ll learn about what data caches do in a DAX calculation. You’ll understand how they work with the formula and storage engines to provide answers to your queries.
What Are Data Caches?
The storage engine interacts with the formula engine in the form of a data cache.
If you recall from the previous blog posts about the DAX Query Structure, you’ll know that the formula engine requests data from the storage engine. The latter then goes to the database, gets different types of data, and then returns it back to the formula engine in the form of a data cache.
The storage engine can do simple math and aggregations which makes it faster. It also offers parallelism which can fill multiple data caches at once. One storage engine query can take time in generating an answer. But if you add up all different storage engine queries, the results will not be the same because it’s filling more than one data cache.
The formula engine, on the other hand, works on complex queries. Since it’s a single-threaded engine, it can only use one final consolidated data cache. It starts working on the data cache once all requests are filled by the storage engine.
If the formula engine request is exactly the same as the previous request, the cache that was previously returned will be used again. The storage engine is smart enough to return the same data cache if it receives the same request. This makes the DAX’s performance faster since the data is already cached.
When you run a measure, it’s important to run it against a cold cache; always avoid running your queries in the warm cache. A warm cache means that the storage engine data cache is already formed and ready to be returned.
What Are The Key Features?
Data caches are tables materialized in memory which takes time and eats RAM. Always keep in mind that the materialization should be as low as possible.
Another key feature is that these caches are uncompressed. Once it comes out of the storage engine and sent to the formula engine, the data cache becomes an uncompressed full table in memory. This execution is much slower than a peer storage engine request which is all in memory and compressed.
There’s no problem with using the formula engine; you just need to get the storage engine to do the work as much as possible.
Data caches are also temporary. They exist for awhile and then they’re gone. Even though they are materialized in memory, they’re not persistent so they will get cleared out.
Where To Find Data Caches?
You can find this information in the Physical Query Plans’ Records Column and in DAX Studio’s Server Timings in the Rows Column. It will you show how many rows are being produced by the storage engine and being sent to the formula engine.
If you run certain tests in the server timings, you’ll notice that the rows count may be off by a little bit. This is just a guideline of how many rows you’re pulling up.
The main thing to keep in mind with data caches is that the number of rows materialized should be equal to the number of rows needed for the output as close as possible. If there’s a big difference in the number of rows, it means that the order operation is off.
If the amount of materialized rows is equal to the amount of rows needed for the output, it’s called Late Materialization. However, if the number of materialized row is greater than the number of rows needed, it’s called Early Materialization. Since these are uncompressed data, you should not let both engines materialize them because it will slow down DAX’s performance.
This is an example of an Early Materialization:
In the Physical Query Plan, you can see 25 million rows being pulled up in the Records Column. However, the sample query doesn’t need 25 million rows of output. This means that the query pulled too many rows than needed.
If you look at the Server Timings, you’ll see the same amount of rows being pulled in the Rows Column.
In the SE CPU visuals, you’ll see a number if there are parallelisms in your queries. Since this example has none, it showed 0.
A slow performing DAX may mean that your query is pulling unnecessary caches for your engines to work. Data caches are responsible for the speed and accuracy of your DAX because the engines work on them to generate answers for different requests. For a faster DAX performance, you can solve this issue by properly building your data model and DAX measures.