When you’re optimizing your data model and your DAX for Power BI, it helps to remember that there are two engines running behind the scenes — the formula engine and the storage engine.
These two engines work together to return the results needed back to the user. They provide the results set in the form of a table or a scale value.
The Role Of The Formula Engine
In this article, we’re going to focus on the formula engine.
The formula engine is the top-level engine that manages the whole architecture of a query.
Whatever you do while executing a query or while creating a calculated column or a calculated table, the formula engine kicks in.
The formula engine understands the queries received not just in DAX, but also in MDX. For example, when you’re using Excel, you create pivot models based on pivot tables, and pivot tables actually use MDX language.
This is where the formula engine kicks in. Because it understands both DAX and MDX, you can use another Power BI model. The queries sent from this data model will be using DAX, while the queries coming from the pivot model will be using MDX.
How The Formula Engine Works
Understanding the queries originating either from the MDX source or the DAX source is just one factor in the formula engine’s performance. It also goes through other steps to help you work on your queries.
First, it has to generate a query plan that the storage engine understands. As the lower level engine, the storage engine doesn’t actually know if a pivot table inside Excel or a visual inside Power BI executed the query.
In creating a query plan, the formula engine builds a logical query plan first, then it creates a physical query plan. Both the logical and the physical query plan should look like each other.
After completing the physical query plan, the user or the client tool gets the results set. From there, the client tool dictates how it will use the results or data set to display the end result to the user.
Main Limitations Of The Formula Engine
The formula engine plays a huge role when you’re optimizing DAX for Power BI using DAX Studio. However, it also comes with some limitations.
First, the formula engine always uses a single core to manage your entire operation regardless of how many CPU cores are available in your system.
When I open up my Task Manager, you’ll see that my CPU has six cores.
Despite this fact, the formula engine will still choose one of those six cores to work with, while it leaves the storage engine to use the rest.
Second, the formula engine doesn’t have direct access to your database. Because of this limitation, you’d have to store data either inside the internal memory, or you can store in the database and create a DirectQuery over that database.
It also relies on the storage engine to get the data in the form of a data cache. The formula engine then iterates over that data cache to proceed before it can provide results to the end user.
How The Formula Engine Helps Optimize DAX For Power BI
Sometimes, when you’re trying to optimize your data model or your DAX for Power BI, you’ll notice the storage engine slowing down as it executes queries. In this case, you can use the formula engine in completing the entire calculation to optimize the process.
Note that if the formula engine ends up building the entire table on its own, the results set created in the process will not be used in the subsequent queries that will be generated by the client tool if they are similar to the queries previously executed. But in case the storage engine provides the data cache, then that particular data cache can be kept in the memory so that it can optimize requests coming from the client tool.
How will that work?
Let’s say we write a query to be executed completely by the storage engine, but with some aspects to be computed by the formula engine. In this case, the storage engine will send a data cache back to the formula engine and that data cache (in case it’s a simple one) can be kept in the memory.
Now, if a similar query comes in a few seconds later, instead of scanning the whole data model once again, the formula engine or the storage engine will use that particular data cache so that the query time can be reduced and the user experience can be improved.
Reducing The Load On The Formula Engine
Again, the formula engine understands DAX and MDX codes.
So if you use CALCULATE, CROSSJOIN and other tasks, the formula engine knows all the functions that you’re using within the DAX language and will be able to solve all of it on its own. However, this could impact the performance of the query especially since the formula engine tries to do everything using a single core.
This is why we also need to be mindful about reducing the load on the formula engine and increasing the load on the storage engine.
A good example of how different the computation is within the formula engine versus the storage engine would be the use of CROSSJOIN versus SUMMARIZE.
When you use CROSSJOIN, only a limited amount of data will be coursed through the storage engine. As for the rest of the computation, it will all be performed by the formula engine. In comparison, using SUMMARIZE means executing the entire query within the storage engine.
Of course, this would also depend on how SUMMARIZE is used.
If SUMMARIZE is used as the innermost function, then it will definitely be pushed down to the storage engine. But in some cases, SUMMARIZE is used in conjunction with a different code that requires the formula engine to work. In this case, the formula engine will step in and do the work.
Knowing where the load goes can help you optimize your query’s performance, especially when you keep in mind what the capacity and limitations are for the engines in your disposal.
Tools That Can Help Optimize DAX For Power BI
Optimizing DAX will be easier if you maximize the tools that make it possible. DAX Studio is one of those tools.
Let’s take the example earlier where the formula engine takes on the entire load of completing the computation when the SUMMARIZE function is used. DAX Studio can help you rewrite your code in a way that pushes SUMMARIZE into the storage engine, freeing up the load on the formula engine. This potentially improves the performance of your data model and queries.
Of course, this should work seamlessly for simple queries. There are also more complicated scenarios.
For example, you can build composite models that use two storage engines within the data model itself. One could be for Vertipaq, while the other would be for DirectQuery. In this case, your database has to be optimized in a way that the queries generated will be built for that specific relational data model.
Let’s say you have two tables inside your date model — the Products table and the Sales table. The Products table is in the Vertipaq storage, while the Sales table, being a larger table, is stored directly inside the database through a DirectQuery connection.
Let’s also assume that you’re dragging the Brand column into the matrix, as well as the Total Sales measure that’s coming from the Sales table within the DirectQuery connection.
In this case, the formula engine will have to perform the operation between both engines.
First, it will retrieve the Product Key and Brand columns from the Products table. Then, it will retrieve the Net Price, Quantity, and Product Key columns from the Sales table. Once it has the data cache from both Vertipaq and DirectQuery, the formula engine will have to take the data cache and try to join them to provide the results to the end user.
The formula engine truly plays a huge role when you’re working with DAX. However, it’s critical to understand how it works first before jumping in. Mastering how to make the formula engine and storage engine work seamlessly together is key, especially if you want your DAX queries to perform better.
As the top-level engine inside the analysis services architecture, the formula engine has to perform a lot of operations. But once you also understand the ins and outs of the storage engine, you can figure out which operation to push into the storage engine and which one to leave within the formula engine.
We’ll dig deeper into the storage engine in a separate tutorial.
All the best,
Enterprise DNA Experts