VertiPaq Power BI: How Columns Are Encoded

by | Power BI

In this tutorial, you’ll learn about the methods of encoding columns using VertiPaq in Power BI. You’ll understand how each method works and how it affects the performance of your DAX.

Each method has its own style of compressing data so that it can give you faster and smoother transactions.

Value Encoding With VertiPaq In Power BI

The first data compression type is Value Encoding. It occurs on integer-type columns.

This is an example of an original cost column:

You can see that there are values inside.

Now, if VertiPaq stores these data in the column as is, it would need 12 bits of memory.

However, VertiPaq looks for ways to minimize how much memory it needs to store data. It will find the minimum value of a column and then encode that column by subtracting 3004 from each value. By doing so, it would then only need 9 bits of memory.

vertipaq power bi

VertiPaq lessened the memory it needs, making your code perform faster.

Dictionary Encoding

The next data compression method is Dictionary or Hash Encoding which occurs on non-integers or text values columns.

This is an example of a column with repeating product names:

VertiPaq has a feature that helps compress non-integer columns. It builds a list of distinct values in that column and then assigns a number to them.

vertipaq power bi

You can see that it applies a number that corresponds to the product names.

Next, it replaces the values of the product names with the integer values.

vertipaq power bi

Storing integer values requires lesser bits than storing text values.

This is something that DAX and the Storage Engine does when they’re encoding.

Run Length Encoding With VertiPaq In Power BI

The last encoding type is called Run Length Encoding. It looks for repeating values and stores a start and an end.

This is a sample table with Sales Person ID and Rows:

And this is the table that is encoded using the Run Length Encoding:

vertipaq power bi

It knows that the salesperson with an ID of EMP1017 goes from row 1 to 363.

You’ll see that it has the Row Start, Row End, and it also shows how many repeating values there are in between. So, there’s no need to store EMP1017 362 times because it knows where it starts and where it ends.

Run Length Encoding Using Encoded Columns

Another great thing about VertiPaq is that it could Run Length Encoding on columns that were encoded with Value or Dictionary Encoding.

This is a table that’s encoded using the Dictionary Encoding:

You’ll notice that the NameID column is repeating. It means that you can use the Run Length Encoding with the column.

In this case, VertiPaq will only look for the NameID, Row Start, and Row End.

vertipaq power bi

So instead of storing the integer values 3 times, it now stores it once.

***** Related Links *****
Optimize DAX Functions With This New Course
Measure In Power BI: Optimization Tips And Techniques
Advanced Tips To Optimize Your Power BI Table

Conclusion

Inside the VertiPaq Storage Engine, there are 3 data compression methods which are responsible for making your DAX code run smoother.

Once you’ve understood how each method works in your calculations, you can use and maximize any of them to improve the performance of your code and easily retrieve the data you need in your reports.

Nick

Related Posts

Comprehensive Data Analysis using Power BI and DAX

Data Model Discovery Library

An interactive web-based application to explore and understand various data model examples across multiple industries and business functions.