In this post, we’re going to show you how to use the Chi-square test in Excel to compare variables. You can watch the full video of this tutorial at the bottom of this blog.
The Chi-square test of independence has two categorical variables. We will find out if there’s any relationship between those variables.
We will be using the hypothesis testing framework. In case you are not familiar with this, check out the Advancing Into Analytics book. This is an advanced analytics technique using Excel, R, and Python.
For our demo, we will be using a housing data set in Excel. We need to get the actual values using the pivot table and then compare this to the expected values using formulas.
We will also check the 95% significance level and if there’s a relationship between the air conditioning and the preferred area.
Using Chi-square Test In Excel To Compare Variables
Let’s jump into Excel. These are the housing prices that came from the Journal of Applied Econometrics.
Insert a Pivot table, then click OK to use an index column to count observations.
From the Pivot table fields, drag ‘id‘ into Values section. Select Count from the Value Field Settings since these are all unique values.
Next is to drag ‘prefarea‘ into Columns and then drag ‘airco‘ into Rows sections to see the actual values.
We have a total of 546 observations. The 298 are neither air conditioning nor preferred area, while 53 are both and so on.
Let’s create two tables and label them as Actual and Expected.
To get the expected values for neither air conditioning nor preferred area, multiply D5 by B7, then divide it by D7.
Next is to get the expected value of either the air conditioning or preferred area, multiply D5 by C7, then divide it by D7.
Now for the expected value of either the preferred area or air conditioning, multiply D6 by B7, then divide it by D7.
Lastly, let’s compute the expected value of both air conditioning and the preferred area. Multiply D6 by C7, then divide it by D7.
Highlight the range of the expected values to find out the sum, which should be 546.
Now we will find the relationship between the actual and expected values. Let’s check if they tend to move together in a way.
To find the Significance, type =CHISQ.TEST. Highlight the range of the actual values, add a comma, then highlight the range of the expected values to get the p-value.
Based on the result, this is a slight random chance. There seems to be some relationship between the preferred area variable and the air conditioning variable.
The Chi-square test is a common technique that helps us know the difference by category, and is a good example of A/B testing.
Another thing to remember is that the observations need to be independent. Based on our example, we are looking at two binary variables, a specific type of categorical variable.
Lastly, let’s be aware that it can get tricky when we have a lot of categories.
I hope this is something that you can use for A/B testing, whether in Excel or Python.
All the best,