Automating your data filtering and cleaning processes can be such a huge time saver. For today’s tutorial, I’m going to demonstrate how you can compare and merge queries by using Anti Join in Excel with Power Query. You can watch the full video of this tutorial at the bottom of this blog.
Let me start with the scenario. We’ve got two lists of experts, which are fictitiously generated although some of the names may be familiar. Let’s say, I’ve got on the left Python experts, and on my right are Power BI experts. What I want to know is I can see that there are some shared names, but I want to find out who is only a Python expert and who is only a Power BI expert. That’s what we’re going to be able to do with Anti Joins.
Compare And Merge Queries Using Anti Joins In Power BI
We can do this in two directions. The left Anti Join is going to be table 1 here to the left. We’ll find out the values that are only to be found on that left table. Who only does Python in this case? Right Anti Join on the other side will have the same idea. Who only does Power BI?
Let’s run this in Excel. In my demonstration, we’re going to have two tables. We are going to have the population of the US in 1950, and then again in 2020. I want to find out which cities are only found on one of those tables. So, let’s try this out.
These are the top 10 cities by population for those census years. I want to know which are unique. We know New York is big. We know Chicago’s big, but which of these weren’t on the 1950 census or vice versa? That’s what I’m going to be able to do with Power Query.
Let’s get into my Query Editor. I’m going to edit this query and go to Merge Queries as New.
We have our population of 1950 and I’m going to select my population for 2020. The common relationship is going to be City and we want a Joint Kind of Left Anti Join, then click OK.
As you can see, there are matches that are only available In 1950. We’re going to see cities like Detroit, Cleveland, and St. Louis. These Midwestern towns have lost population since then.
If you want to bring in information from the 2020 census, you can click on the upper right of the table column (pop_2020), and you’ll find these columns that you can select. I’m not really concerned about these things, in this case. All I wanted is the list of names, so we’re just going to leave this as is right now.
With that in mind, I’m going to go back to my 1950 table, and let’s run this again. We’re going to select Merge Queries as New. We’re going to do 2020 now. The same idea, the relationship is going to be City on City. But this time we’re going to do a Right Anti Join. A Right Anti Join is going to show the cities that are only found in 2020 and not 1950?
We’re going to see the opposite here. All we have to do, in this case, is click on the table and we’re going to get the information from that right table. Right Anti Join is a little less common than Left Anti Join.
What I mean by this is, in theory, it does the same thing. In this case, what it’s telling us is that here are the cities that were only found in the 2020 census. So, if we look at this, we see a lot of Texas, we see California, etc. The Sunbelt idea, people moving to the Southern and Western states seems to hold up in the data.
Finally, we close and load this, we’ll select Only Create Connection.
Anti Joins are really helpful. If you’ve got two data tables (or you could do multiple), and you want to see what the changes are from the two, this could be useful. If you’re trying to figure out what customers haven’t placed an order, for example, you could use an Anti Join as well.
Maybe you want to figure out any unique values, or one specific location, or if there are products selling in one place but don’t in another, etc. These are interesting ways to use a Join. Anti Joins are almost like filters on your table.
I hope it got you thinking about how you might be able to use it. If you’re using Anti Joins, let us know how you’re doing it. If you see a use case for Anti Joins, let us know what it is.
All the best!