Linking Datasets via the Merge Command

Introduction to PowerQuery Linking and Organizing Data
9 minutes
Share the link to this page
Copied
  Completed
One of the most frequently used formulas in Excel is the VLOOKUP formula. Next on the list is likely the INDEX/MATCH combination for doing the same thing. These functions are great for singular or one-off calculations; however, they frequently get used for large sets of data. In this lesson, we'll look at how to replace many lookup calculations with a simpler and stronger relationship match through the Get & Transform interface.  Refer to 2-4-practice-file-base.xlsx for the practice set associated with this lesson.

Transcript

In the previous lesson, we appended data from multiple data sets together in a top to bottom approach. More often, you'll need to link data together based on relationships than a more left to right type of approach. While the append command is akin to copying one data set below another data set, merging data sets is more like the traditional v lookup index match approach. But even that description doesn't quite capture the capabilities of power queries merge functions. Let's start with a simple example. We have two tables, one has a list of stores with their fruits and their quantities.

The other has the fruits, the wholesale price and the country of origin. Suppose we wanted to calculate the total expenditure based upon the wholesale price. We could link that wholesale price into the store details table using the lookup formula just like so. This is fairly quick and easy and I think duplicate my efforts to bring the country of origin information into the table as well. But this duplicates the efforts through the match function. Now, this is a one to many relationship between our fruit descriptor table and our store details table.

And our use of functions is akin to creating a left join between the tables. Much of this terminology is common when working with databases, but it's not used very frequently when using Excel. Don't worry about focusing on or remembering these specific terms too much. The query editor will help us out a lot. And as you use power query, he'll become more accustomed to what the terms mean. Now that we have the brief example out of the way, let's revert to a prior version of this workbook without any formulas or connections.

Here we have our base query set up and ready to go. I'm going to edit the store query to bring up the query editor. In the editor, I will click on merge queries on the Home tab to bring up our merge wizard. Within the wizard we can see our current table at the top and there will be A blank drop down and a blank box in the middle of the wizard. Clicking on that drop down will show a list of the queries within our workbook. And I will select the fruits table to populate that center box.

Below both of these tables, there's a drop down with the join kind, the initial default selection will always be left outer. In addition to the type of join, the item will include a brief description of what each join means. For the left outer. This means we get all from the first 10 matching from the second. What this actually means is that our left outer join will pull all of the rows from the top table, but will only pull matching items from the second table. If we look at our list of options, we can see that the Right Outer pulls all from the second and matches from the first, the full pulls all from both the inner pulls only matching rows and then the anti joins only pull the unmatched rows from each table.

We'll cover the anti joins again soon. Additionally, I'll share a Venn diagram explaining each of the different join types later in this Lesson. Once we have the join cans selected, the next step is to define the relationship between the two tables. And to do that, I need to click on each of the table columns that I want to merge to match across. I'm going to click on the item field in my stores table. And I want to merge this with the values that are in the item field on the fruits table, so I can click on it as well.

Once the merge is defined, the editor will tell me how many rows of data match my defined criteria at the bottom. In this case, I have 15 matches out of the first 15 rows. So in other words, all of my data points match up. This can be useful for quickly figuring out how many data points may not have a match or how closely aligned the two data sets are. Once I press OK, this will add a new column to the query that will be populated with table values. So what are those table values?

The result of the merge command is an entry for each row that represents all of the rows from the associated table where the relationship holds true. With this table type is a different column button. When we click on will have an option to expand or aggregate our fields. In this case, we want to expand and I can select or deselect any columns I want to show, so I'll deselect the item, since I don't need a second copy of the fruit names. When I press OK, I'll see the full list of prices and the country of origin for each fruit. This duplicates our efforts from the excel formula approach with one quick command.

Imagine needing to connect six, eight, or even 10 or more columns to a table. This could save you tons of time over the traditional Excel lookup approach. So this is a left join. Let's close and load our table to excel below our existing data sets. This will let us make some adjustments and see how the query join responds to different changes. The first change I want to test is to add a misspelling to our apples data entry above.

With the formula approach this would generate pound na errors where the lookup could not be found. Now to update the query, I will right click on the table table below, and I will choose refresh. Instead of getting that pound and error, the entries where we have apples show up as blank as if there's no associated data. We've seen how our left outer join handles some missing data in action. Now let's edit the query to see how the different join types handle the results differently. Once I'm in the query editor, I can click on the gear button next to the merge command to modify it.

Once here, I'm going to change the left outer join to an inner join. pressing OK will show us a much smaller set of results. And in these results, we should see that we no longer have any Apple entries whatsoever. Our INNER JOIN drops the apple rows from both tables, because it only keeps those rows that have at least one match on each side of the relationship. This differs from the left join where the Apple's entries in our stores table would show a blank entry when we had no matches. But we're not done with our merge scenarios yet.

So let's edit this again. This time I'm going to select the left anti join. The anti join looks for those places where there are no matches between the data. So that can be particularly useful when looking for what is missing. When I press OK with this type of join, I'll get a list of entries that contains only the apples. Since those are the only part of our stores table does not have a corresponding match in our other table.

When we have mismatches like this, looking at both ante joins can help us figure out what the mismatch might be. So I can reiterate the merge and choose the right entity join to see where my error is. For this query, I'll re add the item from the expansion to see the results. The Right Outer Join is analogous to the left outer join, but we displayed the misspelled apples result with empty data, rather than vice versa, just like sell. The full outer would show all of the data combination with blanks on both sides where appropriate. If I further modify this query to use the full OUTER JOIN Look at the results, we can see that the blank entries from both of our left and our right outer joins plus all of our matches are displayed.

In terms of the other joins, our yellow entries are the left anti join results. The green entries are the right anti join results, and the white entries are the inner join results. With that in mind, let's go look at a Venn diagram that describes how these joints relate together. If you label each circle as a left table, or right table, then the circle represented by the whole left circle would be your left outer join. The data set represented by the whole right circle would be your Right Outer Join. The intersection of these two tables is where you'll have the data that's related between them.

And this is your inner join section. These are the most common joins that look at the data connections between two tables. When talking about any joints, we're talking about the data in each circle that is not included in that overlap. So our left table is the outside scope. On the right table, the outside scope will be the anti join there. Finally, the full OUTER JOIN is the entirety of the data from both of our circles, intersecting and non intersecting.

The power query editor descriptions for these make memorizing such differences unnecessary, but doing so can help you improve your communication with your database administrators. Ultimately, if all else fails, it's easy enough to just trial and error your way through them. You can also download this Venn diagram as a PDF and print it out for reference whenever you need to. Beyond these joins, there's one additional type of join we haven't considered called the cross join. The cross join is not handled by the merge tool, I will show you how to create a proper cross join in chapter four. So don't worry, it hasn't been forgotten.

The merge command is very probably my favorite and most frequently used Power Query command. Once you become comfortable with them, they can make mixing datasets together so fast and easy. I use Inner Joins as a way to create data filters, and I'll use outer joins to populate one data set with information from another. And finally, I'll use anti joins to find gaps or errors across common data sets. Thanks for joining me for this lesson. Now go replace some v lookups.

Sign Up

Share

Share with friends, get 20% off
Invite your friends to LearnDesk learning marketplace. For each purchase they make, you get 20% off (upto $10) on your next purchase.