So now one of the key pitfalls that you could run into, and this is why we talk a lot about eliminating those duplicative records, is doing what we call a many-to-many JOIN. Now in many-to-many JOIN, for example, if we had not gone through the exercise for our stations and weather temperature readings exercise, what we could have done is joined on, not necessarily, a distinct or unique key. So you have five records on the right hand side and you have five records on the left hand side and they all match each other. So how many output rows would actually be returned? Would it be zero? Would it be five? Or could it be potentially way more than five? And the answer is way more than five. So you'd actually get what's called the Cartesian product of five times five. So you get 25 output rows even though your input tables only had five rows. And if you imagine this at scale, if you don't get that joining key correct, that's when you're really just blow out all the resources that are using. Because having a million record table joined even on a 10,000 record table, you could just be exploding by many, many, many factors of ten and get a data set that's out putting way, way, way more rows than what you're expecting. So the end result is a situation that you want to avoid, and the way you can avoid it is the third bullet point here, is knowing the relationship between the tables and your data before you do that JOIN. So let's walk through an example. Here, this is an irs example where instead of joining on the unique identifying, which is the ein, we joined on that tax period. And there's many common tax period values that can be shared across many of those eins. So let's walk through what it actually looks like conceptually. So here we have our two different tables. On the left hand side, we have the 2015 filings and on the right hand side is the organization or charity details table, which just has a tax period and when they last filed. So instead of joining in on ein, you would normally expect your eins to match up one for one, one ein organizational details per one filing. But instead if you incorrectly put the tax period as your joining key, what you've done here is basically said, all right, well, this tax period in this organizational details table on the right matches five records there on the left. So that's great. What we're going to be returning here is just this ein number, the 345352, as an additional column for all of those five. So it's not that bad. And we just have one record here. This query doesn't make too much sense joining on tax period. But what happens if we join on another ein that also has the same tax period of December 2014. Now you see we're starting to multiply, in a very bad way, the amount of output rows that we have. So this one also matches those five records and what this looks like in your resulting dataset is, again, the cross product or what's called an unintentional cross join of your data. And again this is a very small example where you just have to rows cross joining against five and you get the result there. But you can imagine if you have this on a million row data sent, you could potentially just have even a billion or a trillion rows accidentally output in. And this is generally where big query will run for more than 60 seconds or a couple of minutes and then you realize, wait a minute. There's something that's going terribly wrong with our queries. And again, the way to cure this is really understand what are the unique identifying fields in your data set and what's the relationship between all of your different data tables? So you could have 1 to 1. For example, you could have one charity that corresponds to one tax year of filing for 2015 or you could have many-to-one that's what that end represents. Many-to-one could potentially represent if you had multiple years of tax filings, you could have many tax filings per one organization. Or we could have the opposite one-to-many or you could have that many-to-many scenario, where you might need to create through concatenation, or another method, a unique identify or key that you have. And this is why the very first thing that we taught in the first course here is practicing count and count distinct to see which fields in your data set are the actual unique identifiers for those rows. Understanding when and how to use Joins and Unions in SQL is a concept that's easy to pick up but honestly it takes a while to truly master. The best advice I can give you when starting out is to really understand how your data tables are supposed to be related to each other, like customers to order, supplier to inventory, but being able to back that up by verifying those relationships through SQL. Remember all data is dirty data and it's your job to investigate it and interrogate it before it potentially pollutes your larger data set with Joins and Unions. And once you understand the relationships between your tables, use Unions to append records into a consolidated table and Joins to enrich your data with other data sources. Let's practice these concepts and pitfalls in our next lab