[MUSIC] So this notion of sets versus bags, the duplicate question. So, first of all, what is a set? A set is a collection of objects where there are no duplicates and a bag is a collection of objects where there can be duplicates and so, right up here. A is not repeated at all in a set, but it may be repeated in a bag. And whether that's legal or illegal is what gives you the semantics of a set versus bag. You can define a relational algebra in terms of these two different semantics. You can define it in terms of set, or you can define it in terms of Bag, and this a notion of a extended relational algebra comes from the need to sort of work with bags, as well as other things like sorting, as I mentioned. Okay, and so the rule of thumb is, this is the last time that I'm really gonna mention this, the rule of thumb here is, that every paper you read, if you end up reading some of the papers that we talk about in the course, or beyond, will, you know, unless it's said explicitly we'll assume set semantics, okay? So be prepared for that. While every implementation, you know, every commercial database, will assume bag semantics, and we'll sort of see where that comes up in the language. Okay. So I just wanted to put that out there up front that I may play fast and loose with the difference between sets versus bags, but it can be important in practice. So one lifted set operation, you can define the union of two sets in the standard way. The union of two relations is natural given that a relation is a set of tuples and in relation algebra notation, I'd rather like this and I can also write in an SQL with the Union keyword. And here's where set and bag will come up. If I wanna unqualified union, does indeed remove duplicates in which case the answer is of the union of this relation with a1 b1 as a tuple and a2 b1 and a1 b1 and a3 b4 is these three tuples, the duplicate of a1 b1 didn't get passed through. To express this in bag semantics, to make sure we do include duplicates you can say, union all, and that would include all four tuples. You can define the difference operation the same way, or in the same way in the sense that you're lifting it from the set, from the natural definition of over sets, that find every tuple in this set and remove any tuples that also appear in this set. And we see a1 b1 as we saw before also appears in R1 and so you get rid of it. And all you're left with is this tuple. All right. So why isn't this one in there? Well if a3 b4 doesn't appear in R1, we know it's not in the set. All we want is everything that's in R1 removing things that also appear in R2. So what about intersection? That's another set operation that we could lift up. You can indeed define intersection, but you don't necessarily need to have it as a fundamental operator, because you can re-express it in terms of difference. Right? So if I want the intersection of R1 and R2, I can take everything in R1 that is not in R2, and then I can take everything in R1 that is not in that result. So if you think about this for a second. This expression returns everything that is only in R1. And then, this expression overall removes everything that is only in R1, leaving things that are both in R1 and R2 and so that's what an intersection is. Okay. And we'll touch on this later, but you can also express intersection in terms of join, which that operator we haven't defined yet. The selection operator is how we take tuples that satisfy a certain condition. And so we write it with sigma and we put C to express the condition. This notation, honestly, we won't necessarily use too much throughout this course but I think it's good to be familiar with it when it does come up. I'm more interested in recognizing this sort of English translation of the select, union, join, and so on. The Greek notation is, perhaps, less important. Okay, so if we wanna find where the salary's greater than 4,000 over an employee. Or where the name is equal to Smith that's an instance of a selection operator. And the let's see it says that Condition C can involve equals, less than, greater than, equal to and so on but it can be more than this right? It can be any sort of Boolean expression. And in fact as we'll see in maybe a segment or two it can be any arbitrary function that returns a Boolean value. So it doesn't have to necessarily just be, a less than b, or a equals b. It could be some complicated function, okay. But let's say in between some complicated function that's user defined and a simple condition like this where you just say salary > 4000, you can have arbitrary Boolean expressions. So you could have conjunctions. You could say where salary is greater than 4,000. And sname equals Smith. And of course, you could say or, and you could say not. All these are legal. Okay. So as an example if we want to have a selection where a salary is greater than 4000 of an employee which one passed this test. Well I've been saying 4000 this whole time and that's 40,000. Excuse me. These numbers look [SOUND] for a moment. So John has a salary less than 40,000, so he can be removed from the set. And so the result of this expression is this table, we have tables in and tables out, the result of this expression is this table, same three columns and only two tuples in it. [MUSIC]