Aggregations in Big Data Pipelines. After this video you will be able to compare and select the aggregation operation that you require to solve your problem. Explain how you can use aggregations to compact your dataset and reduce volume. That is in many cases. And design complex operations in your pipeline, using a series of aggregations. Aggregation is any operation on a data set that performs a specific transformation, taking all the related data elements into consideration. Let's say we have a bunch of stars which are of different colors. Different colors denote diversity or variety in the data. To keep things simple, we will use letter 'f' to denote a transformation. In the following slides, we will see examples of how 'f' can take the shape of different transformations. If we apply a transformation that does something using the information of all the stars here, we are performing an aggregation. Loosely speaking, we can say that applying a transformation 'f' that takes all the elements of data as input is called 'aggregation'. One of the simplest aggregations is summation over all the data elements. In this case, let's say every star counted as 1. Summing over all the stars gives 14, which is the summation of 3 stars for yellow, 5 stars for green, and 6 stars for color pink. Another aggregation that you could perform is summation of individual star colors, that is, grouping the sums by color. So, If each star is a 1, adding each group will result in 3 for yellow stars, 5 for green stars, and 6 for pink stars. In this case, the aggregation function 'f' will output 3 tuples of star colors and counts. In a sales scenario, each color could denote a different product type. And the number 1 could be replaced by revenue generated by a product in each city the product is sold. In fact, we will keep coming back to this analogy. You can also perform average over items of similar kind, such as sums grouped by color. Continuing the example earlier, you can calculate average revenue per product type using this aggregation. Other simple yet useful aggregational operations to help you extract meaning from large data sets are maximum, minimum, and standard deviation. Remember, you can always perform aggregation as a series of operations, such as maximum of the sums per product. That is, summation followed by maximum. If you first sum sales for each city, that is for each product, then you can take the maximum of it by applying maximum function to the result of the summation function. In this case, you get the product, which has maximum sales in the country. Aggregation over Boolean data sets that can have true-false or one-zero values could be a complex mixture of AND, OR, and NOT logical operations. A lot of problems become easy to manipulate using sets. Because sets don't allow duplicate values. Depending on your application, this could be very useful. For example, to count the number of products from a sales table, you can simply take all the sales tables and create sets of these products in those tables, and take a union of these sets. To summarize, by choosing the right aggregation, you can generate compact and meaningful insights that enable faster and effective decision making in business. You will find that in most cases, aggregation results in smaller output data sets. Hence, aggregation is an important tool set to keep in pocket when dealing with large data sets, and big data pipelines.