Now obviously a course on data analysis could include a lot of different topics, right? We would have means, percentages, totals, how to compute them, how to estimate differences in means, how to do regressions, how to do clustering techniques. Lots of stats courses. Some of this covered in Stats 101, some of this in advanced courses. Some of it, your software package knows how to do and gives you good guidance on how to get through it. But in many data contexts there are challenges involved with any of these standard procedures that you need to know about and then know how to deal with. One is, for example, to start linking the different data sources. You might have them at different hierarchy levels. Much more complex, much more difficult to deal with in the analysis. You would want to take those kind of things into account. But more importantly, you need to capture variability correctly in order to provide your estimate of a percentage, the mean or total, or even to talk about the size of a treatment effect or the precision of a prediction. In each of these cases, because you make this inference, because you infer from the data at hand to something larger, across people, across time, across space, or all of these, you need to make sure that the way you communicate the variability in your estimate, in your statistic, is actually the proper one. So let's say we take a sample of three little faces here. You could have in one sample all blue, right? The next one could be one of each color. Obviously, each subsample might vary and that's the kind of variability you do wanna capture properly in your analysis, that's what our analysis course is about. You might remember this from your inter-stats courses, in case you took those. Confidence intervals, standard errors, vaguely remember that, maybe? Here is a visualization of 100 samples taken from the German Socio-economic Panel Data Set. And what is displayed here is the average year of birth in that population, where the population in this case is the GSEP data set. And then, the bands that you see, the little stripes around this midpoint, looks like a tannenbaum to me, they have, in their middle the estimate for that particular sample, just like I had, the sample of all blue heads. So here is the estimate of that particular sample, and around it is an error band. For now we don't care how to estimate that particular error band, that's what the analysis class later on shows you. But what you see here is that most of the time the error band is constructed in a way that most of the time it does spend the mean value in the middle. Sometimes, marked in these black ones here, that's not the case, right? But depending on how you construct them this will vary how many of the repeated sample that you could hypothetically take will or will not capture the true population mean. So that's the issue of variability that you try to. Now, when you take sub-samples of the whole population, when you take data from a particular part of the population, at a particular time point, or in a fashion that is either what we call a simple random sample, or a sample grouped by geographies, or done in some other form. It's important to know about that piece of sampling, because when you plot, as you see here, the mean for each of these sample, pile them all up, you see the distribution of these varying sampling means. Depending on the technique, how you do the sample, that distribution can either be wider, or more narrow than the population distribution. It can also be biased, the center being away from the true center. Now I know that some of you might not know these terms, and that's good because we do want you to take the sampling course as well. This is just a teaser why it's worthwhile to learn that. So depending on the technique, your variability might change, and no matter what you do, you have to reflect the variability when you put out your results. So that's what these courses later on are about. Resources, as I said Combining and Analyzing Complex Data, is one. Our sampling course itself, is another one. And these two excellent books here, in particular the one on the left. Applied Survey Data Analysis from one of our, three of our colleagues from Michigan actually. Steve Heeringa, Pat Berglund, and Brady West. Highly recommended.