So, the segment here, I'm talking about what can go wrong with your data. Sample bias, this is something to be on the lookout for. Biases and offset in the data, it's also known as selection bias. A famous example was on the 1936 US presidential election, it was Alfred Landon versus Franklin Deleanor Roosevelt in that election. A respectable magazine at the time polled 10 million names from every telephone directory, plus magazine subscription lists and clubs, and polled them all and got 2.4 million responses. From that data, the poll indicated that 57% would vote for Landon and yet Roosevelt won with 62% of the vote. So, what happened? It was the largest error ever for a public opinion poll. Any idea what might have gone wrong there? In 1936, the middle to the end of the depression, having a phone and magazine subscriptions, and belonging to a club meant that you were affluent, you were wealthy, you were rich, and so the samples were pulled from only effluent voters. So, that entire dataset was skewed way in favor of Landon because that's what they predicted. They polled all these people, and the poll said that Landon was going to win. Roosevelt actually won with 62%. That's an example of how bias in your data can affect your outcome. Point is if all your training data is bias, your results are going to be biased, and you're not going to be satisfied at all with your results. Sample variance, this is another challenge. So, it's the average of the squared differences from the mean. So, all of you have had statistics and seen this before. U is mean, and these are all your values of your data, and so you take each one and subtract it from the mean and square it, sum that all up and take one over n and that's how you calculate the variance. So, use of mean and variance measures how far a dataset is spread out. Dataset with low variance is very tightly packed together, and a dataset that has wide variance is spread out. It's important to be aware of this to understand which features may be the important one in a multi-dimensional dataset, and even features along one axis in a multi-dimensional dataset, you may have some called outliers. That's coming up here in a second. It may indicate if your model has sensitivities to variance if the model starts to behave or act erratically. It may also indicate that some samples may be just completely invalid.