In this video, we will be discussing how bias training data leads to biased answers out of your question-answering machine. Remember our machine learning process, the learning data that is used to build your QuAM is all that your QuAM knows about the world and the entire basis it has for answering your question. If that data is biased in some way, that is if the learning data does not accurately represent the operational data that you want your QuAm to give you answers about, the answers will be biased. Let's talk about some examples starting with Betty the farmer. She has a farm in Alberta but she's collected a huge dataset to train her QuAM with data from other farms in Alberta, but also farms in British Columbia, Oregon, and California. But if Betty's question is strongly influenced by climate, this dataset might be biased because it includes data from regions that are much warmer than Betty's farm and ones that are more mountainous and ones that have higher rainfall than hers. She may choose to use the data only from Alberta because the climate in the other areas is just too different from the climate in Alberta. But maybe climate isn't that important to feature after all. Betty could train one QuAM using only data from Alberta and then train another using the larger dataset and compare the two to see which one performs better. The point here is that you should be aware of the differences in different data sources and investigate the consequences of those differences. For Betty the main consequences of her training data being different from her operational data is the performance of her QuAM. In other circumstances though, biases in the training data can result in very real and very negative consequences. There have been multiple examples of facial recognition systems that have been trained on datasets that consists of mostly white male faces, that have a very hard time recognizing female faces and especially women of color. The bias training data resulted in very biased answers something that developers should have been able to see coming. When facial recognition systems with this bias are used for security access for example, this can result in serious problems for women of color. This is something that both the developers of these systems and the users are ethically obligated to address. Other times the biases more subtle and harder to anticipate. An example of this is a smartphone app that was developed for pathole detection in a major city. The app could automatically detect potholes as the user drove around the city. This led to repairs being skewed to the more affluent parts of the city where there were more smartphone users. Unfortunately, this meant that there were also parts of the city where potholes were unreported and thus ignored, until the city recognize the bias and put measures in place to compensate. Unless it is playing, you can't claim to work in all situations if your data does not actually represent all situations. This means that you need to be mindful of representation bias in your training data and pay attention to collecting data from under-represented groups. This concern is not unique to machine learning, but it's one that must be a top priority in your data collection process.