[MUSIC] Hi, the decisions with ID3 is used within classification when data is binary. In order to explain the ID3 algorithms, we need to learn some basic concept. Information entropy is defined as the average amount of information produced by a stochastic source of data. The measure of information entropy associated with each possible data value is the negative logarithm of the probability mass function for the value. When the data source has a low-probability value, the event carries more information than when the source data has a higher-probability value. Consider S as a training set. p plus, positive sample proportion. p- is a negative sample proportion. Entropy is a measure of the impurity or noise of the sample, S. Now, we can measure the effectiveness of an attribute in the classification of the training data. The gain of information of an attribute relative to a training set is defined as the entropy of the sample minus the sum of the entropy of the attribute for each value that this one takes. The main steps of the ID3 algorithm are 4. First, calculate the entropy of every attribute using the data set S. Second, split the set S into subsets using the attribute for which the resulting entropy is minimum. Third, make a decision tree node containing that attribute. Fourth, recurse on subsets using remaining attributes. Here we show a pseudocode. We will learn the ID3 algorithm through an example. Generate the ID3 decision tree that allows us to know under well conditions of environment such as temperature, humidity, and wind, you can play tennis. The following table can be used as a sample of training examples. S where 9 plus 5 minus is a target binary attribute and play tennis, yes or no. First step, the entropy for sample S is, the entropy of outlook is as follows. When outlook is sunny, when outlook is overcast, when outlook is rain. The gain of outlook is entropy of entire sample minus each entropy outlook value than it is, with were exactly the same as humidity. Entropy high, entropy normal, gain of humidity. Now, entropy for wind, entropy for week, entropy for strong, gain of wind. Entropy of temp with hot, mild, and cold values are, and gain of them is. [MUSIC] Now that we have compute the gain of each variable, outlook, temp, humidity, and wind. We have that attribute with more gain while outlook. Therefore, that attribute will be the first node. In this case, the root of the tree. [MUSIC] With premise Outlook_Sunny, and using the remaining attributes calculate the following node. Checking the gains obtain, we have the attribute with highest gain and next note will be humidity. With premise, Outlook_Sunny and Humidity using the remaining attributes calculate the following node. Next node on the rain will be wind. There are certain decision ID3 tree is as follows. The decision tree can express these rules. If outlook is sunny and humidity's high then is no. If outlook is sunny and humidity's normal then is yes. If outlook is overcast then is yes. If outlook is rain and wind is strong then is not. If outlook is rain and wind is weak then is yes. One of the most common data mining techniques used for provision task is a k-nearest neighbor algorithm. It's purpose is to use a database in which data points are separated into several classes to predict the classification of the new sample point. The nearest neighbor technique is a classification method as well. The purpose of a nearest neighbor analysis is to search for and locate either a nearest point in space or nearest numerical value depending on the attribute you use for the basis of comparison. The k-nearest algorithm is as follows. First, determine parameter K which is the number of nearest neighbors. Second, calculate the distance between the query instance and all the training samples. 3, sort the distance and determine nearest neighbors based on the K-th minimum distance. 4, gather the category Y of nearest neighbors. Fifth, use simple majority of the category of nearest neighbors as the prediction value of the query instance. As a sample, we have two attributes, acid durability and strength, to classify whether a special paper tissue is good or not. Here is for training samples. Can we guess what the classification of this new tissue is? Now, the factory produces a new paper tissue that pass laboratory test with X1 = 3 and X2 = 7. How can we guess what the classification of this new tissue is? Well, with k-nearest neighbor, first, determine parameter K as a number of nearest neighbors suppose use K = 3. Second, calculate the distance between the query instance and all the training samples. Consider query instance is the point (3,7) and compute square distance which is faster to calculate without square root. The image shows the distance calculations. 3, sort the distance and determine nearest neighbors based on the k-th minimum distance. 4, gather the category of the nearest neighbors. Notice in the second row last column that the category of nearest neighbor Y is not included because the rank of this data is more than 3 (=K). 5, use simple majority of the category of nearest neighbor as the prediction value of the query instance. We have 2 good and 1 bad, since 2 is bigger than 1, then we conclude that a new paper tissue that pass laboratory test with X1 = 3 and X2 = 7 is included in good category. In the context of adjacent statistics, the probability of a hypothesis makes possible to solve problems of scientific interest that would otherwise be unapproachable. The Bayesian methods are called eager learners. When given a training set, eager learners immediately analyze the data and build a model. When it wants to classify an instance, it uses this internal model. Eager learners tend to classify instance faster than lazy learners. As an example, if I tell you, I pick a random 19 year old and have you tell me the probability of that person being female and without doing any research, you say 50%. There are examples of what is called prior probability and is denoted P(h), the probability of hypothesis h. If I have an equal number of 19 year olds, male and females, probability of female would be 0.5. Suppose I gave you some additional information, the student body is 80% female. The probability of female that attends Frank Lloyd Wright School is 0.86. Then we could say that the probability the person is female given that person attends the Frank Lloyd Wright School is 0.86. The Naive formula is as follows. In our attempt to construct a Bayesian classifier, we will need two additional probabilities. P(D) and P(D given h), Bayes theorem describes the relationship between. Now, let's have a look at an exercise. Let's say we have data on 1,000 pieces of fruit, fruits like banana, orange, or some other. We imagine that we know three characteristics of each fruit. Where it's long or not sweet or not, and yellow or not as shown in the following table, which is the training set. What do we know so far? Well, that 50% of the fruits are bananas, 30% are oranges, 20% are other fruits. Based on the training set, we can say the following. From 500 bananas, 400 I mean 0.8 are long, 350, 0.7 are sweet, and 450, 0.9 are yellow. Out of 300 oranges 0 are long, 150, 0.5 are sweet, and 300, I mean all, are yellow. From the remaining 200 fruits, 100 or half are long, 150 or three quarters are sweet, and 50, one quarter are yellow. The training sets should provide enough evidence to predict the class of the fruit that is introduced. Given the characteristics of a piece of fruit, you need to predict the class. We can say we're given the features of a piece of fruit, and we need to predict the class. We are told that the additional fruit is long, sweet, and yellow, which can be classified using the Bayes theorem, the values to obtain the result if it is a banana, an orange, or another fruit. The one with the highest probability or score would be the winner. Let's check the probability of the fruit we take would be a banana given that it's long, sweet, and yellow. [MUSIC] In the case of an orange, the probability of the fruit we take would be an orange given that it's long, sweet, and yellow. [MUSIC] The last possible case, that would be other fruit, the probably of the fruit we take would be any other given that is long, sweet, and yellow. [MUSIC] In this case, based on the higher score, we can assume this long, sweet, and yellow fruit is in fact a banana. Well, we take a look to something mix of data mining. There are many softwares that can have many techniques to solve problems and the decisions with data mining. One of them is rapid miner. Next session, we will learn some types of data, according to structure, and how to integrate, store and analyse and structure data. See you next session. [MUSIC]