Let's now discuss how CatBoost reduces overfitting. CatBoost all standard gradient boosting implementations. Because each nutrient to approximate the gradients order Newton step of the current model. The gradient value is estimated using all objects and leave, and is calculated as the average gradient of these objects. Thus, gradients are estimated using the same data points to the current model was built on. This leads to a bias in gradient estimates which leads to overfitting. To overcome this problem, CatBoost uses the same permutations that we have used for categorical features statistics. The lift values are calculated for each object separately using only the objects before given one in the permutation. These values are then used to score the split candidates or when building a new tree. CatBoost has efficient CPU training implementation. We did a lot of speed-ups after their release. Currently, training on large datasets with many numerical features, will be around four times faster than extra boost, and will have similar speed as light GBM. On small dataset, CatBoost will be around two times faster than extra boost and two times slower than light GBM. This comparison was done with dataset epson. There might be cases on which this number will change for example if there are less than 10 features than the bottleneck of the algorithm is different. It is also important to note that CatBoost doesn't have sparse data support yet. So, for sparse data, it will train relatively slow. I'm talking now about CatBoost version 0.6.2. The project is evolving and reclaimed for many more speedups, so stay tuned. Now let's talk about some ways to control training speech. The first parameter I want to talk about is called RSM, random subspace melt. This parameter controls the part of the features that are used to select the next split. By default, this parameter is set to one but if you have many features, then changing this parameter to a smaller value, will probably not affect the quality. It will slow down the convergence. So, you need to run more iterations of gradient boosting. So, setting this parameters 2.1 might speed up iteration time up to 10 times, and the resulting number of iterations will increase much less by about 20 to 30 percent, so the resulting training time will be smaller. We optimize the algorithm to give the best possible quality which comes with a cost of additional computations. For example, ordered boosting, or usage of several random permutations, or feature combinations, it all requires additional time. But there is a set of options to control that. To disable ordered boosting, you can start booting type two plain. To disable categorical feature combinations, you can set max are complexity to one, or two if you don't want large combinations. An important feature of CatBoost is the GPU support. To use GPU training, you need to set parameter task type of the feed function to GPU. CatBoost GPU training is about two times faster than light GBM and 20 times faster than extra boost, and it is very easy to use. GPU training should be used for a large dataset. Here you can see how relative speedups changes with the amount of objects in a dataset. The larger the dataset, the more is the speedup. On key 40, speedup is up to six times, and on UR GPUs the speed-up is up to almost 50 times, so it will be much faster. Let's now discuss some useful features of the library you should know about. The first one is the overfitting detector. Gradient boosting trains the model in an iterative fashion. During the training, the error reduces for your training dataset, but started for some iteration the generalization error will no longer reduce to detect when this happens, you need to have an eval data set. When the error starts growing on the eval dataset, you need to stop training and overfitting detector makes that for you. The second useful feature is custom metrics evaluation. During the training process, you can not only work on the values of the optimized flux function, but also on the wells of some other metrics which you can specify. For example, you could optimize will close for binary classification, and look on the various of accuracy and there you see. The nice thing is that you can configure the overfitting detector to use symmetric other than the optimized one. For example, you could again optimize log loss and stop training current AAC stops improving. The values of the metrics of the optimized cost function can be also seen with CatBoost viewer. If you use Jupiter notebook. You could use plot equals true parameter to see them. You can see an example here. On this graph, the optimized loss function log loss is possible by default, but you can switch to a curious it up to see how it changes in the process of training. The best eval iteration is shown by a dot. Also, you can see pastime and estimated time left. It is possible for CatBoost to estimate the time of training because it goes trees of similar size on each iteration. Visualization support zooming moving to logarithmic scale. If you are experimenting using R or command-line utility, you could use a standalone CatBoost viewer tool. This tool has the same functionality, but opens in a separate browser tab. Same graphs can be plotted using TensorBoard. This also possible to write your own metric or loss function in Python or C plus plus. It is more efficient to write it in C plus plus. Our GitHub.repo contains tutorials for that. Next important thing to know about, is non-feature values support. A very common situation is that your data has some missing feature values. The algorithm can deal with this in the following way. For categorical features non-values are treated as a separate category. For numeric features non-values are substituted with the variable that is either greater, or less than all other values, and it is guaranteed that when using internal split selection procedure, the algorithm will consider putting the objects with non-values in a separate leaf. The library also implements cross validation. What you need to know is that it allows for early stopping of the training with overfitting detector when the average evolved error among all four stops improving. Another useful feature is stage predict and metric evaluation mode. The stage predict calculates the variance of the model on every iteration for a given dataset, and the measure calculation mode calculates the metric values on every iteration for this dataset. In CatBoost, there are several ways to explore the feature importances. First of all, it is possible to see which features are the most important in the model. You also can see which features are the most influential for given objects or for a given set of objects. Finally, you can look on future interaction for previous features. So, now let's talk about some parameters that can affect resulting quality of a model. The first one is learning rate. The rule is the following. The least is the learning rate the more iterations you need to and the better the quality you get in the end. The second parameter is the depth of the tree. But the default value is six and it is a good value, but for some datasets you need deeper trees. L2 regularization is also important, you can play with it to reduce overfitting. The next parameter controls bagging intensity. The higher the temperature, the more aggressive the sampling is. You can also change the bootstrap type. Finally, there is a parameter that is called random strength, it is also important for model quality. When selecting best split for tree algorithm uses some randomness. This helps to reduce overfitting. The perimeter can regulate how much randomness should be used. To forbid randomness, you can set this parameter to zero. The most recent information about parameter tuning can be found on our documentation page. Thank you for watching this video, I hope the library will be useful for you. Good luck.