[MUSIC] So, let's take a look at more complicated example. So the tougher challenge comes from the point that in SHiP settings, we will not have access to origin of the shower. So, and here we can explore different methods to find the shower in the volume of a brick, just knowing that the shower is there. So one of the approaches can be based on clustering methods. Some approaches can be based on conditional random fields or putting in all at once, like message passing neural networks. So, the solution I'm going to describe follows this strategy. First, we find neighbors for selected tracks on neighbor plates. Then we build chains of 5-track candidates, and train a classification algorithm to deal with those 5-tracks. After this, we apply a cluster showers using DB-SCAN algorithm. So we group those showers together to identify where exactly in the volume of a brick those showers are located. Let's take a look at the first stage. So, how do we find a neighbor for a track that sits in the bottom left corner? So the grey vertical bars represent plates, so there are two plates, neighbor plates. And on the second plate in the middle of the picture, you have three different base tracks, right? Imagine we prolonged the line of the left base track to the second plate. So we have to look at the combination of features, like distances and angles, to find the most suitable continuation of the original track, right? So, this picture shows that we have to be careful or somehow find the good combination of the distance and angles that can be applied, searching for our next basetrack. And then we can compute different features of those tracks that we have joined together. Like angle between directions, impact parameters, mixed product of two directions, vector connecting positions of basetracks, and some projections. And in this case, we differ from the case with known origin, that we do not rely on a position of the origin, so we just compute those features. And as we connect a chain of five tracks and apply the same XGboost or random forest, we can get to the precision that is comparable to the algorithm when we knew the origin of the shower. So, it is actually fascinating that those algorithm operate on the same level of accuracy. But in this case, we have much less information about signal we are looking for. So, here is an illustration how the density of tracks goes down with the threshold of a classifier. So, we range from 100% of the tracks to 10 or 5% of a signal. And you see how drastically reduces number of background tracks were depicted by red color. So, going even further, we might be wondering how can we find different showers in the volume. So it is a realistic scenario for the case of showers, and here we can apply clustering algorithm. The most trivial one of those algorithms is K means algorithm. So, the basic idea that seems to be behind this algorithm is that, for each point in the cluster we can find a center nearby. And here you can see illustration, well, actually interactive if you follow the link below. So, and the algorithm is starting with given the number of clusters that we are going to find. And then we randomly throw a bunch of centers at the picture or select random centers among the points. Iteratively we update the centers of the cluster, or update the color of every point. And eventually it converges to some configuration, generally, it works. But in some cases, it produces a little bit funny pictures like you see at this image of a smile. That the K-means algorithm did not capture a complicated dependency. For example it cannot build a separate cluster for the circle shape and for eyes for example. So, another approach that we can apply here is called density-based spatial clustering of applications with noise, or DBSCAN for short, it relies on two parameters. The first one is epsilon, which is maximum distance to neighbors, and minimum number of points that are required to form a cluster. It works the following way, we pick a point and start with it, with a random point, and then find all the neighbors that sits within the absolute distance from this point, and iterate until we run out of points. So then we pick a new arbitrary point that hasn't been marked by color yet, and repeat the process. If a point has fewer numbers of neighbors than midpoints in this epsilon vicinity, then we'd drop those points. So, and we repeat this process until no points are left. And if we apply such an algorithm to a number of tracks that has been preselected by previous algorithms, we can get a little bit more interesting picture, with showers emerging from the volume. The thing is that it's not very realistic because the distance metric that we used for this cluster algorithm that when we computed this epsilon bowl for every given track, it just relies on Euclidean distance, right? But we learned before that we have to take care not only about relative positions, but also relative angles between tracks. So, to improve the quality of the algorithm we have to take into account, or add to the metric, additional term that corresponds to the distance between angles. So, it's kind of improved version of the metric. On illustration you see that sometimes the closest basetracks, according to Euclidean metrics, are not really belonging to the same shower, so you have to take angles into account. So, as a result the same volume of brick might look much more realistic. So, showers are better visible here, although there is still room for improvement because in this case some plates are missing and one you have to take into account for directional alignment of the tracks. So, further ideas for improvement might include adding more advanced techniques like conditional random fields or recurrent neural networks. But, I encourage you to explore when you will be completing your home tasks. And I hope that you will be able to bypass the baseline quality in this task, in this problem. [MUSIC]