Most likely, you are already aware of energy and momentum conservation laws. Energy and momentum of an isolated system should be constant, or as they say, conserved. There are plenty of other quantities in physics that are kept at constant level during an isolated system of illusion. Thanks to one of the most profound theorems proven at the first half of 20th century by Emmy Noether every conservation law is tightly connected to certain kind of symmetry in the universe. For example, energy consideration is connected to time uniformity. For example, it doesn't matter if you start the experiment today or tomorrow, it will develop the same way under assumption of isolated system of course. Let's recall what Lepton Flavor is. For every decay, we can compute the number of the first generation lepton second-generation leptons and third-generation leptons. Similarly, there is a quark flavor number. It is known not to be conserved or violated. Look at the decay of B meson. It consist of Anti-B and s quark. After transformation of W boson, we end up with a charm, anti-charm pair that is called gepside and a strange, anti-strange pair that joins into particle that is called Phi. Neutrinos can also transform into each other. The Nobel Prize in 2015 was awarded to Takaaki Kajita and Arthur McDonald's exactly for predictions of this possibility. However, standard model predictions for charged lepton is negligible. At the same time, so the expectation for charged Lepton Flavor Violation are much higher, one of the simplest decay that conserve energy charge lepton number, but violates the lepton flavor number is presented at the slide bottom. It is a transformation of muon into an electron and a gamma quant. So on this slide, you see example of more complicated decays that violate lepton flavor. So for example on top right diagram, you have a Tau particle that decays into three muons. Probability of this decay in Standard Model is very low, so it is less than 10 to -40, so we can not measure process of such probability using LHC or other existing technologies. But according to new physics predictions for example in supersymmetry, such decay can happen thanks to other particles or bosons that do not exist in regular standard model. So you can see on the right example and you can see below examples of such decays and probability of those might be much higher than predictions from standard model. The analysis strategy for the decay of Tau to three muons could be the following. Since we are looking for the decay of something like depicted below, we want our trigger to catch muons that come from a single vertex. Having three of those reduces the amount of background quite drastically. Additional constraint comes from the fact that Tau flies some distance from the proton collision point, so the source of those muons should be distant from primary vertex. There might be other restriction on muon momentum and energy. Then we have to design an event selection technique based on machine learning. But not to spoil the fun, we have to hide particular part of the data before we are happy with the classifier. This is called blinding. We train our classifier on the mixture of data and simulated data. As we're satisfied whether its properties, we can apply to signal region and estimate number of events passing the selection, but then we have to convert this number into branching fractions somehow to compare it with the predictions of standard model. That's why we might need normalization and calibration channel. If we apply the same selection to this channel, we get the number of events that correspond to the well known branching fraction. Since the topology of the calibration channel is very similar, it is assumed that the ratio of counted events to branching fraction should be the same. So why blending is useful. So let's recap that signal region it is region of mass spectrum with high probability of a signal and the probability of a signal is very different from the probability of a background at this region. So it is due to the Feynman diagram and the nature of the process of the decay. So in this region is hidden during analysis to avoid psychological or experimental is bias. So to avoid having decisions on which cut should we apply or when should we stop analysis or search for bug, just looking at the data that we have to analyze and come to the final conclusion. So this plot shows a hypothetical distributions for example signal is in blue color and distribution of the background in the black and the innermost region is the signal region. The distribution in the outer region is used to interpolate the background contribution in the signal region, and the narrow regions are used for analysis optimization. In rare decays searches, blinding is done by defining the entire analysis prior to evaluating the part of the data in which your signal is saught for. This part also referred to as signal region. In case of Tau to three muons, the candidates with invariant mass between M Tau minus 20 MeV. M Tau plus 20 MeV were removed from the analysis from the dataset for the development of the strategy and the classifier optimization. Once the analysis is defined, the signal region is analyzed. This means that the number of candidates is evaluated and can be compared to the expectation. So, to discriminate between signal and background, we should include such features as vertex fit quality. How well muons actually come together, displacement from the primary vertex to how distant the trajectory of a particle or a secondary vertex from the primary vertex. Track quality, track isolation, and samples that are used for training of classifier are taken from Monte Carlo simulation for the signal and real data for the background. Similar channel with similar topology of the s decaying into Phi and pi, used for calibration and normalization of the classifier, and as a metric or proxy metrics because we are interested in branching fraction, which might not be directly related to this metric. We use a AUC area and ROC curve metric. So, as we mentioned before, the signal in the mass region or a single region, have very different shape from the background. So, you can probably spot the problem, that if we give mass or feature that correlates with the mass to the classifier, we might get a biased estimation of number of background classifier. Number of background events will come to this point a little bit later. But let's continue with the strategy. For example, we got the model, we got the classifier that gives the best area under ROC curve we can imagine. Of course, there is a question if two classifiers get the same error in the ROC curve, which one should we choose? But let's, for the educational part, leave this question aside. Get the best threshold for such a classifier that maximizes this fraction on the slide. Essentially, it is true positive rate squared or number of signals squared over number of background that are misclassified as a signal, which gives us roughly estimation of efficiency of our classifier. Then, we apply such classifier with such a threshold though our real-data sample, with signal region still hidden. So, we estimate amount of background events in the signal region by extrapolating side bends to the signal region. We'll show it on the next slide. Then, we unblind this signal region, and one, apply classifier to it and count the number of events in this region. Then, we unblind or apply this classifier to the same signal region of normalization channel, and count number of events which is Ncal there. Then, we check hypotheses p-value. Depending on it, we estimate a branching fraction, branching ratio or upper limit. So, how do we count expected number of background events? So, first, we apply selection of our classifier to sidebands. Then, we assume parametric PDF for combinatorial background like exponential form, and we fit the model like exponential model to real-data in the sideband and check if probability distribution function performs well using one of quality criteria. Then we extrapolate the model to the Blind region and compute area under this extrapolation. Then, we can estimate expected number of background events in this region. Look at this slide. So, here, by blue edge, you have rough estimation of number of background events expected to observe. So, after we get those numbers, we can compute the branching fraction. So, here is the formula that is used to estimate the branching fraction out of experimentally computable or countable numbers, and something that you can get out of literature. So, here is the numbers that are substitute at this formula. You essentially connect together two things, the branching fraction and number of signal that you estimated by counting events in the signal region after applying a classifier to this region. So, in case of this analysis, no significant evidence for an excess of events had been observed. So, as we've mentioned before, the Standard Model is both remarkably simple and very powerful. Nearly every quantity that has been measured in particle physics falls right on the predicted value, and for Tau to free muon decay, we've got something that still waits for discovery. There were not enough data statistics to say how many signals samples are there. In such cases, we can only estimate how likely that the observed value is not higher than a certain threshold. A particular technique that is called, Confidence Level Estimation that puts an upper measurement estimation with specified confidence level. You can see the colorful plot on the right that shows the different confidence levels which are vertical axis for various values of branching fraction which is horizontal axis. Usually, the upper limit is set by 90 or 95 percent of confidence. So, the upper limit set by LHCb by these analysis in 2013 is that the branching fraction of the decay is not above 8.0 with 90 percent confidence.