I now introduce Gibbs sampling, another randomized algorithm for motif finding. Gibbs sampling works somewhat similarly to the randomize motif search. The biggest difference is that randomized motif search is a rather reckless algorithm. It can actually change all found motifs that at every iteration depending on what are the profile most probable k-mers. It allows to move very quickly in the space of the solutions but there is a danger that you already came close to the correct motif, and then jump away from this motif, because randomized motif search moves so rapidly through the space of all possible motif. Gibbs Sampler is more cautious. It actually changes only one of the chosen motifs at every step. This slide shows the key difference between Randomized Motif Search and Gibbs Sampling. Instead of showing you the pseudocode for Gibbs Sampler, let me show you first how it works. Let's first randomly select motifs shown in bold in this slide. In the set of Dna strings, as before, blue elements shows implanted motifs, and bold elements showed motifs that we randomly selected. After we've done it, let's do something strange. Let's actually remove one of the found motifs along with the sequence it belongs to. In our example, we remove the third motif, and remove the third Dna sequence from consideration. We'll consider the resulting set of just four motifs, and build the Count matrix based on this set. And then afterwards, construct Profile matrix based on this Count matrix. And after we constructed the Profile matrix, we will calculate the probability of all k-mers in the deleted string. Afterwards, we will do the following, again relying heavily on rolling the die. There are seven probabilites we computed and we will roll a seven-sided die according to the probabilities we computed. It will be heavily loaded die with probabilities of sides proportional to the probabilities that we computed. After we toss this die, we choose a new starting position of the Motif and proceed in this way. This is new a substituted position in the deleted sequence. So that's how it works and then of course, it iterates until motifs keep improving, or until a certain number of iterations (let's say 100,000 iterations). So, here is the outline of Gibbs Sampler. We randomly select a k-mer in each sequence, as before. We randomly choose one of the selected k-mers from sequence called Sequence* and remove it from the set of motifs. We create profile from the remaining motifs. For each k-mer at the position i in Sequence, we compute its probability of being generated by profile, resulting in n-k+1 probabilities. Afterwards. we generate a die with n-k+1 sides whose probabilities of sides are proportional to probabilities we computed. We roll this die and, depending on the outcome of the die, we turn back this motif number i to the sequence that we removed. And we reiterate. So let's once again look at how Gibbs Sampler work. Let's assume that we've done the first stages and we're about to generate the, probabilities of every k-mer in the sequence. I am still considering the same example I considered before. We generated our seven probabilities, but now let's pay attention of what these probabilities are. And it turned out that six out of seven probabilities. I actually zeroes. Should we roll a one-sided die? Four centuries ago, Oliver Cromwell warned us against rolling one-sided dice. He told in his famous appeal to the court of Scotland. trying to convince them that their alliance with the king was an error. He said: "I beseech you, in the bowels of Christ, think it possible that you may be mistaken." Cromwell Rule in statistics implies that the use of probabilities 0 and 1 should be avoided, unless we talk about purely logical statements. Thus, we should actually give a little probability even to extremely unlikely event. For example, we should give small probability to the fact that the sun won't rise, tomorrow.