We continue on our journey toward better algorithms for solving the reed alignment problem. We studied naive exact matching which was very simple. It consisted of two nested loops, a loop that iterates over possible alignments and then a loop that iterates over character comparisons. Next we will look at an algorithm called Boyer-Moore. Boyer-Moore is similar to naive exact matching, in some ways. It's still going to try alignments, and for each alignment, it will try character comparisons, but it's going to be clever by skipping many alignments that it doesn't need to examine. Boyer-Moore is a very popular algorithm. It's been around for a long time, and it's really the benchmark exact matching algorithm. There have been many variants on Boyer-Moore proposed, and it's implemented in many different systems. Chances are very good that at some point you have done a search in a web document or file on your hard drive, and that search was implemented using Boyer-Moore. So, we'll start with this insight. Here's a snapshot of one moment in the naive algorithm. We're trying to align word, the pattern, with these characters in the text, and eventually we hit a mismatch where the character r in our pattern mismatches a u in the text. So normally, if we were doing the naive exact matching algorithm, we would now, once we reach this mismatch, we would break out of the inner loop and we would move onto the next iteration of the outer loop, which is going to try the next alignment to the right. But we actually now have a hint that tells us that the next alignment over also is not going to match. So how do we know that? Well, because this u that we mismatch in the text doesn't occur anywhere in the pattern P. So any way of lining up P opposite this u in the text is going to be fruitless. So it can't possibly result in a match, because the character u doesn't occur anywhere within P. So we know that we can skip the next two alignments. Both of those alignments will still place P such that it overlaps the u in the text, so we don't have to try them. We can skip them. We know that they're not going to result in matches. So this can be generalized into a principle that we can use in a lot of situations, and in fact, the next algorithm that we discuss, Boyer-Moore, uses this principle to the maximum possible extent. The principle is that we're going to learn from the character comparisons that we do in order to skip alignments that provably won't resolve in a match. They won't be fruitful. So the first thing to know about Boyer-Moore is that we're going to try alignments in left to right order as we did with naive exact matching, but we're going to try character comparisons in the opposite order, in right to left order. So here's an illustration. We'll try alignments in left to right order, but here's a given alignment. For this given alignment we're going to do the character comparisons. Starting from the right-hand side. So in this case we compare a d in the pattern to a d in the text and they match. And then we go to the left and try an r in the pattern and it mismatches an l in the text. The next component of Boyer-Moore is something called the bad character rule. And this rule says that upon mismatch, if we do some character comparisons, and then eventually we mismatch, upon mismatch, we're going to skip alignments until one of two things happens. Either the mismatch becomes a match, or the pattern P moves all the way past the mismatched text character. So here is an example. Here is a text T and a pattern P and we're trying this alignment here which is the very first alignment, the left most alignment. And we're going to try character comparisons again starting from the right-hand side. So first we'll compare a C to a C and find a match. And we'll compare a G to a G. And then we'll compare a T to a T. And then, we hit a mismatch. We compare a T in the text, sorry a T in the pattern to a C in the text and they mismatch. So the question is how far do we skip before this mismatch becomes a match? So the question is how far do we shift P over before this C in the text is matching a corresponding C in the pattern? Well we can tell by looking to the left within P. So if we look to the left within P we're looking for the next C over to the left and we find it here. So the amount that we have to move P in order for this mismatch to become a match is one, two, three. We have to shift it over by three positions, or in other words, we're skipping two alignments. If we shift P over by three, then we're skipping two alignments. So let's go ahead and skip those two alignments, and now let's do this process again. Let's use the bad character rule again. So here's our new alignment. We're going to start at the righthand side. We're going to find that these Cs match and then we're going to reach a mismatch. A G in the pattern mismatches an A in the text. So we apply the bad character rule again, but in this case, the mismatched character in the text, which is A, does not occur anywhere to the left within P. There are no occurrences of A over here within the pattern P. And that means we're going to move P all the way past the mismatched character. If there's no As to the left, then we don't have to try any of the alignments that have character of P opposite that A. We are going to shift is over like this, all the way past the mismatched character. So those are two steps of using the bad character rule, which is one of the rules that we'll use in Boyer-Moore. The last component of Boyer-Moore is another rule. It's called the good suffix rule. So this rule states that we're going to let T equal the substring that's matched by our inner loop. So we're going to use small t to represent this substring that's matched. So in this example, we did the character comparison starting from the right again and we matched a C, and then we matched an A, and then we matched a T. And then we reached a mismatch. A T in the pattern mismatched a C in the text. So we're going to call this substring right here, small t. So, we're going to skip until there are no mismatches between the pattern P and small t. Or until P moves all the way past small t. So, you can see how this is related to the bad character rule. The bad character rule said that upon mismatch, we want to shift P such that the mismatch becomes a match. And the good character rule is saying, essentially, that for the characters we did match, we want to shift P such that we don't turn any of those matches into a mismatch at the end of that shift. So in this particular case, when we apply the good suffix rule, how far are we going to skip? Well, let's look for another occurrence of small t, which is T A C, further to the left within P. You can see one right here. So the shortest amount that we can shift P is such that this T A C in the pattern P comes under this T A C in the text T. So in that case, we'll have skipped three alignments. And once we've done that, here's our new alignment. We've now shifted P, and here's our next alignment. And we have several matches, so that again, here's our small t, the portion of the text T that we matched. And so now, how far do we shift P before there are no mismatches, no new mismatches between P and the characters and small t? This one's a little harder to see at first, but you'll see that there is a prefix of P, C T T A C, that matches a suffix of small t. So C T T A C right here matches C T T A C right here, right? So that's when we should stop shifting. That's how far the good suffix rule is telling us to shift. So, we just skipped three alignments there. And so, that's two steps of the good suffix rule. So, now we have two rules. We have the bad character rule, which tries to turn a mismatch into a match, and we have the good suffix rule, which tries to keep the matches matches and not have them turn into mismatches.