At the end of the previous lecture, we wrote down this recursive function for finding the edit distance between two strings. But there's a problem. It's very, very slow. So here for example, is some Python code that runs in our function as you can see in red. To find the added distance between two strings. And then this sort of gray code on either side, sort of above here and below here, is just there to measure how much time it takes to run our function. So at the end, you see the output of the function, which is three. Which is the added distance between those two strings. It's not hard to see that. And then the number you see here is the number of seconds that it took to run the function. So it took about 31.5 seconds to run this function, and that's a very long time to wait to try to find the edit distance between these two pretty short strings. So why would it be taking that long? So here's a little hint. Let's say that this is the initial call that we make to the recursive function, to the edit distance function. We want to find the edit distance between these two strings. And so this initial call will make three recursive calls. Each with different arguments, like you can see here. Now this recursive call, this one over here, will make three more recursive calls, right like this, etc, etc, etc. So for simplicity, I've only really drawn a tiny little bit of the overall tree of recursive calls that would happen. But just looking at these seven calls that are visible right now, you'll notice that two of them are the same. So they're the same in the sense that two arguments to the function are the same. And so, of course, the result returned by those two calls will also be the same. And that's rather wasteful, right? If we're going to through the trouble of running the function, and it seems like we should at least remember what the answer is. So that if later on we make the same exact call, we call the function with the same arguments, then we can just remember what the answer is. Instead of running the function all over again, which might take a lot of time. So here's another demonstration that gives a clearer idea of just how big the problem is. Here's our editDistance function again. I'm writing it out again. But there's a little bit of added code, which I'm showing here in red. And all that added code is doing is it's just keeping track of how many times the function gets called with a particular set of arguments. So a particular pair of arguments, this pair of arguments here, right. So we're essentially just adding up how many times this function is called for that particular pair of arguments. And it's storing the result in this variable called n. So if we find that n is large, that means that we're being very wasteful, right. We're doing the same call over and over again. So if we call this new function and we give it our two arguments that we gave it before, we get the answer three, which is correct. But then now we can also print out the value of this variable n, which tells us how many times it was called with those particular arguments. And we get this large number, right. It's almost 9,000 this number here. So each time, for each of those 9,000 times or so, we had to call and run the function all over again which was very wasteful. We only really needed to run it once. And then for all of those subsequent times when we wanted to know that edit distance, we could have just returned the value that we remembered from before. So let's rewrite the edit distance function, in a way that avoids doing any of this redundant work. So the way that we're going to rewrite the function is in terms of a matrix that looks like this. So in this matrix, the characters of the string x are labeling the rows of the matrix, like you see here. And the characters of the string y are labeling the columns of the matrix, like you see here. And each element of the matrix corresponds to a particular prefix of x and a particular prefix of y. The first row and the first column are both labeled with epsilon, which just stands for the empty string. So this is because we want to consider all prefixes, including the empty prefix, the prefix that has length zero. Over the course of our algorithm, our edit distance algorithm that we're going to write down, we're going to fill in every element of this matrix. And we're going to do it, we're going to fill in each element with the edit distance between the corresponding prefixes. So for example, the value that will go in this element here is going to be the edit distance between the corresponding prefix of x, which is GCG, and the corresponding prefix of y which is GCT. So this element will contain one after we're finished filling in this matrix. And the element that's all the way down here in the lower right hand corner, that element is special because we will eventually write into that cell into that element. The edist distance between the entire string X and the entire string Y, which is what we're after. That's the edist distance the edist distance between X and Y. So we can think of each element of the matrix as a different pair of arguments to our recursive edist distance function. And every possible pair of arguments is only going to be represented once in this matrix. We're only going to calculate it once. And that's how we're going to avoid the problem of redundant work that we saw before. So we're not done yet of course. We need a way of figuring out what are the edist distances that we should put in the elements of this matrix. So we're going to do so with the help of that same expression that we wrote down in the previous lecture, this expression down here. And, we'll see in a moment exactly how we use it. But when we fill the matrix, we'll start by filling the elements that correspond to short prefixes of X and Y up here. Fill in these first. And ultimately and finally we'll fill in the elements that correspond to the long prefixes of X and Y down here. So let's think about how to use this expression to fill in a particular element of the matrix. So let's pick an element. Let's pick this element right here, the one highlighted in blue. And in this expression, the three cases, these three things that we're taking the minimum of, the three terms. They correspond to three other elements of the matrix. The first case course responds to the element that's above and to the left, diagonally from the element. Because we're recursively calling the edist distance a function with a prefix of the first string that's one character shorter. And a prefix of the second string that's one character shorter. So that term corresponds to shorter prefixes of X and Y, shorter by one character. And that corresponds to the element of the matrix that's highlighted in red. The second term here, corresponds to the element of the matrix that's highlighted in green. So that is taking the edit distance of the entire string X and then it is a shorter prefix of the string Y. So that's the element that's to the left. And then this term corresponds to the entire string y, but a prefix of the string x. So that corresponds to the cell above. So this color coding that you see here, shows you the correspondents between terms in the expression and elements of the matrix. So say that the matrix has already been partly filled in with edist distance for all possible prefixes shorter than the ones that correspond to our blue element of the matrix here. In other words, we've already filled in all these elements like this and now we'd like to calculate the value of the blue element right here. So we can start by taking the value that's in the cell diagonally above and to the left, which is highlighted in red. And then we're going to add the delta function of the corresponding characters from x and y. So that's going to be the delta function evaluated of t and g. Now T and G are not equal, so the delta function is going to evaluate to 1. So that diagonal term is going to equal the edist distance in the diagonal cell, which is 0, plus the delta function, which is 1. So that equals 1. So this second term tells us to take the value of the element to the left, and to add one. And the value of the element to the left is one. So when we add one we get two. And likewise, this third term is telling us to take the value of the element above, and add 1, which again is 2. So when we take the minimum of these three things, we're going to get 1. And that 1 corresponds to if we had gone diagonally. In other words, if we had taken the edit distance between the prefixes and then added 1, corresponding to the substitution that we would need to turn T into G. So, that's how we fill in the cells of the matrix, except I haven't yet told you how to fill in the first row and the first column. But that's not too hard, because the first row and the first column correspond to the edist distance between the empty string and something. And the edist distance between the empty string and some other string is always equal to the length of the other string. So in other words, the way we would initialize the first column would be with ascending integers. 0, 1, 2, 3, 4, 5, etc. And same thing for the first row. 0, 1, 2, 3, 4, 5, etc. So here's the matrix with all of the elements filled in. And this matrix is now telling us what the edist distance between the two strings is. It's just the value that's all the way in the lower right hand corner, which in this case is 2. So the edist distance between x and y is 2. So how fast is our new algorithm for finding the edit distance? The recursive algorithm was slow. How fast is this algorithm? It's very very fast. So the run time is just going to be a fraction of a second, way faster than the recursive formulation that we were using at the beginning which took over 30 seconds to do the same thing. And why is it so much faster? Well, it's really because our new method for calculating edit distance avoids the problem we had before. Which was that the function might be called over and over again for the exact same arguments. And it would run the function all over again, even though we already knew the answer for that particular pair of arguments. So with our new method, for any given pair of prefixes of x and y, we will calculate the edit distance of those prefixes once and only once. And we'll record the answer in the matrix and we'll never recalculate that same edist distance. So this kind of algorithm, where we decompose the overall problem into smaller problems, while also avoiding ever redundantly recalculating one of those smaller problems. This kind of algorithm is called a dynamic programming algorithm. And dynamic programming is a common algorithm design technique. But this dynamic programming paradigm is actually going to be very useful. It's useful for more than just finding the edist distance between two strings. It's also useful in other biosequence analysis applications, as we'll discuss in future lectures.