So before we move on from read alignment, in this lecture we'll discuss what read alignment tools actually do in practice. So the ideas behind these tools are for the most part clever combinations of the ideas that we've seen so far. Especially indexing and dynamic programming alignment. These are two ideas that are very complimentary. They work very well together. So why do I say that? Well let's start with the index. The index allows us to rapidly home in on a small set of candidate locations in the genome where the pattern might have a good approximate match. In particular, what the index tells you is a list of places in the genome that share a sub-string with our pattern string. So the index really acts almost like a filter. So if the initial picture looks like this, after we get our index hits, it turns out we only really have to look at a few places within the genome or within the text, T. So instead of having to search for our needle in the entire haystack, the index presents us with a small set of candidate needles, and then we only really have to look carefully at those candidates. So, what if we didn't have an index though? If we didn't have anything acting as a filter? What if we just did dynamic programming alignment, used that to install the entire realignment problem. We can do this, of course. We saw how to use dynamic programming to find approximate occurrences of a pattern within a text. Let's say that here our text is the genome and the pattern is the read. To do this we would fill in a dynamic programming matrix that looks like this, where the rows are labeled with characters from the read and the columns are labeled with characters from the genome. Now, how big is this matrix? Is it a small matrix? Is that a large matrix? It's really big. It's a very big matrix. It's big because first of all because the read is long. Right? The read is something like 100, 200, 300 or so bases long, and so, you have to have at least that many rows. But then what's particularly bad is that the genome labeling the columns of this matrix is very very long. So for example the human reference genome is about 3 billion bases long, so that's why in this picture I'm showing the matrix sort of disappearing off into the distance off to the right. You have to imagine this matrix is very, very, very, very wide, has many columns. So if we had to fill in a matrix like this for every read in our data set, it would take us years, definitely years to analyze a data set that a DNA sequencer can produce in about a week or so that's why we need an index. We need to be able to home in rapidly on just those portions of the reference genome, where we really need to do our dynamic programming, or our verification step. So now, why do we need dynamic programming? Basically this is because indexes don't deal very well with mismatches and gaps, right? Indexes are really good at finding exact matches, and that's why after we get our list of hits back from the index, we have to use something else, some other algorithm to perform our verification step. To figure out whether the pattern as a whole has an approximate match to the vicinity of that index hit. So this is an ideal application of dynamic programming, algorithms like global alignment and local alignment as we saw are very flexible. They very naturally and flexibly allow things like insertions and deletions. They even allow you to specify different penalties for different substitutions. So dynamic programming is really the ideal way to verify whether a particular index hit corresponds to an approximate match of a read within the genome. So these ideas really go well together, they sort of cover for each other's flaws, and do what they do very well. On the one hand, the index is very fast and good at narrowing down the space of places to look, but it doesn't handle mismatches and gaps naturally at all. And on the other hand, dynamic programming does very naturally handle mismatches and gaps. But if you relied only on dynamic programming, it would be very, very slow. So if you understand these points, you understand a lot about how real read alignment tools work in practice.