In this module, we'll continue on the path of studying algorithms and

data structures to help us solve the read alignment problem.

By the end of this module, you will have seen and worked with a range of ideas

that underly fast pattern matching algorithms, both offline and

online algorithms, and both exact and approximate matching algorithms.

So far we've seen the naive exact matching algorithm, which isn't particularly fast,

and which isn't able to find approximate matches, but a useful solution to

the read alignment problem will have to be both fast and approximate.

So in this module, we'll push in both directions.

First, we'll discuss a very practical and a very fast and fairly simple alternative

to the naive exact matching algorithm, which is called Boyer-Moore.

Then we'll discuss indexing which we'll adapt to work with genomes and

the read alignment problem.

Indexing is a very powerful idea.

Indexing is the reason why when you type a Corey String into an Internet

search engine, you can get an answer back immediately.

And then finally we'll introduce an idea based on the pigeon hole principle,

that will allow us to look, not just for exact occurrences, but

also for approximate occurrences of a pattern in a text.

And, fortunately for us, this pigeon hole principle is going to allow us to reuse,

basically, all the algorithms that we learned for exact matching and

apply them to approximate matching problems.

So let's get started.