Let's define mathematical framework for how to deal with this. We define skew of k, the simplest number of G minus number of C in the first K nucleotides of the genome. And skew diagram is simply plot showing Skew(k) against k. Now, let's construct skew diagram for the E Coli genome. It looks beautiful. Where do we think is the origin of replication in E Coli? Something that the we almost had no hope to find. Of course it's at a point the way it was decreasing and suddenly start increasing, which is here. We found the replication origin of E Coli. And now, lets try to find frequent words in this origin, and hopefully we will find DnaA box in this space. Well after we run our frequent words problem, unfortunately it turns out there are no frequent words that appear even three times in this replication origin, which means that we fail. There are many reasons why we may fail. Maybe origin for application as derived from skew diagram doesn't show precisely to the area where we want to see. Or maybe, just maybe, we don't have a good grip on how hidden messages in E Coli origin look like. Should we give up? Let's try to proceed further and figure out what else can be done. And when we were looking for frequent words, our view of frequent words was very naive. We assumed that frequent words are simply k-mer. But maybe hidden messages that cell uses to initiate replications are more elusive, more subtle. Let's look at this origin for replication in Vibrio cholerae and try to see it. Maybe there is something else. We already found six nine-mer in this region, but maybe, just maybe, there is something else that deserves our attention. And if you look carefully there, then you will see that in addition to these six hidden messages there are two more. There are actually two nine-mers that look almost like canonical nine-mers. They differ just in a single mutation. And they also represent DnaA boxes because DNA can bind not only to the perfect nine-mer, but to different variation of this nine-mer. To find this type of more elusive frequent words, we need to solve frequent words with mismatches problem. When the input is a string Text and integers k and d, we want to find all most frequent k-mers with at most d mismatches in the Text. And finally after we've done this and run our frequent words with mismatches problem, we do find somewhat more elusive frequent words in E Coli genome. And they turn out to be indeed, real DnaA boxes, in E Coli. Now, what I described is a very idealistic view of how origin of replication can be found. In reality, some bacteria have fewer DnaA boxes, and our frequent words algorithm won't work for finding them. Terminus of replication is often not located directly opposite to the origin of replication. And the skew diagram is often more complex than in the case of E Coli.