DNA is the kind of molecule that encodes your genome, the sum total of all your genetic information, all your genes. So let me explain this with an analogy. Your genome is sort of like a book of recipes, with a separate recipe for each type of molecule in your body. Each little physical piece that makes up your brain cells and your skin cells and your heart cells, etc. These are the machines that do the work of building and maintaining you. This recipe book is not written in English. It's written in a different language that you've probably seen before, where the alphabet consists of the four letters A, C, G, and T. These letters stand for different kinds of molecules, different bases. A for adenine, C for cytosine, G for guanine, and T for thymine. A DNA molecule is shaped like a double helix, this thing that looks like a twisted ladder. And the rungs of this ladder are made up of pairs of bases. Specifically, these are complementary base pairs. A is complementary to T and vice versa. And C is complementary to G and vice versa. And so if we look at one of the rungs of this ladder here, for instance, we see two colors, orange and red, which correspond to the two complementary bases, C and G. If we wanted to write down the sequence of bases that describe this molecule, we might do it like this. We might start all the way up at the top and then work our way down one side of the ladder, reading off each base as we go. So if we do this in this way, then we can take the DNA molecule and turn it into a sequence of letters, a string. And the fact that we can write a DNA molecule as a string has profound implications for how we can analyze it. And we'll return to this point later. Now really, DNA molecules are much longer than what I'm showing here on this slide. Human chromosomes, for example, are on the order of hundreds of millions of bases long. But even so, we can still think of these chromosomes as strings. They're very, very long strings, but they're still strings. So, here for example is a tiny snippet of the human genome. Again we're writing it as a string, and this string wraps around from one line to the next, sort of like if you were reading a book. So you would start in the upper left and then read the first line, and then go down to the next line, etc. Now when we look at this string, we don't really understand it, right, we don't know what this means. Is this one of those recipes we talked about before? Is this maybe many recipes together? It's not really clear. In fact, there is one gene in the middle of this sequence here, which I've highlighted here in red, and this gene is called HBB. And you can see it's spread across the genome in a few different pieces. So, we can sequence this short bit of DNA just sort of by eye, by looking at the colors of the rungs of this ladder. So how does a DNA sequencer sequence a genome? A crucial point is that DNA sequencers are not actually very good at reading long stretches of DNA. They're very good at reading short stretches of DNA, but lots and lots and lots of them. So this is what DNA sequencers do well. They read lots of short stretches of DNA. So let's look at an illustration. We start out with the DNA that we'd like to sequence, represented by this string at the bottom of the slide here. This is the input DNA. This input DNA might be, for example, your genome. And the DNA sequencer works by repeatedly reading off randomly selected substrings from the input DNA, randomly selected snippets out from the middle of the input DNA, many, many, many, many of them. So here are many of them. These snippets are called sequencing reads, or simply reads for short. But these reads are themselves very, very short, compared to the length of the input DNA. So, for example, like we said before, one human chromosome is on the order of about a hundred-million bases long. But massively parallel sequencers are these second generation sequencers, produce reads that are closer to about 150, or a couple hundred basis long, or so. So these reads are many orders of magnitude shorter than the input DNA. But the good news is that we get lots, and lots, and lots of these short snippets of DNA. Usually we have enough of these reads to cover the whole genome over many times over. In other words, we have redundant information about any given base of the genome that we are sequencing.