Here we're looking at an actual real sequencing read from a real DNA sequencer. So this read is encoded in a typical file format called FASTQ. We see four lines of information. This first line contains the name of the read. The name of the read might encode some information about the experiment that it came from, maybe the kind of sequencing machine that was used, maybe where on the slide this particular cluster was located. But none of this is really information that's going to have a bearing on what we do with the sequencing read, so we're more or less going to ignore the name line for now. The second line is the sequence of bases as reported by the base caller. This is a very important line. This is the sequence of the read. This third line also can be ignored. It's more or less a placeholder line, so we can ignore that third line. And then the fourth line is a sequence of base qualities. It looks pretty strange the first time you look at it, but as we'll see, this string encodes the base qualities for the corresponding bases in the sequence line. By the way, a typical FASTQ file is just a bunch of records like this, all concatenated together into one big file. So I'm showing you a picture of the beginning of a FASTQ file here. It's just the first five reads worth of the FASTQ file. But as you can see, the FASTQ file is really just read after read after read, a set of four lines after another set of four lines, etc. Let's learn more about that fourth line, the base quality line. So the characters in the base quality line match up with corresponding characters in the sequence line, but what do one of those base qualities mean? So what does this H right here mean in the base quality string? Each base quality is an ASCII encoded representation of that value Q. So if you watched the optional videos just prior to this one, then you already know what Q is. But the short version is that Q is just an adjusted version of the probability, which is called P that the base call is incorrect. So when Q is higher, we can be more confident that the base is correct, and when Q is lower, we're less confident that the base is correct. And if you want to see more details about why we would use this particular scale for queue, or how we would go about estimating the probability P, you should watch the optional lectures that come right before this one. So, I said that the quality values are ASCII-encoded. In other words, we're using a character to encode an integer. And this table here is called the ASCII table. It shows the mapping between characters and integers. And the letters on this diagram are pretty small, but for example, if you look right here, you can see that lowercase m, one of the characters is lowercase m, it maps to an integer, 109. So base qualities are numbers, they're not necessarily integers, but we want to encode them as characters somehow, so how exactly do we do that? The particular method that we'll use, the method that all the data that we'll look at in this class uses, is called Phred 33. And what Phred 33 says is that if you want take a base quality Q and convert it to the corresponding character, then you simply round the Q off to the nearest integer, and then add 33 to it, and then turn it into the corresponding character, according to that math that we saw on the previous slide. So what I have here are two python functions that will help you to do this. So this first one is called QtoPhred33. This function, if you give it a quality value Q, which is already rounded to the nearest integer, then it's going to add 33 and turn it into a character according to that ASCII table I showed before. And this second function is called phred33ToQ. This function does the opposite, so if you give it a character, that is the phred33 encoded quality value, it will turn it back into Q for you. So it'll first of all take that character and turn it into an integer, and then second of all, it'll subtract 33. So these two functions are the inverse of each other. And you don't have to memorize these functions. They'll be shown to you and used in the following practical session.