We now present the performance analysis of this random coding scheme. We will show that the probability of the error event, that is W hat is not equal to W, can be made arbitrarily small, by a suitable choice of parameters. [BLANK_AUDIO] Consider the probability of the error event, which can be written as summation from w equals 1 up to M, the probability of the error event conditioning on W equals small w, times the probability of W equals small w. [BLANK_AUDIO] Now, the probability of the error event, given W equals small w, is simply equal to the probability of the error event given W is equal to 1. This is so by the symmetry of the construction, of the random coding scheme. Specifically because all of the codewords are generated randomly in exactly the same fashion. Because the probability of the error event, given W is equal to 1, does not depend on w, it can be taken outside the summation. Now summing the probability that W is equal to small w, for small w from 1 up to M, is simply equal to 1. And therefore, we have shown that the probability of the error event, is equal to the probability of the error event, given that W is equal to 1. So, without loss of generality, we assume that the message 1 is chosen. This is illustrated in the figure, in which the first codeword, tilde{X}(1) is sent through the channel, to obtain the received sequence Y. [BLANK_AUDIO] Now for w from 1 up to M, define the event, E_w, equals the event, that tilde{X}(w) and Y are jointly delta typical, that is the codeword for message w, and received sequence Y are jointly delta typical. [BLANK_AUDIO] Now observe that if E_1 occurs but, E_w does not occur, for all w from 2 up to M, then no decoding error would occur, because, according to our decoding rule, the received sequence, Y, would be decoded to message 1. Therefore, the probability of E_1 intersect E_2 complement, intersect E_3 complement, intersect all the way to E_M complement, given that W is equal to 1, is upper bounded by the probability that the error event does not occur, given that W is equal to 1. [BLANK_AUDIO] Now consider the probability of the error event, given that W is equal to 1. This is equal to 1 minus the probability that the error event does not occur, given that W is equal to 1. Using the above inequality, we obtain the upper bound 1 minus the probability of E_1 intersect E_2 complement, intersect E_3 complement, intersect all the way to E_M complement, given that W is equal to 1. [BLANK_AUDIO] This is simply equal to the probability of the complement, of E_1 intersect E_2 complement, intersect E_3 complement, intersect all the way to E_M complement, given that W is equal to 1. Now by means of de Morgan's Law, we can write this event as E_1 complement union E_2, union E_3, union all the way to E_M. [BLANK_AUDIO] Then by the union bound, the probability of the error event, given that W is equal to 1, is upper bounded by the probability of E_1 complement, given W equals 1, plus summation w equals 2 up to M, the probability of E_w, given W is equal to 1. [BLANK_AUDIO] By the strong joint AEP, the probability of E_1 complement, given W is equal to 1, that is the probability that tilde{X}(1), the first codeword, and the received sequence Y, are not jointly delta typical given that W is equal to 1, is less than some small quantity nu that goes to 0, as delta goes to 0. [BLANK_AUDIO] This is illustrated in the figure. [BLANK_AUDIO] Now conditioning on W equals 1, that is, the first codeword has been sent, for w equals 2 up to M, tilde{X}(w) and the received sequence Y, are n i.i.d. copies of the pair of generic random variables, X prime, Y prime, where X prime has the same distribution as X, and Y prime has the same distribution as Y. It is because all the codewords are generated in exactly the same way, and so they have the same distribution. Namely, p(x) to the power n. [BLANK_AUDIO] Since a DMC is memoryless, X prime and Y prime are independent. [BLANK_AUDIO] The idea here is that the codeword tilde{X}(1) is independent of all of the codewords tilde{X}(2), up to tilde{X}(M). Because the received sequence Y is obtained through the transmission of the codeword tilde{X}(1), intuitively, tilde{X}(2) up tilde{X}(M), are independent of Y. [BLANK_AUDIO] Now for w equals 2 up to M, the probability of E_w, given W equals 1, is equal to the probability that the codeword tilde{X}(w), and the received sequence Y, are jointly delta typical given that W is equal to 1. [BLANK_AUDIO] By Lemma 7.17, this is upper bounded by 2 to the power minus n, times the mutual information between X and Y, minus tau, where tau goes to 0, as delta goes to 0. [BLANK_AUDIO] Note that our choice of M satisfies 1 over n times log M, is less than the mutual information between X and Y, minus epsilon over 4. And this implies that M, the total number of messages, is less than 2 to the power n times I(X;Y), minus epsilon over 4. [BLANK_AUDIO] Therefore, the probability of the error event, which is equal to the probability of the error event, given that W is equal to 1, is upper bounded, by the probability of E_1 complement, given W equals 1, which is less than nu, plus summation w equals 2 up to M, the probability of E_w, given that W is equal to 1, which is less than or equal to 2 to the power minus n, times I(X;Y) minus tau. Now in this summation, the number of terms is upper bounded by 2 to the power n, times I(X;Y) minus epsilon over 4. Now this I(X;Y) and this I(X;Y) cancel with each other. And so, we have nu plus 2 to the power minus n, times epsilon over 4, minus tau. [BLANK_AUDIO] Now let us take a close look at this term, 2 to the power minus n, times epsilon over 4, minus tau. [BLANK_AUDIO] Recall that epsilon is fixed. Since tau tends to 0, as delta tends to 0, we can choose delta to be sufficiently small, so that epsilon over 4, minus tau, is greater than 0. [BLANK_AUDIO] If so, 2 to the power minus n, times epsilon over 4, minus tau would tends to 0, as n tends to infinity. [BLANK_AUDIO] Now let nu be less than epsilon over 3, which is so for a sufficiently small delta, and let n be sufficiently large, so that 2 to the power minus n times epsilon over 4, minus tau, is less than epsilon over 6. [BLANK_AUDIO] Then we obtain the probability of the error event, less than epsilon over 3, plus epsilon over 6, which is equal to epsilon over 2. Thus, we have shown, that with a suitable choice of parameters, for the random coding scheme, the probability of the error event, is less than epsilon over 2, for any arbitrarily small epsilon, when n is sufficiently large. [BLANK_AUDIO] The idea of the analysis is the following. First of all, let the block length n be large. [BLANK_AUDIO] Then by the joint AEP, the probability that tilde{X}(1), that is the codeword being sent, is jointly typical with the received sequence Y, would tend to 1. [BLANK_AUDIO] At the same time, the probability that any other codeword is jointly typical with the received sequence, Y, is approximately equal to 2 to the power minus n times the mutual information between X and Y. [BLANK_AUDIO] If the size of the codebook, that is M, grows at a rate strictly less than the mutual information between X and Y, then the probability that a codeword tilde{X}(w) jointly typical with the sequence Y, for some W not equal to 1, can be made arbitrarily small. [BLANK_AUDIO] Then the probability of the error event, that is W hat not equal to W, can be made arbitrarily small. [BLANK_AUDIO] We have already shown that for the random coding scheme, the probability of the error event can be made arbitrarily small. We now show the existence of a deterministic coding scheme, that can do the same. [BLANK_AUDIO] According to the random coding scheme, the probability of the error event, can be written as the summation over all codebook C, the probability of the codebook C being chosen, times the probability of the error event for that codebook C. In other words, the probability of the error event is a weighted average of the probability of the error event over all the codebooks. [BLANK_AUDIO] Then there exists at least one codebook C star, such that the probability of error P_e, is less than or equal to the probability of error, weighted over all the possible codebooks, which we have shown, is less than epsilon over 2. And by construction, this codebook has rate 1 over n, times log M, greater than I(X;Y), minus epsilon over 2. By letting p(x) be the input distribution, that achieves the channel capacity, the rate of this code can be arbitrarily close to the channel capacity. [BLANK_AUDIO] To complete the proof of the achievability of the channel coding theorem, we are now going to show that from this deterministic code, with P_e, the average probability of error, less than epsilon over 2, we can construct another deterministic code whose maximal conditional probability of error, lambda_max, is less than epsilon. Without loss of generality, assume that the conditional probability of errors are in ascending order. That is lambda_1, less than or equal to lambda_2, less than or equal to all the way to lambda_M. From P_e less than epsilon over 2, we have 1 over M, times summation w equals 1 up to M, lambda_w, less than epsilon over 2. This implies that the summation of all the lambda_w's is less than M over 2, times epsilon. [BLANK_AUDIO] Since M is even, M over 2 is an integer. Then the second half of the summation, namely summation w equals M over 2 plus 1, to M, lambda_w, which is upper bounded by the summation, w equals 1 up to M, lambda_w is less than M over 2, times epsilon. [BLANK_AUDIO] Thus we have shown that the second half of the summation, is less than M over 2, times epsilon. [BLANK_AUDIO] From this, we have 1 over M over 2, times the second half of the summation, is less than epsilon. As there are exactly M over 2 terms in the second half of the summation, this means that the average of all the lambda_w's for w from M over 2 plus 1 to M, is less than epsilon. [BLANK_AUDIO] Since we assume that lambda_w are in ascending order. [BLANK_AUDIO] The smallest value, in the second half of the summation, namely lambda_{M over 2 plus 1}, is less than epsilon. [BLANK_AUDIO] And this implies that lambda_{M over 2}, which is less than or equal to lambda_{M over 2 plus 1}, is also less than epsilon. [BLANK_AUDIO] The conclusion is that if the average probability of error, P_e is less than epsilon over 2, then lambda_1, lambda_2 up to lambda_{M over 2} are all less than epsilon. Thus, by discarding the worst half of the codewords in the codebook C star, namely, those codewords that correspond to lambda_{M over 2 plus 1}, up to lambda_M, we obtain a new codebook with lambda_max, less than epsilon. [BLANK_AUDIO] After discarding the worst half of C star, the rate of the code becomes 1 over n, times log M over 2, because the size of the message set, instead of M, is now equal to M over 2. Now this is equal to 1 over n, times log M, minus 1 over n, where 1 over n times log M, is greater than I(X;Y), minus epsilon over 2. By letting n be sufficiently large, so that 1 over n, is less than epsilon over 2, this is greater than I(X;Y) minus epsilon. Here, we assume that the decoding function is unchanged, so that deletion of the worst half of the codewords does not affect the conditional probabilities of error, of the remaining codewords. This completes the proof of the achievability of the channel coding theorem.