We now describe the Blahut-Arimoto algorithm for computing the channel capacity C, which is obtained by specializing the general alternating optimization algorithm. Recall the double supremum in section 9.1, that is sup u_1 in A_1, sup u_2 in A_2, f(u_1,u_2), where A_i is a convex subset of the n_i-th dimensional Euclidean space for i equals 1 and 2. f, a function from A_1 times A_2 to R is bounded from above, and is such that f is continuous and has continuous partial derivatives on A_1 times A_2. For all u_2 in A_2, there exists a unique u_1 in A_1 that maximizes f. And for all u_1 in A_1, there exists a unique u_2 in A_2 that maximizes f. Now we cast the computation of C into this optimization problem. Let f(r,q) be summation x, summation y, r(x) times p(y|x) log q(x|y) divided by r(x). In the above expression, r plays the role of u_1 and q plays the role of u_2 in the double supremum problem. Let A_1 be the set of all strictly positive input distributions r, that is r(x) is greater than 0, for all x, and summation x, r(x) is equal to 1, where A_1 is a subset of the Euclidean space of dimension equal to the size of the alphabet X. And A_2 be the set of all reverse transition matrix q under consideration, that is q(x|y) is greater than or equal to 0 for all x and y, q(x|y) is greater than 0 if and only if p(y|x) is greater than 0, and summation x q(x|y) is equal to 1 for all y, where A_2 is a subset of the Euclidean space with dimension equal to the size of the alphabet X times the size of the alphabet Y. We now check that the function f, the set A_1, and the set A_2, so defined satisfies the requirements in the double supremum problem. First, we note that A_1 and A_2 are convex. Second, f is bounded from above. It is because, the double summation defining f is maximized when q is equal to q star. In that case, the double summation is equal to the mutual information between X and Y which is upper bounded by entropy of X and further upper bounded by log of the size of the alphabet. Now in f(r,q), the double summation by convention is over all x such that r(x) is greater than 0 and all y such that, p(y|x) is greater than 0. Since q(x|y) is greater than 0, whenever p(y|x) is greater than 0, all the probabilities involved in the double summation are positive. Therefore, f is continuous and has continuous partial derivatives on A equals A_1 times A_2. So, we have checked all the requirements in the double supremum problem except for the requirement that f is uniquely maximized when one of the components is fixed. We will deal with that later. The double supremum now becomes sup r in A_1, sup q in A_2, f(r,q), equals sup r in A_1, sup q in A_2, summation x, summation y, r(x) times p(y|x), log q(x|y) divided by r(x), where the supremum over all q in A_2, by lemma 9.1 is in fact, a maximum equal to I(r,p). And by theorem 9.2, f star which is the value of this double supremum is in fact, equal to C. [BLANK_AUDIO] We now fill in the algorithm details. By lemma 9.1, for any given r in A_1, the unique q in A_2 that maximizes f, is given by q(x|y) as displayed, which is the reverse transition matrix that corresponds to r as the input distribution and p as the forward transition matrix. We will show in a moment by Lagrange multipliers, that for any given q in A_2, the unique input distribution r that maximizes f, is given by r(x) equals the product over all y, q(x|y) to the power p(y|x), divided by a summation which is actually the normalizing constant, because r(x) is a probability distribution. In the above, the product is over all y, such that p(y|x) is greater than 0. Let r^0 be an arbitrary chosen strictly positive input distribution in A_1. Then, q^0 in A_2 can be computed according to equation 1. This forms the pair, (r^0, q^0). Then, compute r^1, q^1, r^2, q^2 iteratively, by applying equation 2 and equation 1 alternately. It can easily be verified from equation 1 that if r^k is in A_1, that is r^k is strictly positive, then q^k of x given y is greater than 0 if only and if p(y|x) is greater than 0, that is q^k is in A_2. Likewise, it can be verified from equation two, that if q^k is in A_2, then r^{k+1} is strictly positive, that is, r^{k+1} is in A_1. Therefore r^k is in A_1 and q^k is in A_2 for all k greater than or equal to 0. In other words if we start with any r^0 that is strictly positive, then the subsequent r's and q's will satisfy the required constraints. It will be shown in section 9.3 that f^k, that is f(r^k,q^k) converges to f star, which is equal to C. We now discuss maximizing f(r,q) for a fixed q. In this maximization problem, q(x|y) which is given, is highlighted in blue. In this maximization problem the constraints on r, are summation x, r(x) is equal to 1, and r(x) is greater than 0 for all x. Now, we use the method of Lagrange multipliers to find the best r by ignoring temporarily the positivity constraint on r in equation two. Let J be the double summation in the maximization problem minus lambda, the Lagrange multiplier, times summation x, r(x), which is the equality constraint on the R axis. For convenience sake, we assume that the logarithm is the natural logarithm. This makes no difference because we are doing maximization. Differentiating with respect to r(x), we have partial J by partial r(x) is equal to summation y p(y|x) log q(x|y) minus log r(x) minus 1 minus lambda. Upon setting partial J by partial r(x) to zero, we have log of r(x) is equal to summation y, p(y|x) log q(x|y) minus 1 minus lambda, or r(x) is equal to e to the power minus lambda plus 1, times the product over all y q(x|y), to the power p(y|x). By considering the normalization constraint in equation one, we see that e to the power minus lambda plus one is actually the normalization constant. And so, we can immediately obtain r(x) is equal to the product here, divided by, summation of the same product with x replaced by x prime, over all x prime. The product in the above equation is over all y, such that p(y|x) is greater than 0, and for all such y's, q(x|y) is greater than 0. This implies that both the numerator and the denominator on the right hand side are positive. And therefore, r(x) is greater than 0. In other words, the r thus obtained happen to satisfy the positivity constraints in equation two, although, these constrains were ignored when we setup the Lagrange multipliers. However, it is not clear that r as given in the equation three is a maximum or a minimum. We will show in section 9.3.2 that f is concave. Then r as given in three, which is unique, indeed achieves the maximum of f for a give q in A_2, because r is in the interior of A_1.