Hi! And welcome to this class. As we can see from the figure it is true that the Smith-Waterman Algorithm is composed by four phases but, at the end of the day, considering that the first one is used to read all the necessary inputs and the last one has the objective of giving back the computed results, it can be basically considered as a 2 phases, or steps, algorithm. We are going to refer to the first phase as matrices calculation phase, while to the second one as traceback step. We know that, given: a query N, a database M, a similarity system and an affine gap function, the algorithm calculates the scoring matrix S and the Traceback Matrix T, whose dimension is NxM each. Finally it traces back the path of elements starting from the index identifying the maximum element in the similarity matrix, and terminating whenever a value with value zero is found. Let us now start defining what happens in the Matrices calculation step, where the similarity and traceback matrix are created. It's worthy to mention that, during the calculation of the similarity matrix, it is as well needed to keep track of the index of the maximum element found, as this values is going to be used for the second step of the algorithm. Each element of the similarity matrix is populated by using the relation presented in the following formula, where we can find: - the capital "S" is used to identify the Similarity Matrix, - capital "S(i,J)", is the i, j element in the Similarity Matrix, - "s(N_i,M_j)" represents the similarity function over the two strings N_i and M_j, - "gap del" is the gap scoring value in case of deletion, and finally - "gap ins" is the gap scoring value in case of an insertion. As an example of matrix calculation, we can consider the one in the figure. For each element of the matrix it is necessary to compute its three dependency values first. The dependency values we are referring to, are the ones highlight in yellow in the figure, and are: - the value above the current one, or north value, the value on the left of the one we are considering, or west value, and the value that is positioned in the upper left position, or north-west value. Given that we computed these three values, it is now possible to compute the current one by performing some basic operation. For all the cell of the matrix where there is not one of the dependency value, as for the first row and first column, the dependency values are supposed to be equal to zero. While computing the similarity matrix as described so far, it is necessary to keep track of what is the maximum value inside the matrix. In particular we are interested in knowing what is the index of this value so that it can be used in the traceback. The similarity matrix in fact, is not the only matrix that needs to be computed at this stage. It is also necessary to compute the traceback matrix in parallel to the first one. This second matrix is used to store all the directions that the second step will have to follow so that the final string can be obtained. In this figure we can see an example of a traceback matrix. As it can be understood from the formula presented before, the similarity store will hold the maximum value between zero and the three dependency values. The first one is a value composed of the sum of the element that lies on the anti-diagonal and another value that depends on the similarity function applied. The second element, is the sum of the value on the left of the one considered and the gap value in case of deletion. The third value is composed of the gap value in case of insertion and the element over the one considered. Once identified the maximum between this three values, the algorithm stores in the traceback matrix the position of the value that originated the identified maximum. As an example, in case of the maximum value is the first one, the program will store in the traceback matrix the value north-west, in the second case it will store west and in the third one north. In case none of the previous is the maximum, it means that the maximum value is a zero, then the program will store center. At the end of the first step of the algorithm two matrices were produced: the Similarity Matrix and the TraceBack Matrix. After this, we can than move to the second step of the computation, the TraceBack. This is the final step and produces the real output: the optimal local alignment between the query and the database. This step, starts from the maximum index identified before and iteratively passes through the elements in the TraceBack matrix by following the stored directions. The process continues until a cell where the stored value is center is encountered. Finally it outputs how the query sequence aligns to the database one. The output can be generated/provided by using multiple formats. An output example can be represented by the cigar string, which specifies the length of the bases aligned as well as the associated operation that can be base alignment, both match and mismatch, deletion and insertion of bases. Now that we have the basic knowledge of the Smith-Waterman algorithm, we are ready use them to properly accelerate the Smith-Waterman algorithm on an FPGA.