In this video, we're going to learn what it is re-identification problem in computer vision and what are common deep learning approaches to handle this problem. Person re-identification is the problem of identifying people across images that have been taken using different cameras or across time using a single camera. Re-identification is an important capability for surveillance systems as well as human computer interaction systems. In the slide, you may already notice certain difficulties associated with this task. Let us look at these more closely. Person re-identification usually needs to match the person images captured by surveillance cameras working in wide angle mode. Therefore, the resolution of person images are very low, for instance around 48 by 128 pixels. And the lighting conditions are unstable too. Furthermore, the direction of cameras and the pose of persons are arbitrary. And therefore despite years of efforts, person re-ID remains challenging due to dramatic variations in visual appearance and ambient environment caused by different viewpoints from different cameras, significant challenges in human pose across time and space, background clutter and occlusions, and of course different individuals that share similar appearances. A standard approach to re-identification generally follows the pipeline sent by face verification algorithms. A typical re-identification system takes as input two images, each of which usually contains a person's full body. And outputs, either a similarity score between the two images or a classification of the pair of images as same if the two images depict the same person, or different if the images are of different people. Given two person images, they are sent to Siamese convolutional neural network. A neural network architecture work which consists of two copies of the same network. For two images, x and y, a Siamese network can predict a label to denote whether the image pair comes from the same subject or not. Many applications need to rank the images in the gallery based on their similarities to a probe image. Therefore a typical Siamese net outputs a similarity score instead. The structure of the Siamese net is shown in the slide. It is composed of two convolutional neural networks connected by a connection function. Connection function is used to evaluate the relationship between two samples. And cost function is used to convert the relationship into a cost. How to choose the connection function and cost function is closely related to the performance of the re-identification model. There are many distance similarity or other functions that can be used as candidates to connect to vectors such as euclidean distance, cosine similarity, absolute difference, vector concatenation and so on. One detail concerning person re-ID problem is of the aspect ratio of width to height of cropped images contained in persons is something on the order of one to three. Thus it makes sense to crop overlapping parts of person image in the way depicted in the slide. The crops are then passed into a number, say three, of sub-networks to perform effective feature extraction at the original scale. Without the crops, one would need to resize the image to the square for instance, and that would affect the scale of features, which would otherwise be different. Model architecture results in a part-based network. First, the three parts may share the first convolutional layer since the lowest level features are supposed to be the same for all parts of the image. Second, each part may have their own set of convolutional layers, which can help to learn per specific filters. Third, the higher level features of all parts may be fused at fully connected layer by some rule, for instance. Then the similarity of fused features is evaluated by the connection function. Driven by a common cost function, the three parts can contribute to the training process jointly. So person re-identification aims to establish person identity using images captured from multiple cameras with non overlapping views. This problem is really difficult and in fact, unsolved fully due to large pose and lighting variation, clutter and occlusions. A standard approach to re-ID is to learn a similarity score between two images, and that's similar to face verification. CNNs are a natural choice for first solving re-ID and furthermore, you can approach this problem within the part-based framework.