In this lecture, we're going to talk about how we're going to estimate the pose of the camera from the projections of N points we know in the scene. Imagine that we have a camera mounted behind the windshield of a car. And that we have a three-dimensional map of the environment. Like a map on Google Earth. The tips of these arrows are points with known GPS position and altitude. Full known XYZ. From the projections of those points on the camera, and the calibration, we can establish these arrays, these vectors in the scene. The question is, out of the angles between these rays, how can we find, where is the car? To study this problem, we're going to go back to the idea of bearing base navigation. Imagine that we have two lighthouses. And that the ship sees the two lighthouses and knows the position of the lighthouses on the map. The only thing that the ship can see in the night is the angle between the two lighthouses. This is the only angle we can measure. Imagine that we don't have a compass, so that we don't know where is the north. Where is the ship if we know the lighthouse is on the map and the angle spanning from the ship to the lighthouses? Which we call bearing. We see in the movie, that the geometric locks of the point that spans exactly the same angle to two fixed points, is a circle. And this fixed angle is actually the same while the point is in front of the points. Or on top of the points and that changes to 360 degrees minus this angle, if the angle is behind, like in the bottom. So we're just measuring the image that has two lighthouses and thus the angles to these two light houses. It is impossible to constrain the position of the camera. The camera can lie anywhere on the circle. Now, what happens in 3D? In the previous case, we had the ship which was constrained to lie on the S sea. A camera can lie anywhere. But we can generalize this, and when we move a circle around this specific axis to these two points corresponding to the segment connecting the two lighthouses. Then, our camera can lie anywhere on this surface which is called toroid. So, two points are not sufficient in 3D either. If we add the third point, we're going to see that we can uniquely estimate the position of the cover. As a matter of fact it will not be uniquely but there will be a finite number of solutions. This is a very old problem in photogrammetry, it's called the resection problem. Has a very long history, since the 17th century, and the first researchers, the first photogrammetrist who addressed this was Snellius and Pothenot. And the full solution was formulated in 1841 by Grunert in Germany. It was known as the Snellius-Pothenot problem. In computer vision, it corresponds exactly to the problem where we have three projections of points and from the projections of the points. If we know the calibration, we can find the angles between the race, d1, d2 and d3 in this picture? And we know the triangle in the world, which means we know the distances, d12, d23 and d13. If only two points were known, then we know that this camera would lie on a toroid. We can imagine that if we had one more point and one more measurement, we would have the intersection of two surfaces like the toroids. And with three points, we would have intersection of three toroids which would be justifying a number of points, like four or eight points. What can we say now about this triangle that are formed in this picture, in this tetrahedral? We can apply for any side of this tetrahedral the cosine law, what we learned in trigonometry in high school. And this cosign law says that the sum of the squares of the two sides, minus twice the product of these two sides, times the cosign of the angle, is equal to square of the third side. If we find the lengths of the sides, then we put them back in the rigid displacement equation, the blockalization equation of a camera. Where we know the capital PJ, which are the points in the scene. And the little PJs which are our projections. And now we're going to know the lambdas which are nothing else than the DIs in this equation. And we're going to solve for rotation translation. So first, we're going to find these lambdas or here in this tetrahedron that we have seen, they are the I's. So we can build for every side of the triangle, we can build a cosine law. This gives us three equations. In these three equations, we do the following trick. We try to eliminate one unknown. The way to eliminate one unknown is by setting, D2 equal UD1, and D3, equal VD1. And we can formulate all the three equations in such a way, that on the left we have D1 square. And on the right, we have the right hand side of the equations which are three of D1. These three equations, it would put them in pairs. Like in two pairs, the first and the third, then the first and the second, for example. They build two equations in U and V. Like the ones on the top of the slide. Observe again, we have the delta 13, delta 12, delta 23, which are the bearings we get from the pixels after calibration. And we have the D13 which are the lengths of the triangle in the world, and their known here as UNV. We have two quadratic equations and two unknowns. What we can do is to solve the first equation for U square and insert it in the second. If we do this, we're going to have a linear equation with respect to U in the second equation. Then we solve for U, and we put it back in the first equation. Getting a quartic, a four degree equation in V. This equation can have as many as four real solutions for V. And because we obtained it by eliminating first U, we're going to get also two equations for every U. So in total, we'll have eight equations. And in the next lecture, we're going to see how after we get this Ds, we can solve for the rotation translation. Now, there is a more direct way to solve for the PnP problem. Let's say that we have the pixel positions UIs, VIs, we apply the inverse calibration, we get again the XI and YI is calibrated coordinates. So we have these five entities for every point. The capital XYZ, the known position in this scene, and the little, the small XYs, which are the calibrated coordinates. If we divide through the third, through the Z coordinates, we are going to get two equations. Left hand side is known, are the calibrated coordinates. Right hand side has the XYZ, which is known and unknown derivation of the translation. This is a nonlinear problem for two reasons. First, we have things in the denominator. And second, we have the rotation which is a function of the oil parameters or it can be as the function of quaternions. And this is also a non-linear function. Instead, we're going to do the following dirty trick. We're going to multiply with the denominators and we're going to regard all the 12 unknowns as independent. Nine unknowns for the rotation matrix and, three unknowns for translation. So we have a total of 12 unknowns for which we can solve linearly. After we solve linear for these unknowns, we can find the closest rotation matrix. To the rotation matrix R prime dissolution which does not have necessarily to be orthogonal. We have seen how to do this using the singular value decomposition. This is the end of the lecture on the PnP, or the perspective endpoint problem.