I’m interested in knowing the position of the camera with respect to a known object (calibration object). This is of interest because I can normalize lots of pictures taken of the same place. I can normalize them in such a way that the camera position would be similar (to a certain degree) in all of them. Also, fiddling with opencv is, up until now, cool :)

At the moment I finished the first phase of this mini-project. I have managed to calculate the extrinsic camera values from my laptop webcam and I can calculate the extrinsic values of a known object, specifically a chessboard printout, moving around in front on the webcam. After calculating the intrinsic values, which I don’t care much about, the algorithm outputs the extrinsic values to stdout. I can see that movements in pitch, roll and yaw are consistently output to my shell. I can also see the three directional movements in the three directional axis.

I used opencvs chessboard detection algorithms and its solvePnP and calibrateCamera functions. The command accepts a list of images or a stream from a camera. I prefer to use a camera stream for testing, but the final objective is to use it with list of images. The gist of the process goes something like this:

- Calibrate camera (get intrinsic values): The algorithm detects some points in the chessboard image and relates them to the “real” object points. By using these two sets of information, the algorithm can calculate the camera distortion information and the camera matrix information [1][2]. The calibration takes 20 images/frames.
- Even though I get intrinsic values after the camera calibration, I am only interested in the extrinsic values. So I use the found intrinsics and pass them to solvePnP to get only the extrinsic values.[3]
- I output each extrinsic I find.

My next move is to use the re-projection error to improve the intrinsic values calculation. Hopefully that will increase the accuracy of the calculated extrinsic values. I also want to put my code in some kind of git repository so I can keep track of it better.

The following were links that helped me find my way through the math and the coding:

http://www.vision.caltech.edu/bouguetj/calib_doc/ (It has matlab code and an extensive explanation of what is happening under the covers.) http://www.youtube.com/watch?v=DrXIQfQHFv0 (Cool video that shows what can be done with the extrinsic parameters) http://www.amazon.co.uk/Learning-OpenCV-Computer-Vision-Library/dp/0596516134/ (chapter 11, on camera models and calibration. Very good explanation and more code goodies )I’ll put my code on my ITU page for now (until I get something better in the research group server, or until I put it in github), Comments and patches are greatly appreciated :

http://www.itu.dk/people/jogr/opencv/imageadjust.tar.gz [1] http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/EPSRC_SSAZ/node3.html [2] http://opencv.willowgarage.com/documentation/camera_calibration_and_3d_reconstruction.html#calibratecamera2 [3] http://opencv.willowgarage.com/documentation/cpp/camera_calibration_and_3d_reconstruction.html#cv-solvepnp
Can You please tell me: the [R|t] are with respect to which coordinate system?? is it the trans and orient of the camera w.r.t. a fixed coordinate system in the room or the calib. chessboard’s origin w.r.t. the camera/frame coordinate system. and is the way suggested by opencv to go from (X,Y,Z) to (u,v) correct (http://opencv.willowgarage.com/documentation/camera_calibration_and_3d_reconstruction.html), it would mean the world to me if somebody can answer this question once and for all.

Thank you :)

Hey Mohamad.

Thx for the comment :)

1. The [R|t] matrix is a transformation matrix and its values are not relevant to any of the two coordinate systems (camera nor world). Notice on the link that you gave me that the vector with the real world coordinates is [X,Y,Z]. And the vector with the camera coordinates is [u,v]. Further notice that you can use [R|t] to go from real to camera coordinates or from camera to real. The equation in the link goes from real to camera. To go from camera to real, you need put [R|t] and A on the other side of the equality.

2. Is it correct? I have used this model with very good results. In my experiments, this representation is correct.

Hope this helps…

Thank you very much for your fast response, it was very helpful, cheers.

I am getting good resluts at last (i.e. shape from silhoutte : the visual cones from 4 camers are aprox intersecting where the object of observation is, but the intersection is far from resembling that object :( ). I think that the calibration param are not accurate enough!

My question is how accurate should the chess board be(mm or 10th of mm …) in order to calibrate cameras fixed to a 3 m ceiling, and is it possible to get the intrinsic param of a camera with a chessbord grid (35mm^2 square size), and then use another with (200mm^2) set on the floor in the middle of the room to get the extrinsic params of that camera? I hope the problem is interesting enough for you to answer :). Thanks again.

Hey Mohamad.

Glad my first response helped :). To answer you other questions.

1. Size of the squares in the chessboard. The “real world” is not the only thing that matters. What matters (in my experiments) is the number of pixels that represent a square. This depends on the size of the chessboard and on the distance of the camera from the chessboard. The lowest I went (with good results) was 30^2 pixels per chessboard square. That is, each square was represented in the image plane by a 30 by 30 matrix of pixels. I’m guessing you could go lower but did not test :).

2. Calculations with different sizes. To begin with note that you will not get the exact same values for two different calculations. Even if you do them with the same chessboard. The noise from the setup (lighting, lens aberrations, camera sensor characteristics….) is too much to be exact. With that said, I think the algorithm should receive a size parameter. Look at opencv’s example (http://code.opencv.org/svn/opencv/trunk/opencv/samples/cpp/calibration.cpp) and notice how they have the “squareSize” in calcChessboardCorners. This size parameter should adjust for the change in the square size.

Not sure if I answered your questions…

Don’t hesitate to keep on asking.

Cheers

hey i have found out intrinsic parameters …how can i found out extrinsic ones??? im using webcam (integrated)??

Hey abdullahmjl11.

Sorry for the late reply. OpenCV allows you to find the extrinsic parameters (where the objects are located relative to the camera sensor) with cvCalibrateCamera2 or cvFindExtrinsicCameraParams2. The name of the methods vary if you are using opencv python or opencv c++.

Hope it helps.

Hi there.

I want to develop a tool using openCv (Python version) that does, pretty much, exactly what you described here. It was really reassuring to realise that my idea has been successfully implemented before, however I still have some problems.

Camera calibration is fine, didn’t take me too long to get that done, however I have problems with the values returned by solvePnP. They seem wrong, regardless of whether I look at the camera pose or just the rvec and tvec (compared with the physical dimensions of the setup I made to test the approach). The distance from the chessboard (which I also use for location finding, not only calibration) is off by 10-25% and the x & y coordinates by up to several hundred %.

I wanted to ask about the definition of the coordinate systems used by solvePnP (camera and world), primarily the directions of axes. I’ve read that the world coordinate can be specified however (for me it’s one of the corners of the chessboard). Is that true?

Also, what units solvePnP outputs the tvec in? I’ve read that this are the calibration objectPoints units (i.e. mm in my case), but maybe my problem is in this conversion?

Thanks a lot, I’ll appreciate any help.

Hey Alek

Thx for the comment.

Before I try to answer: Though it was a long time ago, I remember that I gave up on solvePnP because I realized that there was an easier way of re-projecting images into common view points by using OpenCV’s warpPerspective function. While before I was trying to correct for the rotation and translation by using solvePnP, I ended up just warping the image with four known image points (https://github.com/Joelgranados/EcoIS/blob/master/src/ilacImage.cpp#L186). So if your objective is to re-project images, you might want to try the warpPerspective function instead of solvePnP (It was easier in the end for me).

From your question I understand (and correct me if I am wrong) that you are using solvePnP function of OpenCV and you are passing it one objectPoint with its corresponding imagePoint. After the calculation the distances in tvec and rvec are not consistent with your setup. You also want to know about the units of tvec.

I would suggest you use more than one point. I would suggest you use all the chessboard corners you have at your disposal. Remember that the objectPoints need to correspond to the imagePoints. The world coordinates can indeed be specified in whatever order, but you need to make sure that you use the same order in the imagePoints. The units indeed depend on how you call the calibration function, I think they are implied in how you define your objectPoints for the chessboard.

Not sure if any of this helps. Feel free to post another question if you think that I have not understood what you are asking.

Hey, thanks a lot for a quick reply. Yes, I do use all of the chessboard corners. I’m trying to determine the location of the object with respect to the camera only, but I’ll look into this function that you mentioned, thanks. Same with the order in which object and image points are passed into the function (I did check it some time ago but just to make sure…).

Dear Sir,

I am trying to implement monocular visual odometry in opencv python. I need to calculate the distance moved by the camera in real world.

I obtained the fundamental matrix, rotational matrix and translational matrix for each image frame separately (if i took a pair of images, then I get TWO rotational and translational matrices). I got this using the cv2.calibrateCamera function in opencv. The outputs are the instrinsic and extrinsic parameters. Or is that the wrong way to obtain R and T (I plan to use this for images other than the chessboard. For eg. a room) ?

The rotational matrix is a 3×3 matrix and the translational matrix is a 3×1 matrix. Can you tell me how I could use these to calculate the real world distance travelled by the camera in successive frames? Also, what exactly do the rotational and translational matrices signify? If it defines the motion of the camera, then why does it exist for a single image (there’s no motion in a single image right)? Can you point me in the right direction? Any help would be appreciated. Thankyou.

I think that you have misinterpreted the results from calibrate camera. The matrices that you refer to (translation and rotation matrix) are the values that express how the world coordinates map to the projected (on the sensor) coordinates. If you are looking for a way to define the location of the camera, you should start with visual SLAM (http://en.wikipedia.org/wiki/Simultaneous_localization_and_mapping) which might have a bit more than just localization, but its a good starting point.

Hope it helps

Hi again,

I was wondering if it would be possible to do the reverse process. If the world coordinates can be mapped to the sensor coordinates using R and T, then would it not be possible to find the world coordinates if we know the sensor coordinates? I assume that sensor coordinates refer to the pixel coordinates of an interest point in the image.

Hi,

I have noticed that you have also used the Extrinsic values to determine the camera’s position. I think our requirements are similar. If I know how my camera moves with respect to an interest point in the image scene, I could plot the camera movement. Would it also be possible to find the actual distance in metres travelled by the camera?

Thanks

Hey Clive.

In theory your idea should work, but you need to be careful of your assumptions:

1. You would need to see the same chessboard at all time.

2. The camera position would be only relative to the chessboard.

3. Your movements is always with respect to the chessboard, so if the chessboard moves, your camera position calculations will be compromised.

4. Notice that if you want to use other patterns different than the chessboard you will have to use some other method besides calibrateCamera. That method (AFAIK) assumes a chessboard.

I still think you might be better off searching for a visual SLAM algorithm which addresses your problem in a general way.

Cheers

Thanks for the reply. Why can’t I use something other than chessboard? I could get the fundamental matrix and distortion coefficients from camera calibration using the chessboard images. The rotation and translation matrices can be obtained from solvePnP right (which takes 3d points, 2d points, fundamental matrix and distortion coefficients as input parameters)? The 2d points would be my (u,v) corners detected from a corner detection algorithm like cv2.goodfeaturestotrack. But I don’t know what 3d points I should pass. How can I get that? I noticed that the camera calibration program in opencv uses cv2.calibratecamera which also needs 3d points (objpoints) and 2d points (imgpoints) as input parameters. The imgpoints are obtained from cv2.findchessboardcorners. But I don’t understand how objpoints are obtained and why it is done that way. The beginning of the code has objp = np.zeros((6*7,3), np.float32) and

objp[:,:2] = np.mgrid[0:7,0:6].T.reshape(-1,2). Why do we do this? Shouldn’t the 3d points be actual points in the real world and not just some orderly set of points that we define?

PS: code for camera calibration is as given on opencv docs http://opencv-python-tutroals.readthedocs.org/en/latest/py_tutorials/py_calib3d/py_calibration/py_calibration.html

Hey Clive.

All awesome questions :)

Why can’t I use something other than chessboard?

– You can use something different. The chessboard is just easier.

But I don’t know what 3d points I should pass. How can I get that?

– The “real” points of the chessboard are actual points in the “real world”. We first need to select an arbitrary (0,0,0) point in the “real world”; for convenience it is going to be one of the corners of the chessboard. Then we need to give all of the chessboard corners “real coordinates”; notice that all these coordinates are going to be in one 2D plane of the 3D world (because its a flat chessboard). The first coordinate is (0,1,0), the second is (0,2,0), the third is (0,3,0)… (1,0,0), (1,0,0)…. This is why we define the world coordinates ourselves, because we know what they are. They are just points in a chessboard separated by the same unit of measurement (any integer, 1 in this case). This is the reason why its easier to use the solvePnP with the chessboard (or something you know the coordinates for). But you want to use something arbitrary (like a room) which is difficult to get the 3D coordinates; specially if that arbitrary object is going to change every time. This again takes you to your general question: How to find a reference point for the camera with an arbitrary scene? I still think that visual SLAM is the place to begin.

Thankyou for all your quick responses. Implementing Visual SLAM is out of the question as I am short of time (a week or so until the deadline :( …). So I understand that using a chessboard, I could measure movement in a 2D plane and not depth. But, the Z in the chessboard is 0. So this is an initial point. What happens when I move the chessboard backward? Would Z be incremented from 0 to 1 or something (I noticed that one of the youtube links you posted above shows the chessboard being moved back and forth while the movement was being reflected in another window. How is the depth being calculated there?)? Or could I solve this problem by calibrating using a chessboard that is in a 3D plane, like a staircase or something (I know this sounds wierd)? Then the world points could be given as (0,0,0), (1,0,1), (2,0,1)….(1,1,1),(2,1,1) etc..

You *can* measure depth with the chessboard even though its a 2D. When the 2D object becomes smaller, you are moving away and when it becomes bigger, you are moving towards. This should be the third coordinate of tvec returned from findextrinsiccameraparams2 function. Are you getting an unexpected value?

2. The chessboard sample images given with opencv are not always taken perpendicular to the camera. Its taken in different angles. So how can we put Z=0? Even though the chessboard is a 2D plane, the chessboard orientation with respect to the camera has varying Z right? (If you still don’t get what I’m trying to convey, picture a chessboard that’s slanting backwards)

Indeed. You need to move the chessboard around so you get a good sense of the camera system: lens distortion, imager distortion, focal length…. But if the chessboard is just slanting, then its only a rotational change, not a translation one. You would actually have to move the chessboard forward to get a translation.

Hi,

With respect to which point on the chessboard is the rotation and translation calculated (since every point on the chessboard will have a different rotation and translation)?

First, remember that the chessboard needs to have special characteristics. As stated in “learning Opencv”: “In practice, it is oſten more convenient to use a chessboard grid that is asymmetric and of even and odd dimensions—for example, (5, 6). Using such even-odd asymmetry yields a chessboard that has only one symmetry axis, so the board orientation can always be defined uniquely.”

This is how the book answers your question: “You can envision the chessboard’s location as being expressed by (1) “creating” a chessboard at the origin of your camera coordinates, (2) rotating that chessboard by some amount around some axis, and (3) moving that oriented chessboard to a particular place.”

The book also mentions that if you want to get a rotation matrix out of the rvector you need to use the rodrigues transform: “Each of these rotation vectors can be converted to a 3-by-3 rotation matrix by calling cvRodrigues2()”

I strongly suggest you get “learning OpenCv” (http://www.amazon.com/Learning-OpenCV-Computer-Vision-Library/dp/0596516134/ref=sr_1_1?ie=UTF8&qid=1421834697&sr=8-1&keywords=learning+opencv). It has a whole chapter on how to do camera calibration. And it has all the mathematical explanation. Additionally, it points you to other papers that might be of interest.

Hope it helps

ret, corners = cv2.findChessboardCorners(gray, (9,6),None) finds the corners of the chessboard starting from bottom right to top left. So if we assign the imgpoints as (0,0,0),(1,0,0),(2,0,0)…(0,1,0), (1,1,0),(2,1,0) etc… would it be correct? If the first corner (bottom right) is assigned (0,0,0) then the next point (one step left) should be (-1,0,0) right?

I have not tried to assign negative values to the “real” object, but it should work. But why would you want to do this? If you give it positive values, the algorithm should calculate the rotation vector; even if it is rotated 180 degrees.

hi,

i am in deep trouble with m project work. can u help me with this doubt.

i hav to find how much distance did my camera move. so can i use rotational and transitional matrix of images , taken from tat cam inorder to know how much did my camera move.

thanks a lot. I’ll appreciate any help

Hey Febin.

As Mentioned before, I think that a visual SLAM algorithm is more fit for camera localization than opencvs camera calibration code.

Cheers

hi,

how do you find translational and rotational matrix of an image (not chess board) and also how do i find X Y Z real world coordinates.?

Hey Elsa.

Visual Odometry is what you are looking for. There is tons of literature on this, and there should be some library out there that you can begin tinkering with. Here (http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=1315094) is a paper that might get you started.

Hope it helps

hai….

Is there any way to find out how much did my camera move in the real world. without using visual slam.. or is there a way by using caliberation?

Hi again,

I want to know how the rotational and translational matrices are calculated. A chessboard has many corners. The rotational matrix and translational matrix describe rotation and translation. Which point’s rotation and translation is considered here? Is it the center point of the image?