Cali Cam Methods

Home

Methods

Use Tips

Download

References
&
Credit

Methods

A set of images is processed in two distinct stages--point detection and calibration. Point detection is inherently a heuristic-driven procedural task in which solutions can be easily verified but difficult to generate. To make Cali Cam more useful, it is desirable to place as few restrictions on the environment as possible and to require no user interaction for each individual picture. The environment should only require that the checkerboard calibration target be unobstructed and well lit. The background is allowed to be cluttered as long as it doesn't contain other checkered objects. The program should be able to find the target without the help of the user. Since the calibration procedure requires several views of the same target, the calibration points must be able to be identified with different orientations of the target. Calibration, on the other hand, is mathematically rigorous but care must be taken to avoid numerical instabilities.

Point Detection

To perform the calibration, the calibration points--internal corners on the checkerboard--must be found and arranged into a grid structure resembling the calibration target. The user should have to merely provide the number of rows and columns of the target. This process proceeds in several steps: possible point identification and refinement, reduction to a probable set, and the arrangement of the points into a grid structure with missing point adaptions.

Point Identification


scorePoint(Point P){
  Let N be a 13x13 window of an image 
  with a center n₆₆  ∈ N 
  such that n₆₆=P

  Let M1,M2 be the matrices
  with entries specified below. 
  Let m1_ij ∈ M1, m2_ij ∈ M2

  score= MAX(
   (∑_i∑_j if(m1_ij!=0){ | (n_ij-128)-127*m1_ij | }),
   (∑_i∑_j if(m2_ij!=0){ | (n_ij-128)-127*m2_ij | })
  )
  return score
}

As stated previously, the calibration target is in the foreground of a given image with a cluttered background. The program must identify the calibration points independent of the user. The image (if not already) is converted to black and white and thereby ensuring the calibration target is a black and white pattern. The calibration points can be thought of as the meeting point of four surfaces--two white and two black--or as an intersection of two lines. Because of noise and blur, the intersection itself is often ill defined and a determination must be made with regard to the neighborhood of the intersection instead. Initial versions of Cali Cam used the intersection of two lines as its determination. However, special cases such as parallel lines and phantom lines shadowing actual edges made performance uncertain. Currently Cali Cam seeks to find the nexus of four surfaces of alternating colors. To score a point, the neighborhood is evaluated against the following two matrices. A matrix is centered on the point and each pixel of the 13x13 neighborhood is multiplied the corresponding weight. The score is then the absolute value of the sum of the difference of product and 128 (half the maximum value). The score of the point is the highest score produced by a matrix. The calibration points will then be a collection of the highest scored local maxima. It should be noted that the size of the neighborhood (13x13) and the form and number of the matrices are not necessarily optimal but rather perform well enough. The multiple matrices and null weights enable detection of calibration points when the targets is skewed or if the image is distorted.

1	1	1	1	1	-1	-1	-1	-1	-1
1	1	1	1	1	-1	-1	-1	-1	-1
1	1	1	1	1	-1	-1	-1	-1	-1
1	1	1	1	1	-1	-1	-1	-1	-1
1	1	1	1	1	-1	-1	-1	-1	-1
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
0	0	0	0	0	0	0	0	0	0
-1	-1	-1	-1	-1	1	1	1	1	1
-1	-1	-1	-1	-1	1	1	1	1	1
-1	-1	-1	-1	-1	1	1	1	1	1
-1	-1	-1	-1	-1	1	1	1	1	1
-1	-1	-1	-1	-1	1	1	1	1	1

0	0	-1	-1	-1	-1	-1	-1	-1	-1	-1	0	0
0	0	0	-1	-1	-1	-1	-1	-1	-1	0	0	0
1	0	0	0	-1	-1	-1	-1	-1	0	0	0	1
1	1	0	0	0	-1	-1	-1	0	0	0	1	1
1	1	1	0	0	0	-1	0	0	0	1	1	1
1	1	1	1	0	0	0	0	0	1	1	1	1
1	1	1	1	1	0	0	0	1	1	1	1	1
1	1	1	1	0	0	0	0	0	1	1	1	1
1	1	1	0	0	0	-1	0	0	0	1	1	1
1	1	0	0	0	-1	-1	-1	0	0	0	1	1
1	0	0	0	-1	-1	-1	-1	-1	0	0	0	1
0	0	0	-1	-1	-1	-1	-1	-1	-1	0	0	0
0	0	-1	-1	-1	-1	-1	-1	-1	-1	-1	0	0

Refinement

The above method produces a set of points that highly correlates to the calibration points. However, the position of the points need to be refined to sub-pixel accuracy and spurious points need to be eliminated. In order to accomplish this, the neighborhood of the point can be thought of as a surface with color denoting height. If the neighborhood contains a calibration point, the surface will be the form of a saddle point. If a quadratic surface is fit to the neighborhood, the critical point of the surface will be the location of the calibration point. In order to achieve better accuracy, the neighborhood is recentered on the updated point, the quadratic surface is refitted, and the critical point is reevaluated. If the position of the point converges, the point is kept as a possible calibration point; if it diverges the point is removed from consideration. Once all the possible points are refined, the set should be checked to remove any double entries.

Reduction to a Probable Subset

Let S be a set of Points
Let n be the desired number of points

While( |S|>n ){
  minAvg=(∑_{(p ∈ S)}dist(p,p'))/|S|
     where p' ∈ S | dist(p,p')<=min_{(t ∈ S|t!=p)}dist(p,t)

  for each point p ∈ S define the score as
    s1=∑_{(t ∈ S)}dist(p,t)
    p' ∈ S| dist(p,p')<=min_{(t ∈ S|t!=p)}dist(p,t)
    p'' ∈ S| dist(p,p'')<=min_{(t ∈ S|t!=p, t!=p')}dist(p,t)
    s2= |dist(p,p')-minAvg|/minAvg +1
    s3= |dist(p,p'')-minAvg|/minAvg +1
    score=s1*s2*s3
  eliminate the point with highest score from S
}

The set of remaining points needs to be further culled of outliers which may still be present. The set should be reduced to the number of calibration points present on the target. The possibilities in choosing K points from a set of N can be a magnitude of the factorial of N, and therefore an optimal solution is unlikely to be tractable. Instead, as in a method akin to hill climbing, the least fit points will be eliminated one at a time until the desired number of points is reached. To evaluate the points for fitness, the characteristics of the calibration target must be considered. The set of points that corresponds to calibration points will be fairly compact and for each point in the set, the neighboring points of the set will be nearly uniform. For each pair of points the distance is calculated and for each point the distance to the two nearest points is indexed. To measure compactness, the sum of the distance from one point to all others is calculated. To measure uniformity, a more complex measure is used. First the absolute value of the difference between the distance of a neighbor and the average distance to nearest neighbors is computed. The measure is the sum of that value divided by the average distance to nearest neighbors and one. The measure is computed for the two nearest neighbors of a point. The point to be eliminated is the one with the highest product of the three measures. After a point is eliminated, the distances and averages are updated for the set.

Grid Arrangement

stepList=NULL for each pair of points a,b ∈ S a, b are neighbors if !∃ c ∈ S such that dist(a,b) < dist(a,c) && dist(a,b) < dist(b,c) if a,b are neighbors rise=a.y-b.y run=a.x-b.x if rise < 0 rise= -rise run= -run stepList.add( (rise,run) ) stepList.sort_rise_then_run() horiz=stepList[ |stepList|/4 ] vert=stepList[ |stepList|* 3/4 ] find a Point P ∈ S such that for a given error threshold ε |δ_x| < ε P+horiz+δ₁ ∈ S P-horiz+δ₂ ∈ S P+vert+δ₃ ∈ S P-vert+δ₄ ∈ S P+horiz+vert+δ₅ ∈ S P+horiz-vert+δ₆ ∈ S P-horiz+vert+δ₇ ∈ S P-horiz-vert+δ₈ ∈ S else fail Start a grid G such that P ∈ G 8 neighboring points of P ∈ G A calibration target has a number of rows and columns Let the lower bound lb=min( #rows, #columns) Let the upper bound ub=max( #rows, #columns) while G.rowSize < lb && G.colSize < lb G.addRow G.addCol if G can add more rows while G.colSize < ub G.addRow else G can add more columns while G.rowSize < ub G.addCol

In order to perform calibration, the set of calibration points must be arranged so that each point is one-to-one and onto with an abstract calibration target model. In order to fit the points to a grid, characteristics of the arrangement must be recovered. The grid will be roughly rectangular (distortion and skew affect the projection) and as such there should be a general distance between rows and columns. For lack of better terms, these distances will be referred to as the vertical and horizontal steps respectively. In order to recover these general directions, the set of points is analyzed for distances between neighboring points. In this context, two points are considered neighboring if the distance between the points is less than a distance between one of the neighboring points and all other points in the set. These distances--recorded as a change in rise and run (y and x) manipulated so that the run is always positive--are sorted in an array so that the run is ascending. The general horizontal and vertical distances will then be at the first and third quartiles of the array (the entries at the indices 1/4 and 3/4 of total magnitude of the array). Once the general directions have been recovered, the set of points is searched to fill a 3x3 block in which entries are separated the horizontal and vertical steps. After the initial 3x3 block is found, rows and columns are added one at a time until the desired dimensions are reached. Proposed points for the side to be added are determined by the nearest point, the difference of the nearest point and its parent, and the rate of change of the difference. Proposed points are then checked for matches within the set of points or are refined via the quadratic surface fitting defined in the refinement section.

Calibration

Calibration is performed as described by Zhengyou Zhang. Since the methodology is already written in his paper, I will only provide a brief overview as well as note some places of numerical importance.

Before performing calibration, the model and image points need to be normalized for numerical stability. Model points should be offset and scaled such that the centroid of the model is at the origin and that the average magnitude of the coordinates is 1. Assuming a nearly square picture, the image points need to be offset so that the center of the image is at the origin and assuming a nearly square picture the points scaled by the half the smallest side.

Creating the homography H for a set of points

Let M_i, m_i represent an image and a model point respectively.

Let H be a 3 by 3 matrix such that

s*M_i= H*m_i
for some scalar s

Let H=[ h₁ h₂ h₃ ]
Let x=[ h₁^T h₂^T h₃^T ]

then for each point m_i
[ M_i^T 0^T -uM_i^T ]
[ 0^T M_i^T -vM_i^T ] x = 0

for the n points in a set, stack the n equations to form
Lx=0
where L is a 2n by 9 matrix
x, defined up to a scale factor, is the right singular vector of L

construct H from x, optimize H with the LM algorithm
since H is only defined up to a scalar, it is numerically advantageous scale the entries of H so that the average magnitude is 1.

Extracting Camera Parameters

Let B = (AA^T)^-1, B_ij is the i,j element of B
Let b = [ B₁₁, B₁₂, B₂₂, B₁₃, B₂₃, B₃₃ ]^T
Let the ith column vector of H, h_i = [h_i1, h_i2, h_i3 ]^T

Let v_ij=[ h_i1h_j1, h_i1h_j2+h_i2h_j1, h_i2h_j2, h_i3h_j1+h_i1h_j3, h_i3h_j2+h_i2h_j3, h_i3h_j3 ]^T

for each homography

[	v₁₂^T	]
[	v₁₁^T-v₂₂^T	]

b=0

with n homographies (denoting n pictures)
Let Vb=0
where V is a 2n by 6 matrix created by stacking the 2 equations for each homography

Create B from b, calculate B^-1, extract the calibration matrix A via the below equations.
Alternatively, Zhang uses another method to extracts the variables--see his paper for details.

B^-1=sAA^T=s*

[	alpha	skew	u0	][	alpha			]
[		beta	v0	][	skew	beta		]
[			1	][	u0	v0	1	]

=s*

[	(alpha²+skew²+u0²)	(skewbeta+u0v0)	(u0)	]
[	(skewbeta+u0v0)	(beta²+v0²)	(v0)	]
[	(u0)	(v0)	(1)	]

It should be noted, while (B^-1/s) should be theoretically be a positive definite matrix and thereby would enable the extraction of A via Cholesky decomposition, in practice, due to finite precision algebra and least squares approximations, this is not necessarily the case. Thus in extracting A, one may have to take liberties with some of the values to avoid complex numbers. Since these values will be optimized in subsequent steps, errors in calculating A can be mitigated.

Extracting Extrinsic Parameters
Given A, the rotation and and translation vectors for each image can be calculated as follows.
for each homography H
lambda=1/||A^-1h₁||=1/||A^-1h₂||
r₁=lambda*A^-1h₁
r₂=lambda*A^-1h₂
r₃=r₁Xr₂
t=lambda*A^-1h₃
However the rotation vectors here may not fullfill the constraints of a rotation matrix.
Let Q=[r₁ r₂ r₃]
Let the singular value decomposition of Q be USV^T
Then the nearest rotation matrix R=UV^T

Restoring Image and Model Points

At this point, the model and image points need to be restored to their unnormalized values. Additionally, the camera parameters and translation vectors need to be updated to reflect the change in coordinates.

To compensate for the scaled and offset image points, the camera parameters need to be modified.
Let s be the original scalar by which the points were scaled
Let x_offset, y_offset be the x and y coordinate offsets in which the points were originally translated to put the image center at the origin.
Let A' be the corrected camera matrix

   [ alpha/s  skew/s  u0/s-x_offset ]
A`=[ 0        beta/s  v0/s-y_offset ]
   [ 0        0       1             ]

To compensate for the scaled model points, the translation vector for each image needs to be scaled.
Let s' be the scalar by which the original points were scaled.
Let t be the previous translation vector, t' be the corrected vector.
t'= (1/s)t

Optimization and Distortion Calculation

Once the points have been restored from their normalized values and the camera parameters and translation vectors have been adjusted, all the variables need to be optimized via the LM algorithm. If n images are used, there are 5+6n variables (the 5 camera parameters, and for each image three variables representing the translation vector and three variables representing the rotation matrix encoded according to the Rodriguez formula/angle and axis representation).

Once the values are optimized, the variables need to be optimized again with barrel distortion taken into account. If n images are used, there are then 7+6n variables.