REALTIME LOCALIZATION OF A CENTRAL CATADIOPTRIC

CAMERA USING VERTICAL LINES

Bertrand Vandeportaele, Michel Cattoen, Philippe Marthon

IRIT & LEN7

Enseeiht, 2 rue Camichel, 31071 Toulouse, France

Keywords:

Omnidirectional vision, localization, orientation, lines.

Abstract:

Catadioptric sensors are used in mobile robot localization because of their panoramic ﬁeld of view. However

most of the existing systems require a constant orientation of the camera and a planar motion, and thus the

localization cannot be achieved in general for persons handling a camera. In this paper, we use the images of

the vertical lines of indoor environment to localize in realtime the central catadioptric camera orientation and

the 2D position. The pose detection is done in two steps. First, a two axes absolute rotation is computed to

bring the vertical line images in vertical position on the viewing sphere. Then the 2D pose is estimated using

a 2D map of the site.

1 INTRODUCTION

Our goal is to localize in realtime a central catadiop-

tric camera held by a person inside a known building.

Benosman and Kang gave in (Benosman 2001) a de-

tailed description of these cameras. They are able to

acquire instantaneously some panoramic images and

have a single viewpoint. They are frequently used in

robotic applications where the provided 360 ˚ ﬁeld of

view is useful for image based localization and 3D re-

construction. Image based localization methods can

be divided in two categories:

A: methods requiring a priori knowledge about the

place where to localize the robot (for example a data-

base of appearance images corresponding to different

locations) (Padjla 2001). These methods require the

acquisition of many images and constrain the orien-

tation of the sensor. Moreover, the robust realtime

matching of occluded images is a relatively complex

problem, even if the panoramic ﬁeld of view make it

easier than in the perspective case.

B: methods using Simultaneous Localization And

Mapping (Shakernia 2003). They do not require an

acquisition of the database before the localization.

Invariant points from the scene are often detected

in many images acquired at different locations using

Scale Invariant Feature Transform, KLT or Harris de-

tector, and then Structure From Motion is performed

in order to detect the position of the camera in the re-

constructed scene.

We propose a new method to achieve the localiza-

tion of a central catadioptric camera using the vertical

lines images. We use this method to localize persons.

In this application, the accuracy is not very important

(0.3m is sufﬁcient) but the sensor can be held in dif-

ferent orientations and many occlusions can occur.

First, the vertical lines provide information about

the absolute orientation of the sensor around two axes

thanks to the common vanishing points. Second, they

can be used for localization using a simple 2D map

because they are invariant to translations about the

vertical axis. Vertical lines are numerous inside build-

ings and difﬁcult to occlude completely. Thus, verti-

cal lines based method particulary ﬁts our application.

Our method allows a realtime matching with vary-

ing orientation around every axes. We can use a pri-

ori knowledge consisting in a 2D map which is easy

to acquire. Thus, the camera to localize do not have

to acquire images continuously from a known starting

position. Moreover, vertical lines images are well de-

tected in varying poses and this compensates the fact

that they can be less numerous than corners in the im-

age.

In this paper, we ﬁrst give a method to retrieve the

two axes orientation using the vertical lines images

acquired with a calibrated camera. Then we show

how to determine the 2D pose using a simple 2D map

of the scene and present some experimental results.

416

Vandeportaele B., Cattoen M. and Marthon P. (2006).

REALTIME LOCALIZATION OF A CENTRAL CATADIOPTRIC CAMERA USING VERTICAL LINES.

In Proceedings of the First International Conference on Computer Vision Theory and Applications, pages 416-421

DOI: 10.5220/0001370304160421

 SciTePress

2 THE VERTICAL LINES

IMAGES DETECTION

The catadioptric camera is calibrated using the Geyer

& Daniilidis method (Geyer 1999) in order to know

the correspondence between the viewing sphere and

the image. The detection of a central catadioptric line

image di can be accomplished by ﬁnding the orien-

tation of the plane π

containing dli (the line image

di lifted on the viewing sphere) and the viewpoint

F as shown on ﬁg. 1. Ying & Hu (Ying 2004) and

Vasseur & Mouaddib (Vasseur 2004) proposed robust

methods to detect the line images using a two parame-

ters Hough transform. For our experiments, we use

our own realtime line image detector which is based

on robust least squares ﬁtting of planes on the image

points projected on the mirror surface. It provides the

same kind of results than the Hough based methods

but with increased speed.

mirror

perfect image plane

viewing

sphere

,π

Figure 1: Central catadioptric projection of parallel lines

to portions of great circles on the viewing sphere and the

paracatadioptric projection on the image.

The parallel lines can be detected from the image

by ﬁnding the two vanishing points located at the in-

tersection of parallel lines images which are conics

(Geyer 2001). In this case, the detection of vanishing

points is difﬁcult because the conics parameters are

not accurately estimated from small conic sections on

noisy data and thus parallel lines images do not in-

tersect precisely in the same points (see ﬁg 6.g for an

example).

We propose a method to gather the potentially par-

allel lines in space based on the detection of the line

which is common to a set of planes π

deﬁned

by parallel lines D

in space. D

deﬁnes a bundle

of planes. In practice, the detected planes π

do not

intersect exactly in D

. Nevertheless, our method al-

lows the gathering from a criterion based on an angu-

lar measurement.

The intersection of two planes π

and π

respec-

tively with normals n

and n

is a line in the direction

h (π

,π

) passing through F . We have chosen to use

g (π

,π

), a normalized vector in the top hemisphere

deﬁned by: g (π

,π

∧n

||n

∧n

.sign



∧ n

)



We give a criterion which measures how close

to a single line three planes π

, π

and π

inter-

sect. It uses the angular measurement ω(π

,π

)

between g (π

,π

) and g (π

,π

): ω(π

,π

acos (g (π

,π

) .g (π

,π

)).

To deal with planes which have a normal near the

axis z, a ”close” intersection between three planes

,π

is detected in the following two cases:

ω(π

,π

) <ω

thres

or ω(π

,π

) >π−ω

thres

The parameter ω

thres

is an angular threshold and has

to be set in accordance with the accuracy of the sen-

sor. On the ﬁgure 2, a small angle between g (π

,π

g (π

,π

) and g (π

,π

) indicates that π

, π

and

almost deﬁne a bundle of planes so we can con-

clude that D

, D

and D

are potentially parallel

in space.

g(π

,π

)

g(π

,π

)

g(π

,π

)

g(π

,π

)

g(π

,π

)

g(π

,π

)

Figure 2: Angles between planes intersections.

A score is given to each gathered solution by sum-

ming the scores sc

corresponding to planes π

inside

the bundle. The score sc

is computed for each line

image by counting the number of corresponding pix-

els. So the gathered solution having the best score

ﬁts the greatest number of line points and thus corre-

sponds to the main orientation inside the scene.

This method is robust to erroneous detection of ver-

tical lines because the normal of such lines do not ﬁt

inside the threshold and thus are not taken in account.

Once the planes have been gathered, a more precise

intersection is computed using iterative Levenberg

Marquardt (LM) algorithm. Let α and β be respec-

tively the azimuth and elevation of the line P

(α,β)

cor-

responding to the best intersection. Its corresponding

3D vector coordinates are P

3D(α,β)

=[P

]

and P

=(cos(α).cos(β)), P

=(sin(α).cos(β))

and P

= sin(β). The intersection of two random

planes from the gathered set is used as an initializa-

tion value. The different planes are weighted by their

scores sc

and P

3D(α,β)

is computed to be the most

possibly perpendicular line to all the weighted nor-

malized normals of planes n

by minimizing the fol-

lowing criteria:

C(α, β)=





acos



3D(α,β)



−



(1)

REALTIME LOCALIZATION OF A CENTRAL CATADIOPTRIC CAMERA USING VERTICAL LINES

417

(ˆα,

β) = arg min

(α,β)

C(α, β) (2)

As the vertical lines are often the most numerous

inside buildings, this method is used to detect their

relative orientation and thus to detect the sensor’s ori-

entation around two axes. To increase the robustness

of the vertical line detection, we can restrict the poten-

tial orientations by asking the user to hold the camera

in an interval of angular values (±45 ˚ ). It is thus

easy to discard the candidate orientations that cannot

be vertical. The two axes orientation of the sensor is

obtained by computing the 3D rotation R(ˆα,

β) that

brings P

3D(α,β)

to a vertical line.

3 THE 2D POSE DETECTION

Once the rotation R(ˆα,

β) has been detected, the lo-

calization can be achieved in 2D. The altitude is not

estimated as it can vary between different users and

it is not an useful data in our application where only

an approximate localization of the camera inside the

2D map is needed. For buildings with multiple ﬂoors,

a single 2D map can be created by lifting the dif-

ferent ﬂoors maps to different locations on the same

plane. A 2D map contains the positions of vertical

lines (x

) and the occlusive segments joining these

points.

In the noiseless and non degenerate cases (no more

than two map points and camera position lie on the

same line), 3 points are sufﬁcient to localize the cam-

era. However, due to the limited accuracy of the

vertical line detector, more points should be used to

achieve an accurate localization.

The ﬁgure 3 shows an example of a simple 2D map

made of 8 points and 8 occlusive segments.

−2 −1 0 1 2 3 4 5 6

−1

Figure 3: The 2D map (in black), inlier lines images (long

red lines with circles) and outliers (short red lines with

cross) after localization (the sensor is located at the inter-

section of all the red lines). The scale is in meter unit.

Each plane π

corresponding to a detected vertical

line is deﬁned by an value γ

. γ

is the angle around

the vertical axis corresponding to the intersection of

the plane π

with the horizontal plane. The plane π

is not sufﬁcient to know on which side of the view-

ing sphere lie a vertical line Di. Let p

be the im-

age points corresponding to the line image d

. These

points are lifted to the viewing sphere in P

and then

rotated to RP

= R(ˆα,

β).P

in order to align the

viewing sphere with the vertical. As the p

correspond

to a vertical line of the scene, the RP

lie on the same

meridian on the viewing sphere. We use the center of

mass of RP

to know on which side of the sphere the

line is located and thus can compute the correspond-

ing angular value γ

in the range [0, 2.π]. If the RP

lie on the two sides of the sphere, then we detect a line

on each side of the camera. A descriptor of a location

is deﬁned by a vector of different γ

Let us ﬁrst consider that the correspondences be-

tween every γ

and 2D scene point i (corresponding

to a vertical line located in (x

)) are known.

Let the pose be deﬁned by three parameters. x

and

are the 2D position of the camera in the map and

is the rotation.

Let γ

be the angular direction corresponding to the

point i in the map viewed from the pose (x

,γ

= atan2(y

− y

− x

) − γ

(3)

The correct pose best ﬁts the different angular val-

ues γ

for every γ

. The angular error has to be ex-

pressed in the interval [−π, π[ in order to be near

zero for small deviations in the two directions. Fi-

nally E(i, x

,γ

) corresponds for each point i to

the squared angular difference:

E(i, x

,γ



|γ

− γ

[−π,π[



(4)

The pose is then estimated using Levenberg Mar-

quartd to minimize the following criteria:

(ˆx

, ˆy

, ˆγ

) = arg min

,γ

)



E(i, x

,γ

)

(5)

where V

is equal to 1 if the point i is not occluded

by segments of the map from (x

) and 0 else. sc

is the score corresponding to the plane π

as described

in the previous section.

In practice, all the V

are equal to 1 if the correspon-

dences are all correct because only the non occluded

lines are detected on the image. If we do not deal with

the occlusions and noise and use a sufﬁcient number

of points (> 4), the convergence is generally obtained

for any initialization value.

Let us now consider the more complex problem

of detecting inliers and outliers γ

and ﬁnding the

correspondences with points from the map. A sim-

ple RANSAC scheme would result in long process-

VISAPP 2006 - MOTION, TRACKING AND STEREO VISION

418

ing time even if only a few of the possible correspon-

dences have to be tested, as at least triplets (3 corre-

spondences) of couples (1 map point and one γ

) are

needed to compute a pose P

In order to reduce the computational complexity,

we propose to achieve a Delaunay triangulation of the

2D map (as shown on the ﬁg 4). We then use the three

corners of each triangle as triplets of points to match,

as they are never occluded (by map segments) from

inside the triangle. The correspondences between the

triangle corners points i and γ

have to respect the or-

der constraint. If there are N map points, M values

and P Delaunay triangles, the number of all possi-

ble solutions to test is reduced from (N.M)

to less

than P.M

as there are less than M

combinations of

correspondences verifying the order constraint.

−2 −1 0 1 2 3 4 5

−

Figure 4: The Delaunay triangulation of the 2D map and the

corresponding initialization value (red crosses).

We now show how to discard quickly an invalid

triplet of correspondences. Two points i and j and

a relative angle γ

− γ

deﬁne a circle of potential lo-

cations for (x

) as shown on ﬁg 5. The third point

k should lie on a line related to (x

) − (x

) by

−γ

and to (x

)−(x

) by γ

−γ

. So, if the

two extremum positions are considered for the camera

position (x

), (x

) should lie inside the zone

between the two lines L

and L

deﬁned by the tan-

gents of the circle at point i (resp. j) and the angles

− γ

(resp. γ

− γ

). Thus, if (x

) is inside the

circle or outside the zone delimited by L

and L

, the

triplet is discarded.

; y

)

; y

)

; y

)

− γ

center of

circle for

; y

)

Figure 5: Fast detection of valid triplets (i,j,k).

The non linear optimization of the Equ. 5 is ap-

plied on the retained triplets of points in order to es-

timate the pose. The center of the Delaunay triangle

is used as initial value (ﬁg. 4). The process is applied

to several different triplets and the pose P

which ﬁts

the greatest numbers of γ

within an angular thresh-

old thres

and without occlusion due to occluding

segments is kept.

Once the best pose P

has been computed from

three points, the optimization is achieved on all the

detected inliers using the pose P

as initialization

value.

The ﬁgure 3 shows the localization of a pose in the

2D map. The inliers are shown in red long line ﬁn-

ishing with a circle and the outliers are shown in red

short line ﬁnishing with a cross.

Once the camera has been localized, it is easy to

track it inside the map if three lines images are visi-

ble on the two successive images. Each vertical line

projection γ

is tracked individually to keep the corre-

spondence information and the new pose is estimated

using the last pose as initialization value. Some corre-

spondences are removed and others are added during

the process.

4 RESULTS

In our experiments, we use 568*426 pixels images,

a paracatadioptric sensor [4] and a Pentium IV based

computer running at 1.8 Ghz. The ﬁgure 1 shows par-

allel lines imaged by this sensor. In a perfect image

plane, with zero skew and square pixels, these pro-

jections are circles having two common points (the

projections of the vanishing points p

and p

In order to validate the two axes orientation detec-

tion, the camera is mounted on a rotating table moved

by a high accuracy step motor (0.03 ˚ /step). We apply

the method to images of a scene containing two main

sets of parallel lines disposed on the 3D pattern shown

on the ﬁgure 6.a, parameter ω

thres

being set to 4 ˚.

In this example, 45 lines images have been detected.

The best set of potentially parallel lines contains 23

lines and is shown on the ﬁgure 6.b. The ﬁgure 6.e

shows the normals of planes π

and the correspond-

ing computed orientation (α, β) in 3D. The ﬁgures

6.f and 6.g show the two estimated vanishing points

on the paracatadioptric image plane. The orientation

of the sensor is closely related to the relative position

of the vanishing points around the projection of the

viewpoint.

We apply the method for different rotations and

plot the orientation error as a function of the real ori-

entation on the ﬁgure 6.c. 10 tests are done on dif-

ferent images for 9 different values of rotation from -

60 ˚ to 60 ˚ using a 15 ˚ step. The plot shows the maxi-

mum, minimum and mean error. The ﬁgure 6.d shows

the result for the other axis obtained by rotating the

REALTIME LOCALIZATION OF A CENTRAL CATADIOPTRIC CAMERA USING VERTICAL LINES

419

(a) (b)

−60 −40 −20 0 20 40 60

−2

−1.5

−1

−0.5

0.5

1.5

orientation in degree

angular error ( ˚ )

(c)

−60 −40 −20 0 20 40 6

−1.5

−1

−0.5

0.5

1.5

orientation in degree

angular error ( ˚ )

(d)

−1

−0.5

0.5

−1

−0.5

0.5

−1

−

0.8

−

0.6

−

0.4

−

0.2

0.4

0.6

0.8

(e)

240 250 260 270 280 290 300 310

(f)

1020 1040 1060 1080 1100 1120 1140 1160 1180 1200 1220

(g)

Figure 6: (a) An image of the 3D pattern. (b) Detected lines

images. (c) Maximum, mean and minimum error in degree

versus ground trust orientation. (d) Same plotting than (c)

around the other axis of rotation. (e) The normalized nor-

mals to the different planes of the bundle and the retrieved

orientation (orthogonal line ). (f) One computed vanishing

point (Big cross) and projection of the sphere center on the

image plane (Small cross). (g) The second vanishing point

(Big cross.)

3D pattern by π/2. These curves show that the esti-

mation is not biased and that the absolute maximum

deviation is less than 2˚.

Once the orientation around the two axes has been

determined, for validation purpose, a rectiﬁcation of

the image is achieved in order to obtain an image from

the same viewpoint but with a ﬁxed sensor’s orienta-

tion. The ﬁgure 7.a shows an image acquired by our

camera projected to a cylinder whose axis of rotation

is the revolution axis of the paraboloidal mirror. The

ﬁgure 7.b shows the same image projected to a ver-

tical cylinder, whose relative orientation is estimated

using the vertical lines projections. The images of the

vertical lines of the scene are approximately vertical

in the cylindrical projection, as the orientation of the

sensor has been well estimated. A completely rota-

tion invariant image can be generated by shifting the

cylindrical image by the detected orientation ˆγ

(a)

(b)

Figure 7: The cylindrical projection of the original (a) and

rectiﬁed (b) image and some of the detected vertical lines.

We now validate the 2D pose detection using the

synthetic map of the ﬁg 3. 100 random poses are gen-

erated inside this map and 6 random inliers are kept

while 5 random outliers are added. The two ﬁrst rows

of table 1 shows the mean and max absolute deviation

for the pose parameters without noise. On the ﬁrst

part of the table, the γ

are noised. For a random an-

gular maximum deviation of 2 ˚ (which correspond to

the accuracy of our vertical lines detector), the max

pose error is less than 10cm and 2.3 ˚ . The second

part of the table shows the inﬂuence of errors in the

2D map (positions of the vertical lines) using a 2 ˚ de-

viation for γ

. Errors of about 20cm do not degrade

very much the accuracy of the pose estimation. The

last column shows how many poses have been com-

pletely erroneously estimated because of a bad choice

of inliers. Theses false matches are not taken into ac-

count in the calculus of the deviations.

The method is now used to localize a real camera

VISAPP 2006 - MOTION, TRACKING AND STEREO VISION

420

Table 1: Varying deviations of γ

or (x

,yi) over 100 tests

and their inﬂuence on the absolute deviation of pose estima-

tion parameters. (sub row 1=mean, sub row 2=max). The

right column shows the number of images that have been

erroneously localized.

∆γ

∆x

(m) ∆y

(m) ∆γ

(˚) bad

0˚ 0.0007 0.0060 0.0917 0

0.0677 0.0597 1.7762

2˚ 0.0110 0.0127 1.0485 1

0.0523 0.0998 2.3090

5˚ 0.0294 0.0305 2.4981 3

0.1743 0.1163 4.2628

10 ˚ 0.0501 0.0647 4.8415 4

0.1820 0.2830 7.3625

∆(x

,yi) ∆x

(m) ∆y

(m) ∆γ

(˚) bad

0.2m 0.0467 0.0516 1.1115 2

0.1895 0.1645 3.6326

0.5m 0.1321 0.1455 1.6100 4

0.7256 0.4124 7.7750

inside three different rooms (approx. surface: 60m2)

containing 63 main vertical lines (wall corners, win-

dows and doors borders, racks and desks) whose posi-

tions are measured by hand. The ﬁg 7.b shows a part

of the indoor environment used in this experiment.

We processed 40 images randomly selected from a

sequence of 150 frames acquired at 15fps. 32 im-

ages were correctly localized, the position of the cam-

era being estimated inside a 20cm tolerance (We use

the tiled ﬂoor to localize approximately the camera as

ground thrust). γ

is compared with the angle given

by an electronic compass whose accuracy is about 2 ˚ .

The maximum detected orientation error for γ

in the

32 images is less than 4 ˚ . The 8 images which have

been erroneously localized can geometrically corre-

spond to different locations due to outliers. However

when we use the complete sequence to track the pose

and the correspondences between vertical lines and

, all the 150 images are correctly localized.

The detection of the lines images takes about 40 ms

(mean time for 150 images for approx 20000 contour

points and up to 100 lines to detect). The compu-

tation of (ˆα,

β) is generally achieved in less than 1

ms. The 2D pose estimation time greatly depends of

the complexity of the 2D map and the number of the

detected vertical lines. In our experiments, the local-

ization of the ﬁrst image of the sequence has needed

1.2 sec. The tracking of the following poses, how-

ever, has been achieved in a few ms. During the ﬁrst

two seconds, the computer processes the ﬁrst pose and

caches the incoming images. Then it processes the

cached images more quickly than the acquisition rate

and thus can localize in realtime at 15 fps after about

three seconds of initialization.

5 CONCLUSION

In this paper, we have proposed an original method to

detect the pose of a central catadioptric camera from

an image of a indoor environment containing verti-

cal lines. The two axes orientation detection which

is ﬁrst applied can be used in others applications to

detect arbitrary sets of parallel lines. The 2D pose

estimation, in spite of its apparent simplicity has ex-

hibited an high computational complexity due to the

presence of outliers and unknown matches. We have

proposed improvements allowing to achieve the pose

estimation in realtime using a smart selection of the

correspondences between the lines in the 2D map and

the detected vertical lines. Realtime is also obtained

thanks to a caching of the images and a tracking of

the correspondences inside the sequence. As future

work, we plan to integrate colorimetric information to

avoid false detections that are geometrically correct

and to accelerate the search of the correspondences

by discarding incompatible matches. Methods based

on 1D Panoramas (Briggs 2005) will also be investi-

gated. Then, experiments on entire buildings will be

achieved to validate the approach at a wide scale.

REFERENCES

X. Ying, Z. Hu (2004). Catadioptric Line Features De-

tection using Hough Transform. In Proceedings of

the 17th International Conference on Pattern Recog-

nition, volume 4, pp 839-842, 2004.

P. Vasseur and E. M. Mouaddib (2004). Central Catadiop-

tric Line Detection. In BMVC, Kingston, Sept 2004.

T.Pajdla, V.Hlavac (2001). Image-based self-localization by

means of zero phase representation in panoramic im-

ages. In Proceedings of the 2nd International Confer-

ence on Advanced Pattern Recognition, March 2001.

C. Geyer and K. Daniilidis (2001). Catadioptric Projective

Geometry. In International Journal of Computer Vi-

sion , 45(3), pp. 223-243, 2001.

C. Geyer and K. Daniilidis (1999). Catadioptric Camera

Calibration. In Proceedings of the 7th International

Conference on Computer Vision, volume 1, p. 398,

1999.

R. Benosman and S. B. Kang (2001). Panoramic Vision.

Springer, 2001.

A. Briggs, Y. Li and D. Scharstein (2005). Feature Match-

ing Across 1D Panoramas. In Proceedings of the OM-

NIVIS, 2005.

O. Shakernia, R. Vidal, S. Sastry (2003). Structure from

Small Baseline Motion with Central Panoramic Cam-

eras. In the Fourth Workshop on Omnidirectional Vi-

sion, 2003

REALTIME LOCALIZATION OF A CENTRAL CATADIOPTRIC CAMERA USING VERTICAL LINES

421