ACTIVE STEREO VISION-BASED MOBILE ROBOT NAVIGATION

FOR PERSON TRACKING

V. Enescu, G. De Cubber, K. Cauwerts, S. A. Berrabah, H. Sahli

Vrije Universiteit Brussel

Pleinlaan 2, B-1050 Brussels, Belgium

M. Nuttin

Katholieke Universiteit Leuven

Celestijnenlaan 300B, B-3001 Heverlee-Leuven, Belgium

Keywords:

active vision, person following, color tracking, particle ﬁlter, proposal distribution, mean shift, measurement

model update, robot navigation, fuzzy logic behavior-based navigation, hybrid behavior-based navigation.

Abstract:

In this paper, we propose a mobile robot architecture for person tracking, consisting of an active stereo vision

module (ASVM) and a navigation module (NM). The ﬁrst tracks the person in stereo images and controls the

pan/tilt unit to keep the target in the visual ﬁeld. Its output, i.e. the 3D position of the person, is fed to the

NM, which drives the robot towards the target while avoiding obstacles. As a peculiarity of the system, there

is no feedback from the NM or the robot motion controller (RMC) to the ASVM. While this imparts ﬂexibility

in combining the ASVM with a wide range of robot platforms, it puts considerable strain on the ASVM.

Indeed, besides the changes in the target dynamics, it has to cope with the robot motion during obstacle

avoidance. These disturbances are accommodated by generating target location hypotheses in an efﬁcient

manner. Robustness against outliers and occlusions is achieved by employing a multi-hypothesis tracking

method - the particle ﬁlter - based on a color model of the target. Moreover, to deal with illumination changes,

the system adaptively updates the color model of the target. The main contributions of this paper lie in (1)

devising a stereo, color-based target tracking method using the stereo geometry constraint and (2) integrating

it with a robotic agent in a loosely coupled manner.

1 INTRODUCTION

In the past, robot navigation was commonly based

upon data coming from classical sensory equipment

like ultrasonic and infrared sensors or laser range

scanners. This approach is in sharp contradiction with

nearly all biological examples (e.g. humans) where

vision is the primary sensing modality. This biologi-

cal example inspired scientists (Beardsley et al., 1995;

Davison and Murray, 1998) to tackle the visual nav-

igation problem and, during the last decade, visual

navigation has gained signiﬁcant importance.

The main problem in setting up a global control

architecture for a mobile robot with an active vision

control loop is that the frequency of the robot con-

trol loop (and certainly that of an eventual manipula-

tor installed on the mobile agent) differs from the fre-

quency of the vision control loop. This also leads to a

second problem: the reusability of the developed con-

trol architecture on different robotic platforms. Due

to the difﬁculties with the timing between different

loops in the systems, most researchers (Davison and

Murray, 1998) tune their control processes such that

they work well for one speciﬁc robot. Unfortunately,

the high coupling between the visual and the robot

control loop yields a robot-dependent control archi-

tecture.

In this paper, we set up a global system architecture

for a visually guided robot, which is independent from

the speciﬁc robot hardware and kinematics. We do

this by separating the visual processing (ASVM), the

navigation control (NM) and the robot motion con-

troller (RMC). As an application for such a robotic

system, we chose the person following task, for which

several problems have to be solved: person tracking,

coping with the erratic motion of the target, stereo

head control, 3D position estimation, robot naviga-

tion and the robot motion control. In the following,

we address each of these problems.

For most tracking applications a Kalman ﬁlter is

used, as it is a reliable and efﬁcient tool. As a disad-

vantage, the Kalman ﬁlter can not handle multi-modal

distributions as present in our problem. Therefore, we

resort to particle ﬁltering, a Monte Carlo method able

to maintain multiple hypotheses about the target state

in the presence of non-linearity and non-Gaussian dis-

Enescu V., De Cubber G., Cauwerts K., A. Berrabah S., Sahli H. and Nuttin M. (2005).

ACTIVE STEREO VISION-BASED MOBILE ROBOT NAVIGATION FOR PERSON TRACKING.

In Proceedings of the Second International Conference on Informatics in Control, Automation and Robotics - Robotics and Automation, pages 32-39

DOI: 10.5220/0001187300320039

 SciTePress

turbances.

In general, the motion of the target in a target track-

ing scheme is modeled using a predeﬁned model such

as a constant speed or a constant acceleration model

(Strens and Gregory, 2003). This leads to problems

when humans need to be tracked, as they could have

both models as well as unpredictable motions. There-

fore, to cater for the erratic target motion, we propose

an effective mechanism for generating target location

hypotheses in the particle ﬁlter. Based on the current

estimate of the system state, a PID controller deter-

mines the control signal to be applied to the pan/tilt

stereo head to keep it aligned with the target.

For the 3D position estimation, some authors (Ping

et al., 2001) use scaling as means to retrieve depth

information, while others (Ghita and P.-F., 2003) use

the depth from defocus measure. The most popular

approach is however to make use of a stereo setup to

estimate the distance to a target (Arsenio and Banks,

1999; Schlegel et al., 2000; Kuniyoshi and Rougeaux,

1999; Wilhelm et al., 2004). This is also the method

we use here.

In the context of robot navigation, many algorithms

have been proposed to solve the path planning prob-

lem, ranging from simple potential ﬁeld methods (Ko-

ren and Borenstein, 1991) to biologically inspired

neural networks (Franz and H.-A., 2000). We opted

for a behavior based control architecture for the navi-

gation module.

The RMC requires careful consideration of the ro-

bot kinematics and dynamics. As we wanted to build

a system which is easily portable from one robot sys-

tem to another, we decoupled the platform-dependent

RMC from the ASVM and NM. The RMC is there-

fore not considered part of the system architecture and

is not pursued further.

The remainder of this paper is organized as fol-

lows. First, an overview of the system architecture

is given in Section 2. Then, in Section 3, the active

vision module is extensively explained. Here the top-

ics of color histogram matching, stereo geometry, the

particle-ﬁlter based target tracking and camera control

are discussed. The navigation module is introduced in

Section 4, after which, in Section 5, we present some

results. Finally, we conclude the paper in Section 6.

2 OVERVIEW

To achieve the task of person following, two main

problems need to be solved:

• The vision system has to track the target person

• The robot has to navigate to the target person with-

out bumping into obstacles

In fact, these two problems can both be considered as

a coordinate system alignment problem. This can be

Figure 1: Deﬁnition of the coordinate systems: (OXY Z) is

the stereo head coordinate system, (O

l,r

) is the

left/right camera coordinate system, and (O

) is

the robot camera coordinate system.

explained using Figure 1. The objective of the visual

tracking system is to align the coordinate system of

the stereo camera such that the Z-axis points straight

at the target, thus minimizing the relative pan and tilt

angles of the stereo vision system, α and β respec-

tively. On the other hand, the objective of the robot

is to move towards the goal, and hence to align the

robot coordinate system such that the Z

-axis points

straight at the target. This is in general not the case

due to the movement of the target person, the inertia

of the robot and because the robot has to avoid ob-

stacles on its way. Thus, most of the time, α

> 0,

where α

= ∠(O

, OZ). To be able to navigate

in a complex environment with obstacles, a behavior

based robot navigation was adopted, where one be-

havior leads the robot to the target, whereas another

behavior enables the robot to avoid obstacles.

The general system architecture which integrates

all the capabilities discussed above is sketched in

Figure 2. On the left, one can observe the ASVM,

which receives its input from the two cameras in-

stalled on the stereo head. At the heart of ASVM

is a particle ﬁlter-based visual tracker which gener-

ates at each time step hypotheses (particles) about the

3D target position (in spherical coordinates) relative

to the (XY Z) frame. These hypotheses are ”pro-

jected” onto the image plane, resulting in a set of

candidate target regions within the left and right im-

ages. For each pair of candidate regions, we com-

pute two color histograms which we compare with

ACTIVE STEREO VISION-BASED MOBILE ROBOT NAVIGATION FOR PERSON TRACKING

Stereo Head

Camera 1 Camera 2

Color based target

tracker using particle

filtering

Pan & Tilt Controller

Target

Model

Update

Navigation Controller

Motion Controller

Motor

Controller

State

Observer

Robot Actuators & Odometry

Proprioceptive

and

exteroceptive

sensors

-1

3D Target Position Estimation

Active Stereo Vision Module (ASVM) Navigation Module (NM)

Robot Motion Controller (RMC)

Figure 2: Overview of the system architecture

the base color histograms (left and right) serving as

model for the tracked person. The likelihood of each

hypothesis is then quantiﬁed by the matching degree

between these histograms. The outcome of the track-

ing process is an estimate of the target position in the

form of a probabilistic mixture of the target hypothe-

ses. This estimate consists of the azimuth (ˆα), ele-

vation (

β), and range (

λ) of the target and its use is

threefold. First, it serves for updating the base color

histograms in view of coping with changing illumi-

nation conditions. Second, the target pose estimate is

fed to the pan & tilt controller, which employs two

PID controllers to ensure smooth and robust stereo

camera control. Finally, the estimate is used to re-

cover the absolute 3D position (relative to the robot

frame) of the target person. This position estimator

links the ASVM to the NM (see Figure 2). There, a

navigation controller will distill a heading direction

and speed from the absolute pan angle (i.e. α

+ ˆα),

the target range (

λ) and the input from the propriocep-

tive and exteroceptive sensors. We have tested differ-

ent behavior-based navigation controllers: a simple

dual-behavior fuzzy logic-based navigation controller

and a more elaborate hybrid architecture consisting

of a deliberative and a reactive part. The ﬁnal output

on this level of the robot navigation module, a head-

ing direction and a speed setpoint, is compatible with

most robotic platforms, no matter what their kinemat-

ics and dynamics are. What follows further are robot-

speciﬁc modules, indicated by the shaded blocks in

Figure 2, which are not part of the presented architec-

ture.

3 ACTIVE VISION

The ASVM accomplishes the following tasks:

1. tracking the 3D position of a person over time by

means of the color properties of a region of his

body; the color model of the target is updated in

time to account for the variations in illumination;

2. control of the stereo head such that the person stays

always in the ﬁeld of view of the stereo head;

These tasks are detailed in the sequel of this section.

3.1 Dynamic model

The state of the target at time k is described by the

vector x

= (α

, β

, λ

) containing the spherical

coordinates of the target with respect to the frame

(OXY Z) attached to the stereo head. α is the az-

imuth angle relative to OZ, β is the elevation angle

relative to the plane (OXY ), and λ is the target range.

Since a person may move in an unpredictable way,

we adopt a weak state evolution model (inspired by

erez et al., 2004)) for the stereo head-target system.

More speciﬁcally, we assume that the state vector

components evolve according to mutually indepen-

dent Gaussian random walk models, which we aug-

ment with uniform components to capture the possi-

bly erratic motion of the target. Thus the state evolu-

tion model can be written as

p(α

|α

k−1

) = ϕ

N (α

; α

k−1

+ cu

k−1

, σ

)

+ (1 − ϕ

)U(α

; −α

, α

)

(1)

p(β

|β

k−1

) = ϕ

N (β

; β

k−1

+ cu

k−1

, σ

)

+ (1 − ϕ

)U(β

; −β

, β

)

(2)

p(λ

|λ

k−1

) = ϕ

N (λ

; λ

k−1

, σ

)

+ (1 − ϕ

)U(λ

; λ

min

, λ

max

)

(3)

where u

and u

are the pan and tilt control inputs,

c is a known coefﬁcient, {ϕ

}

i=1

∈ (0, 1) are known

mixing coefﬁcients, N (x; µ, σ

) denotes a Gaussian

distribution of variable x, mean µ, and variance σ

and U(x; x

min

, x

max

) signiﬁes that x is uniformly

distributed between x

min

and x

max

. Note that σ

1,2,3

, β

, λ

min

and λ

max

are known by design. Since

, β

, λ

are independent variables, it follows that

the state evolution distribution factorizes as :

p(x

k−1

) = p(α

|α

k−1

)p(β

|β

k−1

)p(λ

|λ

k−1

(4)

3.2 Stereo Geometry

The geometry of the stereo vision system is sketched

in Figure 1. We track the 3D position of the target rel-

ative to the frame XY Z, (α, β, λ), using color mea-

surements in the image plane. Hence, we need to ﬁnd

a relationship between (α, β, λ) and the 2D position

of the point where the target projects on the image

plane, for each image in a stereo pair. To this end, a

ICINCO 2005 - ROBOTICS AND AUTOMATION

ﬁrst step is to compute the azimuth and elevation an-

gles of the target with respect to the coordinate frame

attached to each camera, i.e. (α

, β

) for the left image

and (α

, β

) for the right image. As this derivation is

identical for α and β, we only present the solution for

the azimuth angles here. From (Vieville, 1997, p. 30),

we have that

l,r

= arctan



+ f



tan(α) ∓

2λ cos(α)



(5)

where u

is the optical center of the camera, b is the

baseline and f is the focal length.

Now, we shall relate (α

, β

, α

, β

) to the posi-

tion of the target projection in each image. Let p

, v

) and p

= (u

, v

) denote the position of the

target projection on the left image I

and on the right

image I

, respectively. Given the geometry of the im-

age formation, the following relations hold:

l,r

= f · D

· tan(α

l,r

) (6)

l,r

= f · D

· tan(β

l,r

) (7)

where D

and D

represent the number of pixels

per meter in horizontal and vertical direction, respec-

tively. Thus, starting from (α, β, λ), we can deter-

mine p

and p

based on (5), (6), and (7). Let

, p

) = T2S(α, β, λ)

(T2S stands for ”3D to stereo”) be the function corre-

sponding to these transformations. It is useful to also

deﬁne S2T (), the inverse function of T2S():

(α, β, λ) = S2T(p

, p

), (8)

Alternatively, we refer to T2S as ”projection” and to

S2T as ”back-projection.”

3.3 Color-based measurement model

Initially, at time k = 0, the projections of the target

on the image planes are delineated manually and de-

scribed by means of two elliptical regions with the

half axes H

and H

. Let these regions be denoted

by R

l,0

= R(p

l,0

, 1) (in the left image) and R

r,0

R(p

r,0

, 1) (in the right image), where R(p, s) is an

elliptical region of center p and scale factor s with

respect to the initial ellipse (H

, H

). Subsequently,

tracking the object throughout the stereo image se-

quence localizes the object at time k within the re-

gions R

l,k

= R(p

l,k

, s

) and R

r,k

= R(p

r,k

, s

where s

is the scale at time k. Note that the scale of

the object in the image is inverse proportional with λ

and therefore can be estimated by

= λ

/λ

, (9)

where λ

is initial target range and is found by back-

projecting p

l,0

and p

r,0

The appearance of a target conﬁned to the image

region R(p, s) is described by means of a spatially-

weighted color (RGB) histogram (Comaniciu et al.,

2003) with B bins:

(u) = c

r∈R



kr − pk



δ[b(r)−u], u = 1, ..., B

where c is a constant such that

u=1

h(u) = 1, b(r)

is a function mapping the color of the point r into

a color bin, H = s

+ H

, and φ is the kernel

function

φ(r) =



1 − r

, r < 1

0, otherwise

(10)

The target model consists of the color histograms of

the elliptic regions R

l,0

and R

r,0

. For simplicity, let

these histograms be denoted by

(u) , h

l,0

(u), q

(u) , h

r,0

(u) (11)

For k > 0, the similarity between the target model

q and the color model h of a target candidate (in one

image) is assessed using the Bhattacharya distance,

deﬁned as

d[q, h] =

1 − ρ[q, h], (12)

where ρ[q, h] =

u=1

q(u)h(u).

The Bayesian estimation paradigm entails speci-

fying the likelihood function p(z

) of the state

given the measurement z

. Note that, in our

case, the measurement consists of color stereo im-

ages, z

= {I

l,k

, I

r,k

}. Dropping the time index k

for convenience, the likelihood can be expressed as

p(I

, I

|x) = p(I

, I

|α, β, λ) = p(I

|∆, I

)p(I

|∆),

(13)

where ∆ , (p

, p

, s) and (p

, p

) = T2S(x). For

the partial likelihood p(I

|∆), we use the formulation

from (Perez et al., 2002)(Nummiaro et al., 2003):

p(I

|∆) = p(I

|R(p

, s)) ∝ exp



−

, h

]

2σ



(14)

where d is given by (12) and σ

is a design parameter

which plays the role of a measurement error variance.

The image correlation likelihood p(I

|∆, I

) quan-

tiﬁes the matching between the Bhattacharya distance

in the left and right image and is modeled here as a

Gaussian function of the distance difference:

p(I

|∆, I

) = p(I

|R(p

, s), R(p

, s), I

)

∝ exp



−

(d[q

, h

] − d[q

, h

])

2σ



(15)

where σ

is a design parameter. Plugging (14) and

(15) into (13) yields

p(I

, I

|x)

∝ exp



−

, h

]

2σ

−

(d[q

, h

] − d[q

, h

])

2σ



(16)

ACTIVE STEREO VISION-BASED MOBILE ROBOT NAVIGATION FOR PERSON TRACKING

3.4 Particle ﬁlter algorithm

For non-linear, non-Gaussian and multi-modal mod-

els, as the one described here, the particle ﬁlter (Aru-

lampalam et al., 2002) provides a Monte Carlo solu-

tion to the recursive ﬁltering equation p(x

1:k

) ∝

p(z

)

p(x

k−1

)p(x

k−1

1:k−1

) necessary for

tracking. Starting with a weighted particle set

{(x

(i)

k−1

, ˜w

(i)

k−1

)}

i=1

approximately distributed ac-

cording to p(x

k−1

1:k−1

), the particle ﬁlter proceeds

by predicting new samples from a suitably chosen

proposal distribution which may depend on the old

state and the current and previous measurements, i.e.

(i)

∼ q(x

(i)

k−1

, z

1:k

). To maintain a consistent

sample, the new particle weights are set to

(i)

∝

p(z

(i)

)p(x

(i)

k−1

)

q(x

(i)

k−1

, z

1:k

)

˜w

(i)

k−1

. (17)

After weight normalization, the new particle set

{(x

(i)

, ˜w

(i)

)}

i=1

is then approximately distributed

according to p(x

1:k

). The particles are resampled

according to their weights to avoid degeneracy.

Particle ﬁlters suffer from the curse of dimension-

ality, i.e., as the dimension of the state-space in-

creases an exponentially increasing number of parti-

cles is required to maintain the same estimation ac-

curacy. To mitigate this phenomenon, we choose a

proposal density which biases the generation of the

particles towards the most-likely 3D location, while

it maintains predictive particles to handle the back-

ground clutter and recover from failure or tempo-

rary occlusion. More speciﬁcally, q(x

k−1

, z

1:k

)

is a mixture between the state evolution distribu-

tion, p(x

k−1

), and a Gaussian distribution whose

mean (ˆα

) is derived via stereo mean-

shift tracking and back-projection:

q(x

(i)

k−1

) = (1 − γ)p(x

(i)

k−1

) + γN

) (18)

where γ ∈ (0, 1) is a mixing factor. The mean-shift

algorithm (Comaniciu et al., 2003) minimizes (12),

thereby ﬁnding a highly possible location of the target

in the image. The coefﬁcient γ expresses our belief in

the mean-shift derived 3D target hypotheses, sampled

according to the density

) = N (α

; ˆα

, σ

)N (β

;

, σ

)

· N (λ

;

, σ

(19)

The vector (ˆα

) is obtained as follows:

starting from the mean target positions in the left and

right image at time k − 1,

(k − 1) and

(k − 1),

we ﬁnd via mean-shift the positions of the target at

time k, respectively

(k) and

(k), in the cur-

rent stereo images I

and I

. Further, from these two

Given the sample set S

k−1

= {(x

(i)

k−1

, ˜w

(i)

k−1

)}

i=1

at time k − 1, obtain S

= {(x

(i)

, ˜w

(i)

)}

i=1

as follows:

1. Importance sampling. For i = 1, . . . N , sample x

(i)

based on x

(i)

k−1

and q(x

k−1

, z

1:k

• mean shift tracking and back-projection:

= MeanShift(

(k − 1), I

)

= MeanShift(

(k − 1), I

)

(ˆα

) = S2T(

)

• sample u ∼ U(u; 0, 1)

• if u > γ, sample x

(i)

= (α

(i)

, β

(i)

, λ

(i)

) as follows:

(i)

∼ p(α

|α

(i)

k−1

) [see eq. (1)]

(i)

∼ p(β

|β

(i)

k−1

) [see eq. (2)]

(i)

∼ p(λ

|λ

(i)

k−1

) [see eq. (3)]

else, sample x

(i)

= (α

(i)

, β

(i)

, λ

(i)

) as follows:

(i)

∼ N (α

; ˆα

, σ

) [see eq. (19)]

(i)

∼ N (β

;

, σ

) [see eq. (19)]

(i)

∼ N (λ

;

, σ

) [see eq. (19)]

• project x

(i)

to the image locations p

(i)

, p

(i)

, p

(i)

) = T2S(α

(i)

, β

(i)

, λ

(i)

), s

(i)

= λ

/λ

(i)

• compute w

(i)

, the unnormalized weight of x

(i)

, ac-

cording to (17,4,16,18); the likelihood p(I

, I

(i)

) =

p(I

, I

(i)

, p

(i)

, s

(i)

) (16) is computed based on the

histograms

(i)

, h

(u), where R

= R(p

(i)

, s

(i)

) ⊂ I

(i)

, h

(u), where R

= R(p

(i)

, s

(i)

) ⊂ I

2. Weight normalization: ˜w

(i)

= w

(i)

i=1

(i)

3. Estimation. Compute the mean state of the set S

, the

scale estimate, and the mean target positions in the left

and right image:

= (ˆα

) =

i=1

˜w

(i)

, ˆs

= λ

(k ) =

i=1

˜w

(i)

(k ) =

i=1

˜w

(i)

4. Target model update. Compute the occlusion/outlier

indicator o = p(I

, ˆs) + p(I

, ˆs) as the sum of

likelihoods (deﬁned by(14)) of the elliptical regions of

scale ˆs, centered at

and

respectively; if o exceeds

a threshold th

, we proceed to the target model update

(see Section 3.5).

5. Selective resampling: if the effective sample size

ef f

i=1

( ˜w

(i)

)

−1

is below a threshold th

, apply a systematic resampling

step - see (Arulampalam et al., 2002).

Figure 3: Particle ﬁlter algorithm

ICINCO 2005 - ROBOTICS AND AUTOMATION

image locations, we can determine the 3D location

(ˆα

) by back-projection (8).

The outline of the particle ﬁlter algorithm for color-

based stereo target tracking is presented in Figure 3.

3.5 Target model update

Let

and

denote the estimates of the centers of

the target regions in the left and right image, respec-

tively. If the sum of likelihoods (14) of

and

are

higher than a threshold, it means that there is no out-

lier or occlusion at the estimated target position in the

image. Therefore, we can update the target model to

cope with illumination variations resulting in appear-

ance changes. The target models, q

(u) and q

(u), are

updated as in (Nummiaro et al., 2003):

(u) = (1 − a)q

(u) + ah

(u), (20)

(u) = (1 − a)q

(u) + ah

(u), (21)

where u = 1, ..., B,

= R(

, ˆs),

= R(

, ˆs),

ˆs is the scale estimate, and a ∈ (0, 1) is a factor

weighting the color model of the target at the esti-

mated positions

and

. This evokes a forgetting

process whereby the contribution of a speciﬁc frame

decreases exponentially in time.

3.6 Camera control

The control scheme refers here only to the pan (az-

imuth) angle α. The control of the tilt (elevation)

angle β can be done in the same manner. We use

a discrete Proportional-Integral-Derivative (PID) con-

troller given by

= K

+ K

i=0

+ K

− e

k−1

) + u

where e

is the estimate of the azimuth angle at time

k as delivered by the particle ﬁlter, e

= ˆα

. The

parameters K

, K

, T

are design constants

and T

is the sampling period.

4 ROBOT NAVIGATION

Two behavior-based architectures for robot naviga-

tion are presented here. The general idea behind

behavior-based approaches is to decompose a task in

simpler tasks that are easier to implement and test.

The challenge of this approach remains in how to

combine these different subtasks such that the global

task is executed in a robust manner. The robot naviga-

tion problem can be subdivided into two main parts:

• How to reach the goal location?

Goal (Person)

Following Behavior

Obstacle Avoidance

Behavior

Exteroceptive

sensors

3D Goal Position

v, ω

Figure 4: Fuzzy-logic behavior based navigation controller

• How to avoid obstacles?

The presented solutions preserve this ambivalent

structure by providing a behavior-based navigation

strategy where two main behaviors process each one

of the questions raised above.

The robot navigation module produces as output a

heading direction for the robot and a speed setpoint,

usable on any mobile robotic platform. The speed set-

point depends directly on the distance to the target

person: if the robot is far away, it needs to accelerate

in order not to loose the person; when it approaches

the target, it must move with more caution to not hurt

the human. When the robot comes within one me-

ter of the target person, it will stop automatically, for

security reasons.

4.1 Fuzzy-logic behavior based

navigation controller

In this setup, depicted in Figure 4, each of the be-

haviors consists of a fuzzy logic controller relating

the input commands, i.e. the sensory data, to output

commands for the robot actuators. For the person-

following or goal-seeking behavior, the input comes

from the active stereo vision system, delivering the

3D position of the target person, whereas the obstacle

avoidance behavior uses exteroceptive sensor data to

ﬁnd a path without colliding with obstacles.

4.2 Hybrid behavior based

navigation controller

We also exploit a hybrid architecture used for moving

in human-centered environments (Nuttin et al., 2003).

This architecture consists of a deliberative and a re-

active part. In this way, the advantages of both ap-

proaches are combined. The robot is able to reason

about how to reach a certain goal position, taking a

priori knowledge about the environment into account

if this is available. At the same time, it is able to re-

act very quickly to unmodeled obstacles in the envi-

ronment, by adopting a more direct coupling between

sensors and actuators. A multi-agent framework in

which behaviors can be speciﬁed conveniently was

developed for this goal, as in (Waarsing et al., 2003).

Figure 5 depicts the proposed architecture. The

navigation module as a whole calculates the linear

ACTIVE STEREO VISION-BASED MOBILE ROBOT NAVIGATION FOR PERSON TRACKING

Planner

Behavioral

execution

Robot Position

Odometry

Exteroceptive

sensors

3D Goal Position

∆x, ∆y, ∆θ

v, ω

Figure 5: Hybrid behavior based navigation controller

Figure 6: The Nomad200 and wheelchair with the active

stereo vision system

velocity and heading direction v and ω of the robot,

given the current robot location (x, y, θ) and its un-

certainty, the robot’s global goal given by the active

vision module, the measured ranges from the exte-

roceptive sensors, and the odometry values. During

navigation, a global planner and a behavioral execu-

tion unit co-operate.

5 RESULTS AND DISCUSSION

The overall control strategy was tested on a No-

mad200 robot, which is a pure laboratorial robot used

for testing purposes. As such, the experimental results

which are shown in this section, are obtained with

this robot. For more real-world applications, we make

use of a mobile wheelchair platform. Both mobile ro-

bot platforms are shown in Fig. 6. For this research

project, we developed a totally independent stereo vi-

sion platform consisting of a PC, with a small LCD

screen. On top of the platform, a Biclops stereo head

is installed, carrying two high-resolution Pulnix color

cameras. The whole system is totally self sustainable

as it runs on its own power resources (six 10Ah bat-

Figure 7: Target tracking results: the white ellipse indicates

the goal which is tracked, the small circles represent the

different particles of the particle ﬁlter. The columns show

(1,2) stereo head tracking, (3) robot advancing to the target.

teries). The stereo vision platform can be seen on top

of the Nomad robot in Figure 6, while a model of the

stereo vision subsystem is shown in Figure 1.

The results of the person following application are

illustrated in Figure 7. With a number of N = 70 par-

ticles and B = 8 × 8 × 8, the system is able to run in

real-time. As can be noticed, the tracker succeeds to

aim the stereo head towards the target person, with-

standing illumination changes and even though the

movement of this person was not easily predictable.

The robot navigates towards this person while avoid-

ing the obstacles on its way to come to a stop 2 meters

in front of the person.

To assess the ASVM’s performance as to the target

range estimation, we conducted an experiment where

a colored object is rotated with constant angular speed

in a plan parallel with the ground and at a height cor-

responding to that of the stereo head. In this case, the

range varies in a sinusoidal manner (see Figure 8). At

the beginning, there is a short stationary period nec-

essary for the ASVM to center the target in its ﬁeld

of view. Note that the target range is tracked quite

accurately, with a small lag.

ICINCO 2005 - ROBOTICS AND AUTOMATION

0 50 100 150 200 250 300 350 400

1.6

1.8

2.2

time step

distance [m]

Figure 8: Estimated target range (blue line) vs. ground truth

(gray line).

6 CONCLUSION

We have presented the active stereo vision (ASVM)

and navigation (NM) modules of a mobile robot sys-

tem designed for person following. The ASVM con-

trols a stereo head for tracking a target by means of a

color-based particle ﬁlter, robust to illumination vari-

ations, erratic target motions, and short occlusions.

To enforce the stereo constraint (the target regions

in the stereo images are correlated through the stereo

head-target 3D geometry), the measurement process

is formulated in the image plane, whereas the system

dynamics is based on the 3D position of the target.

Keeping the target in the ASVM’s ﬁeld of view is

achieved by adjusting the pose of the stereo head via

a PID pan/tilt controller. Further, the estimate of the

3D target position is fed to the NM, which consists

of a behavior-based navigation controller. Two dif-

ferent navigation controllers were presented. Finally,

the concept was demonstrated by implementing it on

both a Nomad200 and a wheelchair platform.

ACKNOWLEDGMENT

This research has been conducted within the frame-

work of the Inter-Universitary Attraction-Poles pro-

gram number IAP 5/06 Advanced Mechatronic Sys-

tems, funded by the Belgian Federal Ofﬁce for Scien-

tiﬁc, Technical and Cultural Affairs.

REFERENCES

Arsenio, A. M. and Banks, J. L. (1999). People detection

and tracking by a humanoid robot. Technical report,

MIT.

Arulampalam, S., Maskell, S., Gordon, N., and Clapp, T.

(2002). A tutorial on particle ﬁlters for on-line non-

linear/non-Gaussian Bayesian tracking. IEEE Trans-

actions of Signal Processing, 50(2):174–188.

Beardsley, P., Reid, I., Zisserman, A., and Murray, D.

(1995). Active visual navigation using non-metric

structure. In 5th International Conference on Com-

puter Vision, pages 58–64, Cambridge, MA, USA.

Comaniciu, D., Ramesh, V., and Meer, P. (2003). Kernel-

based object tracking. IEEE Transactions on Pattern

Analysis and Machine Intelligence, 25:564–577.

Davison, A. and Murray, D. (1998). Mobile robot local-

ization using active vision. In 5th European Confer-

ence on Computer Vision, volume 2, pages 809–825,

Freiburg, Germany.

Franz, M.-O. and H.-A., M. (2000). Biomimetic robot nav-

igation. Robotics and Autonomous Systems, 30:133–

153.

Ghita, O. and P.-F., W. (2003). Real time 3d estimation us-

ing depth from defocus. Vision, MVA (SME), 16(3):1–

Koren, Y. and Borenstein, J. (1991). Potential ﬁeld methods

and their inherent limitations for mobile robot naviga-

tion. In IEEE Conference on Robotics and Automa-

tion, pages 1398–1404, Sacramento, California.

Kuniyoshi, Y. and Rougeaux, S. (1999). A humanoid vision

system for interactive robots. In 1st Asian Symposium

on Industrial Automation and Robotics, pages 13–21.

Nummiaro, K., Koller-Meier, E., and Gool, L. V. (2003). An

adaptive color-based particle ﬁlter. Image and Vision

Computing, 21:99–110.

Nuttin, M., Vanhooydonck, D., Demeester, E., Brus-

sel, H. V., Buijsse, K., Desimpelaere, L., Ramon,

P., and Verschelden, T. (2003). A robotic assis-

tant for ambient intelligent meeting rooms. In 1st

European Symposium on Ambient Intelligence (EU-

SAI), pages 304–317, Veldhoven, The Netherlands.

http://www.mech.kuleuven.be/pma/research/mlr.

Perez, P., Hue, C., Vermaak, J., and Gangnet, M. (2002).

Color-based probabilistic tracking. In European Conf.

Computer Vision (ECCV), volume 1, pages 661–675.

erez, P., Vermaak, J., and Blake, A. (2004). Data fusion for

visual tracking with particles. Proc. IEEE, 92:495–

513.

Ping, H., Sahli, H., Colon, E., and Baudoin, Y. (2001). Vi-

sual servoing for robot navigation. In 3rd Interna-

tional Conference on Climbing and Walking Robots:

Clawar 2001, pages 255–264, Karlsruhe, Germany.

Schlegel, C., Illmann, J., Jaberg, H., Schuster, M., and

Worz, R. (2000). Integrating vision based behaviors

with an autonomous robot. Videre: Journal of Com-

puter Vision Research, 1(4):32–60.

Strens, M.-J.-A. and Gregory, I.-N. (2003). Tracking

in cluttered images. Image Vision and Computing,

21(10):891–911.

Vieville, T. (1997). A few steps towards 3D active vision.

Springer, Berlin.

Waarsing, B., Nuttin, M., and Brussel, H. V. (2003). A

software framework for control of multi-sensor, multi-

actuator systems. In International Conference on Ad-

vanced Robotics (ICAR), Coimbra, Portugal.

Wilhelm, T., Bohme, H.-J., and Gross, H.-M. (2004). A

multi-modal system for tracking and analyzing faces

on a mobile robot. Robotics and Autonomous Systems,

48:31–40.

ACTIVE STEREO VISION-BASED MOBILE ROBOT NAVIGATION FOR PERSON TRACKING