Strategic Searching Approaches in a Multi-Robot System
Yan Meng, Ke Cao
Department of Electrical and Computer Engineering
Stevens Institute of Technology, Hoboken, New Jersey, USA
Abstract. In a partially known dynamic environment, two multi-robot strategic
searching
approaches are proposed in this paper: utility greedy approach and
game theoretic approach. It is assumed that a-priori probabilities of the targets’
distributions are provided. A one-step dynamic-programming is used to for-
malize the utility functions for both approaches, which not only depends on the
targets’ distribution probabilities, but also on travel cost. Extensive simulation
results shows that the proposed approaches are more efficient and robust com-
pared to the other heuristic searching strategies, and game theoretic approach
guaranteed better worst-case performance and be more robust to handle the en-
vironmental uncertainty.
1 Introduction
To be more efficient for the searching task in an Urban Search and Rescue (USAR), it
is reasonable to assume that some partial information are available either through
distributed sensors installed in the area, or based on some heuristics from human
beings in the emergency situation. One natural way to capture the available informa-
tion is to represent it as the likelihood of the target presence in the search space. [1]
[2] [3] proposed different searching strategies based on the priori probabilities of the
target distribution. However, these strategies are applied only to a single individual
robot. As we know, multi-robot systems are more desirable in some scenarios, such
as exploration, USAR, and hazardous environments, due to the robustness, stability,
adaptability, and scalability.
Some researches have been conducted on the m
ulti-robot searching. The interac-
tion between the robots is relative simple in [4][5][6] due to the special configura-
tions. An alternative approach that proved to be more efficient consists of discretiz-
ing time and partitioning the continuous space into a finite collection of cells. The
search problem is then reduced to deciding which cell to visit at each time interval.
For a multi-robot system, the mutual interactions between individual robots sharing
a
common workspace could be much more complex in general cases. The game
theory seems to be a convenient tool for modeling and solving multi-robot interaction
problems. In principle game theory can be applied to solve the coordination problem
and some researches have conducted using the game theory [7], [8], [9]. A pursuit-
evasion problem as a Markov game is described in [9], which is the generalization of
a Markov decision process to the case when the system evolution is governed by a
transition probability function depending on two or more player’s actions. This prob-
abilistic setting makes it possible to model the uncertainty affecting the player’s mo-
tion.
In this paper, we propose game-theory based searching strategy for a m
ulti-robot
system in a partially known dynamic searching area. The searching area is partitioned
into different regions, where the initial probability of the target distribution in each
Meng Y. and Cao K. (2006).
Strategic Searching Approaches in a Multi-Robot System.
In Proceedings of the 2nd International Workshop on Multi-Agent Robotic Systems, pages 106-111
Copyright
c
SciTePress
region is given. When the searching task starts, the probability for each region will be
updated dynamically based on the new searching status. A one-step dynamic pro-
gramming is applied to formalize the utility function for each robot to reduce the
computation time. Based on this utility function, the decision making approach and a
non-zero-sum game theory are proposed to coordinate a team of robots for the search-
ing task. In addition, event-based discretisized approach rather that the fixed-time
iteration is applied to further decrease the computation complexity. Compared to
other multi-robot searching algorithms, the proposed approaches are more efficient
and robust than other heuristic algorithms, especially under USAR environment
where the wireless network is tend to be unreliable.
2 Searching Strategies
Assume there are N robots searching for a single target in an indoor area with J dif-
ferent regions. The initially priori probability of the target distribution is provided.
The searching area is discretized and partitioned into a finite collection of cells. The
robots can only be allowed to move to the adjacent ones. Initially, N robots start from
the entrance of the searching area. The team of robots can be homogeneous or het-
erogeneous in terms of their searching capabilities.
To make the decision making procedure to be numerical tractable, the discretiza-
tion of the searching procedure does not depend on a pre-defined fixed time interval,
instead, it only depends on the event. A robot is busy when it is searching inside a
region. Otherwise, it is free. Initially all robots are set as free. A new event happens
when a robot enters a region or finishes searching its current region, which can trig-
ger the update of the robot state. With this event-triggered discretization, the search-
ing time is updated at each event. Since the robots only communicate with each other
upon new event it can reduce the communication overhead significantly compared
with the fixed-time-interval discretization method.
2.1 Utility Function
The utility can be defined as the searching payoff value by selecting which region to
search on the next discrete time. For a multi-robot system, to improve the collective
searching efficiency, the utility value of each robot does not only depend on its own
payoff value, but also on other robots’ decisions.
Obviously, the utility associated with each robot depends on the probability of the
target at each region. The higher the probability, the higher the utility value should
be. When the searching task starts, the probabilities of all regions are updated dy-
namically based on the current searching results. For example, if one robot finishes
searching in region 1 without detecting the target, then the initial probability of the
target in region 1 is evenly distributed by all of unsearched regions on next discrete
time, which can be expressed in the following equation.
1+n
i
p
= 0; =
1+n
j
p
n
i
n
j
p
p
1
, j
i , j=1, 2, …, J. (1)
where represents the priori probability of the target in each region, J is the maxi-
mum region number and n is the current discrete time. However, this priority-only
i
p
107
based approach may tend to achieve the highest priority irrespective of the difficulty
of that goal. Therefore, we add the travel cost to the utility calculations.
The set of decisions made by the robot from 1 to N is denoted
by . The set of probabilities of the target from region 1 to region
J is denoted by , where represents the priori probability of the
target in each region, and i represents the region number. To obtain the optimal solu-
tion, a Dynamic Programming Equation (DPE) is applied to define the utility function
for robot n as follows.
[
Nn
ddd ,...,,...,
1
=D
]
][
J
ppp ,...,,
21
=P
i
p
),,,(),...,,(
21 cnnNn
fdddU TTPD=
=
{}
+
=
=
otherwisefpTTpgh
pTTpg
p
cnn
d
ddndd
ddndd
d
n
nnnn
nnnn
n
] ),
ˆ
,
ˆ
,
ˆ
(max)1(),,,()[(
1 if ),,,(
0 if ,0
ˆ
TTPDDD
D
(2)
where is the utility function of robot n, represent the decisions
made by robot 1 to robot N. represents the probability of target detection in
region by robot n. represents the payoff gain of robot n
searching the region , which is defined as follows:
n
U
N
ddd ,...,,
21
n
d
p
n
d
),,,(
nnn
dndd
pg TTD
n
d
),,,(
nnn
dndd
pg TTD
=
nn
n
dnd
d
TkTk
p
21
+
, (3)
where and represent the time required for robot n to navigate from its current
position to region , and the time required for a robot to cover region , respec-
tively. and are scale factors which can be adjusted based on different envi-
ronmental structures.
n
nd
T
n
d
T
n
d
n
d
1
k
2
k
{
}
),
ˆ
,
ˆ
,
ˆ
(max)1(
ˆ
cnn
d
d
fp
n
n
TTPD
represents the maximum ex-
pected utility of robot n by selecting different for the rest of the unsearched re-
gions after finishing the region with the assumption that other robots keep their
current decision during this recursive procedure.
n
d
ˆ
n
d
In general, the utility is zero if the probability of target detection in region is
zero. If the probability of the target detection in region is 1, which means this is
the last room need to be searched. In this case, the utility function is only related to
the payoff value by searching region . Otherwise, the utility is a recursive function
defined in (2). The dynamic programming is intractable for large-scale region num-
bers. To reduce the computational time, one-step dynamic programming solution is
applied in Equation (2). The average expected teammate contribution is computed as
the contribution that the teammate would make from its current pose. This approxi-
mation is reasonable when each step is relatively small.
n
d
n
d
n
d
Considering the situation that several robots may choose the same region simulta-
neously based on their own utility functions, which may decrease the overall search-
ing performance. We define a factor as follows:
)(Dh
==
+
=
otherwise
Nmnmddif
TT
T
h
nm
mini
ni
1
.,...,2,1,, ,
)(D
, (4)
108
where is the travel time for robot n from its current position to the selected region
i, is the total travel time for robot m (m can be multiple robots from 1 to N,
except n) from their current positions to region i. The definition of actually
embeds the coordination between the multiple robots by cutting down the utility
value, eventually it helps to prevent multiple robots picking up the same region simul-
taneously and improves the overall searching efficiency.
ni
T
mi
TΣ
)(Dh
2.2 Searching Strategies
Utility greedy (UG) approach is to select the next searching region with the highest
utility value calculated by (2). If more than one region has the same highest utility
values, the robot will randomly pick one from them. For the game-theory based (GT)
approach, it is modeled as a multi-player cooperative non-zero-sum game since the
coordination is embedded into the utility function through (4). The players choose
their strategies simultaneously at the beginning of the game. Although the overall
process of the searching is dynamic, we can treat it as a sequence of static game at
each discrete time.
According to the current positions of robots, obstacles, and the probability of the
target in each region, the utility matrix is calculated. Then the Nash Equilibrium (NE
) is applied for this nonzero-sum game. When no pure Nash equilibrium strategy
exits, a max-min method is applied to calculate the mixed-strategy equilibrium.
Let denotes the probability of robot n choosing region j, and
, we have .
(j)p
n
[]
T
nnnn
Jppp )()2()1( =P
Nn(j)p
J
1j
n
,...,2,1 ,1 =
=
=
Utility matrix is a N-dimensional matrix for N robots, where there are J (region
number) units at each dimension, and each cell of the matrix consists of N utility
values for each robot at the corresponding position. can be estimated by solving
the above linear equation, which can be simplified as
n
U
n
P
[
]
T
n
m
n
1000=×PU
.
Since the game is a finite strategic-form game, the existence of the mixed-strategy
equilibrium can be guaranteed. The region with the highest probability is chosen for
the next step.
3 Simulation Results
To evaluate the performance of the proposed game strategic searching approaches,
the simulations with two robots using MATLAB have been conducted. The search-
ing area is sketched as a square of 100 x 100 cells with multiple regions distributed. It
is assumed that each robot takes 1 time unit to traverse one cell in the simulation.
The searching time is calculated by the time units from the starting time till the target
is found.
Two heuristic searching strategies are proposed for comparison. First one is called
randomly selection (RS) approach, where each robot randomly selects the next region
to search from the unsearched regions. Second one is called probability-based (PB)
approach, where each robot only picks the region with the highest probability at any
109
discrete time as its next objective region. If more than one region has the same high-
est probability, the robot randomly picks one from them.
First set of simulation is conducted, where 1000 runs for all four approaches are
implemented. The simulation results of mean value and the corresponding variance
of searching times are shown in Fig. 1. It is obviously that the searching time of the
UG and GT approaches are much less than that of the PB approach since the travel
and searching time was ignored in the latter case, and the RS has the worst perform-
ance.
The performance of GT and UG approach are competitive, which is mainly depends
how the game theory is applied in the simulation. When both robots are free, the
utility matrix is calculated and their searching decisions are computed based on the
game strategy. However, if one robot is busy and the other one is free, the free robot
will make its decision only based on the utility value instead of starting a new game.
The motivation for this simplified procedure is to reduce the computational time. In
addition, since the other robot is busy in searching a region, it would make more
sense to let it finish its current searching instead of reselecting the searching region
again due to the new status. If the second case happens very often, then the overall
performance of game strategy tends to close to that of the utility greedy.
0
100
200
300
400
500
600
700
800
6 101520253040
Region number
Mean searching time (uni t)
GT
UG
PB
RS
0
50
100
150
200
250
300
350
400
6 101520253040
Region number
Searching time Variance (unit)
GT
UG
PB
RS
Fig. 1. Searching time (mean and variance) with different configurations.
In the real world, the prior probability of the target distribution is not accurate. To
explore the system robustness to the prior probability variations, another 1000 runs of
simulations are conducted for each strategy, where the target is distributed in the
searching regions with the variations of the priori probability from 10% to 50%. The
simulation results in the configuration of 10-room are shown in Fig. 2. As can be
seen, the PB approach is very sensitive to the probability variation since the probabil-
ity is the only criteria for the robot to make searching decisions. It makes sense that
the probability variation has no effect on the RS approach at all. The GT and UG
approach are much more robust than the PB approach, where the GT beats UG in
both mean searching time and variance. This indicates that the GT is more robust to
handle environmental uncertainty compared to UG, although we have to pay the pen-
alty of more computational time for GT.
0
20
40
60
80
100
120
140
160
180
200
10% 20% 30% 40% 50%
Variation of probability
Mean searching time (unit)
GT
UG
PB
RS
0
20
40
60
80
100
120
10% 20% 30% 40% 50%
Variation of probability
Searching time variance (unit)
GT
UG
PB
RS
Fig. 2. Searching time (mean and variance) with different variations in probability of target
distribution with a 10-room configuration.
110
5 Conclusions
Utility greedy and game-theory based strategic searching approaches are proposed in
this paper for a cooperative multi-robot searching task. Comparing to other heuristic
searching strategies, the simulation results demonstrated that the proposed two ap-
proaches are more efficient and robust. In addition, the game theory has guaranteed
better worst-case performance and be more robust to handle the environmental uncer-
tainty. Another advantage of using game theory based approach is that the explicit
communication between the robots can be reduced significantly due to their mutual
rationality. Therefore, it can be applied to some emergency scenarios, such as USAR,
where the RF communication tends to attenuate or even broken.
Our preliminary simulation only contains two homogeneous robots and one target
in the searching task. The proposed algorithm can easily be extended to the heteroge-
neous robots with different moving speeds and local sensing capabilities by setting up
different travel and covering time for each robot. Our future research will focus on
multi-target searching by a large scale robot ream.
References
1. Murphy, R.R.: Biomimetic search for urbane search and rescue. IEEE/RSJ International
Conference on Intelligent Robots and Systems, vol. 3, pp. 2073-2078 (2000).
2. Bourgault, F, Furukawa, T., and Durrant-Whyte, H.F.: Coordinated decentralized search
for a lost target in a bayesian world. IEEE/RSJ International Conference on Intelligent Ro-
bots and Systems (2003).
3. Lau, H, Huang, S., and Dissanayake, G. :Optimal search for multiple targets in a built
environment. IEEE/RSJ International Conference on Intelligent Robots and Systems
(IROS'05), Edmonton, Alberta, Canada (2005).
4. Lopez-Ortiz1, A., and Schuierer, S. : Online parallel heuristics and robot searching under
the competitive framework. 8th Scandinavian Workshop on Algorithm Theory, Turku,
Finland, July 3-5 (2002).
5. Kao, M., Reif, J. H., and Tate, S. R.:Searching in an unknown environment: An optimal
randomized algorithm for the cow-path problem. In Proc. 4th ACM-SIAM Sympos. Dis-
crete Algorithms, pages 441–447 (1993).
6. Baeza-Yates, R., Culberson, J., and Rawlins, R.: Searching in the plane. Informationand
Computation, 106:234–252 (1993).
7. Skrzypczyk, K.: Game theory based target following by a team of robots. Fourth Interna-
tional Workshop on Robot Motion and Control, June 17-20 (2004).
8. LaValle, S., and Hutchinson, S.: Path selection and coordination for multiple robots via
Nash equilibria. Proc. IEEE Int. Conf. Robot and Automation, pp. 1847-1852 (1994).
9. Hespanha, J. P., Prandini, M., and Sastry, S.:Probabilistic pursuit-evasion games: a one-
step Nash approach. In proc. of the 39
th
conf. on decision and control, vol. 3, (2000).
111