
significant difference between these values is clear 
although the system considered here can be specified 
as a simple system. So, by taking into account the 
task completion ratios and computational loads, it is 
evident that the proposed approach, strategy-planned 
distributed Q-learning, yields appropriate and useful 
results. 
8 CONCLUSIONS 
In this paper, a new learning-based task allocation 
approach, Strategy-Planned Distributed Q-Learning, 
is proposed. Traditional Q-learning algorithm is 
defined in MDP environments. But MRS 
environments are no longer Markovian because of 
unpredicted behaviours of other robots and presence 
of uncertainties. There are two major approaches 
about Q-learning for multi-agent systems, 
distributed and centralized approaches. The 
proposed algorithm combines the advantages of 
distributed and centralized approaches. It is a 
distributed learning approach in nature but it assigns 
to robots different learning strategies in a centralized 
manner. Experimental results show that task 
completion ratio of high-priority tasks gets higher 
for all three learning approaches because the robots 
make use of their past task allocation experiences for 
future task execution through their learning ability. 
The experimental results show that the centralized 
learning approach produces the best solutions about 
task completion ratios of both high-priority and low-
priority tasks. The proposed approach results in a bit 
less task completion ratios than centralized 
approach. However, it is indicated that the proposed 
algorithm provides reasonable solutions with its low 
learning space dimension and computational load. 
REFERENCES 
Boutlier C., 1996, Planning, learning and coordination in 
multiagent decision processes, Proceedings of the 6th 
Conference on Theoretical Aspects of Rationality and 
Knowledge, TARK '96, pp. 195-210. 
Buşoniu L., Babuška R., Schutter B., 2008, A 
comprehensive survey of multiagent reinforcement 
learning,  IEEE Transactions on Systems, Man, and 
Cybernetics – Part C: Applications and Reviews, 
vol.38, no.2, pp. 156-172. 
Dias M. B., Zlot R. M., Kaltra N., Stentz A., 2006, 
Market-based multirobot coordination: a survey and 
analysis, Proceedings of the IEEE, vol. 94, no.7, pp. 
1257-1270. 
Gerkey B. P., Mataric M. J., 2002, Sold!: Auction methods 
for multi robot coordination, IEEE Transactions on 
Robotics and Automation, vol. 18, no. 5, pp. 758-768. 
Gerkey B. P., Mataric M. J., 2004, A formal analysis and 
taxonomy of task allocation in multi-robot systems, 
International Journal of Robotics Research, 23(9), pp. 
939-954. 
Hatime H., Pendse R., Watkins J. M., 2013, A 
comparative study of task allocation strategies in 
multi-robot systems, IEEE Sensors Journal, vol. 13, 
no. 1, 253-262. 
Hu J., Wellman M. P., 1998, Multiagent reinforcement 
learning: theoretical framework and an algorithm, 
Proceedings of the Fifteenth International Conference 
on Machine Learning ICML’98, pp. 242-250.  
Hu J., Wellman M. P., 2003, Nash Q-learning for general 
sum games, Journal of Machine Learning Research, 4, 
pp. 1039-1069. 
Jones E. G., Dias M. B., Stentz A., 2007, Learning-
enhanced market-based task allocation for 
oversubscribed domains, Proceedings of the 2007 
IEEE/RSJ International Conference on Intelligent 
Robots and Systems, San Diego, CA, USA, pp. 2308-
2313. 
Kaleci B., Parlaktuna O., Ozkan M., Kırlık G., 2010, 
Market-based task allocation by using assignment 
problem, IEEE International Conference on Systems, 
Man, and Cybernetics, pp. 135-14. 
Mataric M. J., 1997, Reinforcement learning in multi-
robot domain, Autonomous Robots, 4(1), pp. 73-83. 
Matignon L., Laurent G. J., Le Fort-Piat N., 2007, 
Hysteretic Q-learning: an algorithm for decentralized 
reinforcement learning in cooperative multi-agent 
teams,  Proceedings of the 2007 IEEE/RSJ 
International Conference on Intelligent Robots and 
Systems, San Diego, CA, USA, pp. 64-69. 
Mosteo A. R., Montano L., 2007, Comparative 
experiments on optimization criteria and algorithms 
for auction based multi-robot task allocation, 
Proceedings of the IEEE International. Conference on 
Robotics and Automation, pp. 3345-3350. 
Russel S., Norvig P., 2003, Artificial intelligence a 
modern approach, Prentice Hall, New Jersey. 
Sutton R. S., Barto A. G., 1998, Reinforcement learning: 
an introduction, MIT Press, Cambridge.  
Wang Y., de Silva C. W., 2006, Multi-robot box-pushing: 
single-agent Q-learning vs. team Q-learning, 
Proceedings of the 2006 IEEE/RSJ International 
Conference on Intelligent Robots and Systems
, 
Beijing, China, pp. 3694–3699. 
Watkins C. J., 1989, Learning from delayed rewards, 
University of Cambridge, UK, PhD Thesis. 
Watkins C. J., Dayan P., 1992, Q-learning, Machine 
Learning, vol. 8. 
Yang E., Gu D., 2004, Multiagent reinforcement learning 
for multi-robot systems: a survey, CSM-404, 
Technical Reports of the Department of Computer 
Science, University of Essex. 
ICINCO2014-11thInternationalConferenceonInformaticsinControl,AutomationandRobotics
416