2 METHODS 
2.1  Soar and Reinforcement Learning 
Soar (Laird, 2012) is a cognitive architecture that 
scalably integrates a rule-based system with many 
other capabilities, including RL and long-term 
memory. The main decision cycle involves rules that 
propose new operators, as well as preferences for 
selecting amongst them; an architectural operator-
selection process; and application rules that modify 
agent state. The reinforcement learning module 
(Soar-RL) modifies numeric preferences for 
selecting operators based on a reward signal, either 
via internal or external source(s).  Soar has been 
used in modeling large-scale complex cognitive 
functions for warfighting processes like the ones in a 
kill chain (Jones et al., 1999). 
In this paper, we will show how to use Soar and 
specifically the reinforcement learning (Soar-RL) 
module to learn an effective combination of existing 
CID features for decision-making, as identified by 
experts and systems, in an operational environment. 
2.2 Combat ID 
There are many challenges in the CID process, 
including 1) an extremely short time for fusion, 
decision-making, and targeting; 2) uncertain and/or 
missing data outside sensor (e.g., radar, radio) 
ranges; 3) manual decision-making; 4) 
heterogeneous data sources for decision making; and 
5) multiple decision-makers in the loop. 
Existing CID methods, sensors, and systems 
include basic CID categories and methodologies as 
follows: 
1.  Procedural. Procedural methods involve 
analysis of a target’s “behaviors,” to include 
such things as flight profile and point of original 
2.  Non-cooperative. These methods gather ID 
information on a target without that target’s 
intentional cooperation/participation.  
3.  Cooperative. Cooperative CID requires active 
participation on the part of the target. A 
common example would be an identification 
friend or foe (IFF) transponder.  
4.  Intelligence and ID Fusion methods. 
Information derived from various networks 
comprises the final CID method. 
The existing methods involve wide ranges of 
participating platforms such as Destroyers, Cruisers, 
Carriers, F/A-18s and E-2Ds; Participating Sensors 
such as Radar, Forward Looking Infrared (FLIR), 
Identification Friend or Foe (IFF), Precision 
Participation Location Identifier (PPLI), National 
Technical Means (NTM); and Participating 
Networks and Systems such as the Aegis combat 
system, Cooperative Engagement Capability (CEC) 
and Link-16.  There are diversified doctrines, rules 
of engagements (ROE), knowledge databases and 
expert systems, as smart data used in the current 
process.  Many existing rules, expert systems and 
smart data may be obselete, incomplete, or have low 
confidence levels. Some models may be conflicted 
with each other, even wrong or not adapative to a 
local environment. There is a critical need to 
research methodologies to better use, fuse and 
improve on all these models to advance the art of 
CID a higher symbolic level.   
This paper evaluates Soar-RL as a tool for this 
purpose due to the fact it can train and fuse the 
system at a symbolic level. The complex CID 
cognitive functions are mapped to the models 
including decision-making, sensor fusion, analytic 
processes and workflow initially and then Soar-RL 
is applied to integrate them together.  
CID decision-making requires a fusion of 
existing rules.  For example, as shown in Figure 2, a 
state at time t can be a track profile of a flying object 
with observable data containing longitude/latitude 
(x/y position), altitude (z), speed, acceleration, IFF, 
point of origin, heading, type, class, etc.  The goal is 
to classify the CID of the object as friendly, foe or 
unknown. So an existing model can be “if an 
unknown object is at the position x,y, there is a 
probability of p
11
, p
12
 or p
13
 that the object’s point of 
origin  to be A, B  or C respectively. There is 
another model saying “if an unknown object’s point 
of origin is from A or B there is probability  of p
21
, 
p
22 
or  p
23 
that the object is a foe respectively.  So 
when an object is observed at (x,y), then the 
probability of the object being a foe is the maximum 
of the combined p
11*
p
21
 ,p
11*
p
22
 ,p
11*
p
23
 ,p
12*
p
21
 
,p
12*
p
22
, p
12*
p
23
 ,p
13*
p
21
 ,p
13*
p
22
, and p
13*
p
23. 
 
 
Figure 2: Example of CID requires a fusion of existing 
rules.