MINING ANOMALIES IN OBJECT-ORIENTED

IMPLEMENTATIONS THROUGH EXECUTION TRACES

Paria Parsamanesh, Amir Abdollahi Foumani

IBM Rational Software Group

Montreal, Quebec, Canada

Constantinos Constantinides

Department of Computer Science and Software Engineering, Concordia University

Montreal, Quebec, Canada

Keywords:

Software quality, anomalies, program comprehension, aspect mining, dynamic programming.

Abstract:

In the context of a computer program, the term “anomaly” is used to refer to any phenomenon that can neg-

atively affect software quality. Examples of anomalies in object-oriented programs include low cohesion of

modular units, high coupling between modular units and the phenomenon of crosscutting. In this paper we

discuss the theoretical component of a technique to identifying anomalies in object-oriented implementations

based on observation of patterns of messages (invoked operations). Our technique is based on the capturing of

execution traces (paths) into a relational database in order to extract knowledge of anomalies in the system, fo-

cusing on potential crosscutting concerns (aspects). In order to resolve ambiguities between candidate aspects

we deploy dynamic programming to identify optimal solutions.

1 INTRODUCTION

In the context of object-oriented development, the

term “anomaly” refers to any phenomenon that can

negatively affect the quality of software. Examples

include low cohesion of modular units, and high cou-

pling between modular units. Another example which

sometimes lies at the root of the previous two quality

problems is the phenomenon of crosscutting, whereby

the implementation of certain concerns is not well lo-

calized but it cuts across the class hierarchy of the sys-

tem, resulting in scattering of behavior and tangling of

code.

In this paper, we discuss the theoretical compo-

nent of a technique to identifying anomalies in object-

oriented implementations. Our proposal is based on

running use-case scenarios and capturing the corre-

sponding execution traces in a relational database

where we can then make observations over cer-

tain patterns of messages (invoked operations). The

knowledge we extract can provide information on po-

tential crosscutting concerns (aspect candidates). In

order to identify an optimal solution while choosing

an aspect among a collection of aspect candidates, we

deploy dynamic programming algorithms.

The rest of this paper is organized as follows: In

Section 2 we discuss the necessary theoretical back-

ground to this research. In Section 3 we discuss the

problem and motivation behind this research. In Sec-

tion 4 we present our proposal and in Section 5 we

present our methodology. In Section 6 we discuss

related work and comparisons to our proposal. We

conclude our work in Section 7 with a summary, dis-

cussion and pointers to future research directions.

2 THEORETICAL BACKGROUND

The principle of separation of concerns (Parnas,

1972; Dijkstra, 1976) refers to the realization of sys-

tem concepts into separate software units and it is a

fundamental principle to software development. The

associated beneﬁts include better analysis and un-

derstanding of systems, readability of code, a high-

level of design-level reuse, easy adaptability and good

maintainability. To this end, the notions of cohesion

and coupling are the fundamental measures of qual-

ity in an object-oriented software system. Cohesion

is the degree to which a module performs a single re-

sponsibility (i.e. it addresses a single concern). Cou-

pling (or dependency) is the degree to which each pro-

gram module relies on another module. Two objects

are said to be coupled if they act upon one another

(Chidamber and Kemerer, 1991). While structuring

object-oriented programs, aiming for high-cohesion

and low coupling are two general goals in order to

177

Parsamanesh P., Abdollahi Foumani A. and Constantinides C. (2006).

MINING ANOMALIES IN OBJECT-ORIENTED IMPLEMENTATIONS THROUGH EXECUTION TRACES.

In Proceedings of the First International Conference on Software and Data Technologies, pages 177-189

DOI: 10.5220/0001321101770189

 SciTePress

support separation of concerns, thus providing sys-

tems which are easier to understand and maintain.

Despite the success of object-orientation in the ef-

fort to achieve separation of concerns, certain proper-

ties cannot be directly mapped in a one-to-one fash-

ion from the problem domain to the solution space,

and thus cannot be localized in single modular units.

Their implementation ends up cutting across the in-

heritance hierarchy of the system. Crosscutting con-

cerns (or “aspects”) include persistence, authentica-

tion, synchronization and logging. The “crosscutting

phenomenon” creates two implications: 1) the scatter-

ing of concerns over a number of modular units and 2)

the tangling of code in modular units. As a result, de-

velopers are faced with a number of problems includ-

ing a low level of cohesion of modular units, strong

coupling between modular units and difﬁcult compre-

hensibility, resulting in programs that are more error

prone.

Aspect-Oriented Programming (AOP) (Kiczales

et al., 1997; Elrad et al., 2001) explicitly addresses

those concerns which “can not be cleanly encapsu-

lated in a generalized procedure (i.e. object, method,

procedure, API)” (Kiczales et al., 1997) by introduc-

ing the notion of an aspect deﬁnition, which is a mod-

ular unit of decomposition. There is currently a grow-

ing number of approaches and technologies to support

AOP. One such notable technology is AspectJ (Kicza-

les et al., 2001), a general-purpose aspect-oriented

language, which has inﬂuenced the design dimen-

sions of several other general-purpose aspect-oriented

languages, and has provided the community with a

common vocabulary based on its own linguistic con-

structs. In the AspectJ model, an aspect deﬁnition is

a new unit of modularity providing behaviour to be

inserted over functional components. This behaviour

is deﬁned in method-like blocks called advice blocks.

However, unlike a method, an advice block is never

explicitly called. Instead, it is activated by an associ-

ated construct called a pointcut expression. A point-

cut expression is a predicate over well-deﬁned points

in the execution of the program which are referred to

as join points. When the program execution reaches

a join point captured by a pointcut expression, the as-

sociated advice block is executed. Even though the

speciﬁcation and level of granularity of the join point

model differ from one language to another, common

join points in current language speciﬁcations include

calls to methods and execution of methods. Most

aspect-oriented languages provide a level of granu-

larity which speciﬁes exactly when an advice block

should be executed, such as executing before, after,

or instead of the code deﬁned at the associated join

point. Furthermore, much like a class, an aspect def-

inition may contain state and behavior. It is also im-

portant to note that AOP is neither limited to object-

oriented programming nor to the imperative program-

ming paradigm. However, we will restrict this discus-

sion to the context of object-oriented programming.

3 PROBLEM AND MOTIVATION

As the complexity of software systems increases, it

becomes more challenging to build software that is

free of certain quality problems, collectively referred

to as “anomalies.” Given an object-oriented program,

we would like to be able to achieve program com-

prehension in order to identify anomalies at speciﬁc

points over the implementation so that we can then

make decisions about possible transformations such

as refactoring or reengineering activities.

4 PROPOSAL

In this section we discuss our proposal to aid in the

provision of high-quality systems. Several techniques

have already been proposed in the literature for de-

tecting low cohesion, high coupling and crosscutting

in object-oriented implementations (Robillard and

Murphy, 2001; Robillard and Murphy, 2002; Krinke

and Breu, 2004; Breu and Krinke, 2004; Moldovan

and Serban, 2006). In this work we build on current

proposals by discussing the theoretical component of

a technique to identify anomalies in object-oriented

programs based on the monitoring of execution traces

and the identiﬁcation of patterns of invoked opera-

tions. More speciﬁcally, our proposal is based on run-

ning use-case scenarios and capturing the correspond-

ing execution traces into a relational database. We can

then apply certain strategies which can help us iden-

tify anomalies. In cases of the existence of several

candidate aspects, we can deploy dynamic program-

ming to resolve ambiguities.

5 METHODOLOGY:

EXTRACTING KNOWLEDGE

In this section we discuss how our proposal is being

realized.

Consider the relational database schema in Figure 1

which can store message sequences at run time. The

main entities of the schema are deﬁned as follows:

• Class: A class encapsulates state and behavior.

Class instances (objects) send and receive mes-

sages.

• Method: Methods collectively make the behavior

of a class. In this model we only consider those

methods which appear in execution paths.

ICSOFT 2006 - INTERNATIONAL CONFERENCE ON SOFTWARE AND DATA TECHNOLOGIES

178

• Message: A message is an invocation of a method

involving a caller (sender) object and a callee (re-

ceiver) object. A message is received and acted

upon by a method inside a class deﬁnition, i.e.

< message >::= object − name.operation −

name. A message implements a partial responsi-

bility of a system operation and it forms part of a

use-case scenario.

• Execution path: A use-case scenario is described as

a sequence of interactions between entities. During

requirements analysis, this sequence is described

at a high-level of abstraction in terms of the inter-

action between an actor and the system where the

latter is viewed as a black box. During design, a

ﬁne grained model captures sequences of interac-

tions between class instances through an ordered

sequence of message passing. We deﬁne this or-

dered sequence of message passing as an execution

path, i.e. < executionpath >::=< message

ExecutionPath

PKPATH_ ID

NUMBER_OF_EXECUTION

Message

PK MESSAGE_ID

FK1 PATH_ID

FK2 METHOD_ID

MESSAGE_ORDER

Class

PKCLASS_ID

CLASS_NAME

Method

PK METHOD_ID

FK1 CLASS_ID

METHOD_NAME

Figure 1: Relational database schema for storing execution

paths.

Main idea: Our technique to identify anomalies in

object-oriented implementations is based on captur-

ing execution paths as data from the runtime environ-

ment and populating a relational database. Data can

then be analyzed in order to provide information in

the form of meaningful statistics. In doing so, we can

proceed as follows:

1. Generate data: Execute all possible use-case sce-

narios. Each use-case scenario is comprised by a

sequence of execution paths.

2. Capture data: Capture execution traces as data and

store them into a relational database schema.

3. Analyze data: We identify patterns of messages

in execution paths. By applying certain strategies

which we have developed (to be discussed next),

we can identify anomalies based on the result sets.

In the next subsections we will describe the three

different types of knowledge on anomalies we wish to

acquire from the relational database schema, namely

low cohesion, high coupling and crosscuttings. We

deﬁne algorithms to identify the three types of anoma-

lies.

5.1 Identifying Low Cohesion

We provide the following deﬁnitions:

Deﬁnition 5.1 The term “fanin” is used to deﬁne the

number of messages that a given object receives over

a single execution path. The term is also deﬁned for

methods.

Deﬁnition 5.2 The term “fanout” is used to deﬁne

the number of messages that a given object sends to

other objects over a single execution path. The term

is also deﬁned for methods.

Deﬁnition 5.3 A class is deﬁned to be independent

in the context of a use-case scenario, if 1) the class

forms an integral part of the use-case scenario and 2)

the class is not dependent on any other classes within

the use-case scenario. We can consider independent

classes as potential aspects (Foumani and Constan-

tinides, 2005a; Foumani and Constantinides, 2005b).

We would like to obtain the following knowledge:

• Classes that are involved in (short length) scenarios

that have large fanin and low fanout can be consid-

ered as having low-cohesion or having an indepen-

dent role in a use-case scenario.

• Methods that are involved in (short length) sce-

narios that have large fanin and low fanout can

be considered as having low-cohesion or having

an independent role in a use-case scenario. These

methods can be separated from their original class

through refactoring and they can be captured as ad-

vice blocks inside aspect deﬁnitions.

Deﬁnition 5.4 We deﬁne “class independency fac-

tor” (CIF ) as

CIF =

(class fanin)

(class fanout)

In the case where class f anout = 0 (in which

case CIF → ∞), the class is considered as one with

an independent role and it can be a candidate aspect.

To calculate this factor we need to execute statements

to measure class fanin, and class fanout which is de-

scribed in the subsequent subsections.

5.1.1 Measuring Class Fanin

In the following query we iterate over all messages

in all execution paths where a unique identiﬁer for a

given class, CLASS

ID, is provided, and we count the

number of references to the class:

MINING ANOMALIES IN OBJECT-ORIENTED IMPLEMENTATIONS THROUGH EXECUTION TRACES

179

Select count(mth.CLASS_ID)

From Message msg

Join Method mth

On mth.METHOD_ID = msg.METHOD_ID And

mth.CLASS_ID = [CLASS ID]

5.1.2 Measuring Class Fanout

In the following query we count the number of classes

referenced by a given class.

Select count(Distinct mth_out.CLASS_ID)

From Message msg

Join Message msg_out

On msg_out.PATH_ID = msg.PATH_ID

And msg_out.MESSAGE_Id <> msg.MESSAGE_ID

And msg_out.MESSAGE_ORDER = msg.MESSAGE_ORDER + 1

Join Method mth

On mth.METHOD_ID = msg.METHOD_ID

And mth.CLASS_ID = [CLASS ID]

Join Method mth_out

On mth_out.METHOD_ID = msg_out.METHOD_ID

And mth_out.CLASS_ID <> [CLASS ID]

The queries in 5.1.1 and 5.1.2 enable the calcu-

lation of the CIF factor for any given class. We are

interested in classes with high CIF values (percent-

ages). These classes can be considered as a bottle-

neck in an object-oriented system for the following

two reasons:

1. They may have many responsibilities (low cohe-

sion). In this case we can follow a refactoring

strategy to break the responsibilities and achieve a

higher separation of concerns.

2. They may have an independent role (and can be po-

tential aspects).

Deﬁnition 5.5 We deﬁne “method independency fac-

tor” (MIF), as

MIF =

(method fanin)

(method fanout)

In the case where method fanout = 0 (in which

case M IF → ∞), the method can be modeled as an

advice block inside an aspect deﬁnition as this is an

independent entity. In order to calculate this factor

we need to execute similar queries to those we de-

scribed above to calculate the fanin and fanout values

of each method. We are interested in methods with

high MIF values (percentages). These methods can

also be considered as a bottleneck for the same rea-

sons described above for classes. As in the case with

classes, object-oriented refactoring or migration to an

aspect-oriented context can provide a viable solution.

Deﬁnition 5.6 To identify classes with low cohesion,

we deﬁne class cohesion factor (CCH),

CCH =

(number of classes in execution path)

(number of methods in execution path)

The CCH factor deﬁnes the distribution of invoked

methods in an execution path between classes. In

other words, CCH deﬁnes the distribution of respon-

sibilities between classes. According to this deﬁnition

we may have the following cases:

• CCH ≪ 1: Many methods of a few classes are

called in a given execution path. This implies that

we have classes with low cohesion.

• CCH ≈ 1: Logical distribution of responsibility

between methods of classes. To maintain high co-

hesion we need to have a CCH value very close

to 1. This would imply that the number of in-

voked methods and the number of classes involved

in a use-case scenario should be very close to each

other. We can identify an exception when there is

only one class and one method referenced in the ex-

ecution path. In this case, even though CCH = 1,

we would still have low cohesion of the given class

and its corresponding method.

Table 1 provides different illustrative cases for cal-

culating and interpreting CCH.

5.2 Identifying High Coupling

To investigate coupling and identify classes with high

coupling we need to calculate two factors:

1. CCH: class cohesion factor. Recall that CCH ≪

1 implies that many methods of a few classes are

invoked in a given execution path.

2. The fanin of each method. This can be provided by

the count of the number of invocations originated

from (methods of) other classes.

Deﬁnition 5.7 For two classes C

and C

with meth-

ods M

and M

respectively, we deﬁne “class cou-

pling” factor (CCP ) as

CCP =

Ci.fanin(M

, C

)

Cj.fanin(M

, C

)

CCP =

fanin(Ci, Cj)

fanin(Cj, Ci)

The term C

.fanin(M

, C

) refers to the number

of invocations for method C

that are originated

from C

. The term f anin(C

, C

) refers to the num-

ber of messages received in C

from C

. If CCP ≈ 1

(the summation of fanin of methods of class Ci is

very close to the summation of fanin of methods of

class Cj) we can say that the degree of coupling be-

tween these two classes is high. If one of these sum-

mations is zero, then we will have low coupling. If

.fanin(M

, C

) ≫ 0, or Cj.f anin(M

, C

) ≫ 0,

then we identify another type of anomaly which we

call a “high dependency” of C

over C

or vice versa.

Table 2 provides different illustrative cases for calcu-

lating and interpreting CCP .

Example: Consider Figure 2 which illustrates an

example execution path. Initially, we need to obtain

CCH to see whether many methods of few classes

ICSOFT 2006 - INTERNATIONAL CONFERENCE ON SOFTWARE AND DATA TECHNOLOGIES

180

Table 1: Examples of class cohesion through the CCH factor with interpretations.

# of classes # of methods CCH Interpetation

1 1 1 Since this is the only class in the execution path, this implies

that the class has low cohesion.

1 5 0.2 Low cohesion at the class level.

2 20 0.1 Low cohesion at the class level.

4 4 1 High cohesion at the class level; Logical distribution of

responsibilities between classes.

5 7 0.7 High cohesion at the class level. In this case we still may have

low cohesion at the method level; To identify low cohesion at

the method level we need to calculate the MIF factor for each

method. We consider methods with high M IF value as

candidate methods with low cohesion.

are invoked during execution. CCH is obtained as

follows:

CCH =

= 0.25 ≪ 1

The very small value of CCH indicates a potential

high coupling between the involved classes. To inves-

tigate high coupling further, we must read the CCP

value, which can be obtained as follows:

.fanin(M

, C

) = fanin(C

, C

) +

fanin(C

, C

) +

fanin(C

, C

) +

fanin(C

, C

)

= 0 + 1 + 1 + 2

= 4

.fanin(M

, C

) = fanin(C

) +

fanin(C

) +

fanin(C

)

= 2 + 1 + 1

= 4

CCP =

= 1

The value of CCP indicates that there is indeed

high coupling between the two classes C

and C

M4

M5

M6

M7

Class Ci Class Cj

M3

M2

M1

Figure 2: Illustration of high coupling.

5.3 Identifying Crosscutting

To identify crosscutting we ﬁrst need to obtain a set

of execution paths. Consider the following:

<path1> ::=

<...,obj2.op2,obj3.op3,obj4.op4,obj5.op5,...>

<path2> ::=

<...,obj2.op2,obj3.op3,obj4.op4,obj5.op5,...>

<path3> ::=

<...,obj2.op2,obj3.op3,obj4.op4,obj5.op5,...>

<path4> ::=

<...,obj1.op1,obj2.op2,...>

<path5> ::=

<...,obj1.op1,obj2.op2,...>

<path6> ::=

MINING ANOMALIES IN OBJECT-ORIENTED IMPLEMENTATIONS THROUGH EXECUTION TRACES

181

Table 2: Interpretation of class dependency.

.fanin(M

, C

)

.fanin(M

, C

)

CCP Interpretation

0 ≫ 1 ∞ High dependency of C

over C

0 1 ∞ Low coupling.

6 7 ≈ 1 or > 1 High coupling.

<...,obj1.op1,obj2.op2,...>

<path7> ::=

<...,obj4.op4,obj5.op5,obj6.op6,...>

<path8> ::=

<...,obj4.op4,obj5.op5,obj6.op6,...>

<path9> ::=

<...,obj3.op3,obj4.op4,obj5.op5,obj6.op6,...>

<path10> ::=

<...,obj3.op3,obj4.op4,obj5.op5,obj6.op6,...>

A crosscutting path, indicated by S

, is one whose

sequence of messages occurs in several execution

paths. Consider the following crosscutting paths

based on the above execution paths S

1−10

Sa(path 1,2,3):

<obj2.op2, obj3.op3, obj4.op4, obj5.op5>

Sb(path 4,5,6):

<obj1.op1, obj2.op2>

Sc(path 7,8):

<obj4.op4, obj5.op5, obj6.op6>

Sd(path 9,10):

<obj3.op3, obj4.op4, obj5.op5, obj6.op6>

We deﬁne S as the set of all sequences S

. Essen-

tially S is a set of candidate aspects. Aspect candi-

dates should not contain common operations because

this will violate the semantics (logic) of the initial

object-oriented system. As a result, intersections be-

tween the aspect candidate sets constitute redundan-

cies. We, therefore, need to identify which one among

those sets (or subsets of them) can constitute better

candidate aspect sets. In the following subsections we

describe an algorithm in order to identify aspect can-

didates in situations when selecting aspect candidates

is ambiguous.

5.3.1 Identifying Orthogonal Aspect Candidate

Sets

This subsection presents an algorithm to address the

issue of redundancies and the identiﬁcation of strong

candidate aspects. The goal is to eliminate redun-

dancies by identifying aspect candidate sets that do

not have common operations. The algorithm has two

steps. In the ﬁrst step the algorithm eliminates the

longest common sets from the list of message se-

quences. The longest common set (there could be

more than one) is one which can be obtained by the

union of others. As a result, we can remove the

longest common sets from S, the set of aspect candi-

dates. In the second step the algorithm ﬁnds orthogo-

nal sets. An orthogonal set is one which does not have

any common element with any other set.

Suppose S = {S

, S

, . . . , S

} is a set

of message sequences and we suppose S

is a mes-

sage sequence in the set with the longest common set.

Step 1: Eliminating longest common sets. In order

to identify a longest common set, we deﬁne S

′

= S −

. S

is the longest common set if the following

conditions hold:

∃ S

∈ S

′

: S

6= ∅

[

i=1

= S

, ∀ S

∈ S

′

The ﬁrst condition satisﬁes that a given set has

common elements with some of the other message se-

quences. The second condition satisﬁes that S

can

be covered by the union of some of the other message

sequences. We can now deﬁne set S

′

as a new set

of aspect candidates by removing S

from set S. We

repeat this process for all the other members of the as-

pect candidate set until there is no member that can be

covered by a union of other members. In the example,

we identify S

as the longest common set (Figure 3),

since:

6= ∅

= ∅

6= ∅

= S

[

Figure 4 illustrates S, the set of all message se-

quences and set of candidate aspects, after having re-

moved S

, the longest common message sequence.

ICSOFT 2006 - INTERNATIONAL CONFERENCE ON SOFTWARE AND DATA TECHNOLOGIES

182

op2

Sa

op3 op4 op5

op2

Sa

op3 op4 op5

op2

Sa

op3 op4 op5

op1

Sb

op2

op1

Sb

op2

Sc op4 op5 op6

Sd op4 op5 op6op3

op1

Sb

op2

Figure 3: Illustration of execution paths. S

is the longest

common message sequence.

op2

Sa

op3 op4 op5

op2

Sa

op3 op4 op5

op2

Sa

op3 op4 op5

op1

Sb

op2

op1

Sb

op2

Sc op4 op5 op6

Sc

op4 op5 op6

A1 A2 A3 A4 A5

op1

Sb

op2

Figure 4: Orthogonal sets.

Step 2: Find orthogonal sets. This would imply

eliminating the vertical redundancies of operations in

Figure 4. We deﬁne S

′

= S − S

such that S

one of the longest message sequence and the follow-

ing conditions hold:

∃ S

∈ S

′

: S

6= ∅

[

i=1

6= S

, ∀ S

∈ S

′

The ﬁrst condition satisﬁes that a given set has

common elements with at least one other set. The

second condition satisﬁes that S

cannot be deﬁned

by the union of other sets. We can use the following

algorithm to identify orthogonal sets (which are con-

sidered aspect candidates):

1. Find common members of S

with S

as C

set

∃ S

∈ S

′

: S

= C

2. Deﬁne C

′

as a set of all members of S

that are

not in S

− C

= C

′

, ∀ S

∈ S

′

3. Deﬁne S

′

as a set of S

members except the

members that belong to C

′

= S

−

[

i=1

, ∀ S

∈ S

′

4. Replace set S with a new set of aspect candidates

such that each S

in S is replaced by C

and C

′

and

also S

replace with S

′

5. We repeat steps 1 to 4 until we have reached the

following condition:

= ∅, ∀ S

, S

∈ S and i 6= j

This condition satisﬁes that there are no two sets

with common members in S. In this case, the mem-

bers of S can be deﬁned as aspect candidates.

Applying this algorithm to the example (Figure 4)

where S = { S

, S

}, and starting with sequence

, we obtain the following:

1. Find common members of S

with S

and S

and C

sets:

= S

= {op2}

= S

= {op4, op5}

2. Deﬁne C

′

and C

′

as sets of all members of S

and S

that are not in S

′

= S

− C

= {op1}

′

= S

− C

= {op6}

3. Deﬁne S

′

as a set of S

members except the

members that belong to C

and C

′

= S

− (C

) = {op3}

4. Replace set S with a new set of aspect candidates

such that each S

in S is replaced by C

and C

′

, S

S is replaced by C

and C

′

and S

replace with S

′

S = {S

′

, C

′

, C

′

}

5. Since S holds orthogonal sets, we do not need to

repeat the algorithm.

In Figure 4 we identify A

1−5

as orthogonal sets,

all these being potential aspects. In general, it is

highly unlikely that aspects are completely orthogo-

nal of each other. When aspects affect the same join

point in one particular sequence, then the order of

aspect (advice) execution is important, as this must

preserve the semantics of the original object-oriented

MINING ANOMALIES IN OBJECT-ORIENTED IMPLEMENTATIONS THROUGH EXECUTION TRACES

183

system. For example for the aspect candidates in Fig-

ure 4, aspects should be deﬁned with the following

precedence:

−→ A

This algorithm identiﬁes a set of aspects by break-

ing down the clone message sequences into the lowest

level of their method invocations such that there are

no conﬂicts between these sets. However, we believe

that this cannot be an optimal solution for the problem

since it does not consider the following parameters:

1. Number of identiﬁed aspects.

2. Number of different types of concerns addressed by

each aspect deﬁnition. An aspect deﬁnition may be

deﬁned with low cohesion.

3. Number of messages in a given execution path.

4. Number of execution paths in a given use-case sce-

nario.

5. Number of runs of all execution paths.

In order to provide optimal solution for the second

step of the algorithm, we deploy dynamic program-

ming which is discussed in the next subsections.

5.3.2 Redundancy of Adjacent Invocations (Rai)

In this subsection we introduce a dynamic program-

ming algorithm to identify aspect/advice candidates

based on a set of given parameters. Dynamic pro-

gramming (Cormen et al., 2002) is typically applied

to optimization problems. In such problems there can

be many possible solutions. Each solution has a value,

and we wish to ﬁnd a solution with the optimal (mini-

mum or maximum) value. In this example, we wish to

deﬁne parts of message sequences as an aspect (such

that the selected operations can be modeled as an ad-

vice block) in such a way that the message sequence

satisﬁes the following conditions:

1. It is used in the maximum number of paths.

2. It has the maximum RAI value.

3. It contains the minimum number of classes.

4. It contains the minimum number of operations.

We believe this strategy can identify a stronger set

of aspect candidates since it selects the most utilized

operations. By minimizing the number of classes and

operations we achieve a better modularity. We ini-

tially deﬁne two new factors to detect clone message

sequence and also we introduce new algorithms to

identify aspect candidates.

Deﬁnition 5.8 We deﬁne “redundancy of adjacent in-

vocations” (RAI) of a given execution path to be the

Table 3: Statistics for message sequences in Figure 3.

Message Sequence NC NO NX RAI

4 4 1000

2 2 100

2 3 100

ratio of the recurrence of a sequence S

of adjacent

messages, over the total number of execution paths.

RAI(path) =

(recurrence of a message sequence)

(total number of paths)

As a result, for each execution path in the example

of Section 5.3 we have the following RAI measures:

RAI(S

) =

RAI(S

) =

RAI(S

) =

From the set S = {S

, S

}, we need to select

a message sequence or sequences that will be the best

candidate for an aspect. Table 3 provides illustrative

cases for the following parameters:

NC: Number of classes in a given execution path.

NO: Number of different operations in a given exe-

cution path.

NX: Number of execution times of given execution

paths.

We deﬁne the following recurrence equation to se-

lect the next aspect candidate from set S and we de-

ﬁne S

′

as a new set of aspect candidates. Each time

we will add S

that is selected from the following

equation into set S

′

= ∅, ∀ S

, S

∈ S.

Max(NX),

Max(RAI),

Min(NC),

Min(NO)

otherwise.

According to this recurrence equation, we select

that does not have any common member with

the other S

’s in set S. For S

’s that have common

ICSOFT 2006 - INTERNATIONAL CONFERENCE ON SOFTWARE AND DATA TECHNOLOGIES

184

members we recursively select S

that has a max-

imum NX value, maximum RAI value, minimum

NC value and minimum OP value. For the selected

we apply the following formula to identify aspect

candidates:

= S

−

[

i=1

, ∀ S

∈ S

′

In the example, by applying this algorithm we

select S

as the ﬁrst candidate aspect (as it has

MAX(N X)). The second aspect candidate accord-

ing to Table 3 would be S

and since there is no in-

tersection between S

and S

, we identify the latter

as a new aspect candidate. The last aspect candidate,

, has an intersection with both S

and S

. For this

reason we apply the above formula for S

to obtain a

strong aspect candidate, i.e.

= S

− (S

[

) = op3

Our RAI optimization strategy has decreased the

number of candidate aspects from ﬁve down to three.

5.3.3 Family Related Dependency (Frd)

Consider the following paths that contain method in-

vocations in the same class, disregarding any possible

gaps between them:

<path1> ::=

<...,obj1.op1,...,obj1.op2,...,obj1.op3,...>

<path2> ::=

<...,obj1.op1,...,obj1.op2,...>

<path3> ::=

<...,obj1.op1,...,ob1.op2,...,obj1.op3,...>

<path4> ::=

<...,obj2.op1,...,obj2.op2,...>

<path5> ::=

<...,obj2.op1,...,obj2.op2,...,obj2.op3,...>

<path6> ::=

<...,obj1.op5,...,obj1.op6,...,obj1.op7,...,obj1.op8,...>

<path7> ::=

<...,obj1.op5,...,obj1.op6,...,obj1.op7,...,obj1.op8,...>

<path8> ::=

<...,obj1.op5,...,obj1.op6,...,obj1.op7,...,obj1.op8,...>

<path9> ::=

<...,obj1.op9,...,obj1.op10,...,obj1.op111,...>

<path10> ::=

<...,obj1.op9,...,obj1.op10,...,obj1.op111,...>

We deﬁne S

as a sequence of messages that occurs

in several paths. For example,

Sa(path 1, 3): <obj1.op1, obj1.op2, obj1.op3>

Sb(path 1, 2, 3):<obj1.op1, obj1.op2>

Sc(path 4, 5):<obj2.op1, obj2.op2>

Sd(path 6, 7, 8):<obj1.op5, obj1.op6, obj1.op7, obj1.op8>

Se(path 9, 10):<obj1.op9, obj1.op10, obj1.op111>

Deﬁnition 5.9 We deﬁne “family-related depen-

dency” (FRD) on a given execution path as the ratio

of the recurrence of a sequence S

of messages sent

to the same class, over the total number of execution

paths.

F RD(path) =

(number of selected paths)

(total number of paths)

As a result, for each execution path we have the

following F RD measures:

F RD(S

) =

F RD(S

) =

F RD(S

) =

F RD(S

) =

F RD(S

) =

From the set S = {S

, S

}, we need to

select a sequence or sequences that will be the best

aspect candidate. Identifying orthogonal aspect algo-

rithm as mentioned before cannot ﬁnd the optimal set

of aspects. In the case of family independent depen-

dency, we introduce another dynamic programming

algorithm that uses the following parameters. Table 4

provides illustrative cases for these parameters:

1. Number of different classes in each message se-

quence (NC)(this parameter in F RD is always

equal to 1).

2. Number of different operations in each message se-

quence (NO).

3. F RD value for message sequences.

We deﬁne the following recurrence equation to se-

lect the next aspect candidate from Set S and we de-

ﬁne S

′

as a new set of aspect candidates. Each time

we will add S

that is selected from the following for-

mula into Set S

′

.obj /∈ S

.obj, ∀ S

∈ S, j 6= i.

Max(FRD),

Min(NO)

otherwise

This algorithm selects all the message sequences

that use an object that is not used in any other mes-

sage sequences; For example, S

in Table 4 uses obj2

that is not used in the other message sequences. For

the other message sequences we recursively select the

MINING ANOMALIES IN OBJECT-ORIENTED IMPLEMENTATIONS THROUGH EXECUTION TRACES

185

Table 4: Parameters for F RD.

Message Sequence NC NO FRD obj

1 3

obj1

1 2

obj1

1 2

obj2

1 4

obj1

1 3

obj1

message sequence with maximum F RD and mini-

mum number of OP . For each selected S

we apply

the following formula to identify advice candidates:

= S

−

[

i=1

, ∀ S

∈ S

′

In the example, by applying this algorithm we se-

lect S

as the ﬁrst candidate aspect (as S

.obj(=

obj2) /∈

(S.obj − S

.obj)(= obj1)). The second

aspect candidate according to Table 4 would be S

We apply the above formula for S

to obtain a strong

aspect candidate:

= S

− S

= {obj1.op1, obj1.op2}

As the third aspect candidate we identify S

that

has MAX(F RD):

= S

− (S

[

)

= {obj1.op5, obj1.op6, obj1.op7, obj1.op8}

Nest aspect candidate is S

or S

since both has the

same F RD and N O, In this case, we will select one

of the message sequences randomly. for example, we

select S

= S

− (S

[

)

= {obj1.op9, obj1.op10, obj1.op111}

and ﬁnally we apply the above formula for S

since S

has some common members with S

, we just

select the memebers in S

that are not in S

= S

− (S

[

)

= {obj1.op3}

5.3.4 Combining Strategies: Rai and Frd

In Section 5.3.2 we described that with the calculation

of the RAI metric we are able to identify aspects by

detecting clones based on method invocations. These

methods belong to the different classes and are called

one after another in order to play a speciﬁc part of

the algorithms in the different scenarios. We claim

that we can extract these methods, and model them

as an aspect. In Section 5.3.3, we discussed how to

detect clones based on method invocations such that

the methods belong to the same class but can be called

in different scenarios in the same order and pattern.

Family-related dependency methods can be modeled

as different advices of an aspect. In this subsection

we combine RAI and FRD. Consider the execution

paths illustrated in Table 5 (the result of calculating

RAI values) and in Table 6 (the result of calculating

F RD values). In Table 7 we show F RD and RAI as

well as certain other parameters such as the number

of classes, and the number of operations.

<path1> ::=

<..., obj1.op1, obj2.op2,

..., obj1.op11,

..., obj2.op22, obj1.op111, obj3.op3,

...>

<path2> ::=

<..., obj1.op1, obj2.op2,

..., obj2.op22, obj1.op111,

...>

<path3> ::=

<..., obj1.op1, obj2.op2,

..., obj1.op11,

..., obj2.op22, obj1.op111, obj3.op3,

...>

We introduce a three-step algorithm to identify as-

pects in these cases:

Step 1: We deploy the F RD algorithm to identify

aspects and their corresponding advices. In our exam-

ple, we identify the following aspect/advices:

Aspect{Advice{obj3.op3}}

Aspect{Advice{obj2.op2}}, Advice{obj2.op22}}

Aspect{Advice{obj1.op1}}, Advice{obj1.op111}}

Aspect{Advice{obj1.op11}}

Step 2: We remove operations selected in the ﬁrst

step from sets identiﬁed for RAI. In this case we will

have three different situations:

1. Selected operations at the beginning of a message

sequence in the RAI set. In this case we can re-

move them from the set. The remaining operations

in the set will deﬁne a new set.

ICSOFT 2006 - INTERNATIONAL CONFERENCE ON SOFTWARE AND DATA TECHNOLOGIES

186

Table 5: RAI metric for paths.

Path Recurring Sequence RAI

{path1, path2, path3} S1:{obj1.op1, obj2.op2}

{path1, path2, path3} S2:{obj2.op22, obj1.op111}

{path1, path3} S3:{obj2.op22, obj1.op111, obj3.op3}

Table 6: F RD metric for paths.

Path Recurring Sequence FRD

{path1, path2, path3} S4:{obj1.op1, obj1.op111}

{path1, path3} S5:{obj1.op1, obj1.op11, obj1.op11}

{path1, path3} S6:{obj3.op3}

{path1, path2, path3} S7:{obj2.op2, obj2.op22}

Table 7: RAI & F RD metrics for paths.

Path RAI FRD NC NO Obj

S1:{obj1.op1, obj2.op2}

- 2 2 -

S2:{obj2.op22, obj1.op111}

- 2 2 -

S3:{obj2.op22, obj1.op111, obj3.op3}

- 3 3 -

S4:{obj1.op1, obj1.op111} -

1 2 obj1

S5:{obj1.op1, obj1.op11, obj1.op11} -

1 3 obj1

S6:{obj3.op3} -

1 1 obj1

S7:{obj2.op2, obj2.op22} -

1 2 obj1

MINING ANOMALIES IN OBJECT-ORIENTED IMPLEMENTATIONS THROUGH EXECUTION TRACES

187

2. Selected operations at the end of a message se-

quence in the RAI set. In this case we can remove

them from the set the same as case 1

3. Selected operations are in the middle of a message

sequence in the RAI set. In this case we can divide

the RAI set into two sets (or more) by removing

selected operations from step 1.

Step 3: We deploy the RAI algorithm to the new

set of message sequences. Here, we stress the impor-

tance of maintaining the precedence of aspect execu-

tion in order to preserve the semantics of the original

object-oriented system.

6 RELATED WORK

A number of studies of software metrics such as cou-

pling and cohesion to evaluate the complexity of soft-

ware have been carried out. However, none of these

proposals directly address the calculation software

metrics at run-time in order to identify aspects.

In (Gupta and Rao, 2001), the authors measure

module cohesion in legacy software. They compared

statically calculated metrics against a program execu-

tion based approach of measuring the levels of mod-

ule cohesion. In (Mitchell and Power, 2003), the au-

thors adapt two common object-oriented metrics, cou-

pling and cohesion, and apply them at run-time. In

(Moldovan and Serban, 2006), the authors describe

a new approach in aspect mining that uses clustering

to identify the methods that have the code scattering

symptom. In this method, for a method, they consider

a large numbers of calling methods and a large num-

bers of calling classes as indications of code scatter-

ing. In order to group the best methods (candidates)

they use a vector-space model for deﬁning the similar-

ity between methods. In (Breu and Krinke, 2004), the

authors describe an automatic dynamic aspect min-

ing approach which deploys program traces gener-

ated in different program executions. These traces

are then investigated for recurring execution patterns

based on different constraints, such as the requirement

that the patterns have to exist in a different calling

context in the program trace. In (Krinke and Breu,

2004), the authors describe an automatic static as-

pect mining approach, where the control ﬂow graphs

of a program are investigated for recurring execu-

tions based on different constraints, such as the re-

quirement that the patterns have to exist in a differ-

ent calling context. In (Robillard and Murphy, 2002),

the authors introduce a concern graph representation

that abstracts the implementation details of a con-

cern and it makes explicit the relationships between

different elements of the concern for the purpose of

documenting and analyzing concerns. To investigate

the practical tradeoffs related to this approach, they

developed the Feature Exploration and Analysis tool

(FEAT) that allows a developer to manipulate a con-

cern representation extracted from a Java system, and

to analyze the relationships of that concern to the code

base. In (Robillard and Murphy, 2001), the authors

describe concerns based on class members. This de-

scription involves three levels of concern elements:

use of classes, use of class members, and class mem-

ber behavior elements (use of ﬁelds and classes within

method bodies). Use of classes is expressed by the

class-use production rules. The rules specify that

a concern either uses the entire class to implement

its behavior or only part of a class, as well as what

parts of the class participate in the implementation of

the particular concern. In (Bruntink, 2004), the au-

thors deﬁne certain clone class metrics that measure

known maintainability problems such as code dupli-

cation and code scattering. Subsequently, these clone

class metrics are combined into a grading scheme

designed to identify interesting clone classes for the

purpose of improving maintainability using aspects.

In (Baxter et al., 1998), the authors use an abstract

syntax tree (AST) to detect duplicated code (clones).

This technique uses parsers to ﬁrst obtain a syntac-

tical representation of the source code, typically an

AST. The clone detection algorithms then search for

similar subtrees in the AST. In (Jr., 2002), the au-

thors introduce a general-purpose, multidimensional,

concern-space modeling schema that can be used to

model early-stage concerns.

7 CONCLUSION AND FUTURE

WORK

In this paper we described the theoretical component

of an approach to identifying anomalies in object-

oriented implementations based on observations of

patterns of messages in a legacy object-oriented sys-

tem. The term “anomaly” is used to refer to any phe-

nomenon that can negatively affect software quality

such as low cohesion, high coupling and crosscutting.

Our technique is based on the capture of execution

traces (paths) into a relational database in order to ex-

tract knowledge of anomalies in the system. We de-

veloped strategies to identify anomalies. In the case of

ambiguities in the presence of multiple candidate as-

pects, we deployed dynamic programming to identify

optimal solutions in order to group the strongest as-

pect candidates. We believe that our work can aid de-

velopers to ﬁnd potential anomalies in object-oriented

systems. Equally, the work can aid maintainers.

For future work, we intend to provide an imple-

mentation through a case study. A tracing mecha-

nism (perhaps deploying AOP) can be used to capture

execution traces into a relational database schema.

ICSOFT 2006 - INTERNATIONAL CONFERENCE ON SOFTWARE AND DATA TECHNOLOGIES

188

SQL queries executed over the relational database

schema should be used to calculating the parameters

of the various algorithms. We also plan to investigate

the use of multidimensional schema, data warehouse,

and data mining techniques to discover knowledge by

considering different dimensions.

REFERENCES

Baxter, I. D., Yahin, A., Moura, L., Sant’Anna, M., and

Bier, L. (1998). Clone detection using abstract syntax

trees. In International Conference on Software Main-

tenance.

Breu, S. and Krinke, J. (2004). Aspect mining using event

traces. In IEEE International Conference on Auto-

mated Software Engineering.

Bruntink, M. (2004). Aspect mining using clone class met-

rics. In WCRE Workshop on Aspect Reverse Engineer-

ing.

Chidamber, S. R. and Kemerer, C. F. (1991). Towards a

metrics suite for object-oriented design. In Object-

Oriented Programming: Systems, Languages and Ap-

plications.

Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C.

(2002). Introduction to Algorithms. The MIT Press.

Dijkstra, E. W. (1976). A Discipline of Programming. Pren-

tice Hall, Englewood Cliffs, NJ.

Elrad, T., Filman, R. E., and Bader, A. (2001). Aspect-

oriented programming. Communications of the ACM,

44(10):29–32.

Foumani, A. A. and Constantinides, C. (2005a). Aspect-

oriented reverse engineering. In World Multiconfer-

ence on Systemics, Cybernetics and Informatics.

Foumani, A. A. and Constantinides, C. (2005b). Reengi-

neering object-oriented designs by analyzing depen-

dency graphs and production rules. In IASTED In-

ternational Conference on Software Engineering and

Applications.

Gupta, N. and Rao, P. (2001). Program execution based

module cohesion measurement. In International Con-

ference on Automated Software Engineering.

Jr., S. M. S. (2002). Early-stage concern modeling. In

AOSD Workshop on Early Aspects.

Kiczales, G., Hilsdale, E., Hugunin, J., Kersten, M., Palm,

J., and Griswold, W. G. (2001). An overview of As-

pectJ. Lecture Notes in Computer Science, 2072:327–

355.

Kiczales, G., Lamping, J., Menhdhekar, A., Maeda, C.,

Lopes, C., Loingtier, J.-M., and Irwin, J. (1997).

Aspect-oriented programming. In Aks¸it, M. and Mat-

suoka, S., editors, Proceedings of the 11th Euro-

pean Conference on Object-Oriented Programming

(ECOOP), volume 1241, pages 220–242. Springer-

Verlag.

Krinke, J. and Breu, S. (2004). Control-ﬂow-graph-based

aspect mining. In WCRE Workshop on Aspect Reverse

Engineering.

Mitchell, A. and Power, J. F. (2003). Toward a deﬁnition of

run-time object-oriented metrics. In ECOOP Work-

shop on Quantitative Approaches in Object-Oriented

Software Engineering.

Moldovan, G. S. and Serban, G. (2006). Aspect mining

using a vector-space model based clustering approach.

In AOSD Workshop on Linking Aspect Technology and

Evolution Revisited.

Parnas, D. L. (1972). On the criteria to be used in decom-

posing systems into modules. Communications of the

ACM, 15:1053–1058.

Robillard, M. P. and Murphy, G. C. (2001). Analyzing

concerns using class member dependencies. In ICSE

Workshop on Advanced Separation of Concerns in

Software Engineering.

Robillard, M. P. and Murphy, G. C. (2002). Concern graphs:

Finding and describing concerns using structural pro-

gram dependencies. In International Conference on

Software Engineering.

MINING ANOMALIES IN OBJECT-ORIENTED IMPLEMENTATIONS THROUGH EXECUTION TRACES

189