FORMAL FRAMEWORK FOR SEMANTIC INTEROPERABILITY

Nadia Yaacoubi Ayadi, Mohamed Ben Ahmed

RIADI-ENSI

Campus Universitaire, 2010 La Manouba

Yann Pollet

Chaire d’intgration des systmes, CNAM

292, Rue Saint Martin

Keywords:

Semantic Interoperability, Logical Formalization, Mapping Approaches, Ontology.

Abstract:

Semantics of schema models is not explicit but always hidden in their structures and labels. To obtain semantic

interoperability we need to make their semantics explicit by taking into account both the interpretation of the

labels and the structures described by the arcs. We address in this paper the issue of semantic interoperability

between systems relying on semantically heterogeneous hierarchies, having been designed for the purpose

of independent speciﬁc goals and activities. Given a set of generalization hierarchies, our approach gives

much emphasis on semantics added-value by ”emerging” the intended informal meaning of concepts, we

rely on Wordnet lexical repository. In the ﬁrst part of the paper, we provide a rigorous logical framework

for representing and automatically reasoning on generalization hierarchies except their formalism (UML, ER

diagram, etc). Then, we describe The SEM-INTEROP algorithm that consists on two main steps : semantic

interpretation and semantic comparaison.

1 INTRODUCTION

Knowledge sharing between heterogeneous sources

is a signiﬁcant challenge, which has been the focus

of much research but remains an open problem. En-

abling the cooperation of heterogeneous information

systems is not easy to achieve because related knowl-

edge is disparate and described in different terms and

using different assumptions. Heterogeneity may arise

from syntactic, structural and semantic discrepancies

in information systems. Syntactic heterogeneity is

due to the use of diverse database models (object-

oriented vs relational), structural heterogeneity arises

from different conceptual choices during the concep-

tualization phase (modelling as a class, as a relation-

ship, or as an attribute), and semantic heterogene-

ity comes from differences between the terms used

to represent information and their intended meaning

(Kashyap and Sheth, 1996).

In this paper, we focus on semantic heterogeneity

and interoperability solutions that address this as-

pect of semantic heterogeneity. Of course, the pres-

ence of a variety of conceptual models is unavoidable

both because humans think differently and because

the applications of these models were designed for

different needs. Thus, the fundamental question in

any approach to interoperability of information sys-

tems is that of identifying concepts or a set of con-

cepts in different information systems that are seman-

tically related, and then resolving the schematic dif-

ferences among semantically related concepts (Sheth

and Kashyap, 1993). By schematic differences, we

may refer to different partial representations of a same

concept, different granularity-level description or a

perspective representation when it encodes a spatio-

temporal, logical, and cognitive point of view.

Two main categories of frameworks have been

proposed for the co-operative information systems :

federation of information systems (Sheth and Larson,

1990) and mediation (Wiederhold, 1992; Chawathe

et al., 1994) which relies on the deﬁnition of wrap-

pers and mediators. Mediation-based architectures

facilitate evolution through the addition of new

data sources. They support cooperation of large

information systems and thus are more suitable in

web environment. Federation-based architectures are

best suited for small-scale cooperation.

Irrespectively of the system architecture, a funda-

mental task in integration is the ability to recognize

an a-priori agreement on knowledge shared by com-

munities through describing mappings between them

and supporting access to the existing data instances.

139

Yaacoubi Ayadi N., Pollet Y. and Ben Ahmed M. (2006).

FORMAL FRAMEWORK FOR SEMANTIC INTEROPERABILITY.

In Proceedings of the First International Conference on Software and Data Technologies, pages 139-144

DOI: 10.5220/0001319301390144

 SciTePress

A large number of papers have investigated various

facets of mapping, such as mapping discovery,

mapping deﬁnition or mappings usage (for a survey

see (Rahm and Bernstein, 2001)).

In such a distributed setting, we believe that an

a-priori agreement on knowledge and knowledge

exchange is very hard to achieve. Indeed, if we try

to achieve integration or interoperation of large and

disparate information systems, the current standard

approach of creating large-scale shared knowledge

will hardly scale up to the size of the (semantic) Web,

and is also conceptually problematic because in our

opinion knowledge is never context-free (Yaacoubi

and BenAhmed, 2003; Stoœmer and Stecher, 2005),

and can thus never be perfectly shared.

In this work, our objective is to propose a com-

plete approach for the semantic integration of Gen-

eralization Hierarchies. We adapt previous results

on schema and ontology integration (ontology fu-

sion, ontology mapping, ontology alignement for a

survey, see (Wache et al., 2001)) to tackle different

kinds of heterogeneities one might encounter during

the interoperation of information systems. Indeed, we

think that the semantics of schema models is not ex-

plicit but is hidden in their structures and label’s con-

cepts. Given a set of generalization hierarchies, our

approach gives much emphasis on semantics added-

value by ”emerging” the intended informal meaning

of their concepts through mapping them to Wordnet

ontology, but also through interpreting their structural

position.

The aim of this paper is to describe an algorithm

to analyse the implicit knowledge in order to pro-

vide correct mappings between concepts. First, we

propose a logical formalization of class hierarchies.

Thus, we provide a rigorous logical framework for

representing and automatically reasoning on general-

ization hierarchies except their formalism (UML, ER

diagram, etc). The SEM-INTEROP algorithm per-

forms two main steps : semantic interpretation and

semantic comparaison.

Compared to other related works, our proposal falls

within the scope of approaches that aim at deﬁning a

formalism or methodology to specify and use inter-

schema correspondences. We can assume that an ini-

tial set of inter-schema correspondences given by the

designer, however we don’t consider the subject of

query reformulation, which is out of the scope of this

paper. The proposal contributes to the area of research

on the following original topic :

• A semantic interpretation approach combining lin-

guistic, structural and contextual knowledge is pro-

posed in order to be able compare semantically

Wordnet is available at http://wordnet.princeton.edu.

concept’s hierarchies,

• We propose a mapping algebra that can be in-

tressent to realize schema transformations.

The paper is structured as follows : Section 2 presents

logical constructs for generalization hierarchies. In

section 3, we present our semantic-based approach for

interoperability, we describe the ﬁrst version of the

SEM-INTEROP algorithm. Finally, Section 4 con-

cludes the paper and identiﬁes future works.

2 BASICS OF THE APPROACH

Let us ﬁrst clarify our terminology. In the litterature,

we identify four levels of abstractions. At the bottom

level we have actual data (or instances) organized

according to a variety of (semi) structured formats

(relational tables, XML documents, HTML ﬁles, sci-

entiﬁc data, and so on). At the second level we have

schemes, which describe the structure of instances (a

relational schema, a DTD, an XML schema or one of

its dialects, etc.). Then, we have different formalisms

for the description of schemes that we call models

(e.g. conceptual model like the ER model or UML

class diagram). Finally, we use the term metamodel

to mean a general formalism for the deﬁnition of

various models. Speciﬁcally, a metamodel is made

of a set of metaprimitives. Each metaprimitive

captures a class of constructs of different data

models that share a common characteristics or, more

precisely, that implement, possibly with different

names, the same basic abstraction principle (Torlone

and Atzeni, 2001). Examples of metaprimitives :

class, attribute, deﬁnition domain, relationship, gen-

eralization, disjoint union, key, foreign key, and so on.

Here, we introduce more speciﬁcally and formally

the terms of our problem. As conceptual model, we

opt for Generalization– its inverse: specialization–

Hierarchies. We propose a logical formalism that al-

lows us to uniformly represent heterogeneous hierar-

chies.

Deﬁnition 1 (Generalization hierarchy) We deﬁne

a class hierarchy H as a triple hC, E,Φi:

• C is a ﬁnite set of classes, C={c

}, each class c

is characterized by a name and a set of attributes,

=hn

, A(c

)i. Each attribute a

∈ A(c

), with

h=1,...,n is deﬁned as a pair, a

=h n

, d

i, where

is a name and d

is the domain associated with

, respectively.

• E is a set of arcs on C, for instance, E is a set of

subsumption relationships (ISA relationships) be-

tween classes.

ICSOFT 2006 - INTERNATIONAL CONFERENCE ON SOFTWARE AND DATA TECHNOLOGIES

140

• Φ is a logical interpretation of H which make ex-

plicit all the knowledge like attributes, values do-

mains or constraints embedded in H by means of a

consistent logical formulation. We assign to each

parent class an AND/OR logical formulae express-

ing constraints among instances of child classes.

Informally, one can use a generalization between two

classes to specify that each instance of subclass is also

an instance of the superclass. Hence, instances of the

subclass inherit the properties of the superclass, but

typically they satisfy additional properties that in gen-

eral do not hold for the superclass. Figure 1 shows

a generalization hierarchy example represented with

the Uniﬁed Modelling Language (UML) constructs

Figure 1: Example of an UML generalization hierarchy.

In our approach, a class C generalizing a class C

can be captured by means of the following logical as-

sertion :

ISA(C

,C) ⇒ ∀ x, C

(x) ⊂ C(x)

With regard to generalization hierarchy, semantic

constraints related to the intersection of the sibling

classes– that is, classes having a common superclass

–are often proposed, allowing the notions of disjoint

and completeness constraints to be introduced. In par-

ticular, a generalization is disjoint or overlapping de-

pending on whether the intersection of the siblings

classes is empty or not, respectively. These con-

straints may be captured by means of the following

logical assertions:

ISA-ASSERT(C,[Constraint])

Disjointness constraint among C

, C

,..., C

can

be expressed by the following predicate and assigned

to the superclass C :

∀ i=1,...,n , C

XOR C

...XOR C

⇒ ∀ x, C

(x) ⊃

j∈{1..n}\i

¬ C

(x)

The complete constraint expressing that each instance

of C is at least one of C

, ...,C

is expressed by :

∀ x, C(x) ⊃

j=1

(x)

Referring to ﬁgure 1, speciﬁc constraints hierarchy

can be captured by means of logical expressions:

see the last speciﬁcation of UML on

http://www.uml.org

Example 2 ISA (Plane, JetPlane)

ISA (Plane, HelixPlane)

ISA (Plane, CivilPlane)

ISA (Plane, militaryPlane

ISA-ASSERT (Plane, {Jetplane XOR Helixplane}

{Civilplane XOR militarplane})

Example 3 Referring to ﬁgure 2, we can deﬁne the

following predicates considering C

as a subclass

of C

and C

(respectively to C

ISA(C

)

ISA(C

)

ISA-ASSERT(C

, C

⊂ C

)

Figure 2: Multiple level hierarchy.

Disjointness and complete constraints are in prac-

tice the mostly commonly used constraints in gener-

alization hierarchies. Finally, we may express addi-

tional constraints specifying for example restrictions

on domain values.

The logical formulation of generalization hierar-

chies allows us to go far beyond. However, this logi-

cal formulation must be consistent.

Consistency of generalization hierarchies. Gener-

alization hierarchies is consistent, if its classes can be

populated without violating any of the constraints. By

exploiting this logical formalization, the consistency

of the hierarchy can be checked by checking the satis-

ﬁability of the corresponding knowledge base (logical

assertions).

Class subsumption. A class C

is subsumed by a

class C

if, whenever the constraints imposed by the

generalization hierarchy are satisﬁed, the extension of

is a subset of the extension of C

. Such a sub-

sumption allows one to deduce that properties for C

hold also for C

Class equivalence. Two classes are equivalent if

they denote the same set of instances whenever the

constraints imposed by the generalization hierarchy

FORMAL FRAMEWORK FOR SEMANTIC INTEROPERABILITY

141

are satisﬁed. Determining equivalence of two classes

allows for their merging.

In the next section, we describe our interoperaliza-

tion approach that is based on logical formalization

and also on linguistic and contextual knowledge.

3 INTEROPERALISATION

APPROACH

We have seen in the previous section how a logical

formulation can be associated to a given hierarchy H

based on constraints expressed in conceptual models.

Indeed, any model has no meaning in isolation. Only

through a semantic space (e.g. domain ontology) are

its elements are linked to context, language, situa-

tion, actor, role, etc.

The semantic space represents

knowledge on a domain, while each model asserts a

single proposition related to a speciﬁc context.

Commonly with (Bouquet et al., 2004), we can iden-

tify at least three distinct levels of knowledge which

can used to elicit a schema’s semantics:

• Lexical knowledge. knowledge about the meaning

of words used to label classes and attributes. In-

deed, word senses can be automatically generated

from a Lexical Knowledge Base (LKB). Wordnet

(Fellbaum, 1998) has been adopted in the current

work because it is the largest repository of word

senses and semantic relations currently available.

However, Wordnet could be replaced by another

combination of a linguistic resource and a domain

knowledge resource.

• Structural knowledge. Knowledge deriving from

the arrangement of classes in the generalization hi-

erarchy. Instead, our analysis consider the implicit

information deriving from the structural relations

with other concepts of the hierarchy.

• Domain knowledge. Knowledge describing the

logical structure of a speciﬁc domain, its concepts

and the relations between them. For instance,

Wordnet assigns a domain label (e.g., tourism, zo-

ology, sport, etc.) to most synsets.

In the current version of the algorithm, SEM-

INTEROP takes two generalization hierachies H

and

as input and returns mappings between their struc-

tures. The algorithm performs the following main

Adapted from Sowa, ”a conceptual graphhas no mean-

ing in isolation. Only through the semantic network are its

concepts and relations linked to context, language, emotion,

and perception”.

Wordnet 2.0 also provides domain labels. However,

we preferred the lable data set described in (Magnini and

Cavaglia, 2000)

steps: Semantic Interpretation and Semantic Compar-

ison.

3.1 Semantic Interpretation

In this phase, we make explicit the meaning of each

class based on a linguistic interpretation. Compared

with other approaches to schema matching such as

(Madhavan et al., 2001; Bergamaschi et al., 1999), we

do not limit ourselves to a linguistic analysis of labels.

Instead, we extend this analysis by considering the

implicit knowledge deriving from the context where

the class appears. Then, we interpret constraints like

Disjointness, Covering, negation in order to exhibit

new abstractions of classes.

Linguistic Interpretation. Let H be a generaliza-

tion hierarchy, and C are classes occurring in H. Each

class c

∈ H are described by labels, which in turn are

composed by words and, possibly, separators between

them. We deﬁne the lexicon of a given hierarchy H

as L= {l

, l

,...,l

} be a valid set of labels belonging

to an hierarchy H. The process of interpretation as-

sociates the appropriate WordNet synset S

to each

label l

in L. So, the sense of L is deﬁned as:

S(L)={ S

k S

∈ Synset(l

), l

∈ L }

where Synset(l

) is the set of senses provided by

WordNet for a label l

. For instance, S(Plane)=

{{Airplane#1}, {Sheet#2}, {stage#3}, {planing ma-

chine#4}, {Carpenter’s plane#5}}.

Contextualization. Contexts appear in many dis-

ciplines as meta-informations to characterize the spe-

ciﬁc situation of an entity, to describe a group of con-

ceptual entities, and to partition a knowledge base into

manageable sets or as a set of logical constructs to fa-

cilitate reasoning services (Dey and Abowd, 1999). In

the current work, we make use of the following meta-

level properties (Guarino, 1998): TYPE, for synsets

representing rigid properties e.g. a person, ROLE, for

synsets representing anti-rigid properties e.g. student,

and ATTRIBUTION, for synsets representing possible

values of attributes e.g. employee, as an attribute-

value for activity. These semantic constructs allow us

to express Contextualized Concepts considering their

structural and contextual features in terms of logical

assertions.

Example 4 An employee is a person who has a role

of a worker and has necessary a salary.

Employee(x) := Person(x) ⊓ (∃ Role(x).worker) ⊓

(∃ Attribution(x).Salary) ⊓ (¬ Employer(x))

A student is a person who has a role of a learner

and is enrolled in one level.

ICSOFT 2006 - INTERNATIONAL CONFERENCE ON SOFTWARE AND DATA TECHNOLOGIES

142

Student(x) := Person(x) ⊓ (∃ Role(x).learner) ⊓ (∃

Attribution(x).level) ⊓ (∃ EnrolledIn(x).level)

An Adult Citizen is a person who take an active role

and he is an adult person.

Adult Citizen(x) := Person(x) ⊓ (∃ Role(x).Activity)

⊓ (∃ Attribution(x).Adult) ⊓ (¬ Attribut-

ion(x).Juvenile)

Implicit Constraints Interpretation. Implicit

structural constraints can lead to derive new classes.

For instance, covering constraint is interpreted as a

meet operation among classes (↓), the resulting class

represents the greatest common lower bound, possi-

bly equal to ⊥– the least element in the hierarchy.

We may obtain a semi-lattice as illustrated in ﬁgure

3, considering:

• ISA relationship as a partial order relation that is a

reﬂexive, antisymmetric and transitive relation,

• Existence for each pair of classes a greatest com-

mon lower bound.

Figure 3: A semi-Lattice structure.

Conjunction between classes may be expressed in

a logical formulae, for instance : Worker-student :=

Employee ⊓ Student

3.2 Semantic Comparison

Intuitively, the problem of semantic interoperabil-

ity arises when one needs to ﬁnd relations between

classes belonging to distinct (and thus typically het-

erogeneous) hierarchies. Formally, we deﬁne the

problem of semantic interoperability as the problem

of discovering mappings between classes in two dis-

tinct hierarchies H and H

′

Deﬁnition 5 (Mapping) A mapping M from H = hC,

E,Φi to H

′

=hC

′

, E

′

,Φ

′

i is a function M: C × C

′

−→

R, where R is the set the possible relations.

We may distinguish two forms of mappings : clas-

sical mapping and rule-based mapping. The ﬁrst

form is widely used to express semantic relations be-

tween classes that are equivalence mapping, disjoint-

ness mapping.

Example 6 (Classical Mappings)

Voiture(x) ⇒ Car(x)

Car(x) ⇒ Voiture(x)

Male(x) ⇒ ¬ Female(x)

A rule-based mapping can be used to represent com-

plex mappings such as generalization/specialization

mappings.

Example 7 (Rule-based Mapping)

CarOwner(x) ⇒ Person(x) ⊓ (Attribution(x).Car) ⊓

(Role(x).Owner)

Mapping Algebra. Unfortunately, a few number

of research works propose mathematical foundations

for the mapping problem. Mapping classes belonging

to different hierarchies is important but not sufﬁcient.

Depending on these mappings, how we can restruc-

ture internal organisation of given hierarchies to ob-

tain the ”interoperation structure” that represent their

greatest common lower bound. For example, Con-

sidering hierarchies as a partially ordered sets, they

can be considered to be equivalent, if there exists a

bijective function between these sets which does also

preserve the order (i.e. which is monotonic). In this

case, being monotonic means that a function respects

the internal structure of partially ordered sets, while

bijectivity indicates the equivalence of two ordered

sets. Structure-preserving functions are a typical im-

plementation of what is called a morphism.

Two partially ordered sets H and H

′

are equivalent

or isomorph whenever there is a monotone function

f : H → H

′

that has a monotone inverse, i.e. for

which there is a monotone function g : H → H

′

with

g ◦ f = id

′

and f ◦ g = id

. We call a morphism an

isomorphism if it has a (necessarily unique) inverse

morphism.

For thus, We may develop a mapping algebra in-

cluding operators such as: S-join (Semantic Join), S-

meet (Semantic meet), S-Project (Semantic Projec-

tion).

4 CONCLUSION AND FUTURE

WORK

In this paper, we have provided a formal semantics

for generalization hierarchies and then used that for-

mal framework to explore a number of linguistic and

semantic issues crucial for interpreting the knowledge

FORMAL FRAMEWORK FOR SEMANTIC INTEROPERABILITY

143

implicitly represented in such hierarchies. The algo-

rithm we have proposed performs a linguistic inter-

pretation of the labels provided in the hierarchy, based

on the Wordnet Ontology. The process of interpret-

ing labels is extended with a contextualization process

which is a progressive construction of logical expres-

sions where predicates constructs are based on three

meta-properties : TYPE, ROLE and ATTRIBUTION.

Next, we perform a semantic comparison that consists

on discovering mappings between classes. Besides

classical mappings, we introduce rule-based map-

pings that express constrained complex mappings.

We think that mapping two hierarchies H and H

′

means, at least, ﬁnding an isomorphic sub-hierarchy

of H

′

equivalent to H. Therefore, in the future, we

plan to work on a mapping algebra that could include

operators such as S-join, S-meet and S-Project. De-

veloping such operators allow us to restructure hier-

archies given a set of mappings while preserving se-

mantics.

REFERENCES

Bergamaschi, S., Castano, S., and Vincini, M. (1999). Se-

mantic integration of semistructured and structured

data sources. SIGMOD Record, 28(1):54–59.

Bouquet, P., Ehrig, M., Euzenat, J., Franconi, E., Hitzler, P.,

and et al. (2004). D2.2.1 speciﬁcation of a common

framework for characterizing alignment. Technical re-

port.

Chawathe, S., Garcia-Molina, H., Hammer, J., Ireland, K.,

Papakonstantinou, Y., Ullman, J. D., and Widom, J.

(1994). The TSIMMIS project: Integration of hetero-

geneous information sources. In 16th Meeting of the

Information Processing Society of Japan, pages 7–18,

Tokyo, Japan.

Dey, A. and Abowd, G. (1999). The context toolkit: Aid-

ing the development of contextaware applications. In

Dey, A.K. and G.D. Abowd. The Context Toolkit: Aid-

ing the Development of ContextAware Applications.

In Proceedings of Human Factors in Computing Sys-

tems: CHI 99. Pittsburgh, PA: ACM Press. pp. 434-

441, May 15-20 1999.

Fellbaum, C. (1998). Wordnet: An Electronic Lexical Data-

base. ed. MIT Press.

Guarino, N. (May 1998). Some ontological principles for

desigining upper level lexical ressources. In Proc. of

the First International Conference on Language Re-

sources and Evaluation, Granada, Spain.

Kashyap, V. and Sheth, A. P. (1996). Semantic and

schematic similarities between database objects: A

context-based approach. VLDB Journal: Very Large

Data Bases, 5(4):276–304.

Madhavan, J., Bernstein, P. A., and Rahm, E. (2001).

Generic schema matching with cupid. In The VLDB

Journal, pages 49–58.

Magnini, B. and Cavaglia, G. (2000). Integrating subject

ﬁeld codes into wordnet. In Proceedings of Language

Resources and Evaluation (LREC 2000), pages 1413–

1418.

Rahm, E. and Bernstein, P. A. (2001). A survey of ap-

proaches to automatic schema matching. The VLDB

Journal, 10(4):334–350.

Sheth, A. P. and Kashyap, V. (1993). So far (schemati-

cally) yet so near (semantically). In Proceedings of

the IFIP WG 2.6 Database Semantics Conference on

Interoperable Database Systems (DS-5), pages 283–

312. North-Holland.

Sheth, A. P. and Larson, J. A. (1990). Federated data-

base systems for managing distributed, heteroge-

neous, and autonomous databases. ACM Comput.

Surv., 22(3):183–236.

Stoœmer, H. and Stecher, R. (2005). An approach for

context-based schema integration in virtual informa-

tion environments. In Doctoral Consortium in CON-

TEXT 05 - Fifth International and Interdisciplinary

Conference on Modeling and Using Context, Paris -

France.

Torlone, R. and Atzeni, P. (2001). A uniﬁed framework for

data translation over the web. In WISE (1), pages 350–

358.

Wache, H., V

ogele, T., Visser, U., Stuckenschmidt, H.,

Schuster, G., Neumann, H., and H

ubner, S. (2001).

Ontology-based integration of information — a sur-

vey of existing approaches. In Stuckenschmidt, H.,

editor, IJCAI–01 Workshop: Ontologies and Informa-

tion Sharing, pages 108–117.

Wiederhold, G. (1992). Mediators in the architecture of fu-

ture information systems. Computer, 25(3):38–49.

Yaacoubi, N. and BenAhmed, M. (2003). Integrating smart

communities in knowledge portals. In IKE, pages

523–528.

ICSOFT 2006 - INTERNATIONAL CONFERENCE ON SOFTWARE AND DATA TECHNOLOGIES

144