FORMAL FRAMEWORK FOR SEMANTIC INTEROPERABILITY
Nadia Yaacoubi Ayadi, Mohamed Ben Ahmed
RIADI-ENSI
Campus Universitaire, 2010 La Manouba
Yann Pollet
Chaire d’intgration des systmes, CNAM
292, Rue Saint Martin
Keywords:
Semantic Interoperability, Logical Formalization, Mapping Approaches, Ontology.
Abstract:
Semantics of schema models is not explicit but always hidden in their structures and labels. To obtain semantic
interoperability we need to make their semantics explicit by taking into account both the interpretation of the
labels and the structures described by the arcs. We address in this paper the issue of semantic interoperability
between systems relying on semantically heterogeneous hierarchies, having been designed for the purpose
of independent specific goals and activities. Given a set of generalization hierarchies, our approach gives
much emphasis on semantics added-value by ”emerging” the intended informal meaning of concepts, we
rely on Wordnet lexical repository. In the first part of the paper, we provide a rigorous logical framework
for representing and automatically reasoning on generalization hierarchies except their formalism (UML, ER
diagram, etc). Then, we describe The SEM-INTEROP algorithm that consists on two main steps : semantic
interpretation and semantic comparaison.
1 INTRODUCTION
Knowledge sharing between heterogeneous sources
is a significant challenge, which has been the focus
of much research but remains an open problem. En-
abling the cooperation of heterogeneous information
systems is not easy to achieve because related knowl-
edge is disparate and described in different terms and
using different assumptions. Heterogeneity may arise
from syntactic, structural and semantic discrepancies
in information systems. Syntactic heterogeneity is
due to the use of diverse database models (object-
oriented vs relational), structural heterogeneity arises
from different conceptual choices during the concep-
tualization phase (modelling as a class, as a relation-
ship, or as an attribute), and semantic heterogene-
ity comes from differences between the terms used
to represent information and their intended meaning
(Kashyap and Sheth, 1996).
In this paper, we focus on semantic heterogeneity
and interoperability solutions that address this as-
pect of semantic heterogeneity. Of course, the pres-
ence of a variety of conceptual models is unavoidable
both because humans think differently and because
the applications of these models were designed for
different needs. Thus, the fundamental question in
any approach to interoperability of information sys-
tems is that of identifying concepts or a set of con-
cepts in different information systems that are seman-
tically related, and then resolving the schematic dif-
ferences among semantically related concepts (Sheth
and Kashyap, 1993). By schematic differences, we
may refer to different partial representations of a same
concept, different granularity-level description or a
perspective representation when it encodes a spatio-
temporal, logical, and cognitive point of view.
Two main categories of frameworks have been
proposed for the co-operative information systems :
federation of information systems (Sheth and Larson,
1990) and mediation (Wiederhold, 1992; Chawathe
et al., 1994) which relies on the definition of wrap-
pers and mediators. Mediation-based architectures
facilitate evolution through the addition of new
data sources. They support cooperation of large
information systems and thus are more suitable in
web environment. Federation-based architectures are
best suited for small-scale cooperation.
Irrespectively of the system architecture, a funda-
mental task in integration is the ability to recognize
an a-priori agreement on knowledge shared by com-
munities through describing mappings between them
and supporting access to the existing data instances.
139
Yaacoubi Ayadi N., Pollet Y. and Ben Ahmed M. (2006).
FORMAL FRAMEWORK FOR SEMANTIC INTEROPERABILITY.
In Proceedings of the First International Conference on Software and Data Technologies, pages 139-144
DOI: 10.5220/0001319301390144
Copyright
c
SciTePress
A large number of papers have investigated various
facets of mapping, such as mapping discovery,
mapping definition or mappings usage (for a survey
see (Rahm and Bernstein, 2001)).
In such a distributed setting, we believe that an
a-priori agreement on knowledge and knowledge
exchange is very hard to achieve. Indeed, if we try
to achieve integration or interoperation of large and
disparate information systems, the current standard
approach of creating large-scale shared knowledge
will hardly scale up to the size of the (semantic) Web,
and is also conceptually problematic because in our
opinion knowledge is never context-free (Yaacoubi
and BenAhmed, 2003; Stoœmer and Stecher, 2005),
and can thus never be perfectly shared.
In this work, our objective is to propose a com-
plete approach for the semantic integration of Gen-
eralization Hierarchies. We adapt previous results
on schema and ontology integration (ontology fu-
sion, ontology mapping, ontology alignement for a
survey, see (Wache et al., 2001)) to tackle different
kinds of heterogeneities one might encounter during
the interoperation of information systems. Indeed, we
think that the semantics of schema models is not ex-
plicit but is hidden in their structures and label’s con-
cepts. Given a set of generalization hierarchies, our
approach gives much emphasis on semantics added-
value by ”emerging” the intended informal meaning
of their concepts through mapping them to Wordnet
1
ontology, but also through interpreting their structural
position.
The aim of this paper is to describe an algorithm
to analyse the implicit knowledge in order to pro-
vide correct mappings between concepts. First, we
propose a logical formalization of class hierarchies.
Thus, we provide a rigorous logical framework for
representing and automatically reasoning on general-
ization hierarchies except their formalism (UML, ER
diagram, etc). The SEM-INTEROP algorithm per-
forms two main steps : semantic interpretation and
semantic comparaison.
Compared to other related works, our proposal falls
within the scope of approaches that aim at defining a
formalism or methodology to specify and use inter-
schema correspondences. We can assume that an ini-
tial set of inter-schema correspondences given by the
designer, however we don’t consider the subject of
query reformulation, which is out of the scope of this
paper. The proposal contributes to the area of research
on the following original topic :
A semantic interpretation approach combining lin-
guistic, structural and contextual knowledge is pro-
posed in order to be able compare semantically
1
Wordnet is available at http://wordnet.princeton.edu.
concept’s hierarchies,
We propose a mapping algebra that can be in-
tressent to realize schema transformations.
The paper is structured as follows : Section 2 presents
logical constructs for generalization hierarchies. In
section 3, we present our semantic-based approach for
interoperability, we describe the first version of the
SEM-INTEROP algorithm. Finally, Section 4 con-
cludes the paper and identifies future works.
2 BASICS OF THE APPROACH
Let us first clarify our terminology. In the litterature,
we identify four levels of abstractions. At the bottom
level we have actual data (or instances) organized
according to a variety of (semi) structured formats
(relational tables, XML documents, HTML files, sci-
entific data, and so on). At the second level we have
schemes, which describe the structure of instances (a
relational schema, a DTD, an XML schema or one of
its dialects, etc.). Then, we have different formalisms
for the description of schemes that we call models
(e.g. conceptual model like the ER model or UML
class diagram). Finally, we use the term metamodel
to mean a general formalism for the definition of
various models. Specifically, a metamodel is made
of a set of metaprimitives. Each metaprimitive
captures a class of constructs of different data
models that share a common characteristics or, more
precisely, that implement, possibly with different
names, the same basic abstraction principle (Torlone
and Atzeni, 2001). Examples of metaprimitives :
class, attribute, definition domain, relationship, gen-
eralization, disjoint union, key, foreign key, and so on.
Here, we introduce more specifically and formally
the terms of our problem. As conceptual model, we
opt for Generalization– its inverse: specialization
Hierarchies. We propose a logical formalism that al-
lows us to uniformly represent heterogeneous hierar-
chies.
Definition 1 (Generalization hierarchy) We define
a class hierarchy H as a triple hC, E,Φi:
C is a finite set of classes, C={c
i
}, each class c
i
is characterized by a name and a set of attributes,
c
i
=hn
c
i
, A(c
i
)i. Each attribute a
h
A(c
i
), with
h=1,...,n is defined as a pair, a
h
=h n
h
, d
h
i, where
n
h
is a name and d
h
is the domain associated with
a
h
, respectively.
E is a set of arcs on C, for instance, E is a set of
subsumption relationships (ISA relationships) be-
tween classes.
ICSOFT 2006 - INTERNATIONAL CONFERENCE ON SOFTWARE AND DATA TECHNOLOGIES
140
Φ is a logical interpretation of H which make ex-
plicit all the knowledge like attributes, values do-
mains or constraints embedded in H by means of a
consistent logical formulation. We assign to each
parent class an AND/OR logical formulae express-
ing constraints among instances of child classes.
Informally, one can use a generalization between two
classes to specify that each instance of subclass is also
an instance of the superclass. Hence, instances of the
subclass inherit the properties of the superclass, but
typically they satisfy additional properties that in gen-
eral do not hold for the superclass. Figure 1 shows
a generalization hierarchy example represented with
the Unified Modelling Language (UML) constructs
2
.
Figure 1: Example of an UML generalization hierarchy.
In our approach, a class C generalizing a class C
1
can be captured by means of the following logical as-
sertion :
ISA(C
1
,C) x, C
1
(x) C(x)
With regard to generalization hierarchy, semantic
constraints related to the intersection of the sibling
classes– that is, classes having a common superclass
–are often proposed, allowing the notions of disjoint
and completeness constraints to be introduced. In par-
ticular, a generalization is disjoint or overlapping de-
pending on whether the intersection of the siblings
classes is empty or not, respectively. These con-
straints may be captured by means of the following
logical assertions:
ISA-ASSERT(C,[Constraint])
Disjointness constraint among C
1
, C
2
,..., C
n
can
be expressed by the following predicate and assigned
to the superclass C :
i=1,...,n , C
1
XOR C
2
...XOR C
n
x, C
i
(x)
V
n
j∈{1..n}\i
¬ C
j
(x)
The complete constraint expressing that each instance
of C is at least one of C
1
, ...,C
n
is expressed by :
x, C(x)
W
n
j=1
C
j
(x)
Referring to figure 1, specific constraints hierarchy
can be captured by means of logical expressions:
2
see the last specification of UML on
http://www.uml.org
Example 2 ISA (Plane, JetPlane)
ISA (Plane, HelixPlane)
ISA (Plane, CivilPlane)
ISA (Plane, militaryPlane
ISA-ASSERT (Plane, {Jetplane XOR Helixplane}
S
{Civilplane XOR militarplane})
Example 3 Referring to figure 2, we can define the
following predicates considering C
13
as a subclass
of C
1
and C
3
(respectively to C
24
):
ISA(C
13
,C
1
)
ISA(C
13
,C
3
)
ISA-ASSERT(C
13
, C
13
C
1
V
C
3
)
Figure 2: Multiple level hierarchy.
Disjointness and complete constraints are in prac-
tice the mostly commonly used constraints in gener-
alization hierarchies. Finally, we may express addi-
tional constraints specifying for example restrictions
on domain values.
The logical formulation of generalization hierar-
chies allows us to go far beyond. However, this logi-
cal formulation must be consistent.
Consistency of generalization hierarchies. Gener-
alization hierarchies is consistent, if its classes can be
populated without violating any of the constraints. By
exploiting this logical formalization, the consistency
of the hierarchy can be checked by checking the satis-
fiability of the corresponding knowledge base (logical
assertions).
Class subsumption. A class C
1
is subsumed by a
class C
2
if, whenever the constraints imposed by the
generalization hierarchy are satisfied, the extension of
C
1
is a subset of the extension of C
2
. Such a sub-
sumption allows one to deduce that properties for C
1
hold also for C
2
.
Class equivalence. Two classes are equivalent if
they denote the same set of instances whenever the
constraints imposed by the generalization hierarchy
FORMAL FRAMEWORK FOR SEMANTIC INTEROPERABILITY
141
are satisfied. Determining equivalence of two classes
allows for their merging.
In the next section, we describe our interoperaliza-
tion approach that is based on logical formalization
and also on linguistic and contextual knowledge.
3 INTEROPERALISATION
APPROACH
We have seen in the previous section how a logical
formulation can be associated to a given hierarchy H
based on constraints expressed in conceptual models.
Indeed, any model has no meaning in isolation. Only
through a semantic space (e.g. domain ontology) are
its elements are linked to context, language, situa-
tion, actor, role, etc.
3
The semantic space represents
knowledge on a domain, while each model asserts a
single proposition related to a specific context.
Commonly with (Bouquet et al., 2004), we can iden-
tify at least three distinct levels of knowledge which
can used to elicit a schema’s semantics:
Lexical knowledge. knowledge about the meaning
of words used to label classes and attributes. In-
deed, word senses can be automatically generated
from a Lexical Knowledge Base (LKB). Wordnet
(Fellbaum, 1998) has been adopted in the current
work because it is the largest repository of word
senses and semantic relations currently available.
However, Wordnet could be replaced by another
combination of a linguistic resource and a domain
knowledge resource.
Structural knowledge. Knowledge deriving from
the arrangement of classes in the generalization hi-
erarchy. Instead, our analysis consider the implicit
information deriving from the structural relations
with other concepts of the hierarchy.
Domain knowledge. Knowledge describing the
logical structure of a specific domain, its concepts
and the relations between them. For instance,
Wordnet assigns a domain label (e.g., tourism, zo-
ology, sport, etc.) to most synsets.
4
In the current version of the algorithm, SEM-
INTEROP takes two generalization hierachies H
1
and
H
2
as input and returns mappings between their struc-
tures. The algorithm performs the following main
3
Adapted from Sowa, a conceptual graphhas no mean-
ing in isolation. Only through the semantic network are its
concepts and relations linked to context, language, emotion,
and perception”.
4
Wordnet 2.0 also provides domain labels. However,
we preferred the lable data set described in (Magnini and
Cavaglia, 2000)
steps: Semantic Interpretation and Semantic Compar-
ison.
3.1 Semantic Interpretation
In this phase, we make explicit the meaning of each
class based on a linguistic interpretation. Compared
with other approaches to schema matching such as
(Madhavan et al., 2001; Bergamaschi et al., 1999), we
do not limit ourselves to a linguistic analysis of labels.
Instead, we extend this analysis by considering the
implicit knowledge deriving from the context where
the class appears. Then, we interpret constraints like
Disjointness, Covering, negation in order to exhibit
new abstractions of classes.
Linguistic Interpretation. Let H be a generaliza-
tion hierarchy, and C are classes occurring in H. Each
class c
i
H are described by labels, which in turn are
composed by words and, possibly, separators between
them. We define the lexicon of a given hierarchy H
as L= {l
1
, l
2
,...,l
n
} be a valid set of labels belonging
to an hierarchy H. The process of interpretation as-
sociates the appropriate WordNet synset S
i
k
to each
label l
k
in L. So, the sense of L is defined as:
S(L)={ S
i
k
k S
i
k
Synset(l
k
), l
k
L }
where Synset(l
k
) is the set of senses provided by
WordNet for a label l
k
. For instance, S(Plane)=
{{Airplane#1}, {Sheet#2}, {stage#3}, {planing ma-
chine#4}, {Carpenter’s plane#5}}.
Contextualization. Contexts appear in many dis-
ciplines as meta-informations to characterize the spe-
cific situation of an entity, to describe a group of con-
ceptual entities, and to partition a knowledge base into
manageable sets or as a set of logical constructs to fa-
cilitate reasoning services (Dey and Abowd, 1999). In
the current work, we make use of the following meta-
level properties (Guarino, 1998): TYPE, for synsets
representing rigid properties e.g. a person, ROLE, for
synsets representing anti-rigid properties e.g. student,
and ATTRIBUTION, for synsets representing possible
values of attributes e.g. employee, as an attribute-
value for activity. These semantic constructs allow us
to express Contextualized Concepts considering their
structural and contextual features in terms of logical
assertions.
Example 4 An employee is a person who has a role
of a worker and has necessary a salary.
Employee(x) := Person(x) ( Role(x).worker)
( Attribution(x).Salary) (¬ Employer(x))
A student is a person who has a role of a learner
and is enrolled in one level.
ICSOFT 2006 - INTERNATIONAL CONFERENCE ON SOFTWARE AND DATA TECHNOLOGIES
142
Student(x) := Person(x) ( Role(x).learner) (
Attribution(x).level) ( EnrolledIn(x).level)
An Adult Citizen is a person who take an active role
and he is an adult person.
Adult Citizen(x) := Person(x) ( Role(x).Activity)
( Attribution(x).Adult) (¬ Attribut-
ion(x).Juvenile)
Implicit Constraints Interpretation. Implicit
structural constraints can lead to derive new classes.
For instance, covering constraint is interpreted as a
meet operation among classes (), the resulting class
represents the greatest common lower bound, possi-
bly equal to the least element in the hierarchy.
We may obtain a semi-lattice as illustrated in figure
3, considering:
ISA relationship as a partial order relation that is a
reflexive, antisymmetric and transitive relation,
Existence for each pair of classes a greatest com-
mon lower bound.
Figure 3: A semi-Lattice structure.
Conjunction between classes may be expressed in
a logical formulae, for instance : Worker-student :=
Employee Student
3.2 Semantic Comparison
Intuitively, the problem of semantic interoperabil-
ity arises when one needs to find relations between
classes belonging to distinct (and thus typically het-
erogeneous) hierarchies. Formally, we define the
problem of semantic interoperability as the problem
of discovering mappings between classes in two dis-
tinct hierarchies H and H
:
Definition 5 (Mapping) A mapping M from H = hC,
E,Φi to H
=hC
, E
,Φ
i is a function M: C × C
R, where R is the set the possible relations.
We may distinguish two forms of mappings : clas-
sical mapping and rule-based mapping. The first
form is widely used to express semantic relations be-
tween classes that are equivalence mapping, disjoint-
ness mapping.
Example 6 (Classical Mappings)
Voiture(x) Car(x)
Car(x) Voiture(x)
Male(x) ¬ Female(x)
A rule-based mapping can be used to represent com-
plex mappings such as generalization/specialization
mappings.
Example 7 (Rule-based Mapping)
CarOwner(x) Person(x) (Attribution(x).Car)
(Role(x).Owner)
Mapping Algebra. Unfortunately, a few number
of research works propose mathematical foundations
for the mapping problem. Mapping classes belonging
to different hierarchies is important but not sufficient.
Depending on these mappings, how we can restruc-
ture internal organisation of given hierarchies to ob-
tain the ”interoperation structure” that represent their
greatest common lower bound. For example, Con-
sidering hierarchies as a partially ordered sets, they
can be considered to be equivalent, if there exists a
bijective function between these sets which does also
preserve the order (i.e. which is monotonic). In this
case, being monotonic means that a function respects
the internal structure of partially ordered sets, while
bijectivity indicates the equivalence of two ordered
sets. Structure-preserving functions are a typical im-
plementation of what is called a morphism.
Two partially ordered sets H and H
are equivalent
or isomorph whenever there is a monotone function
f : H H
that has a monotone inverse, i.e. for
which there is a monotone function g : H H
with
g f = id
H
and f g = id
H
. We call a morphism an
isomorphism if it has a (necessarily unique) inverse
morphism.
For thus, We may develop a mapping algebra in-
cluding operators such as: S-join (Semantic Join), S-
meet (Semantic meet), S-Project (Semantic Projec-
tion).
4 CONCLUSION AND FUTURE
WORK
In this paper, we have provided a formal semantics
for generalization hierarchies and then used that for-
mal framework to explore a number of linguistic and
semantic issues crucial for interpreting the knowledge
FORMAL FRAMEWORK FOR SEMANTIC INTEROPERABILITY
143
implicitly represented in such hierarchies. The algo-
rithm we have proposed performs a linguistic inter-
pretation of the labels provided in the hierarchy, based
on the Wordnet Ontology. The process of interpret-
ing labels is extended with a contextualization process
which is a progressive construction of logical expres-
sions where predicates constructs are based on three
meta-properties : TYPE, ROLE and ATTRIBUTION.
Next, we perform a semantic comparison that consists
on discovering mappings between classes. Besides
classical mappings, we introduce rule-based map-
pings that express constrained complex mappings.
We think that mapping two hierarchies H and H
means, at least, finding an isomorphic sub-hierarchy
of H
equivalent to H. Therefore, in the future, we
plan to work on a mapping algebra that could include
operators such as S-join, S-meet and S-Project. De-
veloping such operators allow us to restructure hier-
archies given a set of mappings while preserving se-
mantics.
REFERENCES
Bergamaschi, S., Castano, S., and Vincini, M. (1999). Se-
mantic integration of semistructured and structured
data sources. SIGMOD Record, 28(1):54–59.
Bouquet, P., Ehrig, M., Euzenat, J., Franconi, E., Hitzler, P.,
and et al. (2004). D2.2.1 specification of a common
framework for characterizing alignment. Technical re-
port.
Chawathe, S., Garcia-Molina, H., Hammer, J., Ireland, K.,
Papakonstantinou, Y., Ullman, J. D., and Widom, J.
(1994). The TSIMMIS project: Integration of hetero-
geneous information sources. In 16th Meeting of the
Information Processing Society of Japan, pages 7–18,
Tokyo, Japan.
Dey, A. and Abowd, G. (1999). The context toolkit: Aid-
ing the development of contextaware applications. In
Dey, A.K. and G.D. Abowd. The Context Toolkit: Aid-
ing the Development of ContextAware Applications.
In Proceedings of Human Factors in Computing Sys-
tems: CHI 99. Pittsburgh, PA: ACM Press. pp. 434-
441, May 15-20 1999.
Fellbaum, C. (1998). Wordnet: An Electronic Lexical Data-
base. ed. MIT Press.
Guarino, N. (May 1998). Some ontological principles for
desigining upper level lexical ressources. In Proc. of
the First International Conference on Language Re-
sources and Evaluation, Granada, Spain.
Kashyap, V. and Sheth, A. P. (1996). Semantic and
schematic similarities between database objects: A
context-based approach. VLDB Journal: Very Large
Data Bases, 5(4):276–304.
Madhavan, J., Bernstein, P. A., and Rahm, E. (2001).
Generic schema matching with cupid. In The VLDB
Journal, pages 49–58.
Magnini, B. and Cavaglia, G. (2000). Integrating subject
field codes into wordnet. In Proceedings of Language
Resources and Evaluation (LREC 2000), pages 1413–
1418.
Rahm, E. and Bernstein, P. A. (2001). A survey of ap-
proaches to automatic schema matching. The VLDB
Journal, 10(4):334–350.
Sheth, A. P. and Kashyap, V. (1993). So far (schemati-
cally) yet so near (semantically). In Proceedings of
the IFIP WG 2.6 Database Semantics Conference on
Interoperable Database Systems (DS-5), pages 283–
312. North-Holland.
Sheth, A. P. and Larson, J. A. (1990). Federated data-
base systems for managing distributed, heteroge-
neous, and autonomous databases. ACM Comput.
Surv., 22(3):183–236.
Stoœmer, H. and Stecher, R. (2005). An approach for
context-based schema integration in virtual informa-
tion environments. In Doctoral Consortium in CON-
TEXT 05 - Fifth International and Interdisciplinary
Conference on Modeling and Using Context, Paris -
France.
Torlone, R. and Atzeni, P. (2001). A unified framework for
data translation over the web. In WISE (1), pages 350–
358.
Wache, H., V
¨
ogele, T., Visser, U., Stuckenschmidt, H.,
Schuster, G., Neumann, H., and H
¨
ubner, S. (2001).
Ontology-based integration of information a sur-
vey of existing approaches. In Stuckenschmidt, H.,
editor, IJCAI–01 Workshop: Ontologies and Informa-
tion Sharing, pages 108–117.
Wiederhold, G. (1992). Mediators in the architecture of fu-
ture information systems. Computer, 25(3):38–49.
Yaacoubi, N. and BenAhmed, M. (2003). Integrating smart
communities in knowledge portals. In IKE, pages
523–528.
ICSOFT 2006 - INTERNATIONAL CONFERENCE ON SOFTWARE AND DATA TECHNOLOGIES
144