EFFICIENT MANAGEMENT OF MULTI-VERSION XML
DOCUMENTS FOR E-GOVERNMENT APPLICATIONS
Federica Mandreoli and Riccardo Martoglia
Dip. di Ingegneria dell’Informazione, Universit
`
a di Modena e Reggio Emilia
Via Vignolese 905/b, I-41100, Modena, Italy
Fabio Grandi and Maria Rita Scalas
Dip. di Elettronica, Informatica e Sistemistica, Alma Mater Studiorum - Universit
`
a di Bologna
Viale Risorgimento 2, I-40136, Bologna, Italy
Keywords:
e-Government, XML, document retrieval, temporal database, semantic Web.
Abstract:
This paper describes our research activities in developing efficient systems for the management of multi-
version XML documents in an e-Government scenario. The application aim is to enable citizens to access
personalized versions of resources, like norm texts and information made available on the Web by public
administrations. In the first system developed, four temporal dimensions (publication, validity, efficacy and
transaction times) were used to represent the evolution of norms in time and their resulting versioning and a
stratum approach was used for its implementation on top of a relational DBMS. Recently, the multi-version
management system has migrated to a different architecture (“native” approach) based on a multi-version XML
query processor developed on purpose. Moreover, a new semantic dimension has been added to the versioning
mechanism, in order to represent applicability of norms to different classes of citizens according to their digital
identity. Classification of citizens is based on the management of an ontology with the deployment of semantic
Web techniques. Preliminary experiments showed an encouraging performance improvement with respect to
the stratum approach and a good scalability behaviour. Current work includes a more accurate modeling of the
citizen’s ontology, which could also require a redesign of the document storage scheme, and the development
of a complete infrastructure for the management of the citizen’s digital identity.
1 INTRODUCTION
In this paper we present our research activities
concerning the implementation of Web information
systems for e-Government applications (EC E-Gov,
2004; US E-Gov, 2004). More precisely, our work
makes use of temporal database and semantic Web
techniques to provide personalized access to multi-
version resources and services provided by the Pub-
lic Administration (PA). The offering of personalized
versions is aimed at improving and optimizing the in-
volvement of citizens in the e-Governance process. In
particular, we consider the selective access to norm
texts and documents made available on Web reposito-
ries in XML format (XML, 2004).
First of all, the fast dynamics involved in normative
systems implies the coexistence of multiple versions
of the norm texts stored in a repository, since laws are
continually subject to amendments and modifications.
This work has been supported by the MIUR-
PRIN Project: “The European citizen in e-Governance:
philosophical-juridical, legal, information and economic
profiles”.
In fact, it is crucial to reconstruct the consolidated
version of a norm as produced by the application of
all the modifications it underwent so far, that is the
form in which it currently belongs to the regulations
and must be enforced today. However, also past ver-
sions are still important, not only for historical rea-
sons: for example, if a Court has to pass judgment
today on some fact committed in the past, the version
of norms which must be applied to the case is the one
that was in force then.
In other words, temporal concerns are widespread
in the e-Government domain and a legal information
system should be able to retrieve or reconstruct on de-
mand any version of a given document to meet com-
mon application requirements. Moreover, another
kind of versioning plays an important role, because
some norms or some of their parts have or acquire
a limited applicability. For example, a given norm
(e.g. defining tax treatment) may contain some arti-
cles which are applicable to different classes of cit-
izens: one article is applicable to unemployed per-
sons, one article to self-employed persons, one article
to public servants only and so on. Hence, a citizen
409
Mandreoli F., Martoglia R., Grandi F. and Rita Scalas M. (2005).
EFFICIENT MANAGEMENT OF MULTI-VERSION XML DOCUMENTS FOR E-GOVERNMENT APPLICATIONS.
In Proceedings of the First International Conference on Web Information Systems and Technologies, pages 409-416
DOI: 10.5220/0001230504090416
Copyright
c
SciTePress
accessing the retrieval service may be interested in
finding a personalized version of the norm, that is a
version only containing articles which are applicable
to his/her personal case. Finally, notice that temporal
and limited applicability aspects, though orthogonal,
may also interplay in the production and management
of versions. For instance, a new norm might state a
modification to a preexisting norm, such as the modi-
fied norm becomes applicable to a limited category of
citizens only (e.g. retired persons), whereas the rest of
the citizens remain subject to the unmodified norm.
In this context, we defined data models for multi-
version XML documents and built prototype systems
for their efficient management in a Web-based e-
Government application scenario. In particular, in
this work we will describe and compare two manage-
ment systems, meeting different application require-
ments, that we recently developed using different ar-
chitectures and implementation techniques.
The first system is based on multi-dimensional tem-
poral versioning, where temporal aspects are cap-
tured by adding timestamping attributes to the XML
markup. The prototype was implemented using a
“stratum” approach on top of a commercial DBMS
and will be briefly described in Section 2 (a more de-
tailed description and evaluation has also been pub-
lished before as (Grandi et al., 2003a; Grandi et al.,
2003b; Grandi et al., 2005)).
The second system is the current outcome of an on-
going research, which is introduced in (Grandi et al.,
2004), and represents the original contribution of the
present work. The XML data model on which it
is based includes semantic annotations in the multi-
versioning mechanism, in order to capture limited ap-
plicability and to support personalized access, and
will be described in Section 3. The prototype is im-
plemented following a “native” approach, which will
be presented in Section 4 and is currently under eval-
uation.
Developments and extensions of the system which
are planned for the near future will be described in
Section 5. These include an improvement in the
ontological modeling of citizens to meet more ad-
vanced application requirements and a completion of
the technological infrastructure needed to make our
system fully operational in a real e-Government envi-
ronment.
Conclusions will finally be found in Section 6.
2 TEMPORAL VERSIONING IN
THE “STRATUM” APPROACH
In a first phase of the research we focused on temporal
aspects and on the effective and efficient management
of time-varying norm texts. To this purpose, we de-
veloped a temporal XML data model which uses four
time dimensions to correctly represent the evolution
of norms in time and their resulting versioning. The
considered dimensions are:
Publication time. It is the time of publication of the
norm on the Official Journal. It has the same se-
mantics as event time (and availability time, as the
two time dimensions, in such a context, coincide).
It is a global and unchangeable property for the
whole norm contents and, thus, it is not used as a
versioning dimension inside text.
Validity time. It is the time the norm is in force. It
has the same semantics of valid time as in temporal
databases (Jensen et al., 1998), since it represents
the time the norm actually belongs to the regula-
tions in the real world.
Efficacy time. It is the time the norm can be applied
to a concrete case. It usually corresponds to the va-
lidity of norms, but it can be the case that an abro-
gated norm continues to be applicable to a limited
number of cases. Until such cases cease to exist,
the norm continues its efficacy though no longer in
force.
Transaction time. It is the time (some part of) the
norm is stored in a computer system. It has the
same semantics of transaction time as in temporal
databases (Jensen et al., 1998).
The data model was defined via an XML Schema
(XMLSchema, 2004), where the structure of norms
is defined by means of a contents-section-article-
paragraph hierarchy and multiple content versions can
be defined at each level of the hierarchy. Each ver-
sion is characterized by timestamp attributes defining
its temporal pertinence with respect to each of the va-
lidity, efficacy and transaction time dimensions.
The model is also equipped with two basic opera-
tors for the management of norm modifications: one
is devoted to change the textual content of a norm por-
tion and the other allows modifications to the tempo-
ral pertinence of a given version. The former can be
used for deletion of (a part of) the norm (abrogation),
or the introduction of a new part of the norm (inte-
gration), or the replacement of (a part of) the norm
(substitution). The latter can be used to deal with the
time extension or the suspension of (part of) the norm.
FOR $a IN path
WHERE constraints on $a
RETURN const-tree(document($a),
temporal specs)
Figure 1: An XQuery-equivalent query executable on our
first system.
WEBIST 2005 - SOCIETY AND E-BUSINESS
410
Citizen
EmployeeUnemployed Retired
Self-employedSubordinate
PrivatePublic
(2,1) (3,6) (8,7)
(4,4) (7,5)
(5,2) (6,3)
<article num="1">
<ver num="1">
<aa applies_to="3"/>
[ Temporal attributes ]
<paragraph num="1">
<ver num="1"> [ Text ]
<aa applies_to="4"/>
[ Temporal attributes ]
</ver>
</paragraph>
<paragraph num="2">
<ver num="1"> [ Text ]
<aa applies_also="8"/>
[ Temporal attributes ]
</ver>
</paragraph>
</ver>
</article>
A fragment of an XML document supporting
personalized access
The sample ontology
(1,8)
Figure 2: An example of civic ontology, where each class has a name and is associated to a (pre,post) pair, and a fragment of
a XML norm containing applicability annotations.
Legal text repositories are usually managed by tra-
ditional information retrieval systems where users
are allowed to access their contents by means of
keyword-based queries expressing the subjects they
are interested in. We extended such a framework by
offering to users the possibility of expressing tempo-
ral specifications for the reconstruction of a consistent
version of the retrieved normative acts (consolidated
act). More precisely, the system is able to answer
queries having the XQuery (XQuery, 2004) form as
in Fig. 1.
The statement following the standard XQuery
FLWR syntax allows users to express selection
constraints on the variable $a iterating over the
nodes returned by the path expression path.
Search keywords can be specified by means of
the function contains in the WHERE clause (e.g.
contains($a,’sea’)). In the RETURN clause,
the operator const-tree is devoted to the recon-
struction of the temporally consistent versions of the
XML documents containing the selected nodes (con-
solidated norms). The temporal specs expres-
sion is the conjunction of temporal selection predi-
cates on the four supported temporal dimensions.
Our approach is the first to provide full search and
reconstruction functionalities with respect to all time
dimensions, whereas previous approaches only pro-
vided a limited support. For example, the tempo-
ral XML markup adopted in the Norma-System de-
scribed in (Palmirani and Brighi, 2002) includes pub-
lication, validity and efficacy time but reconstruction
of consolidated versions is made with respect to va-
lidity only (other time dimensions can be used as ad-
ditional search fields in full-text search).
Our temporal data model with the modification and
query operators was implemented in a prototype sys-
tem for the management and maintenance of a col-
lection of time-varying norms. The system is able
to store norms encoded as XML documents and ef-
ficiently access them by answering queries which can
involve both temporal constraints and search key-
words.
The system architecture is based on two different
components: the former consists of the XML doc-
ument management facilities offered by Oracle 9i
(Oracle, 2004) to handle structural and textual con-
straints, the latter is a software stratum that we built
on top of the former to handle the temporal aspects.
Extensive experimental results on the system behav-
iour show good performance and the ability to man-
age large collections of XML multi-version docu-
ments. A discussion of such architectural solution,
named the “stratum” approach, in comparison with
our new implementation solution, named the “native”
approach, is carried out in Section 4.
A detailed description of the stratum approach and
an account of its evaluation can be found in (Grandi
et al., 2003b; Grandi et al., 2005).
3 INTRODUCTION OF THE
SEMANTIC VERSIONING
In a second phase of the research, the multi-version
model based on temporal dimensions was extended
to include a semantic versioning dimension in order
to provide personalized access to norm texts. In gen-
eral, machine-understanding of the information avail-
able on the Semantic Web requires a semantic markup
of the contents and the availability of automated rea-
soning tools. In order to let information and its inter-
pretation be shared by several agents including auto-
matic tools, the introduction of common reference on-
tologies becomes necessary (Guarino, 1998; WebOnt,
EFFICIENT MANAGEMENT OF MULTI-VERSION XML DOCUMENTS FOR E-GOVERNMENT APPLICATIONS
411
FOR $a IN norm
WHERE textConstr ($a//paragraph//text(), ’health AND care’)
AND tempConstr (’vTime OVERLAPS PERIOD(’2002-01-01’,’2004-12-31’)’)
AND tempConstr (’eTime OVERLAPS PERIOD(’2002-01-01’,’2004-12-31’)’)
AND applConstr (’class
7’)
RETURN $a
Figure 3: An XQuery-equivalent query executable on our second system.
2004). In our case, we defined a civic ontology, which
corresponds to a classification of citizens based on the
distinctions introduced by successive norms (found-
ing acts) that imply some limitation, total or partial, in
their applicability. Hence, in our extended model, the
new versioning dimension encodes information about
the applicability of different parts of a norm text to the
relevant classes of the civic ontology.
Consider, for instance, Fig. 2. The left part of the
figure depicts a simple civic ontology built from a
small corpus of norms ruling the status of citizens
with respect to their work position. The right part
of the figure shows a fragment of a multi-version
XML norm text supporting personalized access with
respect to this ontology. Notice that, at this stage of
the project, we manage “tree-like” ontologies defined
as class taxonomies induced by the IS-A relationship.
This allows us to exploit the pre-order and post-order
properties of trees in order to enumerate the nodes
and check ancestor-descendant relationships between
the classes; such codes are displayed in the upper left
corner of the ontology classes in the Figure, in the
form: (pre-order,post-order). For instance, the class
“Employee” has pre-order “3” which is also its iden-
tifier, whereas its post order is “6”. The article in the
XML fragment on the right of Fig. 2 is composed of
two paragraphs and contains applicability annotations
(aa). Notice that applicability is inherited by descen-
dant nodes unless locally redefined. Hence, by means
of redefinitions we can also introduce, for each part
of a document, complex applicability properties in-
cluding extensions or restrictions with respect to an-
cestors. For instance, the whole article in the Fig-
ure is applicable to civic class “3” (applies
to) and
by default to all its descendants. However, its first
paragraph is applicable to class “4”, which is a re-
striction, whereas the second one is also applicable to
class “8” (applies
also), which is an extension. The
reconstruction of pertinent versions of the norm based
on its applicability annotations is very important in an
e-Government scenario. The representation of exten-
sions and restrictions gives rise to high expressiveness
and flexibility in such a context.
As to the queries that can be submitted by a user in
the new system, they can contain four types of con-
straints: temporal, structural, textual and applicabil-
ity. Such constraints are completely orthogonal and
allow him/her to perform very accurate searches in
the XML norm repository. Let us focus first on the
applicability constraint. Consider again the ontology
and norm fragment in Fig. 2: for John Smith, a “self-
employed” citizen (i.e. belonging to class “7”), the
sample article in the Figure will be selected as perti-
nent, but only the second paragraph will be actually
presented as applicable. Furthermore, the applicabil-
ity constraint can be combined with the other three
ones in order to fully support a multi-dimensional re-
trieval. For instance, John Smith could be interested
in all the norms ...
... which contain paragraphs (structural constraint)
dealing with health care (textual constraint), ...
... which were valid and in effect between 2002 and
2004 (temporal constraint), ...
... and which are applicable to his class (applica-
bility constraint).
Such a query can be issued to our system using
the standard XQuery FLWR syntax in Fig. 3, where
textConstr, tempConstr, and applConstr
are suitable functions allowing the specification of
the textual, temporal and applicability constraints, re-
spectively (the structural constraint is implicit in the
XPath expressions used in the XQuery query). No-
tice that the temporal constraints can involve all the
four available time dimensions (publication, validity,
efficacy and transaction), allowing high flexibility in
satisfying the information needs of citizens in the e-
Government scenario. In particular, by means of va-
lidity and efficacy time constraints, a user is able to
extract consolidated current versions from the multi-
version corpora, or to access past versions of partic-
ular norm texts, all consistently reconstructed by the
system on the basis of its needs and personalized on
the basis of his/her identity.
4 THE “NATIVE” APPROACH
All the multi-version and personalized-access XML
norm querying features have been implemented in our
second prototype system. The system architecture is
shown on the right part of Fig. 4 and is based on
WEBIST 2005 - SOCIETY AND E-BUSINESS
412
Current native approachPrevious stratum approach
User
XML
Repository
DBMS
Stratum
XML Engine
Temporal
Structural
+
Textual
Query
Constraints
User
XML
Repository
Temporal XML
Query Processor
Temporal,
Structural,
Textual,
Applicability
Query
Constraints
XML Docs
Ad-hoc
tuples
XML Docs
Figure 4: First (“stratum”) versus second (“native”) system architecture.
an “XML-native” approach, since it is composed of
a Temporal XML Query Processor designed on pur-
pose, which is able to manage the XML data repos-
itory and to provide all the temporal, structural, tex-
tual and applicability query facilities in a single com-
ponent. The prototype is implemented in Java JDK
1.5 and exploits ad-hoc data structures (relying on
embedded “light” DBMS libraries) and algorithms
which allows users to store and reconstruct on-the-
fly XML norm texts satisfying the four types of con-
straints. Differently from the stratum approach we
used in our previous prototype (see the left part of
Fig. 4), where temporal constraints were processed
separately, all the structural, textual and temporal con-
straints are simultaneously handled by the Temporal
XML Query Processor. Such a component stores the
XML norms not as entire documents but by convert-
ing them into a collection of ad-hoc temporal tuples,
representing each of its multi-version parts (i.e., para-
graphs, articles, and so on); these data structures are
then exploited to efficiently perform structural join
algorithms (Al-Khalifa et al., 2002) we specifically
devised and tuned for the temporal/semantic multi-
version context. Textual constraints, like in the stra-
tum approach, are handled by means of an inverted
index. Furthermore, the current architecture also pro-
vides support to personalized access by handling the
new applicability constraints as required by the refer-
ence e-Government application scenario. The benefits
of our native approach over the stratum one are man-
ifold:
by querying ad-hoc and temporally-enhanced
structures (which have a finer granularity than the
entire documents managed by standard XML en-
gines), the native approach is able to access and re-
trieve only the strictly necessary data;
only the parts which are required and which sat-
isfy the temporal constraints are used for the recon-
struction of the retrieved documents;
there is no need to retrieve whole XML docu-
ments and build space-consuming structures such
as DOM trees, as required in the stratum approach.
As a consequence, we expect the query processing
efficiency could greatly be enhanced and the mem-
ory requirements dramatically reduced. In order to
evaluate the effectiveness of the “native” approach,
we compared its performance with our previous “stra-
tum” implementation on a common query bench-
mark and also conducted a number of exploratory
experiments to analyse its behaviour in perform-
ing personalized access through applicability con-
straints. The experiments have been effected on a
Pentium 4 2.5Ghz Windows XP Professional work-
station, equipped with 512MB RAM and a RAID0
cluster of 2 80GB EIDE disks with NT file system
(NTFS). We performed the tests on three XML doc-
ument sets of increasing size: collection C1 (5,000
XML norm text documents), C2 (10,000 documents)
and C3 (20,000 documents). In this paper, due to
space requirements, we will present in detail the re-
sults obtained on the collection C1, then we will
briefly describe the scalability performance shown on
the other two collections. The total size of the collec-
tions is 120MB, 240MB, and 480MB, respectively. In
all collections the documents were synthetically gen-
erated by means of an ad-hoc XML generator we de-
veloped, which is able to produce different documents
compliant to our multi-version and personalized ac-
cess model. For each collection the average, mini-
mum and maximum document sizes are 24KB, 2KB
and 125KB, respectively.
EFFICIENT MANAGEMENT OF MULTI-VERSION XML DOCUMENTS FOR E-GOVERNMENT APPLICATIONS
413
Experiments were conducted by submitting queries
of ve different types (Q1-Q5). Table 1 presents the
features of the test queries and the query execution
time for both the “stratum” and the “native” archi-
tectures. All the queries require structural support
(St constraint); types Q1 and Q2 also involve tex-
tual searches by keywords (Tx constraint), with dif-
ferent selectivities; type Q3 contains temporal condi-
tions (Tm constraint) on three time dimensions: trans-
action, valid and publication time; types Q4 and Q5
mix the previous ones since they contain both key-
word searches and temporal conditions. For each of
those query types, we also present a personalized ac-
cess variant involving an additional applicability con-
straint (denoted as Qx-A in Table 1).
Let us first focus on the upper part of the table,
and in particular on the comparison of the perfor-
mances offered by the two approaches. The native
approach shows to be faster in every context, provid-
ing a shorter response time (including query analysis,
retrieval of the qualifying norm parts and reconstruc-
tion of the result) of approximately one or two sec-
onds for most of the queries. Notice that, while the
response time of the stratum approach is not too dif-
ferent for query types Q1, Q4, Q5, for the other query
types the performance gap is quite important (for in-
stance, query Q2 is answered approximately 15 times
slower in the stratum approach). The reason is that
the selectivity of the query predicates strongly influ-
ences the performance of the stratum approach, seri-
ously impairing its performance when large amounts
of documents containing some (typically small) rele-
vant portions have to be retrieved, as it happens for
queries Q2 and Q3. On the other hand, the native
approach is able to deliver a faster and more reliable
performance in all cases, since it practically avoids
the retrieval of useless document parts. Further, con-
sider that, for the same reasons, the main memory
requirements of the Temporal XML Query Processor
embedded in the native approach are, on average, 5%
or less of the stratum approach. This property is also
very promising towards future extensions to cope with
concurrent multi-user query processing.
The lower part of the table presents the perfor-
mance of our second system with respect to the
queries involving additional applicability constraints,
enabling personalized access. Thanks to the proper-
ties of the adopted pre-order and post-order encoding
of the civic classes, the system is able to solve person-
alization problems by means of simple comparisons
involving such encodings and, thus, with a very high
efficiency. The time needed to answer the personal-
ized access versions of the Q1–Q5 queries is approx-
imately 0.5-1% more than for the original versions.
Moreover, since the applicability annotations of each
part of an XML document are stored as simple inte-
gers, also the size of the applicability annotated tu-
ples, as stored in the system, is practically unchanged
(only a 3-4% storage space overhead is required with
respect to documents without semantic versioning),
even with quite complex annotations involving sev-
eral applicability extensions and restrictions.
Finally, we only report a comment about the perfor-
mance of our current prototype in querying the other
two collections C2 and C3 and, therefore, concern-
ing the the scalability of the system. We ran the same
queries of the previous tests on the larger collections
and saw that the computing time always grew sub-
linearly with the number of documents. For instance,
query Q1 executed on the 10,000 documents of col-
lection C2 (which is as double as C1) took 1,366 msec
(i.e. the system was only 30% slower); similarly, on
the 20,000 documents of collection C3, the average
response time was 1,741 msec (i.e. the system was
less than 30% slower than with C2). Also with the
other queries the measured trend was the same, thus
showing the good scalability of the system in every
type of query context.
5 FUTURE DEVELOPMENTS
Our current research work is devoted to the extensions
of the current framework and to the development of
a complete technological infrastructure to enable our
approach to be self-contained and usable in a large
Web-based e-Government scenario, as envisioned in
(Grandi et al., 2004).
The adoption of a tree-like civic ontology
that is based on a taxonomy induced by the IS-A
relationship– is sufficient to satisfy basic application
requirements as far as applicability constraints and
personalization services are concerned. However,
more advanced application requirements may include
a more sophisticated ontology definition. As a mat-
ter of fact, real-world norm corpora, if analyzed in
full detail, can lead to the formalization of complex
relationships between civic subclasses giving rise to
ontologies structured, in general, as a graph. Hence,
extensions to the framework are required in order to
overcome the limitations of dealing with a tree-like
civic ontology in our current approach: the XML stor-
age organization and the query processing algorithm
must be revisited, since the solutions adopted so far
rely both on the ontology and document tree structure
(e.g. decomposition in temporal tuples and exploita-
tion of pre- and post-order numbering).
On the other hand, the development of a com-
plete infrastructure is needed to make our approach
self-contained and fully operational in a real-world e-
Government environment. In fact, in addition to the
availability on the Web of the query answering sys-
tem and of the civic ontology, several other compo-
WEBIST 2005 - SOCIETY AND E-BUSINESS
414
Table 1: Features of the test queries and query execution time of the “stratum” and “native” approaches (time in milliseconds,
collection C1)
Query Constraints Selectivity Performance (msec)
Tm St Tx Ap Stratum Native
Q1 - X X - 0.6% 2891 1046
Q2 - X X - 4.02% 43240 2970
Q3 X X - - 2.9% 47638 6523
Q4 X X X - 0.68% 2151 1015
Q5 X X X - 1.46% 3130 2550
Q1-A - X X X 0.23% n/a 1095
Q2-A - X X X 1.65% n/a 3004
Q3-A X X - X 1.3% n/a 6760
Q4-A X X X X 0.31% n/a 1020
Q5-A X X X X 0.77% n/a 2602
nents are needed for a full operativeness of the whole
system, including administration and maintenance fa-
cilities.
First of all, the citizen’s digital identity is defined as
the total amount of information concerning him/her,
which is necessary for the sake of classification with
respect to the ontology. All such information must
be retrievable in an automatic way from the PA data-
bases. To this purpose, facilities for querying PA
databases must be provided and implemented through
standardized access services. In order to supply the
desired services, the digital identity is to be modelled
and represented within the system in a form such that
it can be translated into the same language used for
the ontology (e.g. a Description Logic (Baader et al.,
2002)). In this way, during the classification proce-
dure, the matching between the civic ontology classes
and the citizen’s digital identity can be reduced to a
standard reasoning task (e.g. ontology entailment for
the underlying Description Logic). The reconstruc-
tion and the classification operations will be encapsu-
lated into suitable Web services.
Moreover, each time a new founding act is en-
forced, the civic ontology needs to be updated and its
consistency re-checked. Actually, the ontology up-
date process cannot be fully automated, since it is a
delicate task which needs advice by human experts
and “official validation” of the outcomes. However,
computer tools and graphic environments (e.g. based
on the Prot
´
eg
´
e platform (Prot
´
eg
´
e, 2004)) could be
provided to assist the human experts to perform this
task. Moreover, the introduction of a new founding
act must also trigger the specification of a new Web
service aimed at retrieving from the network the in-
formation necessary to verify the position of a citizen
with respect to the distinguishing features newly in-
troduced by the founding act. For example, if the new
law states some benefit for former public servants re-
tired since 2001, the ontology must be enriched with
a new subclass corresponding to such a description,
the norm will be annotated with a reference to the
new subclass and, at the same time, a new Web ser-
vice must be specified in order to verify whether a
citizen belongs to the new subclass by querying the
database of the public body paying out pensions. The
specification of such services could be completely au-
tomated or, more likely, should be effected through
a semi-automated process involving a human expert
by means of an “intelligent” interactive editor, to be
used for the recording of the new laws in legal data-
bases. Once formally specified (and “officially” vali-
dated), such services will anyway allow a completely
automated verification, by effectively and efficiently
supplying the fragment of the citizen’s digital identity
which can be used for the desired high-level services.
For the specification of reconstruction, classifica-
tion and identification services, we intend to adopt
a standard declarative formalism (e.g. based on
XML/SOAP (SOAP, 2004), like WSDL, DAML-S,
BPEL4WS). The study of services and of the mecha-
nisms necessary to their semi-automatic specification
will be dealt with in future research.
6 CONCLUSIONS
In this paper we presented our research work con-
cerning the design and implementation of efficient
Web-based information systems for e-Government
applications. Recent activities include the develop-
ment of a platform (“stratum” approach) for tempo-
ral management of multi-version norm texts on top
of a commercial DBMS and the migration of such
a system towards a more efficient platform (“native”
approach) for which a specialized Temporal XML
Query Processor has been designed. The new system
also offers advanced functionalities, as it provides a
personalized access to resources on the basis of the
digital identity of citizens. While the first system em-
ploys temporal database techniques for the manage-
EFFICIENT MANAGEMENT OF MULTI-VERSION XML DOCUMENTS FOR E-GOVERNMENT APPLICATIONS
415
ment and maintenance of multi-version XML data,
the second system also employs Semantic Web tech-
niques, including the adoption of an ontology, for the
management of applicability constraints and person-
alized access.
Preliminary experimental work on query perfor-
mance, with repositories of syntectic XML docu-
ments, showed encouraging results. In particular, the
native approach proved to be very efficient in a large
set of experimental situations and showed excellent
scale-up figures with varying load configurations.
Future work will consider the improvement of the
approach to cope with more advanced application
requirements and the completion of the technologi-
cal infrastructure required with the implementation of
auxiliary services. Further work will also include the
assessment of our developed systems in a concrete
working environment, with real users and in the pres-
ence of a large repository of real legal documents.
REFERENCES
Al-Khalifa, S., Jagadish, H., Patel, J. M., Wu, Y., Koudas,
N., and Srivastava, D. (2002). Structural joins: A
primitive for efficient xml query pattern matching. In
Proc. of 18th International Conference on Data Engi-
neering (ICDE 2002), pages 141–154, San Jose, CA.
Baader, F., Horrocks, I., and Sattler, U. (2002). Description
logics for the semantic web. K
¨
unstliche Intelligenz,
16(4):57–59.
EC E-Gov (2004). European commission e-government
home page: http://europa.eu.int/information
society/
eeurope/2005/all
about/egovernment/index en.htm.
Grandi, F., Mandreoli, F., Scalas, M. R., and Tiberio, P.
(2004). Management of the citizen’s digital iden-
tity and access to multi-version norm texts on the
semantic web. In Proc. of the Intl’ Symposium on
Challenges in the Internet and Interdisciplinary (IPSI
2004), Pescara, Italy.
Grandi, F., Mandreoli, F., and Tiberio, P. (2005). Temporal
modelling and management of normative documents
in xml format. Data & Knowledge Engineering, 47
(in press).
Grandi, F., Mandreoli, F., Tiberio, P., and Bergonzini, M.
(2003b). A temporal data model and management
system for normative texts in xml format. In Proc.
of the 15th ACM Intl’ Workshop on Web Information
and Data Management (WIDM), pages 29–36, New
Orleans, LA.
Grandi, F., Mandreoli, F., Tiberio, P., and Bergonzini, M.
(2003a). A temporal data model and system architec-
ture for the management of normative texts. In Proc.
of the 11th Natlional Conf. on Advanced Database
Systems (SEBD), pages 169–178, Cetraro, Italy.
Guarino, N., editor (1998). Formal Ontology in Information
Systems. IOS Press, Amsterdam.
Jensen, C. S., Dyreson, C. E., and et al., E. (1998). The
Consensus Glossary of Temporal Database Concepts -
February 1998 Version. In Etzion, O., Jajodia, S., and
Sripada, S., editors, Temporal Databases — Research
and Practice, pages 367–405. Springer-Verlag. LNCS
No. 1399.
Oracle (2004). The Oracle 9i database home page. Oracle
corporation:
http://www.oracle.com/technology/products/oracle9i/.
Palmirani, M. and Brighi, R. (2002). Norma-system: A le-
gal document system for managing consolidated acts.
In Proc. of 13th Intl’ Conf. on Database and Expert
Systems Applications (DEXA), pages 310–320, Aix-
en-Provence, France.
Prot
´
eg
´
e (2004). The OWL plugin for Prot
´
eg
´
e home page:
Stanford University,
http://protege.stanford.edu/plugins/owl/.
SOAP (2004). The web services activity home page: W3C
Consortium, http://www.w3.org/2000/xp/Group/.
US E-Gov (2004). U.S. president’s e-government initia-
tives: http://www.whitehouse.gov/omb/egov/.
WebOnt (2004). The web ontology group home page, W3C
Consortium: http://www.w3.org/2001/sw/WebOnt/.
XML (2004). The extensible markup language home page,
W3C Consortium: http://www.w3c.org/XML/.
XMLSchema (2004). The xml schema home page, W3C
Consortium: http://www.w3c.org/XML/Schema/.
XQuery (2004). The xml query home page, W3C Consor-
tium: http://www.w3c.org/XML/Query.
WEBIST 2005 - SOCIETY AND E-BUSINESS
416