EFFICIENT MANAGEMENT OF MULTI-VERSION XML

DOCUMENTS FOR E-GOVERNMENT APPLICATIONS

∗

Federica Mandreoli and Riccardo Martoglia

Dip. di Ingegneria dell’Informazione, Universit

a di Modena e Reggio Emilia

Via Vignolese 905/b, I-41100, Modena, Italy

Fabio Grandi and Maria Rita Scalas

Dip. di Elettronica, Informatica e Sistemistica, Alma Mater Studiorum - Universit

a di Bologna

Viale Risorgimento 2, I-40136, Bologna, Italy

Keywords:

e-Government, XML, document retrieval, temporal database, semantic Web.

Abstract:

This paper describes our research activities in developing efﬁcient systems for the management of multi-

version XML documents in an e-Government scenario. The application aim is to enable citizens to access

personalized versions of resources, like norm texts and information made available on the Web by public

administrations. In the ﬁrst system developed, four temporal dimensions (publication, validity, efﬁcacy and

transaction times) were used to represent the evolution of norms in time and their resulting versioning and a

stratum approach was used for its implementation on top of a relational DBMS. Recently, the multi-version

management system has migrated to a different architecture (“native” approach) based on a multi-version XML

query processor developed on purpose. Moreover, a new semantic dimension has been added to the versioning

mechanism, in order to represent applicability of norms to different classes of citizens according to their digital

identity. Classiﬁcation of citizens is based on the management of an ontology with the deployment of semantic

Web techniques. Preliminary experiments showed an encouraging performance improvement with respect to

the stratum approach and a good scalability behaviour. Current work includes a more accurate modeling of the

citizen’s ontology, which could also require a redesign of the document storage scheme, and the development

of a complete infrastructure for the management of the citizen’s digital identity.

1 INTRODUCTION

In this paper we present our research activities

concerning the implementation of Web information

systems for e-Government applications (EC E-Gov,

2004; US E-Gov, 2004). More precisely, our work

makes use of temporal database and semantic Web

techniques to provide personalized access to multi-

version resources and services provided by the Pub-

lic Administration (PA). The offering of personalized

versions is aimed at improving and optimizing the in-

volvement of citizens in the e-Governance process. In

particular, we consider the selective access to norm

texts and documents made available on Web reposito-

ries in XML format (XML, 2004).

First of all, the fast dynamics involved in normative

systems implies the coexistence of multiple versions

of the norm texts stored in a repository, since laws are

continually subject to amendments and modiﬁcations.

∗

This work has been supported by the MIUR-

PRIN Project: “The European citizen in e-Governance:

philosophical-juridical, legal, information and economic

proﬁles”.

In fact, it is crucial to reconstruct the consolidated

version of a norm as produced by the application of

all the modiﬁcations it underwent so far, that is the

form in which it currently belongs to the regulations

and must be enforced today. However, also past ver-

sions are still important, not only for historical rea-

sons: for example, if a Court has to pass judgment

today on some fact committed in the past, the version

of norms which must be applied to the case is the one

that was in force then.

In other words, temporal concerns are widespread

in the e-Government domain and a legal information

system should be able to retrieve or reconstruct on de-

mand any version of a given document to meet com-

mon application requirements. Moreover, another

kind of versioning plays an important role, because

some norms or some of their parts have or acquire

a limited applicability. For example, a given norm

(e.g. deﬁning tax treatment) may contain some arti-

cles which are applicable to different classes of cit-

izens: one article is applicable to unemployed per-

sons, one article to self-employed persons, one article

to public servants only and so on. Hence, a citizen

409

Mandreoli F., Martoglia R., Grandi F. and Rita Scalas M. (2005).

EFFICIENT MANAGEMENT OF MULTI-VERSION XML DOCUMENTS FOR E-GOVERNMENT APPLICATIONS.

In Proceedings of the First International Conference on Web Information Systems and Technologies, pages 409-416

DOI: 10.5220/0001230504090416

 SciTePress

accessing the retrieval service may be interested in

ﬁnding a personalized version of the norm, that is a

version only containing articles which are applicable

to his/her personal case. Finally, notice that temporal

and limited applicability aspects, though orthogonal,

may also interplay in the production and management

of versions. For instance, a new norm might state a

modiﬁcation to a preexisting norm, such as the modi-

ﬁed norm becomes applicable to a limited category of

citizens only (e.g. retired persons), whereas the rest of

the citizens remain subject to the unmodiﬁed norm.

In this context, we deﬁned data models for multi-

version XML documents and built prototype systems

for their efﬁcient management in a Web-based e-

Government application scenario. In particular, in

this work we will describe and compare two manage-

ment systems, meeting different application require-

ments, that we recently developed using different ar-

chitectures and implementation techniques.

The ﬁrst system is based on multi-dimensional tem-

poral versioning, where temporal aspects are cap-

tured by adding timestamping attributes to the XML

markup. The prototype was implemented using a

“stratum” approach on top of a commercial DBMS

and will be brieﬂy described in Section 2 (a more de-

tailed description and evaluation has also been pub-

lished before as (Grandi et al., 2003a; Grandi et al.,

2003b; Grandi et al., 2005)).

The second system is the current outcome of an on-

going research, which is introduced in (Grandi et al.,

2004), and represents the original contribution of the

present work. The XML data model on which it

is based includes semantic annotations in the multi-

versioning mechanism, in order to capture limited ap-

plicability and to support personalized access, and

will be described in Section 3. The prototype is im-

plemented following a “native” approach, which will

be presented in Section 4 and is currently under eval-

uation.

Developments and extensions of the system which

are planned for the near future will be described in

Section 5. These include an improvement in the

ontological modeling of citizens to meet more ad-

vanced application requirements and a completion of

the technological infrastructure needed to make our

system fully operational in a real e-Government envi-

ronment.

Conclusions will ﬁnally be found in Section 6.

2 TEMPORAL VERSIONING IN

THE “STRATUM” APPROACH

In a ﬁrst phase of the research we focused on temporal

aspects and on the effective and efﬁcient management

of time-varying norm texts. To this purpose, we de-

veloped a temporal XML data model which uses four

time dimensions to correctly represent the evolution

of norms in time and their resulting versioning. The

considered dimensions are:

Publication time. It is the time of publication of the

norm on the Ofﬁcial Journal. It has the same se-

mantics as event time (and availability time, as the

two time dimensions, in such a context, coincide).

It is a global and unchangeable property for the

whole norm contents and, thus, it is not used as a

versioning dimension inside text.

Validity time. It is the time the norm is in force. It

has the same semantics of valid time as in temporal

databases (Jensen et al., 1998), since it represents

the time the norm actually belongs to the regula-

tions in the real world.

Efﬁcacy time. It is the time the norm can be applied

to a concrete case. It usually corresponds to the va-

lidity of norms, but it can be the case that an abro-

gated norm continues to be applicable to a limited

number of cases. Until such cases cease to exist,

the norm continues its efﬁcacy though no longer in

force.

Transaction time. It is the time (some part of) the

norm is stored in a computer system. It has the

same semantics of transaction time as in temporal

databases (Jensen et al., 1998).

The data model was deﬁned via an XML Schema

(XMLSchema, 2004), where the structure of norms

is deﬁned by means of a contents-section-article-

paragraph hierarchy and multiple content versions can

be deﬁned at each level of the hierarchy. Each ver-

sion is characterized by timestamp attributes deﬁning

its temporal pertinence with respect to each of the va-

lidity, efﬁcacy and transaction time dimensions.

The model is also equipped with two basic opera-

tors for the management of norm modiﬁcations: one

is devoted to change the textual content of a norm por-

tion and the other allows modiﬁcations to the tempo-

ral pertinence of a given version. The former can be

used for deletion of (a part of) the norm (abrogation),

or the introduction of a new part of the norm (inte-

gration), or the replacement of (a part of) the norm

(substitution). The latter can be used to deal with the

time extension or the suspension of (part of) the norm.

FOR $a IN path

WHERE constraints on $a

RETURN const-tree(document($a),

temporal specs)

Figure 1: An XQuery-equivalent query executable on our

ﬁrst system.

WEBIST 2005 - SOCIETY AND E-BUSINESS

410

Citizen

EmployeeUnemployed Retired

Self-employedSubordinate

PrivatePublic

(2,1) (3,6) (8,7)

(4,4) (7,5)

(5,2) (6,3)

[… Temporal attributes … ]

<ver num="1"> [ … Text … ]

[… Temporal attributes … ]

</ver>

</paragraph>

<ver num="1"> [ … Text … ]

[… Temporal attributes … ]

</ver>

</paragraph>

</ver>

</article>

A fragment of an XML document supporting

personalized access

The sample ontology

(1,8)

Figure 2: An example of civic ontology, where each class has a name and is associated to a (pre,post) pair, and a fragment of

a XML norm containing applicability annotations.

Legal text repositories are usually managed by tra-

ditional information retrieval systems where users

are allowed to access their contents by means of

keyword-based queries expressing the subjects they

are interested in. We extended such a framework by

offering to users the possibility of expressing tempo-

ral speciﬁcations for the reconstruction of a consistent

version of the retrieved normative acts (consolidated

act). More precisely, the system is able to answer

queries having the XQuery (XQuery, 2004) form as

in Fig. 1.

The statement following the standard XQuery

FLWR syntax allows users to express selection

constraints on the variable $a iterating over the

nodes returned by the path expression path.

Search keywords can be speciﬁed by means of

the function contains in the WHERE clause (e.g.

contains($a,’sea’)). In the RETURN clause,

the operator const-tree is devoted to the recon-

struction of the temporally consistent versions of the

XML documents containing the selected nodes (con-

solidated norms). The temporal specs expres-

sion is the conjunction of temporal selection predi-

cates on the four supported temporal dimensions.

Our approach is the ﬁrst to provide full search and

reconstruction functionalities with respect to all time

dimensions, whereas previous approaches only pro-

vided a limited support. For example, the tempo-

ral XML markup adopted in the Norma-System de-

scribed in (Palmirani and Brighi, 2002) includes pub-

lication, validity and efﬁcacy time but reconstruction

of consolidated versions is made with respect to va-

lidity only (other time dimensions can be used as ad-

ditional search ﬁelds in full-text search).

Our temporal data model with the modiﬁcation and

query operators was implemented in a prototype sys-

tem for the management and maintenance of a col-

lection of time-varying norms. The system is able

to store norms encoded as XML documents and ef-

ﬁciently access them by answering queries which can

involve both temporal constraints and search key-

words.

The system architecture is based on two different

components: the former consists of the XML doc-

ument management facilities offered by Oracle 9i

(Oracle, 2004) to handle structural and textual con-

straints, the latter is a software stratum that we built

on top of the former to handle the temporal aspects.

Extensive experimental results on the system behav-

iour show good performance and the ability to man-

age large collections of XML multi-version docu-

ments. A discussion of such architectural solution,

named the “stratum” approach, in comparison with

our new implementation solution, named the “native”

approach, is carried out in Section 4.

A detailed description of the stratum approach and

an account of its evaluation can be found in (Grandi

et al., 2003b; Grandi et al., 2005).

3 INTRODUCTION OF THE

SEMANTIC VERSIONING

In a second phase of the research, the multi-version

model based on temporal dimensions was extended

to include a semantic versioning dimension in order

to provide personalized access to norm texts. In gen-

eral, machine-understanding of the information avail-

able on the Semantic Web requires a semantic markup

of the contents and the availability of automated rea-

soning tools. In order to let information and its inter-

pretation be shared by several agents including auto-

matic tools, the introduction of common reference on-

tologies becomes necessary (Guarino, 1998; WebOnt,

EFFICIENT MANAGEMENT OF MULTI-VERSION XML DOCUMENTS FOR E-GOVERNMENT APPLICATIONS

411

FOR $a IN norm

WHERE textConstr ($a//paragraph//text(), ’health AND care’)

AND tempConstr (’vTime OVERLAPS PERIOD(’2002-01-01’,’2004-12-31’)’)

AND tempConstr (’eTime OVERLAPS PERIOD(’2002-01-01’,’2004-12-31’)’)

AND applConstr (’class

7’)

RETURN $a

Figure 3: An XQuery-equivalent query executable on our second system.

2004). In our case, we deﬁned a civic ontology, which

corresponds to a classiﬁcation of citizens based on the

distinctions introduced by successive norms (found-

ing acts) that imply some limitation, total or partial, in

their applicability. Hence, in our extended model, the

new versioning dimension encodes information about

the applicability of different parts of a norm text to the

relevant classes of the civic ontology.

Consider, for instance, Fig. 2. The left part of the

ﬁgure depicts a simple civic ontology built from a

small corpus of norms ruling the status of citizens

with respect to their work position. The right part

of the ﬁgure shows a fragment of a multi-version

XML norm text supporting personalized access with

respect to this ontology. Notice that, at this stage of

the project, we manage “tree-like” ontologies deﬁned

as class taxonomies induced by the IS-A relationship.

This allows us to exploit the pre-order and post-order

properties of trees in order to enumerate the nodes

and check ancestor-descendant relationships between

the classes; such codes are displayed in the upper left

corner of the ontology classes in the Figure, in the

form: (pre-order,post-order). For instance, the class

“Employee” has pre-order “3” which is also its iden-

tiﬁer, whereas its post order is “6”. The article in the

XML fragment on the right of Fig. 2 is composed of

two paragraphs and contains applicability annotations

(aa). Notice that applicability is inherited by descen-

dant nodes unless locally redeﬁned. Hence, by means

of redeﬁnitions we can also introduce, for each part

of a document, complex applicability properties in-

cluding extensions or restrictions with respect to an-

cestors. For instance, the whole article in the Fig-

ure is applicable to civic class “3” (applies

to) and

by default to all its descendants. However, its ﬁrst

paragraph is applicable to class “4”, which is a re-

striction, whereas the second one is also applicable to

class “8” (applies

also), which is an extension. The

reconstruction of pertinent versions of the norm based

on its applicability annotations is very important in an

e-Government scenario. The representation of exten-

sions and restrictions gives rise to high expressiveness

and ﬂexibility in such a context.

As to the queries that can be submitted by a user in

the new system, they can contain four types of con-

straints: temporal, structural, textual and applicabil-

ity. Such constraints are completely orthogonal and

allow him/her to perform very accurate searches in

the XML norm repository. Let us focus ﬁrst on the

applicability constraint. Consider again the ontology

and norm fragment in Fig. 2: for John Smith, a “self-

employed” citizen (i.e. belonging to class “7”), the

sample article in the Figure will be selected as perti-

nent, but only the second paragraph will be actually

presented as applicable. Furthermore, the applicabil-

ity constraint can be combined with the other three

ones in order to fully support a multi-dimensional re-

trieval. For instance, John Smith could be interested

in all the norms ...

• ... which contain paragraphs (structural constraint)

dealing with health care (textual constraint), ...

• ... which were valid and in effect between 2002 and

2004 (temporal constraint), ...

• ... and which are applicable to his class (applica-

bility constraint).

Such a query can be issued to our system using

the standard XQuery FLWR syntax in Fig. 3, where

textConstr, tempConstr, and applConstr

are suitable functions allowing the speciﬁcation of

the textual, temporal and applicability constraints, re-

spectively (the structural constraint is implicit in the

XPath expressions used in the XQuery query). No-

tice that the temporal constraints can involve all the

four available time dimensions (publication, validity,

efﬁcacy and transaction), allowing high ﬂexibility in

satisfying the information needs of citizens in the e-

Government scenario. In particular, by means of va-

lidity and efﬁcacy time constraints, a user is able to

extract consolidated current versions from the multi-

version corpora, or to access past versions of partic-

ular norm texts, all consistently reconstructed by the

system on the basis of its needs and personalized on

the basis of his/her identity.

4 THE “NATIVE” APPROACH

All the multi-version and personalized-access XML

norm querying features have been implemented in our

second prototype system. The system architecture is

shown on the right part of Fig. 4 and is based on

WEBIST 2005 - SOCIETY AND E-BUSINESS

412

Current “native” approachPrevious “stratum” approach

User

XML

Repository

DBMS

Stratum

XML Engine

Temporal

Structural

Textual

Query

Constraints

User

XML

Repository

Temporal XML

Query Processor

Temporal,

Structural,

Textual,

Applicability

Query

Constraints

XML Docs

Ad-hoc

tuples

XML Docs

Figure 4: First (“stratum”) versus second (“native”) system architecture.

an “XML-native” approach, since it is composed of

a Temporal XML Query Processor designed on pur-

pose, which is able to manage the XML data repos-

itory and to provide all the temporal, structural, tex-

tual and applicability query facilities in a single com-

ponent. The prototype is implemented in Java JDK

1.5 and exploits ad-hoc data structures (relying on

embedded “light” DBMS libraries) and algorithms

which allows users to store and reconstruct on-the-

ﬂy XML norm texts satisfying the four types of con-

straints. Differently from the stratum approach we

used in our previous prototype (see the left part of

Fig. 4), where temporal constraints were processed

separately, all the structural, textual and temporal con-

straints are simultaneously handled by the Temporal

XML Query Processor. Such a component stores the

XML norms not as entire documents but by convert-

ing them into a collection of ad-hoc temporal tuples,

representing each of its multi-version parts (i.e., para-

graphs, articles, and so on); these data structures are

then exploited to efﬁciently perform structural join

algorithms (Al-Khalifa et al., 2002) we speciﬁcally

devised and tuned for the temporal/semantic multi-

version context. Textual constraints, like in the stra-

tum approach, are handled by means of an inverted

index. Furthermore, the current architecture also pro-

vides support to personalized access by handling the

new applicability constraints as required by the refer-

ence e-Government application scenario. The beneﬁts

of our native approach over the stratum one are man-

ifold:

• by querying ad-hoc and temporally-enhanced

structures (which have a ﬁner granularity than the

entire documents managed by standard XML en-

gines), the native approach is able to access and re-

trieve only the strictly necessary data;

• only the parts which are required and which sat-

isfy the temporal constraints are used for the recon-

struction of the retrieved documents;

• there is no need to retrieve whole XML docu-

ments and build space-consuming structures such

as DOM trees, as required in the stratum approach.

As a consequence, we expect the query processing

efﬁciency could greatly be enhanced and the mem-

ory requirements dramatically reduced. In order to

evaluate the effectiveness of the “native” approach,

we compared its performance with our previous “stra-

tum” implementation on a common query bench-

mark and also conducted a number of exploratory

experiments to analyse its behaviour in perform-

ing personalized access through applicability con-

straints. The experiments have been effected on a

Pentium 4 2.5Ghz Windows XP Professional work-

station, equipped with 512MB RAM and a RAID0

cluster of 2 80GB EIDE disks with NT ﬁle system

(NTFS). We performed the tests on three XML doc-

ument sets of increasing size: collection C1 (5,000

XML norm text documents), C2 (10,000 documents)

and C3 (20,000 documents). In this paper, due to

space requirements, we will present in detail the re-

sults obtained on the collection C1, then we will

brieﬂy describe the scalability performance shown on

the other two collections. The total size of the collec-

tions is 120MB, 240MB, and 480MB, respectively. In

all collections the documents were synthetically gen-

erated by means of an ad-hoc XML generator we de-

veloped, which is able to produce different documents

compliant to our multi-version and personalized ac-

cess model. For each collection the average, mini-

mum and maximum document sizes are 24KB, 2KB

and 125KB, respectively.

EFFICIENT MANAGEMENT OF MULTI-VERSION XML DOCUMENTS FOR E-GOVERNMENT APPLICATIONS

413

Experiments were conducted by submitting queries

of ﬁve different types (Q1-Q5). Table 1 presents the

features of the test queries and the query execution

time for both the “stratum” and the “native” archi-

tectures. All the queries require structural support

(St constraint); types Q1 and Q2 also involve tex-

tual searches by keywords (Tx constraint), with dif-

ferent selectivities; type Q3 contains temporal condi-

tions (Tm constraint) on three time dimensions: trans-

action, valid and publication time; types Q4 and Q5

mix the previous ones since they contain both key-

word searches and temporal conditions. For each of

those query types, we also present a personalized ac-

cess variant involving an additional applicability con-

straint (denoted as Qx-A in Table 1).

Let us ﬁrst focus on the upper part of the table,

and in particular on the comparison of the perfor-

mances offered by the two approaches. The native

approach shows to be faster in every context, provid-

ing a shorter response time (including query analysis,

retrieval of the qualifying norm parts and reconstruc-

tion of the result) of approximately one or two sec-

onds for most of the queries. Notice that, while the

response time of the stratum approach is not too dif-

ferent for query types Q1, Q4, Q5, for the other query

types the performance gap is quite important (for in-

stance, query Q2 is answered approximately 15 times

slower in the stratum approach). The reason is that

the selectivity of the query predicates strongly inﬂu-

ences the performance of the stratum approach, seri-

ously impairing its performance when large amounts

of documents containing some (typically small) rele-

vant portions have to be retrieved, as it happens for

queries Q2 and Q3. On the other hand, the native

approach is able to deliver a faster and more reliable

performance in all cases, since it practically avoids

the retrieval of useless document parts. Further, con-

sider that, for the same reasons, the main memory

requirements of the Temporal XML Query Processor

embedded in the native approach are, on average, 5%

or less of the stratum approach. This property is also

very promising towards future extensions to cope with

concurrent multi-user query processing.

The lower part of the table presents the perfor-

mance of our second system with respect to the

queries involving additional applicability constraints,

enabling personalized access. Thanks to the proper-

ties of the adopted pre-order and post-order encoding

of the civic classes, the system is able to solve person-

alization problems by means of simple comparisons

involving such encodings and, thus, with a very high

efﬁciency. The time needed to answer the personal-

ized access versions of the Q1–Q5 queries is approx-

imately 0.5-1% more than for the original versions.

Moreover, since the applicability annotations of each

part of an XML document are stored as simple inte-

gers, also the size of the applicability annotated tu-

ples, as stored in the system, is practically unchanged

(only a 3-4% storage space overhead is required with

respect to documents without semantic versioning),

even with quite complex annotations involving sev-

eral applicability extensions and restrictions.

Finally, we only report a comment about the perfor-

mance of our current prototype in querying the other

two collections C2 and C3 and, therefore, concern-

ing the the scalability of the system. We ran the same

queries of the previous tests on the larger collections

and saw that the computing time always grew sub-

linearly with the number of documents. For instance,

query Q1 executed on the 10,000 documents of col-

lection C2 (which is as double as C1) took 1,366 msec

(i.e. the system was only 30% slower); similarly, on

the 20,000 documents of collection C3, the average

response time was 1,741 msec (i.e. the system was

less than 30% slower than with C2). Also with the

other queries the measured trend was the same, thus

showing the good scalability of the system in every

type of query context.

5 FUTURE DEVELOPMENTS

Our current research work is devoted to the extensions

of the current framework and to the development of

a complete technological infrastructure to enable our

approach to be self-contained and usable in a large

Web-based e-Government scenario, as envisioned in

(Grandi et al., 2004).

The adoption of a tree-like civic ontology –

that is based on a taxonomy induced by the IS-A

relationship– is sufﬁcient to satisfy basic application

requirements as far as applicability constraints and

personalization services are concerned. However,

more advanced application requirements may include

a more sophisticated ontology deﬁnition. As a mat-

ter of fact, real-world norm corpora, if analyzed in

full detail, can lead to the formalization of complex

relationships between civic subclasses giving rise to

ontologies structured, in general, as a graph. Hence,

extensions to the framework are required in order to

overcome the limitations of dealing with a tree-like

civic ontology in our current approach: the XML stor-

age organization and the query processing algorithm

must be revisited, since the solutions adopted so far

rely both on the ontology and document tree structure

(e.g. decomposition in temporal tuples and exploita-

tion of pre- and post-order numbering).

On the other hand, the development of a com-

plete infrastructure is needed to make our approach

self-contained and fully operational in a real-world e-

Government environment. In fact, in addition to the

availability on the Web of the query answering sys-

tem and of the civic ontology, several other compo-

WEBIST 2005 - SOCIETY AND E-BUSINESS

414

Table 1: Features of the test queries and query execution time of the “stratum” and “native” approaches (time in milliseconds,

collection C1)

Query Constraints Selectivity Performance (msec)

Tm St Tx Ap Stratum Native

Q1 - X X - 0.6% 2891 1046

Q2 - X X - 4.02% 43240 2970

Q3 X X - - 2.9% 47638 6523

Q4 X X X - 0.68% 2151 1015

Q5 X X X - 1.46% 3130 2550

Q1-A - X X X 0.23% n/a 1095

Q2-A - X X X 1.65% n/a 3004

Q3-A X X - X 1.3% n/a 6760

Q4-A X X X X 0.31% n/a 1020

Q5-A X X X X 0.77% n/a 2602

nents are needed for a full operativeness of the whole

system, including administration and maintenance fa-

cilities.

First of all, the citizen’s digital identity is deﬁned as

the total amount of information concerning him/her,

which is necessary for the sake of classiﬁcation with

respect to the ontology. All such information must

be retrievable in an automatic way from the PA data-

bases. To this purpose, facilities for querying PA

databases must be provided and implemented through

standardized access services. In order to supply the

desired services, the digital identity is to be modelled

and represented within the system in a form such that

it can be translated into the same language used for

the ontology (e.g. a Description Logic (Baader et al.,

2002)). In this way, during the classiﬁcation proce-

dure, the matching between the civic ontology classes

and the citizen’s digital identity can be reduced to a

standard reasoning task (e.g. ontology entailment for

the underlying Description Logic). The reconstruc-

tion and the classiﬁcation operations will be encapsu-

lated into suitable Web services.

Moreover, each time a new founding act is en-

forced, the civic ontology needs to be updated and its

consistency re-checked. Actually, the ontology up-

date process cannot be fully automated, since it is a

delicate task which needs advice by human experts

and “ofﬁcial validation” of the outcomes. However,

computer tools and graphic environments (e.g. based

on the Prot

e platform (Prot

e, 2004)) could be

provided to assist the human experts to perform this

task. Moreover, the introduction of a new founding

act must also trigger the speciﬁcation of a new Web

service aimed at retrieving from the network the in-

formation necessary to verify the position of a citizen

with respect to the distinguishing features newly in-

troduced by the founding act. For example, if the new

law states some beneﬁt for former public servants re-

tired since 2001, the ontology must be enriched with

a new subclass corresponding to such a description,

the norm will be annotated with a reference to the

new subclass and, at the same time, a new Web ser-

vice must be speciﬁed in order to verify whether a

citizen belongs to the new subclass by querying the

database of the public body paying out pensions. The

speciﬁcation of such services could be completely au-

tomated or, more likely, should be effected through

a semi-automated process involving a human expert

by means of an “intelligent” interactive editor, to be

used for the recording of the new laws in legal data-

bases. Once formally speciﬁed (and “ofﬁcially” vali-

dated), such services will anyway allow a completely

automated veriﬁcation, by effectively and efﬁciently

supplying the fragment of the citizen’s digital identity

which can be used for the desired high-level services.

For the speciﬁcation of reconstruction, classiﬁca-

tion and identiﬁcation services, we intend to adopt

a standard declarative formalism (e.g. based on

XML/SOAP (SOAP, 2004), like WSDL, DAML-S,

BPEL4WS). The study of services and of the mecha-

nisms necessary to their semi-automatic speciﬁcation

will be dealt with in future research.

6 CONCLUSIONS

In this paper we presented our research work con-

cerning the design and implementation of efﬁcient

Web-based information systems for e-Government

applications. Recent activities include the develop-

ment of a platform (“stratum” approach) for tempo-

ral management of multi-version norm texts on top

of a commercial DBMS and the migration of such

a system towards a more efﬁcient platform (“native”

approach) for which a specialized Temporal XML

Query Processor has been designed. The new system

also offers advanced functionalities, as it provides a

personalized access to resources on the basis of the

digital identity of citizens. While the ﬁrst system em-

ploys temporal database techniques for the manage-

EFFICIENT MANAGEMENT OF MULTI-VERSION XML DOCUMENTS FOR E-GOVERNMENT APPLICATIONS

415

ment and maintenance of multi-version XML data,

the second system also employs Semantic Web tech-

niques, including the adoption of an ontology, for the

management of applicability constraints and person-

alized access.

Preliminary experimental work on query perfor-

mance, with repositories of syntectic XML docu-

ments, showed encouraging results. In particular, the

native approach proved to be very efﬁcient in a large

set of experimental situations and showed excellent

scale-up ﬁgures with varying load conﬁgurations.

Future work will consider the improvement of the

approach to cope with more advanced application

requirements and the completion of the technologi-

cal infrastructure required with the implementation of

auxiliary services. Further work will also include the

assessment of our developed systems in a concrete

working environment, with real users and in the pres-

ence of a large repository of real legal documents.

REFERENCES

Al-Khalifa, S., Jagadish, H., Patel, J. M., Wu, Y., Koudas,

N., and Srivastava, D. (2002). Structural joins: A

primitive for efﬁcient xml query pattern matching. In

Proc. of 18th International Conference on Data Engi-

neering (ICDE 2002), pages 141–154, San Jose, CA.

Baader, F., Horrocks, I., and Sattler, U. (2002). Description

logics for the semantic web. K

unstliche Intelligenz,

16(4):57–59.

EC E-Gov (2004). European commission e-government

home page: http://europa.eu.int/information

society/

eeurope/2005/all

about/egovernment/index en.htm.

Grandi, F., Mandreoli, F., Scalas, M. R., and Tiberio, P.

(2004). Management of the citizen’s digital iden-

tity and access to multi-version norm texts on the

semantic web. In Proc. of the Intl’ Symposium on

Challenges in the Internet and Interdisciplinary (IPSI

2004), Pescara, Italy.

Grandi, F., Mandreoli, F., and Tiberio, P. (2005). Temporal

modelling and management of normative documents

in xml format. Data & Knowledge Engineering, 47

(in press).

Grandi, F., Mandreoli, F., Tiberio, P., and Bergonzini, M.

(2003b). A temporal data model and management

system for normative texts in xml format. In Proc.

of the 15th ACM Intl’ Workshop on Web Information

and Data Management (WIDM), pages 29–36, New

Orleans, LA.

Grandi, F., Mandreoli, F., Tiberio, P., and Bergonzini, M.

(2003a). A temporal data model and system architec-

ture for the management of normative texts. In Proc.

of the 11th Natlional Conf. on Advanced Database

Systems (SEBD), pages 169–178, Cetraro, Italy.

Guarino, N., editor (1998). Formal Ontology in Information

Systems. IOS Press, Amsterdam.

Jensen, C. S., Dyreson, C. E., and et al., E. (1998). The

Consensus Glossary of Temporal Database Concepts -

February 1998 Version. In Etzion, O., Jajodia, S., and

Sripada, S., editors, Temporal Databases — Research

and Practice, pages 367–405. Springer-Verlag. LNCS

No. 1399.

Oracle (2004). The Oracle 9i database home page. Oracle

corporation:

http://www.oracle.com/technology/products/oracle9i/.

Palmirani, M. and Brighi, R. (2002). Norma-system: A le-

gal document system for managing consolidated acts.

In Proc. of 13th Intl’ Conf. on Database and Expert

Systems Applications (DEXA), pages 310–320, Aix-

en-Provence, France.

Prot

e (2004). The OWL plugin for Prot

e home page:

Stanford University,

http://protege.stanford.edu/plugins/owl/.

SOAP (2004). The web services activity home page: W3C

Consortium, http://www.w3.org/2000/xp/Group/.

US E-Gov (2004). U.S. president’s e-government initia-

tives: http://www.whitehouse.gov/omb/egov/.

WebOnt (2004). The web ontology group home page, W3C

Consortium: http://www.w3.org/2001/sw/WebOnt/.

XML (2004). The extensible markup language home page,

W3C Consortium: http://www.w3c.org/XML/.

XMLSchema (2004). The xml schema home page, W3C

Consortium: http://www.w3c.org/XML/Schema/.

XQuery (2004). The xml query home page, W3C Consor-

tium: http://www.w3c.org/XML/Query.

WEBIST 2005 - SOCIETY AND E-BUSINESS

416