2·
[Systems]: Relational Databases; H.2.4 [Systems]: Query processing
General Terms: Algorithms, Theory
Additional Key Words and Phrases: Data exchange, data integration, composition, schema map-
ping, certain answers, conjunctive queries, dependencies, chase, computational complexity, query
answering, second-order logic, universal solution, metadata model management
1. INTRODUCTION & SUMMARY OF RESULTS
The problem of transforming data structured under one schema into data struc-
tured under a different schema is an old, but persistent problem, arising in several
different areas of database management systems. In recent years, this problem
has received considerable attention in the context of information integration, where
data from various heterogeneous sources has to be transformed into data structured
under a mediated schema. To achieve interoperability, data-sharing architectures
use schema mappings to describe how data is to be transformed from one represen-
tation to another. These schema mappings are typically specified using high-level
declarative formalisms that make it possible to describe the correspondence be-
tween different schemas at a logical level, without having to specify physical details
that may be relevant only for the implementation (run-time) phase. In particular,
declarative schema mappings in the form of GAV (global-as-view), LAV (local-as-
view), and, more generally, GLAV (global-and-local-as-view) assertions have been
used in data integration systems [Lenzerini 2002]. Similarly, source-to-target tuple-
generating dependencies (source-to-target tgds) have been used for specifying data
exchange between a relational source and a relational target [Fagin, Kolaitis, Miller
and Popa 2005; Fagin, Kolaitis and Popa 2003]; moreover, nested (XML-style)
source-to-target dependencies have been used in the Clio data exchange system
[Popa et al. 2002].
The extensive use of schema mappings has motivated the need to develop a frame-
work for managing these schema mappings and other related metadata. Bernstein
[Bernstein 2003] has introduced such a framework, called model management, in
which the main abstractions are schemas and mappings between schemas, as well as
operators for manipulating schemas and mappings. One of the most fundamental
operators in this framework is the composition operator, which combines successive
schema mappings into a single schema mapping. The composition operator can play
a useful role each time the target of a schema mapping is also the source of another
schema mapping. This scenario occurs, for instance, in schema evolution, where a
schema may undergo several successive changes. It also occurs in peer-to-peer data
management systems, such as the Piazza System [Halevy, Ives, Mork and Tatari-
nov 2003], and in extract-transform-load (ETL) processes in which the output of
a transformation may be input to another [Vassiliadis, Simitsis and Skiadopoulos
2002]. A model management system should be able to figure out automatically how
to compose two or more successive schema mappings into a single schema mapping
between the first schema and the last schema in a way that captures the interaction
of the schema mappings in the entire sequence. The resulting single schema map-
ping can then be used during the run-time phase for various purposes, such as query
ACM Transactions on Database Systems, Vol. , No. , 20.