http://rw4.cs.uni-saarland.de/teaching/spa10/papers/shapeanalysis3vl.pdf

Parametric Shape Analysis via
3-Valued Logic
MOOLY SAGIV, THOMAS REPS, and REINHARD WILHELM
Shape analysis concerns the problem of determining “shape invariants” for programs that per-
form destructive updating on dynamically allocated storage. This article presents a parametric
framework for shape analysis that can be instantiated in different ways to create different shape-
analysis algorithms that provide varying degrees of efficiency and precision. A key innovation of
the work is that the stores that can possibly arise during execution are represented (conservatively)
using 3-valued logical structures. The framework is instantiated in different ways by varying the
predicates used in the 3-valued logic. The class of programs to which a given instantiation of the
framework can be applied is not limited a priori (i.e., as in some work on shape analysis, to pro-
grams that manipulate only lists, trees, DAGS, etc.); each instantiation of the framework can be
applied to any program, but may produce imprecise results (albeit conservative ones) due to the
set of predicates employed.
Categories and Subject Descriptors: D.2.5 [Software Engineering]: Testing and Debugging—
symbolic execution; D.3.3 [Programming Languages]: Language Constructs and Features—
data types and structures;dynamic storage management; E.1 [Data]: Data Structures—graphs;
lists;trees; E.2 [Data]: Data Storage Representations—composite structures;linked representa-
tions; F.3.1 [Logics and Meanings of Programs]: Specifying and Verifying and Reasoning about
Programs—assertions;invariants
General Terms: Algorithms, Languages, Theory, Verification
Additional Key Words and Phrases: Abstract interpretation, alias analysis, constraint solving,
destructive updating, pointer analysis, shape analysis, static analysis, 3-valued logic
A preliminary version of this paper appeared in the Proceedings of the 1999 ACM Symposium on
Principles of Programming Languages. Part of this research was carried out while M. Sagiv was at
the University of Chicago. M. Sagiv was supported in part by the National Science Foundation under
grant CCR-9619219 and by the U.S.–Israel Binational Science Foundation under grant 96-00337.
T. Reps was supported in part by the National Science Foundation under grants CCR-9625667
and CCR-9619219, by the U.S.–Israel Binational Science Foundation under grant 96-00337, by a
Vilas Associate Award from the University of Wisconsin, by the Office of Naval Research under
contract N00014-00-1-0607, by the Alexander von Humboldt Foundation, and by the John Simon
Guggenheim Memorial Foundation. R. Wilhelm was supported in part by a DAAD-NSF Collabora-
tive Research Grant.
Authors’ addresses: M. Sagiv, Department of Computer Science, School of Mathematical Sci-
ence, Tel-Aviv University, Tel-Aviv 69978, Israel; email: [email protected].il; T. Reps, Computer
Sciences Department, University of Wisconsin, 1210 West Dayton Street, Madison, WI 53706;
email: [email protected].edu; R. Wilhelm, Fachrichtung Inf., Univ. des Saarlandes, 66123 Saarbr ¨
ucken,
Germany; email: [email protected].de.
Permission to make digital/hard copy of all or part of this material without fee for personal or
classroom use provided that the copies are not made or distributed for profit or commercial advan-
tage, the ACM copyright/server notice, the title of the publication, and its date appear, and notice
is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on
servers, or to redistribute to lists requires prior specific permission and/or a fee.
C
2002 ACM 0164-0925/02/0500–0217 $5.00
ACM Transactions on Programming Languages and Systems, Vol. 24, No. 3, May 2002, Pages 217–298.
218 M. Sagiv et al.
1. INTRODUCTION
In the past two decades, many “shape-analysis” algorithms have been devel-
oped that can automatically create different classes of “shape descriptors” for
programs that perform destructive updating on dynamically allocated storage
[Jones and Muchnick 1981, 1982; Larus and Hilfinger 1988; Horwitz et al. 1989;
Chase et al. 1990; Stransky 1992; Assmann and Weinhardt 1993; Plevyak et al.
1993; Wang 1994; Sagiv et al. 1998]. A common feature of these algorithms is
that they represent the set of possible memory states (“stores”) that arise at a
given point in the program by shape graphs, in which heap cells are represented
by shape-graph nodes and, in particular, sets of “indistinguishable” heap cells
are represented by a single shape-graph node (often called a summary-node
[Chase et al. 1990]).
This article presents a parametric framework for shape analysis. The frame-
work can be instantiated in different ways to create shape-analysis algorithms
that provide different degrees of precision. The essence of a number of pre-
vious shape-analysis algorithms, including Jones and Muchnick [1981, 1982],
Horwitz et al. [1989], Chase et al. [1990], Stransky [1992], Plevyak et al. [1993],
Wang [1994], and Sagiv et al. [1998], can be viewed as instances of this frame-
work. Other instantiations of the framework yield new shape-analysis algo-
rithms that obtain more precise information than previous work.
A parametric framework must address the following issues:
(i) What is the language for specifying (a) the properties of stores that are to
be tracked, and (b) how such properties are affected by the execution of the
different kinds of statements in the programming language?
(ii) How is a shape-analysis algorithm generated from such a specification?
Issue (i) concerns the specification language of the framework. A key innovation
of our work is the way in which it makes use of 2-valued and 3-valued logic:
2-valued and 3-valued logical structures are used to represent concrete and
abstract stores, respectively (i.e., interpretations of unary and binary pred-
icates encode the contents of variables and pointer-valued structure fields);
first-order formulae with transitive closure are used to specify properties such
as sharing, cyclicity, reachability, and the like. Formulae are also used to specify
how the store is affected by the execution of the different kinds of statements
in the programming language. The analysis framework can be instantiated in
different ways by varying the predicates that are used. The specified set of
predicates determines the set of data-structure properties that can be tracked,
and consequently what properties of stores can be “discovered” to hold at the
different points in the program by the corresponding instance of the analy-
sis. Issue (ii) concerns how to create an actual analyzer from the specification.
In our work, the analysis algorithm is an abstract interpretation; it finds the
least fixed point of a set of equations that are generated from the analysis
specification.
The ideal is to have a fully automatic parametric framework, a yacc for shape
analysis, so to speak. The designer of a shape-analysis algorithm would sup-
ply only the specification, and the shape-analysis algorithm would be created
ACM Transactions on Programming Languages and Systems, Vol. 24, No. 3, May 2002.
Parametric Shape Analysis via 3-Valued Logic 219
automatically from this specification. A prototype version of such a system,
based on the methods presented in this article, has been implemented by
T. Lev-Ami [Lev-Ami 2000; Lev-Ami and Sagiv 2000]. (See also Section 7.4.1.)
The class of programs to which a given instantiation of the framework can
be applied is not limited a priori (i.e., as in some work on shape analysis, to
programs that manipulate only lists, trees, DAGS, etc.). Each instantiation of
the framework can be applied to any program, but may produce conservative
results due to the set of predicates employed; that is, the attempt to analyze a
particular program with an instantiation created using an inappropriate set of
predicates may produce imprecise, albeit conservative, results. Thus, depending
on the kinds of linked data structures used in a program, and on the link-
rearrangement operations performed by the program’s statements, a different
instantiation—using a different set of predicates—may be needed in order to
obtain more useful results.
The framework allows one to create algorithms that are more precise than
the shape-analysis algorithms cited earlier. In particular, by tracking which
heap cells are reachable from which program variables, it is often possible
to determine precise shape information for programs that manipulate several
(possibly cyclic) data structures (see Sections 2.6 and 5.3). Other static-analysis
techniques yield very imprecise information on these programs. So that reach-
ability properties can be specified, the specification language of the framework
includes a transitive-closure operator.
The key features of the approach described in this article are as follows.
The use of 2-valued logical structures to represent concrete stores. Interpre-
tations of unary and binary predicates encode the contents of variables and
pointer-valued structure fields (see Section 2.2).
The use of a 2-valued first-order logic with transitive closure to specify proper-
ties of stores such as sharing, cyclicity, reachability, and so on (see Sections 2,
3, and 5).
The use of Kleene’s 3-valued logic [Kleene 1987] to relate the concrete
(2-valued) world and the abstract (3-valued) world. Kleene’s logic has a
third truth value that signifies “unknown,” which is useful for shape anal-
ysis because we only have partial information about summary nodes. For
these nodes, predicates may have the value unknown (see Sections 2
and 4).
The development of a systematic way to construct abstract domains that
are suitable for shape analysis. This is based on a general notion of “truth-
blurring” embeddings that map from a 2-valued world to a corresponding
3-valued one (see Sections 2.5, 4.2, and 4.3).
The use of the Embedding Theorem (Theorem 4.9) to ensure that the meaning
of a formula in the “blurred” (3-valued) world is compatible with the formula’s
meaning in the original (2-valued) world. The consequence of the Embedding
Theorem is that it allows us to extract information from either the concrete
world or the abstract world via a single formula: the same syntactic expres-
sion can be interpreted either in the 2-valued world or the 3-valued world.
The Embedding Theorem ensures that the information obtained from the
ACM Transactions on Programming Languages and Systems, Vol. 24, No. 3, May 2002.
220 M. Sagiv et al.
3-valued world is compatible (i.e., safe) with that obtained from the 2-valued
world. This eases soundness proofs considerably.
New insight into the issue of “materialization.” This is known to be very
important for maintaining accuracy in the analysis of loops that advance
pointers through linked data structures [Chase et al. 1990; Plevyak et al.
1993; Sagiv et al. 1998]. (Materialization involves the splitting of a summary-
node into two separate nodes by the abstract transfer function that expresses
the semantics of a statement of the form x = y->n.) This article develops a
new approach to materialization:
The essence of materialization involves a step (called focus in Section 6.3)
that forces the values of certain formulae from unknown to true or false.
This has the effect of converting an abstract store into several abstract
stores, each of which is more precise than the original one.
Materialization is complicated because various properties of a store
are interdependent. We introduce a mechanism based on a constraint-
satisfaction system to capture the effects of such dependences (see
Section 6.4).
In this article, we address the problem of shape analysis for a single pro-
cedure. This has allowed us to concentrate on foundational aspects of shape-
analysis methods. The application of our techniques to the problem of interpro-
cedural shape analysis, including shape analysis for programs with recursive
procedures, is addressed in Rinetskey and Sagiv [2001] (see also Section 7.4.3).
The remainder of the article is organized as follows. Section 2 provides an
overview of the shape-analysis framework. Section 3 shows how 2-valued logic
can be used as a metalanguage for expressing the concrete operational seman-
tics of programs (and programming languages). Section 4 provides the tech-
nical details about the representation of stores using 3-valued logic. Section 5
defines the notion of instrumentation predicates, which are used to specify the
abstract domain that a specific instantiation of the shape-analysis framework
will use. Section 6 formulates the abstract semantics for program statements
and conditions, and defines the iterative algorithm for computing a (safe) set of
3-valued structures for each program point. Section 7 discusses related work.
Section 8 makes some final observations.
In Appendix A, we sketch how the framework can be used to analyze pro-
grams that manipulate doubly linked lists, which has been posed as a chal-
lenging problem for program analysis [Aiken 1996]. The proof of the Em-
bedding Theorem and other technical proofs are presented in Appendices B
and C.
2. AN OVERVIEW OF THE PARAMETRIC FRAMEWORK
This section provides an overview of the main ideas used in the article. The
presentation is at a semi-technical level; a more detailed treatment of this
material, as well as several elaborations on the ideas covered here, is presented
in the later sections of the paper.
ACM Transactions on Programming Languages and Systems, Vol. 24, No. 3, May 2002.
Parametric Shape Analysis via 3-Valued Logic 221
2.1 Shape Invariants and Data Structures
Constituents of shape invariants that can be used to characterize a data struc-
ture include
(i) anchor pointer variables, that is, information about which pointer variables
point into the data structure;
(ii) the types of the data-structure elements, and in particular, which fields
hold pointers;
(iii) connectivity properties, such as
whether all elements of the data structure are reachable from a root
pointer variable,
whether any data-structure elements are shared,
whether there are cycles in the data structure, and
whether an element vpointed to by a “forward” pointer of another ele-
ment v0has its “backward” pointer pointing to v0; and
(iv) other properties, for instance, whether an element of an ordered list is in
the correct position.
Each data structure can be characterized by a certain set of such properties.
Most semantics track the values of pointer variables and pointer-valued
fields using a pair of functions, often called the environment and the store.
Constituents (i) and (ii) above are parts of any such semantics; consequently,
we refer to them as core properties.
Connectivity and other properties, such as those mentioned in (iii) and (iv),
are usually not explicitly part of the semantics of pointers in a language, but
instead are properties derived from this core semantics. They are essential
ingredients in program verification, however, as well as in our approach to shape
analysis of programs. Noncore properties are called instrumentation properties
(for reasons that become clear shortly).
Let us start by taking a Platonic view, namely, that ideas exist without regard
to their physical realization. Concepts such as “is shared,” “lies on a cycle,” and
“is reachable” can be defined either in graph-theoretic terms, using properties
of paths, or in terms of the programming-language concept of pointers. The
definitions of these concepts can be stated in a way that is independent of any
particular data structure.
Example 2.1.A heap cell is heap-shared if it is the target of two pointers,
either from two different heap cells, or from two different pointer components
of the same heap cell.
Data structures can now be characterized using sets of such properties, where
“data structure” here is still independent of a particular implementation.
Example 2.2.An acyclic singly linked list is a set of objects, each with one
pointer component. The objects are reachable from a root pointer variable either
directly or by following pointer components. No object lies on a cycle, that is, is
reachable from itself by following pointer components.
ACM Transactions on Programming Languages and Systems, Vol. 24, No. 3, May 2002.
1 / 82 100%
La catégorie de ce document est-elle correcte?
Merci pour votre participation!

Faire une suggestion

Avez-vous trouvé des erreurs dans linterface ou les textes ? Ou savez-vous comment améliorer linterface utilisateur de StudyLib ? Nhésitez pas à envoyer vos suggestions. Cest très important pour nous !