http://rw4.cs.uni-saarland.de/teaching/spa10/papers/shapeanalysis3vl.pdf

Parametric Shape Analysis via

3-Valued Logic

MOOLY SAGIV, THOMAS REPS, and REINHARD WILHELM

Shape analysis concerns the problem of determining “shape invariants” for programs that per-

form destructive updating on dynamically allocated storage. This article presents a parametric

framework for shape analysis that can be instantiated in different ways to create different shape-

analysis algorithms that provide varying degrees of efﬁciency and precision. A key innovation of

the work is that the stores that can possibly arise during execution are represented (conservatively)

using 3-valued logical structures. The framework is instantiated in different ways by varying the

predicates used in the 3-valued logic. The class of programs to which a given instantiation of the

framework can be applied is not limited a priori (i.e., as in some work on shape analysis, to pro-

grams that manipulate only lists, trees, DAGS, etc.); each instantiation of the framework can be

applied to any program, but may produce imprecise results (albeit conservative ones) due to the

set of predicates employed.

Categories and Subject Descriptors: D.2.5 [Software Engineering]: Testing and Debugging—

symbolic execution; D.3.3 [Programming Languages]: Language Constructs and Features—

data types and structures;dynamic storage management; E.1 [Data]: Data Structures—graphs;

lists;trees; E.2 [Data]: Data Storage Representations—composite structures;linked representa-

tions; F.3.1 [Logics and Meanings of Programs]: Specifying and Verifying and Reasoning about

Programs—assertions;invariants

General Terms: Algorithms, Languages, Theory, Veriﬁcation

Additional Key Words and Phrases: Abstract interpretation, alias analysis, constraint solving,

destructive updating, pointer analysis, shape analysis, static analysis, 3-valued logic

A preliminary version of this paper appeared in the Proceedings of the 1999 ACM Symposium on

Principles of Programming Languages. Part of this research was carried out while M. Sagiv was at

the University of Chicago. M. Sagiv was supported in part by the National Science Foundation under

grant CCR-9619219 and by the U.S.–Israel Binational Science Foundation under grant 96-00337.

T. Reps was supported in part by the National Science Foundation under grants CCR-9625667

and CCR-9619219, by the U.S.–Israel Binational Science Foundation under grant 96-00337, by a

Vilas Associate Award from the University of Wisconsin, by the Ofﬁce of Naval Research under

contract N00014-00-1-0607, by the Alexander von Humboldt Foundation, and by the John Simon

Guggenheim Memorial Foundation. R. Wilhelm was supported in part by a DAAD-NSF Collabora-

tive Research Grant.

Authors’ addresses: M. Sagiv, Department of Computer Science, School of Mathematical Sci-

ence, Tel-Aviv University, Tel-Aviv 69978, Israel; email: [email protected].il; T. Reps, Computer

Sciences Department, University of Wisconsin, 1210 West Dayton Street, Madison, WI 53706;

email: [email protected].edu; R. Wilhelm, Fachrichtung Inf., Univ. des Saarlandes, 66123 Saarbr ¨

ucken,

Germany; email: [email protected].de.

Permission to make digital/hard copy of all or part of this material without fee for personal or

classroom use provided that the copies are not made or distributed for proﬁt or commercial advan-

tage, the ACM copyright/server notice, the title of the publication, and its date appear, and notice

is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on

servers, or to redistribute to lists requires prior speciﬁc permission and/or a fee.

C

2002 ACM 0164-0925/02/0500–0217 $5.00

ACM Transactions on Programming Languages and Systems, Vol. 24, No. 3, May 2002, Pages 217–298.

218 •M. Sagiv et al.

1. INTRODUCTION

In the past two decades, many “shape-analysis” algorithms have been devel-

oped that can automatically create different classes of “shape descriptors” for

programs that perform destructive updating on dynamically allocated storage

[Jones and Muchnick 1981, 1982; Larus and Hilﬁnger 1988; Horwitz et al. 1989;

Chase et al. 1990; Stransky 1992; Assmann and Weinhardt 1993; Plevyak et al.

1993; Wang 1994; Sagiv et al. 1998]. A common feature of these algorithms is

that they represent the set of possible memory states (“stores”) that arise at a

given point in the program by shape graphs, in which heap cells are represented

by shape-graph nodes and, in particular, sets of “indistinguishable” heap cells

are represented by a single shape-graph node (often called a summary-node

[Chase et al. 1990]).

This article presents a parametric framework for shape analysis. The frame-

work can be instantiated in different ways to create shape-analysis algorithms

that provide different degrees of precision. The essence of a number of pre-

vious shape-analysis algorithms, including Jones and Muchnick [1981, 1982],

Horwitz et al. [1989], Chase et al. [1990], Stransky [1992], Plevyak et al. [1993],

Wang [1994], and Sagiv et al. [1998], can be viewed as instances of this frame-

work. Other instantiations of the framework yield new shape-analysis algo-

rithms that obtain more precise information than previous work.

A parametric framework must address the following issues:

(i) What is the language for specifying (a) the properties of stores that are to

be tracked, and (b) how such properties are affected by the execution of the

different kinds of statements in the programming language?

(ii) How is a shape-analysis algorithm generated from such a speciﬁcation?

Issue (i) concerns the speciﬁcation language of the framework. A key innovation

of our work is the way in which it makes use of 2-valued and 3-valued logic:

2-valued and 3-valued logical structures are used to represent concrete and

abstract stores, respectively (i.e., interpretations of unary and binary pred-

icates encode the contents of variables and pointer-valued structure ﬁelds);

ﬁrst-order formulae with transitive closure are used to specify properties such

as sharing, cyclicity, reachability, and the like. Formulae are also used to specify

how the store is affected by the execution of the different kinds of statements

in the programming language. The analysis framework can be instantiated in

different ways by varying the predicates that are used. The speciﬁed set of

predicates determines the set of data-structure properties that can be tracked,

and consequently what properties of stores can be “discovered” to hold at the

different points in the program by the corresponding instance of the analy-

sis. Issue (ii) concerns how to create an actual analyzer from the speciﬁcation.

In our work, the analysis algorithm is an abstract interpretation; it ﬁnds the

least ﬁxed point of a set of equations that are generated from the analysis

speciﬁcation.

The ideal is to have a fully automatic parametric framework, a yacc for shape

analysis, so to speak. The designer of a shape-analysis algorithm would sup-

ply only the speciﬁcation, and the shape-analysis algorithm would be created

ACM Transactions on Programming Languages and Systems, Vol. 24, No. 3, May 2002.

Parametric Shape Analysis via 3-Valued Logic •219

automatically from this speciﬁcation. A prototype version of such a system,

based on the methods presented in this article, has been implemented by

T. Lev-Ami [Lev-Ami 2000; Lev-Ami and Sagiv 2000]. (See also Section 7.4.1.)

The class of programs to which a given instantiation of the framework can

be applied is not limited a priori (i.e., as in some work on shape analysis, to

programs that manipulate only lists, trees, DAGS, etc.). Each instantiation of

the framework can be applied to any program, but may produce conservative

results due to the set of predicates employed; that is, the attempt to analyze a

particular program with an instantiation created using an inappropriate set of

predicates may produce imprecise, albeit conservative, results. Thus, depending

on the kinds of linked data structures used in a program, and on the link-

rearrangement operations performed by the program’s statements, a different

instantiation—using a different set of predicates—may be needed in order to

obtain more useful results.

The framework allows one to create algorithms that are more precise than

the shape-analysis algorithms cited earlier. In particular, by tracking which

heap cells are reachable from which program variables, it is often possible

to determine precise shape information for programs that manipulate several

(possibly cyclic) data structures (see Sections 2.6 and 5.3). Other static-analysis

techniques yield very imprecise information on these programs. So that reach-

ability properties can be speciﬁed, the speciﬁcation language of the framework

includes a transitive-closure operator.

The key features of the approach described in this article are as follows.

—The use of 2-valued logical structures to represent concrete stores. Interpre-

tations of unary and binary predicates encode the contents of variables and

pointer-valued structure ﬁelds (see Section 2.2).

—The use of a 2-valued ﬁrst-order logic with transitive closure to specify proper-

ties of stores such as sharing, cyclicity, reachability, and so on (see Sections 2,

3, and 5).

—The use of Kleene’s 3-valued logic [Kleene 1987] to relate the concrete

(2-valued) world and the abstract (3-valued) world. Kleene’s logic has a

third truth value that signiﬁes “unknown,” which is useful for shape anal-

ysis because we only have partial information about summary nodes. For

these nodes, predicates may have the value unknown (see Sections 2

and 4).

—The development of a systematic way to construct abstract domains that

are suitable for shape analysis. This is based on a general notion of “truth-

blurring” embeddings that map from a 2-valued world to a corresponding

3-valued one (see Sections 2.5, 4.2, and 4.3).

—The use of the Embedding Theorem (Theorem 4.9) to ensure that the meaning

of a formula in the “blurred” (3-valued) world is compatible with the formula’s

meaning in the original (2-valued) world. The consequence of the Embedding

Theorem is that it allows us to extract information from either the concrete

world or the abstract world via a single formula: the same syntactic expres-

sion can be interpreted either in the 2-valued world or the 3-valued world.

The Embedding Theorem ensures that the information obtained from the

ACM Transactions on Programming Languages and Systems, Vol. 24, No. 3, May 2002.

220 •M. Sagiv et al.

3-valued world is compatible (i.e., safe) with that obtained from the 2-valued

world. This eases soundness proofs considerably.

—New insight into the issue of “materialization.” This is known to be very

important for maintaining accuracy in the analysis of loops that advance

pointers through linked data structures [Chase et al. 1990; Plevyak et al.

1993; Sagiv et al. 1998]. (Materialization involves the splitting of a summary-

node into two separate nodes by the abstract transfer function that expresses

the semantics of a statement of the form x = y->n.) This article develops a

new approach to materialization:

—The essence of materialization involves a step (called focus in Section 6.3)

that forces the values of certain formulae from unknown to true or false.

This has the effect of converting an abstract store into several abstract

stores, each of which is more precise than the original one.

—Materialization is complicated because various properties of a store

are interdependent. We introduce a mechanism based on a constraint-

satisfaction system to capture the effects of such dependences (see

Section 6.4).

In this article, we address the problem of shape analysis for a single pro-

cedure. This has allowed us to concentrate on foundational aspects of shape-

analysis methods. The application of our techniques to the problem of interpro-

cedural shape analysis, including shape analysis for programs with recursive

procedures, is addressed in Rinetskey and Sagiv [2001] (see also Section 7.4.3).

The remainder of the article is organized as follows. Section 2 provides an

overview of the shape-analysis framework. Section 3 shows how 2-valued logic

can be used as a metalanguage for expressing the concrete operational seman-

tics of programs (and programming languages). Section 4 provides the tech-

nical details about the representation of stores using 3-valued logic. Section 5

deﬁnes the notion of instrumentation predicates, which are used to specify the

abstract domain that a speciﬁc instantiation of the shape-analysis framework

will use. Section 6 formulates the abstract semantics for program statements

and conditions, and deﬁnes the iterative algorithm for computing a (safe) set of

3-valued structures for each program point. Section 7 discusses related work.

Section 8 makes some ﬁnal observations.

In Appendix A, we sketch how the framework can be used to analyze pro-

grams that manipulate doubly linked lists, which has been posed as a chal-

lenging problem for program analysis [Aiken 1996]. The proof of the Em-

bedding Theorem and other technical proofs are presented in Appendices B

and C.

2. AN OVERVIEW OF THE PARAMETRIC FRAMEWORK

This section provides an overview of the main ideas used in the article. The

presentation is at a semi-technical level; a more detailed treatment of this

material, as well as several elaborations on the ideas covered here, is presented

in the later sections of the paper.

ACM Transactions on Programming Languages and Systems, Vol. 24, No. 3, May 2002.

Parametric Shape Analysis via 3-Valued Logic •221

2.1 Shape Invariants and Data Structures

Constituents of shape invariants that can be used to characterize a data struc-

ture include

(i) anchor pointer variables, that is, information about which pointer variables

point into the data structure;

(ii) the types of the data-structure elements, and in particular, which ﬁelds

hold pointers;

(iii) connectivity properties, such as

—whether all elements of the data structure are reachable from a root

pointer variable,

—whether any data-structure elements are shared,

—whether there are cycles in the data structure, and

—whether an element vpointed to by a “forward” pointer of another ele-

ment v0has its “backward” pointer pointing to v0; and

(iv) other properties, for instance, whether an element of an ordered list is in

the correct position.

Each data structure can be characterized by a certain set of such properties.

Most semantics track the values of pointer variables and pointer-valued

ﬁelds using a pair of functions, often called the environment and the store.

Constituents (i) and (ii) above are parts of any such semantics; consequently,

we refer to them as core properties.

Connectivity and other properties, such as those mentioned in (iii) and (iv),

are usually not explicitly part of the semantics of pointers in a language, but

instead are properties derived from this core semantics. They are essential

ingredients in program veriﬁcation, however, as well as in our approach to shape

analysis of programs. Noncore properties are called instrumentation properties

(for reasons that become clear shortly).

Let us start by taking a Platonic view, namely, that ideas exist without regard

to their physical realization. Concepts such as “is shared,” “lies on a cycle,” and

“is reachable” can be deﬁned either in graph-theoretic terms, using properties

of paths, or in terms of the programming-language concept of pointers. The

deﬁnitions of these concepts can be stated in a way that is independent of any

particular data structure.

Example 2.1.A heap cell is heap-shared if it is the target of two pointers,

either from two different heap cells, or from two different pointer components

of the same heap cell.

Data structures can now be characterized using sets of such properties, where

“data structure” here is still independent of a particular implementation.

Example 2.2.An acyclic singly linked list is a set of objects, each with one

pointer component. The objects are reachable from a root pointer variable either

directly or by following pointer components. No object lies on a cycle, that is, is

reachable from itself by following pointer components.

ACM Transactions on Programming Languages and Systems, Vol. 24, No. 3, May 2002.

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

http://rw4.cs.uni-saarland.de/teaching/spa10/papers/shapeanalysis3vl.pdf

Documents connexes

Faire une suggestion

Produits

Assistance

Produits

Assistance

http://rw4.cs.uni-saarland.de/teaching/spa10/papers/shapeanalysis3vl.pdf

Documents connexes

Faire une suggestion

Produits

Assistance

Ajouter ce document à la (aux) collections

Ajouter ce document à enregistré

Suggérez-nous comment améliorer StudyLib