http://www.cs.cmu.edu/~bryant/pubdir/dac01b.pdf

Téléchargement

EffectiveUseofBooleanSatisﬁabilityProceduresintheFormalVeriﬁcation

of Superscalar and VLIW Microprocessors1

Miroslav N. Velev*

[email protected]

http://www.ece.cmu.edu/~mvelev

Randal E. Bryant‡, *

[email protected]

http://www.cs.cmu.edu/~bryant

*Department of Electrical and Computer Engineering

‡School of Computer Science

Carnegie Mellon University, Pittsburgh, PA 15213, U.S.A.

Abstract

We compare SAT-checkers and decision diagrams on the evalua-

tion of Boolean formulas produced in the formal veriﬁcation of

both correct and buggy versions of superscalar and VLIW micro-

processors. We identify one SAT-checker that signiﬁcantly out-

performs the rest. We evaluate ways to enhance its performance

by variations in the generation of the Boolean correctness formu-

las. We reassess optimizations previously used to speed up the

formal veriﬁcation and probe future challenges.

1 Introduction

In the past few years, SAT-checkers have made a dramatic

improvement in both their speed and capacity. We compare 28 of

them with decision diagrams—BDDs [7] and BEDs [61]—as

well as with ATPG tools [21][52] when used as Boolean Satisﬁ-

ability (SAT) procedures in the formal veriﬁcation of micropro-

cessors. The comparison is based on two benchmark suites, each

of 101 Boolean formulas generated in the veriﬁcation of 1 cor-

rect and 100 buggy versions of the same design—a superscalar

and a VLIW microprocessor, respectively. Unlike existing

benchmark suites, e.g., ISCAS 85 [5] and ISCAS 89 [6], which

are collections of circuits that have nothing in common, our

suites are based on the same correct design and hence provide a

point for consistent comparison of different evaluation methods.

The correctness condition that we use is expressed in a decid-

able subset of First-Order Logic [10]. That allows it either to be

checked directly with a customized decision procedure [51] or to

be translated to an equivalent Boolean formula [55] that can be

evaluated with SAT engines for either proving correctness or

ﬁnding a counterexample. The latter approach can directly bene-

ﬁt from improvements in the SAT tools.

We identify Chaff [38] as the most efﬁcient SAT-checker for

the second veriﬁcation strategy when applied to both correct and

buggy designs. Chaff signiﬁcantly outperforms BDDs [7] and the

SAT-checker DLM-2 [48], the previous most efﬁcient SAT pro-

cedures for, respectively, correct and buggy processors. We

reevaluate optimizations used to enhance the performance of

BDDs and DLM-2 and conclude that many of them are no longer

crucial on the same benchmark suites. Our study allows us to

eliminate conservative approximations that might result in false

negatives and thus consume precious user time for analysis. We

also prioritize the optimizations that are still useful with Chaff in

the order of their impact on the efﬁciency of the formal veriﬁca-

tion.

1. This research was supported by the SRC under contract 00-DC-684.

2 Background

The formal veriﬁcation is done by correspondence checking—

comparison of the superscalar/VLIW Implementation against a

non-pipelined Speciﬁcation, based on the Burch and Dill ﬂushing

technique [10]. The correctness criterion is expressed as a for-

mula in the logic of Equality with Uninterpreted Functions and

Memories (EUFM) [10] and states that all user-visible state ele-

ments in the processor should be updated in sync by either 0, or

1, or up to kinstructions after each clock cycle, where kis the

issue width of the design. The correctness formula is then trans-

lated to a Boolean formula by an automatic tool [55] that exploits

the properties of Positive Equality [8], the eij encoding [18], and

a number of conservative approximations. The resulting Boolean

formula should be a tautology in order for the processor to be

correct and can be evaluated by any SAT procedure.

The syntax of EUFM [10] includes terms and formulas.

Terms are used in order to abstract word-level values of data, reg-

ister identiﬁers, memory addresses, as well as the entire states of

memory arrays. A term can be an Uninterpreted Function (UF)

applied on a list of argument terms, a domain variable, or an ITE

operator selecting between two argument terms based on a con-

trolling formula, such that ITE(formula,term1, term2) will evalu-

ate to term1 when formula =true and to term2 when formula =

false. The syntax for terms can be extended to model memories

by means of the functions read and write [10][59]. Formulas are

used in order to model the control path of a microprocessor, as

well as to express the correctness condition. A formula can be an

Uninterpreted Predicate (UP) applied on a list of argument terms,

a propositional variable, an ITE operator selecting between two

argument formulas based on a controlling formula, or an equa-

tion (equality comparison) of two terms. Formulas can be

negated and connected by Boolean connectives. We will refer to

both terms and formulas as expressions.

UFs and UPs are used to abstract away the implementation

details of functional units by replacing them with “black boxes”

that satisfy no particular properties other than that of functional

consistency. Namely, that the same combinations of values to the

inputs of the UF (or UP) produce the same output value. Then, it

no longer matters whether the original functional unit is an adder

or a multiplier, etc., as long as the same UF (or UP) is used to

replace it in both the Implementation and the Speciﬁcation. Note

that in this way we will prove a more general problem—that the

processor is correct for any implementation of its functional

units. However, that more general problem is easier to prove.

Two possible ways to impose the property of functional con-

sistency of UFs and UPs are Ackermann constraints [1] and

nested ITEs [3][4][21]. The Ackermann scheme replaces each

UF (UP) application in the EUFM formula Fwith a new domain

variable (propositional variable) and then adds external consis-

tency constraints. For example, the UF application f(a1,b1) will

be replaced by a new domain variable c1, another application of

the same UF, f(a2,b2), will be replaced by a new domain variable

c2. Then, the resulting EUFM formula F’ will be extended as

[(a1=a2)∧(b1=b2)⇒(c1=c2)] ⇒F’. In the nested ITEs

scheme, the ﬁrst application of the UF above will still be

replaced by a new domain variable c1. However, the second one

will be replaced by ITE((a2=a1)∧(b2=b1), c1,c2), where c2is

a new domain variable. A third one, f(a3,b3), will be replaced by

ITE((a3=a1)∧(b3=b1), c1,ITE((a3=a2)∧(b3=b2), c2,c3)),

where c3 is a new domain variable, and so on. Similarly for UPs.

Positive Equality allows the identiﬁcation of two types of

terms in the structure of an EUFM formula—those which appear

in only positive equations and are called p-terms (for positive

terms), and those which appear in both positive and negative

equations and are called g-terms (for general terms). A negative

equation is one which appears under an odd number of negations

or as part of the controlling formula for an ITE operator. The efﬁ-

ciency from exploiting Positive Equality is due to the observation

that the truth of an EUFM formula under a maximally diverse

interpretation of the p-terms implies the truth of the formula

under any interpretation. A maximally diverse interpretation is

one where the equality comparison of a domain variable with

itself evaluates to true, that of a p-term domain variable with a

syntactically distinct domain variable evaluates to false, and that

of a g-term domain variable with a syntactically distinct g-term

domain variable (a g-equation) could evaluate to either true or

false and can be encoded with Boolean variables [18][40].

3 Microprocessor Benchmarks

We base our comparison of SAT procedures on a set of high-level

microprocessors, ranging from a single-issue 5-stage pipelined

DLX [23], 1×DLX-C, to a dual-issue superscalar DLX with mul-

ticycle functional units, exceptions, and branch prediction,

2×DLX-CC-MC-EX-BP [56], to a 9-wide VLIW architecture,

9VLIW-MC-BP [57], that imitates the Intel Itanium [25] [49] in

speculative features such as predicated execution, speculative

The VLIW design is far more complex than any other that

has been formally veriﬁed previously in an automatic way. It has

a fetch engine that supplies the execution engine with a packet of

9 instructions, with no internal data dependencies. Each of these

instructions is already matched with one of 9 execution pipelines

of 4 stages: 4 integer pipelines, two of which can perform both

integer and ﬂoating-point memory accesses; 2 ﬂoating-point

pipelines; and 3 branch-address computation pipelines. Every

instruction is predicated with a qualifying predicate identiﬁer,

such that the result of that instruction affects user-visible state

only when the predicate evaluates to 1. Data values are stored in

4 register ﬁles: integer, ﬂoating-point, predicate, and branch-

address. The two ﬂoating-point ALUs, as well as the Instruction

and Data Memories, can each take multiple cycles for computing

a result or completing a fetch, respectively. There can be up to 42

instructions in ﬂight. An extended version, 9VLIW-MC-BP-EX,

also implements exceptions.

We created 100 incorrect versions of both 2×DLX-CC-MC-

EX-BP and 9VLIW-MC-BP. The bugs were variants of actual

errors made in the design of the correct versions and also coin-

cided with the types of bugs that Van Campenhout, et al. [54]

analyzed to be among the most frequent design errors. The

injected bugs included omitting inputs to logic gates, e.g., an

instruction is not squashed when a preceding branch is taken or a

stalling condition for the load interlock does not fully account for

the cases when the dependent data operand will be used. Other

types of bugs were due to using incorrect inputs to logic gates,

functional units, or memories, e.g., an input with the same name

but a different index. Finally, lack of mechanisms to correct a

speculative update of a user-visible state element when the spec-

ulation is incorrect. Hence, the variations introduced were not

completely random, as done in other efforts to generate bench-

mark suites [22][26][27][36]. The bugs were spread over the

entire designs and occurred either as single or multiple errors.

4 Comparison of SAT Procedures

We evaluated 28 SAT-checkers: SATO.3.2.1 [44][63]; GRASP

[17][32] [33], used both with a single strategy and with restarts,

randomization, and recursive learning [2]; CGRASP [12][34], a

version of GRASP that exploits structural information; DLM-2

and DLM-3 [48], as well as DLM-2000 [62], all incomplete SAT-

checkers (i.e., they cannot prove unsatisﬁability) based on global

random search and discrete Lagrangian Multipliers as a mecha-

nism to not only get the search out of local minima, but also steer

it in the direction towards a global minimum—a satisfying

assignment; satz [30][45], satz.v213 [30][45], satz-rand.v4.6 [19]

[45], eqsatz.v20 [31]; GSAT.v41 [45][47], WalkSAT.v37 [45]

[46]; posit [16][45]; ntab [13][45]; rel_sat.1.0 and rel_sat.2.1

[3][45]; rel_sat_rand1.0 [19][45]; ASAT and C-SAT [15]; CLS

[41]; QSAT [39] and QBF [42], two SAT-checkers for quantiﬁed

Boolean formulas; ZRes [11], a SAT-checker combining Zero-

Supressed BDDs (ZBDDs) with the original Davis-Putnam pro-

cedure; BSAT and IS-USAT, both based on BDDs and exploiting

the properties of unate Boolean functions [29]; Prover, a com-

mercial SAT-checker based on Stålmarck’s method [50]; Heer-

Hugo [20], also based on the same method; and Chaff [38], a

complete SAT-checker exploiting lazy Boolean constraint propa-

gation, non-chronological backtracking, restarts, randomization,

and many optimizations.

Additionally, we experimented with 2 of the fastest (and pub-

licly available) ATPG tools—ATOM [21] and TIP [52]—used in

a mode that tests the output of a benchmark for being stuck-at-0,

which triggers the justiﬁcation of value 1 at the output, turning

the ATPG tool into a SAT-checker. We also used Binary Decision

Diagrams (BDDs) [7] and Boolean Expression Diagrams (BEDs)

[61]—the latter not being a canonical representation of Boolean

functions, but shown to be extremely efﬁcient when formally

verifying multipliers [60].

The translation to the CNF format [28], used as input to most

SAT-checkers, was done after inserting a negation at the top of

the Boolean correctness formula that has to be a tautology in

order for the processor to be correct. If the formula is indeed a

tautology, its negation will be false, so that a complete SAT-

checker will be able to prove unsatisﬁability. Else, a satisfying

assignment for the negation will be a counterexample.

In translating to CNF, we introduced a new auxiliary Boolean

variable for the output of every AND,OR, or ITE gate in the

Boolean correctness formula and then imposed disjunctive con-

straints (clauses) that the value of a variable at the output of a

gate be consistent with the values of the variables at the inputs,

given the function of the gate. Inverters were subsumed in the

clauses for the driven gates. All clauses were conjuncted

together, including a constraint that the only primary output (the

negation of the Boolean correctness formula) is true. The vari-

ables in the support of the Boolean correctness formula before its

translation to CNF will be called primary Boolean variables.

The experiments were performed on a 336 MHz Sun4 with

1.2 GB of memory and 1 GB of swap space. CUDD [14] and the

sifting dynamic variable reordering heuristic [43] were used for

the BDD-based runs. In the BED evaluations, we experimented

with converting the ﬁnal BED into a BDD with both the

up_one() and up_all() functions [61] by employing 4 dif-

ferent variable ordering heuristics—variants of the depth-ﬁrst

and fanin [37] heuristics—that were the most efﬁcient in the ver-

iﬁcation of multipliers [60][61].

The SAT procedures that scaled for the 100 buggy variants of

2×DLX-CC-MC-EX-BP are listed in Table 1. The rest of the

SAT solvers had trouble even with the single-issue processor,

1×DLX-C, or could not scale for its dual-issue version, 2×DLX-

CC (without exceptions, multicycle functional units, and branch

prediction). The SAT-checker Chaff had the best performance,

ﬁnding a satisfying assignment for each benchmark in less than

40 seconds (indeed, less than 37 seconds). We ran the rest of the

SAT procedures for 400 and 4,000 seconds—one and two orders

of magnitude more, respectively. DLM-2 was the second most

efﬁcient SAT-checker for this suite, closely followed by DLM-3.

CGRASP was next, solving only half of the benchmarks in 400

seconds, followed by QSAT with 49 of the benchmarks under

400 seconds. The rest of the SAT procedures, including BDDs,

performed signiﬁcantly worse. DLM-2000 is slower than DLM-2

and DLM-3 because of extensive analysis before each decision.

When verifying the correct 2×DLX-CC-MC-EX-BP, Chaff

again had the best performance, requiring 40 seconds of CPU

time, followed by BDDs with 2,635 seconds [56], and QSAT

with 14 hours and 37 minutes. CGRASP, SATO, GRASP, and

GRASP with restarts, randomization, and recursive learning

could not prove the CNF formula unsatisﬁable in 24 hours.

Figure 1: Comparison of Chaff and BDDs on 100 buggy ver-

sions of 9VLIW-MC-BP. The benchmarks are sorted in ascend-

ing order of their times for the BDD-based experiment.

We then compared Chaff and DLM-2 on the 100 buggy

VLIW designs: Chaff was better in 77 cases, with DLM-2 being

faster with more than 60 seconds on only 10 benchmarks. How-

ever, Chaff took at most 355 seconds, and 79 seconds on average,

while DLM-2 did not complete 2 of the benchmarks in 3,600 sec-

onds (we tried 4 different parameter sets). When verifying the

correct 9VLIW-MC-BP, Chaff required 1,644 seconds, compared

to the 31.5 hours by BDDs [57], using a monolithic correctness

criterion in both cases. Figure 1 compares Chaff and BDDs on

the 100 buggy VLIW designs, such that Chaff is evaluating only

SAT Procedure % Satisﬁable in

< 40 sec < 400 sec < 4,000 sec

Chaff 100 100 100

DLM-2 61 90 98

DLM-3 58 86 99

CGRASP 46 50 71

QSAT 40 49 52

SATO 22 39 71

rel_sat.1.0 13 20 22

WalkSAT 13 18 32

rel_sat_rand 10 27 34

DLM-2000 9 37 70

GRASP 6 27 48

GRASP + restarts 6 11 18

CLS 5 8 10

rel_sat.2.1 4 71 99

eqsatz 345

BDDs 225

Table 1: Comparison of SAT procedures on 100 buggy

versions of 2xDLX-CC-MC-EX-BP.

0 10 20 30 40 50 60 70 80 90 100

100

101

102

103

104

105

100 Buggy VLIW Designs

Time, sec.

BDDs: 16 runs

Chaff: 1 run

one monolithic correctness criterion, while BDDs evaluate 16

weak (and easier) criteria in parallel [57]. The assumption is that

there are enough computing resources to support parallel runs of

the tool. As soon as one of these parallel runs comes with a coun-

terexample, we terminate the rest, and consider the minimum

time as the veriﬁcation time. As shown, the difference between

BDDs and Chaff is up to 4 orders of magnitude.

Applying the script simplify [35] in order to perform

algebraic simpliﬁcations on the CNF formula for one of the

buggy VLIW designs required more than 47,000 seconds, while

Chaff took only 14 seconds to ﬁnd a satisfying assignment with-

out simpliﬁcations. This is not surprising, given the CNF formula

sizes of up to 450,000 clauses with up to 25,000 variables.

Hence, based on experiments with two suites consisting of

100 buggy designs and their correct counterpart, we identiﬁed

Chaff as the most efﬁcient SAT procedure—more than 2 orders

of magnitude faster than other SAT solvers—for evaluating Bool-

ean formulas generated in the formal veriﬁcation of complex

microprocessors with realistic features. How does this change the

frontier of possibilities? The rest of the paper examines ways to

increase the productivity in formal veriﬁcation of microproces-

sors by using Chaff as the back-end SAT-checker.

5 Impact of Structural Variations in Gener-

ating the Boolean Correctness Formulas

Early reduction of p-equations. When eliminating UFs and

UPs that take only p-terms as arguments, the translation algo-

rithm introduces equations between argument terms in order to

enforce the functional consistency property by nested ITEs

[8][55]. The argument terms consist of only nested ITEs that

select one among a set of supporting domain variables. If the

terms on both sides of an equation have disjoint supports of

p-term domain variables, then the two compared terms will not

be equal under a maximally diverse interpretation and their equa-

tion can be replaced with false. This is already done in the ﬁnal

step of the translation algorithm [55]. However, an early reduc-

tion of such equations will result in a different structure of the

DAG for the ﬁnal Boolean formula, i.e., in a different (but equiv-

alent) CNF formula to be evaluated by SAT-checkers.

Eliminating UPs with Ackermann constraints. Ackermann

constraints [1] result in a negated equation for the outputs of the

eliminated UF or UP: [(a1=a2)∧(b1=b2)⇒(c1=c2)] ⇒F’,

which is equivalent to: (a1=a2)∧(b1=b2)∧¬(c1=c2)∨F’. The

negated equation for the output values c1and c2means that they

cannot be p-terms—something that we want to avoid in order to

exploit the computational efﬁciency of Positive Equality. There-

fore, Ackermann constraints should not be used for eliminating

UFs whose results appear only in positive equations. However,

they can be used when eliminating UPs—then the negated equa-

tions will be over Boolean variables and that is not a problem

when using Positive Equality. Hence, Ackermann constraints can

be used instead of nested ITEs for eliminating UPs.

The data points for 4 runs with structural variations, shown in

Fig. 2, are the minimum times among 4 parallel runs: one with no

structural variations (the data plotted for 1 run), one for each of

the above variations used alone, and one for both variations com-

bined. The data for 4 runs with parameter variations are the mini-

mum among the 1 run with no structural variations and 3

additional runs where some of the input parameters to Chaff were

changed. The average time for ﬁnding a satisfying assignment

when using structural variations is 45.8 seconds, with the maxi-

mum being 278 seconds, compared to 45 and 254 seconds,

respectively, with parameter variations. Therefore, the effect of

structural variations is almost identical with that of parameter

variations, as can be seen in the ﬁgure. Running them in parallel

(7 runs) reduces the average time to 37 seconds and the maxi-

mum to 218 seconds. Hence, only a few parallel runs with differ-

ent structural and/or parameter variations can help reduce the

time for SAT checking with Chaff. Structural variations also

accelerated the veriﬁcation of correct designs with up to 20%.

Figure 2: Using structural vs. parameter variations in Chaff.

The benchmarks are sorted in ascending order of their times

for the experiment with 1 run.

6 Encoding G-Equations

The eij encoding. The equation gi=gj, where giand gjare g-

term domain variables, is replaced by a unique Boolean variable

eij [18]. Transitivity of equality, (gi=gj)∧(gj=gk)⇒(gi=gk)

has to be enforced additionally, e.g., by triangulating the compar-

ison graph of those eij variables that affect the ﬁnal Boolean for-

mula and then enforcing transitivity for each of the resulting

triangles—sparse transitivity [9]. Although not every correct

microprocessor requires transitivity for its correctness proof, that

property is needed in order to avoid false negatives for buggy

processors or for designs that do need transitivity.

The small domains encoding. Every g-term domain variable is

assigned a set of constant values that it can take on in a way that

allows it to be either equal to or different from any other g-term

domain variable that it can be transitively compared for equality

with [40]. If the set of constants for a g-term variable consists of

Nvalues, those can be indexed with log2(N)Boolean variables.

Then two g-term domain variables are equal if their indexing

Boolean variables select simultaneously a common constant.

Note that transitivity is automatically enforced in this encoding.

Depending on the structure of the g-term variable comparison

graphs, the small domains encoding might introduce fewer pri-

mary Boolean variables than the eij encoding. That would mean a

smaller search space. However, now the equality comparison of

two g-term domain variables gets replaced with a Boolean for-

mula—a disjunction of conjuncts, each consisting of many Bool-

ean variables or their complements and encoding the possibility

that the two g-term domain variables evaluate to the same com-

mon constant—instead of just a single Boolean variable.

The two encodings are compared on the 100 buggy VLIW

designs in Fig. 3. In a single run of the small domains encoding,

the maximum CPU time for detecting a bug is 3,633 seconds and

the average is 394 seconds, compared to 355 and 79 seconds,

respectively, for the eij encoding (which was used for the experi-

ments before this section). Constraints for transitivity of equality

were included when using the eij encoding. Structural variations

with 4 runs reduced the maximum time with the small domains

encoding to 1,240 seconds, and the average to 154 seconds, com-

pared to 154 and 46 seconds, respectively, for the eij encoding.

When verifying the correct 9VLIW-MC-BP, the small

domains encoding resulted in 1,152 primary Boolean variables,

with 890 of them being indexing variables, and required 6,008

seconds of CPU time. On the other hand, the eij encoding

resulted in 2,615 primary Boolean variables, with 2,353 of them

0 10 20 30 40 50 60 70 80 90 100

100

101

102

103

100 Buggy VLIW Designs

Time, sec.

1 run

4 runs, structural variations

4 runs, parameter variations

being eij variables, and required 1,644 seconds of CPU time.

Since this design does not need transitivity of equality for its cor-

rectness proof, such constraints were not included in the formula

generated with the eij encoding. Adding these constraints

resulted in 705 extra eij variables due to triangulating the g-term

comparison graph, and in 2,680 seconds of CPU time—an

increase of over 1,000 seconds. Hence, including transitivity con-

straints for a design that does not need them for its correctness

proof might result in an increase of the veriﬁcation time.

Figure 3: Comparison of the eij and small domains encodings

on 100 buggy versions of 9VLIW-MC-BP, using Chaff. The

benchmarks are sorted in ascending order of their times for

the experiment with the small domains encoding.

We also compared the two encodings on correct designs that

do require transitivity of equality for their correctness proofs—

superscalar processors with out-of-order execution that can exe-

cute register-register and load instructions. Because instructions

are dispatched when they do not have Write-After-Write (in

addition to Write-After-Read and Read-After-Write) dependen-

cies [23] on instructions that are earlier in the program order but

are stalled due to data dependencies, transitivity of equality is

required in proving the equality of the ﬁnal states of the Register

File reached after the Implementation and the Speciﬁcation sides

of the commutative correctness diagram.

While the small domains encoding introduced fewer Boolean

variables—less than half of those required by the eij encoding for

the 5-wide design—it resulted in longer CPU times. Chaff could

not prove the unsatisﬁability of the CNF formula for the 6-wide

superscalar processor with either encoding in less than 24 hours

of CPU time—a direction for future work.

The efﬁciency of the eij encoding can be explained by the

impact of g-equations on the instruction ﬂow, and hence on the

correctness formula. Such equations determine forwarding and

stalling conditions, based on equality comparisons of register

identiﬁers, as well as instruction squashing conditions for cor-

recting branch mispredictions, based on equality comparisons of

actual and predicted branch targets. Therefore, g-equations affect

Issue

Width

G-Equation Encoding

eij small domains

Primary

Boolean

Variables

CPU Time

[sec]

Primary

Boolean

Variables

CPU Time

[sec]

2 95 3.5 81 3.7

3 201 54 127 64

4 346 810 194 2,358

5 530 2,500 249 3,804

Table 2: Comparison of the eij and small domains encodings

on correct out-of-order superscalar microprocessors that do

require transitivity of equality for their correctness proofs.

0 10 20 30 40 50 60 70 80 90 100

100

101

102

103

104

100 Buggy VLIW Designs

Time, sec.

small domains encoding: 1 run

eij encoding: 1 run

the execution of many instructions. A single Boolean variable,

introduced in the eij encoding, naturally ﬁts the purpose of

accounting for both cases—that the equality comparison is either

true or false. Transitivity of equality is never violated—as soon

as two eij variables in a triangle become true then the third eij

variable in that triangle immediately becomes true, due to the

imposed transitivity constraints and the effect of the unit clause

rule in SAT-checkers, and this is immediately extended to any

cycle of eij variables [9]—which avoids wasteful exploration of

infeasible portions of the search space.

On the other hand, the small domains encoding enumerates

all mappings of g-term domain variables to a sufﬁcient set of dis-

tinct constants, thus introducing more information than actually

required to solve the problem. Now, an auxiliary Boolean vari-

able fij is introduced in place of each primary eij Boolean variable

from the previous encoding, such that fij represents the value of a

Boolean formula enumerating the cases when g-terms iand jwill

evaluate to the same common constant. Therefore, fij depends on

the indexing Boolean variables xil that encode the mapping of

g-term ito its set of possible constants, and on the indexing Bool-

ean variables xjk that encode the mapping of g-term jto its set of

possible constants. Note that the indexing variables xil will affect

the value of each fim auxiliary variable that encodes the equality

between g-term iand some g-term m. If a SAT-checker assigns

values to fij variables before all their supporting indexing vari-

ables, then the fij values might violate transitivity of equality.

Furthermore, it might take a while before enough indexing vari-

ables get assigned in order to detect the violation and to correct it

by backtracking. The work done in the meantime will be wasted.

On the other hand, if all supporting indexing variables get

assigned before the fij variables that they affect, then those fij

variables will ﬂip every time when a single indexing variable in

their support ﬂips. Note that each legal assignment to eij vari-

ables is a legal assignment to fij variables, except that now it can

be justiﬁed with many possible assignments to the indexing vari-

ables. Hence, multiple branches in the formula will be revisited

for what will be just one visit with the eij encoding. As a result,

the small domains encoding is less efﬁcient than the eij encoding.

In a different application—encoding constraint satisfaction

problems as SAT instances—Hoos [24] similarly found that bet-

ter performance is achieved with an encoding that introduces

more variables but results in conceptually simpler search spaces.

7 Beneﬁts of Conservative Approximations

and Positive Equality

Conservative approximations, such as manually inserted transla-

tion boxes (dummy UFs or UPs with one input) [56] or automati-

cally abstracted memories [57] have the potential to speed up the

veriﬁcation of correct designs, but might result in false negatives

that will require manual user intervention and analysis. Not

exploiting such optimizations in the veriﬁcation of 9VLIW-MC-

BP-EX resulted in CPU time of 2,542 seconds with monolithic

evaluation of the correctness criterion and the eij encoding, com-

pared to 1,513 seconds with the optimizations. However, exploit-

ing structural variations in only one run—combining early

reductions of p-equations and Ackermann constraints for elimi-

nating UPs—resulted in CPU time of 1,964 seconds. This is a

negligible overhead, compared with the burden of manual analy-

sis necessary to identify potential false negatives that might

result when using these optimizations.

We then evaluated the beneﬁts of exploiting Positive Equal-

ity, given the extremely efﬁcient SAT-checker Chaff. This was

implemented by introducing an eij Boolean variable for the

equality comparison of two distinct p-term domain variables as

done originally by Goel, et al. [18], instead of treating these

p-terms as different. We started with a buggy version of

1×DLX-C: the bug was detected in 0.02 seconds with Positive

Equality, compared to 20 seconds without. Verifying the correct

1×DLX-C took 0.17 seconds with Positive Equality, compared to

3,111 seconds without. The bug in an erroneous version of

2×DLX-CC-MC-EX-BP was detected in 1.6 seconds with Posi-

tive Equality, compared with 661 seconds without. The correct

2×DLX-CC-MC-EX-BP was veriﬁed in 40 seconds with Positive

Equality, consuming 36 MB of memory, but ran out of memory

after 77,668 seconds without exploiting Positive Equality.

Finally, a bug in an incorrect version of 9VLIW-MC-BP was

detected in 173 seconds using 96 MB, compared to running out

of memory after 6,351 seconds without Positive Equality. There-

fore, exploiting Positive Equality is still the major reason for our

success in formally verifying complex microprocessors.

8 Conclusions

We found the SAT-checker Chaff [38] to be the most efﬁcient

means for evaluating Boolean formulas generated in the formal

veriﬁcation of both correct and buggy microprocessors, dramati-

cally outperforming 27 SAT-checkers, 2 ATPG tools, and 2 deci-

sion diagrams—BDDs [7] and BEDs [61]. Reassessing various

optimizations that can be applied when producing the Boolean

formula for the microprocessor correctness, we conclude that the

single most important step is exploiting Positive Equality [8].

Without it, Chaff would not have scaled for realistic superscalar

and VLIW microprocessors with exceptions, multicycle func-

tional units, branch prediction, and other speculative features.

Exploiting the eij encoding [18] of g-equations resulted in a

speedup of a factor of 4 for our most complex VLIW benchmarks

compared to the small domains encoding [40] when verifying

correct designs, and consistently performed better on buggy ver-

sions. Although the eij encoding results in more than twice as

many primary Boolean variables, its efﬁciency can be explained

with the conceptual simplicity of the resulting search space—

with each eij Boolean variable naturally encoding the equality

between a pair of g-term domain variables. Transitivity of equal-

ity is never violated, which avoids wasteful exploration of infea-

sible portions of the search space. In contrast, the small domains

encoding enumerates all mappings of g-term domain variables to

a sufﬁcient set of distinct constants, thus introducing more infor-

mation than actually required to solve the problem. This results

in revisiting portions of the search space for what would be just

one visit with the eij encoding. Transitivity of equality is not

guaranteed to be always satisﬁed, also allowing wasteful work.

Conservative approximations, such as automatic abstraction

of memories [57] and manually-inserted translation boxes [56],

are not as essential to the fast veriﬁcation of correct VLIW and

dual-issue superscalar processors when using Chaff as these opti-

mizations were when using BDDs—previously the most efﬁcient

SAT procedure for correct designs.

Structural variations in generating the Boolean correctness

formulas—early reductions of p-equations and using Ackermann

constraints for eliminating uninterpreted predicates—as well as

parameter variations for Chaff can help to somewhat accelerate

the SAT checking, although no single variation performs best.

Applying algebraic simpliﬁcations [35] to the CNF formulas

resulting from realistic microprocessors is impractical, due to the

large number of clauses—hundreds of thousands.

To conclude, we showed that Chaff can easily handle very

hard and big CNF formulas, produced in the formal veriﬁcation

of microprocessors without applying conservative transforma-

tions that were previously needed in BDD-based evaluations but

have the potential to result in false negatives and to take exten-

sive human effort to analyze. We identiﬁed the optimizations that

do help increase the performance of Chaff on realistic dual-issue

superscalar and VLIW designs—Positive Equality, combined

with the eij encoding, and possibly with structural/parameter

variations in multiple parallel runs. Our study will increase the

productivity of microprocessor design engineers and shorten the

1 / 7 100%

Documents connexes

http://www.aaai.org/Papers/AAAI/2006/AAAI06-241.pdf

http://fmv.jku.at/papers/BiereCimattiClarkeFujitaZhu-DAC99.pdf

http://www.cecs.uci.edu/%7Epapers/compendium94-03/papers/1999/dac99/pdffiles/18_4.pdf

pdf-3-exercices-java

Mining Knowledge-Sharing Sites for Viral Marketing Matthew Richardson and Pedro Domingos

http://fmv.jku.at/papers/ClarkeBiereRaimiZhu-FMSD-19-1-2001.pdf

An Improved CNF Encoding Scheme for Probabilistic Inference

Basics on Social Democracy: Introduction to Political Ideology

Merci pour votre participation!

Faire une suggestion

Avez-vous trouvé des erreurs dans l'interface ou les textes ? Ou savez-vous comment améliorer l'interface utilisateur de StudyLib ? N'hésitez pas à envoyer vos suggestions. C'est très important pour nous!

GDPR Confidentialité Conditions d''utilisation

http://www.cs.cmu.edu/~bryant/pubdir/dac01b.pdf

Documents connexes

Faire une suggestion

Produits

Assistance

Produits

Assistance

http://www.cs.cmu.edu/~bryant/pubdir/dac01b.pdf

Documents connexes

Faire une suggestion

Produits

Assistance

Ajouter ce document à la (aux) collections

Ajouter ce document à enregistré

Suggérez-nous comment améliorer StudyLib