COURS DE VERIFICATION LOGICIEL MASTER II 2018-2019 Teacher ‘s name : Pr Atsa Etoundi Roger Résumé Ce document présente les différents travaux éffectués par les étudiants de MASTER 2 promotion 2017-2018 pour le compte du cours de vérification logiciel. Ce document est divisé en cinq parties à savoir : - statistical and function approaches to testing, - test data analysis, testability, - static analysis techniques, - dynamic analysis techniques, - selected state-of-the-art results, and world application statistical and function approaches to testing Introduction Software testing is the process of analyzing software to find the difference between required and existing condition. Software testing is performed throughout the development cycle of software and it is also performed to build quality software, for this purpose two basic testing approaches are used, they are white box testing and black box testing. One of the software testing technique is Black Box Testing, and we’ll show the statistical testing. I. Black box testing Black box testing is an integral part of correctness testing but its ideas are not limited to correctness testing only. The tester, in black box testing only knows about the input (process by a system) and required output, or in the other word tester need not know the internal working of the system. II. DIFFERENT FORMS OF BLACK BOX TESTING TECHNIQUE The differents forms of black box testing technique are (see figure) : For each forms we can respond at all the question : what ?,why ? ,how ? 1. Equivalance Partitioning What ? Equivalence partitioning is a black box testing method that divides the input data of a software unit into partitions of data from which test cases can be derived In equivalence class partitioning an equivalence class is formed of the inputs for which the behavior of the system is specified or expected to be similar An equivalence class represents a set of valid or invalid states for input conditions. See the figure Why ? The issue is to select the test cases suitably. The partitioning an equivalence class is formed of the inputs for which the behavior of the system is specified or expected to be similar. How ? Some of the guidelines for equivalence partitioning are given below : 1) One valid and two invalid equivalence classes are defined if an input condition specifies a range. 2) One valid and two invalid equivalence classes are defined if an input condition requires a specific value. 3) One valid and one invalid equivalence class are defined if an input condition specifies a no. of a set. 4) One valid and one invalid equivalence class are defined if an input condition is Boolean 2. Boundary Value Analysis What ? Boundary value analysis is a testing technique that focuses more on testing at boundaries or where the extreme boundary values are chosen. Boundary value include maximum, minimum, just inside/outside boundaries, typical values and error values. See figure Why ? The systems have tendency to fail on boundary as programmers often make mistakes on the boundary of the equivalence classes/input domain . How ? Suppose in boundary value analysis each input values has defined range : 1) The extreme ends of the range. 2) Just beyond the ends. 3) Just before the ends. 3. Cause-Effect Graph What ? Cause Effect Graph is a black box testing technique that graphically illustrates the relationship between a given outcome and all the factors that influence the outcome. It is also known as Ishikawa diagram as it was invented by Kaoru Ishikawa or fish bone diagram because of the way it looks On se focalise sur les causes et effets et non forcement sur les donées d’entrées. See figure Why ? Equivalence Partitioning Partition the input space into equivalence classes and develop a test case for each class of inputs. Boundary value analysis develops test cases at the boundaries of the possible input ranges (minimum and maximum values). These techniques are important for data processing intensive applications. How ? Step 1 : Identify and Define the Effect Step 2 : Fill in the Effect Box and Draw the Spine Step 3: Identify the main causes contributing to the effect being studied. Step 4 : For each major branch, identify other specific factors which may be the causes of the EFFECT. Step 5 : Categorize relative causes and provide detailed levels of causes. 4. Fuzzing why ? Fuzz testing has enjoyed great success at discovering security critical bugs in real software Fuzzing is also used to test for security problems in software what? Fuzz testing is often employed as a black box software testing technique, which is used for finding implementation bugs using malformed / semimalformed data injection in an automated or semi-automated fashion. Two forms of fuzzing programs are : 1) Mutation based-mutation based fuzzers mutate existing data sample to create test data. 2) Generation based- generation based fuzzers define new test data based on models of input. How ? Using tools, they have each procedure. for example : AFL(american fuzz loop). Steps of fuzzing : 1. Compile/install AFL (once) 2. Compile target project with AFL afl‐gcc / afl‐g++ / afl‐clang / afl‐clang++ / (afl‐as) 3. Chose target binary to fuzz in project Chose its command line options to make it run fast 4. Chose valid input files that cover a wide variety of possible input files afl‐cmin / (afl‐showmap) 5.Fuzzing afl‐fuzz 6. Check how your fuzzer is doing command line UI / afl‐whatsup / afl‐plot / afl‐gotcpu 7. Analyze crashes afltmin / triage_crashes.sh / peruvian were rabit • ASAN / valgrind / exploitable gdb plugin / … 8 Have a lot more work than before CVE assignment / responsible disclosure / … 5. Orthogonal array testing What ? Orthogonal Array Testing (OAT) is defined as a type of software testing strategy that uses Orthogonal Arrays especially when the system to be tested has huge data inputs. This type of pairing or combining of inputs and testing the system to save time is called Pairwise testing. Pairwise Testing also known as All-pairs testing is a testing approach taken for testing the software using combinatorial method. It's a method to test all the possible discrete combinations of the parameters involved. Why ? Any scenario, delivering a quality software product to the customer has become challenging due to the complexity of the code. In the conventional method, test suites include test cases that have been derived from all combination of input values and pre-conditions. As a result, n number of test cases has to be covered. How ? How OAT's is represented ? Runs (N) – Number of rows in the array, which translates into a number of test cases that will be generated. Factors (K) – Number of columns in the array, which translates into a maximum number of variables that can be handled. Levels (V) – Maximum number of values that can be taken on any single factor. A single factor has 2 to 3 inputs to be tested. That maximum number of inputs decide the Levels. How to do Orthogonal Array Testing ? 1. 2. 3. 4. 5. dentify the independent variable for the scenario. Find the smallest array with the number of runs. Map the factors to the array. Choose the values for any "leftover" levels. Transcribe the Runs into test cases, adding any particularly suspicious combinations that aren't generated. 6. All pairs Testing What ? All-pairs also known as pairwise testing is a testing approach taken for testing the software using combinatorial method. It's a method to test all the possiblediscrete combinations of the parameters involved. Why ? Assume we have a piece of software to be tested which has got 10 input fields and 10 possible settings for each input field. Then, there are 10^10 possible inputs to be tested. In this case, exhaustive testing is impossible even if we wish to test all combinations. How ? Step 1 : Order the values such that one with most number of values is the first and the least is placed as the last variable. Step 2 : Now, start filling the table column by column. List box can take 2 values. Step 3 : The next column under discussion would be check box. Again, Check box can take 2 values. Step 4 : Now, we need to ensure that we cover all combinations between list box and Check box. Step 5 : Now, we will use the same strategy for checking the Radio Button. It can take 2 values. Step 6 : Verify if all the pair values are covered as shown in the table below 7. State Transition testing What ? State Transition testing, a black box testing technique, in which outputs are triggered by changes to the input conditions or changes to 'state' of the system. In other words, tests are designed to execute valid and invalid state transitions. Why ? very large or infinite number of test scenarios + finite amount of time = impossible to test everything. How ? The tester built the model as following : 1)States → how the state of the system changes from one state to another. 2) Transition →how the state of the system changes from one state to another. 3) Events→input to the system 4) Actions→output for the events. III. Statistical Testing What ? Statistical testing is based on a probabilistic generation of test patterns: structural or functional criteria serve as guides for defining an input profile and a test size. The method is intended to compensate for the imperfect connection of criteria with software faults, and should not be confused with random testing, a blind approach that uses a uniform profile over the input domain. Why ? Finding bug Estimate the reability of software How to perform Statistical Testing ? Step 1: Construct the statistical models based on actual usage scenarios and related frequencies. Step 2: Use these models for test case generation and execution. Step 3: Analyze the test results for reliability assessment and predictions, and help with decisionmaking. Conclusion I conclude black box testing is a testing technique that ignores the internal mechanism or structure of a system and focuses on the outputs generated in response to selected inputs and execution conditions. The statistical testing is a suitable means to compensate for the tricky link between test criteria and software design faults References : S. Akers. Probabiistic Techniques for Test Generation from Functional Descriptions. 1979. J. Whittaker. Markov chain techniques for software testing and reliability analysis. Thesis, University of Tenessee, May 1992. J. A. Whittaker, Stochastic Software Testing, Annals of Software Engineering, vol. 4: pp. 115 – 131, 1997 Whittaker JA, Thomason MG. A Markov Chain Model for Statistical Software Testing[J]. IEEE Transactions on Software Engineering, 1994, 20(10):812-824 Zhang Youzhi. Research and application of hidden Markov model in data mining [C]. Second IITA International Conference on Geoscience and Remote Sensing (IITA-GRS), Aug. 2010, 459-462. Test Data analysis, Testability III- Defintion de donnéees de test 1- Exemple Analyse partitionnelle des données d’entrées et test aux limites 1- Classe d’equivalence - Partitionnement de domaines - Analyse partitionnelle-Methode 2- Test aux limites - Principe - Analyse des valeurs aux limites - Exemple - Types de données - Analyse partitionnelle et test aux limites- synthèse - Combinaison des valeurs d’entrées I. Définition : données de test Une donnée de test est un n-uplet DT (SC, DE, f, g, RA, tol) et d’une mesure N, tels que : - - SC est le scénario à tester. DE est la donnée en entrée représentative de ce scénario. F est la fonction à tester. G est la fonction de test. Il s’agit de la fonction à appliquer au résultat obtenu en appelant la fonction à tester, afin d’obtenir une valeur nous permettant de savoir si le résultat est bon ou non. RA est le résultat attendu, et on doit avoir la condition. N(g(f(DE)) - RA) < tol Dans les cas les plus courants, on choisit N=valeur absolue, G=ID ou G=f-1 Remarques : 1. La fonction de test doit normalement être plus facile que la fonction à tester. Elle devrait être suffisamment facile pour qu’il ne soit pas nécessaire de la tester elle-même. 2. Dès qu’un élément du n-uplet manque, le cas de test est incomplet. En particulier, pas de cas de test sans résultat attendu. 3. La qualité du test dépend de la pertinence du choix des données de test. II. Analyse partitionnelle des domaines des données d’entrée et test aux limites 1. Classe d’équivalence de test (domaines des données d’entrées) Définition : classe d’équivalence de test Une classe d’équivalence correspond à un ensemble de données de tests supposées tester le même comportement pour la fonction à tester, c’est-à-dire si la fonction marche pour l’une d’entre elles, elle marche pour toutes les autres. Les classes d’équivalences nous permettent de limiter le nombre de cas de tests. On évitera de tester deux données de la même classe d’équivalence. Partitionnement de domaines : exemple - - Soit le programme suivant : Un programme lit trois nombres réels qui correspondent à la longueur des côtés d’un triangle. Si ces trois nombres ne correspondent pas à un triangle, imprimer le message approprié. Dans le cas d’un triangle, le programme examine s’il s’agit d ’un triangle isocèle, équilatéral ou scalène et si son plus grand angle est aigu, droit ou obtus (< 90°, = 90°, > 90 °) et renvoie la réponse correspondante. Classes d’équivalence et Données de Test Scalène Isocèle Equilatéral Aigu 6,5,3 6,1,6 4,4,4 + non triangle -1,2,8 Obtus 5,6,10 7,4,4 Impossible Droit 3,4,5 √2,2, √2 Impossible Analyse partitionnelle – Méthode Trois phases : - Pour chaque donnée d’entrée, calcul de classes d’équivalence sur les domaines de valeurs, Choix d’un représentant de chaque classe d’équivalence, Composition par produit cartésien sur l’ensemble des données d’entrée pour établir les DT. Soit Ci, une classe d’équivalence, ∪ Ci = E ∧ ∀i,j Ci∩Cj= ∅ Règles de partitionnement des domaines Si la valeur appartient à un intervalle, construire : o Une classe pour les valeurs inférieures, o Une classe pour les valeurs supérieures, o N classes valides. Si la donnée est un ensemble de valeurs, construire : o Une classe avec l’ensemble vide, o Une classe avec trop de valeurs, o N classes valides. Si la donnée est une obligation ou une contrainte (forme, sens, syntaxe), construire : o Une classe avec la contrainte respectée, o Une classe avec la contrainte non-respectée 2. Test aux limites Principe : on s’intéresse aux bornes des intervalles partitionnant les domaines des variables d’entrées : - - - Pour chaque intervalle, on garde les 2 valeurs correspondant aux 2 limites, et les 4 valeurs correspondant aux valeurs des limites ±le plus petit delta possible n ∈ 3 .. 15 ⇒ v1 = 3, v2 = 15, v3 = 2, v4 = 4, v5 = 14, v6 = 16 Si la variable appartient à un ensemble ordonné de valeurs, on choisit le premier, le second, l’avant dernier et le dernier n ∈ {-7, 2, 3, 157, 200} ⇒ v1 = -7, v2 = 2, v3 = 157, v4 = 200 Si une condition d’entrée spécifie un nombre de valeurs, définir les cas de test à partir du nombre minimum et maximum de valeurs, et des tests pour des nombres de valeurs hors limites invalides. Un fichier d’entrée contient 1 - 255 records, produire un cas de test pour 0, 1, 255 et 256. Analyse des valeurs aux limites Exemple Types de données Les données d’entrée ne sont pas seulement des valeurs numériques : caractères, booléens, images, son, … Ces catégories peuvent, en général, se prêter à une analyse partitionnelle et à l’examen des conditions aux limites : o True/ False o o o o o Fichier plein / Fichier vide Trame pleine / Trame vide Nuances de couleur Plus grand / plus petit …. Analyse partitionelle et test aux limites – synthèse L’analyse partitionnelle est une méthode qui vise à diminuer le nombre de cas de tests par calcul de classes d’équivalence o Importance du choix de classes d’équivalence : risque de ne pas révéler un défaut Le choix de conditions d’entrée aux limites est une heuristique solide de choix de données d’entrée au sein des classes d’équivalence o Cette heuristique n’est utilisable qu’en présence de relation d’ordre sur la donnée d’entrée considérée. Le test aux limites produit à la fois des cas de test nominaux (dans l’intervalle) et de robustesse (hors intervalle) Combinaison des valeurs d’entrée - Le problème de la combinaison des valeurs des données d’entrée Données de test pour la variable Xi : DT xi= {di1, …, din} - Produit cartésien : DTX1×DTx2×…×DTxn => Explosion du nombre de cas de tests Choix de classes d’équivalence portant sur l’ensemble des données d’entrée REFERENCES : Fundamentals of Software Testing , Bernard Homès SOFTWARE TESTING AND QUALITY ASSURANCE Theory and Practice , KSHIRASAGAR NAIK , PRIYADARSHI TRIPATHY Static techniques Table des matières Introduction .................................................................................................................. 26 Définition ..................................................................................................................... 27 Avantages des techniques statiques.............................................................................. 27 Inconvénients des techniques statiques ........................................................................ 27 Activités des techniques statiques ................................................................................ 27 4.1) La revue .......................................................................................................... 28 4.2) Types de revues ............................................................................................... 29 4.3) Management reviews ...................................................................................... 29 4.4) Technical reviews ............................................................................................ 29 4.5) Inspections ....................................................................................................... 29 4.6) Walk-through .................................................................................................... 30 4.7) L’audit ............................................................................................................. 30 Analyse statique ........................................................................................................... 30 5.1) Types d’analyse statiques ................................................................................ 31 Conclusion .................................................................................................................... 31 Références .................................................................................................................... 32 Introduction La vérification et la validation désignent toute activité/processus visant à s’assurer qu’un logiciel est conforme à ses spécifications et répond aux besoins du client. Pour atteindre les objectifs du processus de vérification et de validation, des techniques d’analyse et de de vérification du système devraient être utilisées. Ces techniques sont notamment les techniques de statiques et dynamiques. Les techniques d’analyse statiques sont des techniques de vérification de système qui n’impliquent pas l’exécution d’un programme alors que les techniques d’analyse dynamiques nécessitent l’exécution de l’application pour expérimenter et observer le comportement du système. Dans cet exposé, on s’attardera sur les techniques d’analyse statiques. 1) Définition Les techniques statiques consistent à tester un système sans toutefois l’exécuter. Elles visent à inspecter/analyser de manière statique (sans exécution) un système pour découvrir des problèmes ou prouver sa correction. Elles peuvent être manuelles ou automatiques. Les techniques statiques peuvent être appliquées à de nombreux produits livrables, tels que des composants logiciels, les spécifications de haut niveau ou détaillées ; listes d’exigences ; les contrats ; plans de développement ou plans de test. 2) Avantages des techniques statiques Les principaux avantages des techniques statiques sont : – Elles ne nécessitent pas l’exécutable d’une application fonctionnelle, elles peuvent être appliquées à des documents ou une partie du logiciel. – Elles sont moins couteuses que les techniques dynamiques et leur retour sur investissement est élevé. Puisque, plus les erreurs sont identifiées tôt, plus il est facile de les corriger. 3) Inconvénients des techniques statiques Les principaux inconvénients des techniques statiques sont : – Elles prennent assez de temps surtout lorsqu’elles doivent être fait manuellement. – Les outils automatisés pour les réaliser ne fonctionnent pas avec tous les langages de programmation. – Les outils automatisés peuvent fournir des résultats à la fois justes et faux. 4) Activités des techniques statiques Les techniques statiques couvrent deux grandes activités : la revue et l’analyse statique qui se matérialisent sur la figure ci-dessous : 4.1) La revue Les objectifs de la revue peuvent être de quatre types : – vérification de la conformité avec les documents de haut niveau qui ont servi à sa création. – vérification par rapport aux documents du projet au même niveau. – vérification par rapport aux normes et aux standards, aux meilleures pratiques recommandées, pour assurer la conformité du produit soumis aux revues et à l’analyse statique. – vérification de l’utilisation et à l’aptitude à l’emploi, par exemple pour concevoir des composants plus détaillés. 4.2) Types de revues Une revue est un processus/une réunion au cours de laquelle les documents d’un produit logiciel sont examinés systématiquement par une ou plusieurs personnes dans le but principal de détecter et de supprimer les erreurs tôt dans le cycle de développement d’un logiciel. Les revues sont utilisées pour vérifier des documents tels que le cahier de charge, la conception des systèmes, le code, les plans de test et les scénarios de test. De cette définition, il ressort que la revue d’un logiciel est importante pour le test du logiciel. Son apport est que le temps et le coût des tests sont réduits puisqu’on passe assez de temps à la phase initiale. En plus, on note également l’amélioration de la productivité des développeurs et les délais réduits car la détection des erreurs très tôt dans le cycle de vie du logiciel permettent de garantir que les documents sont clairs et sans ambiguïté. Les revues peuvent être groupées en fonction de leur niveau de formalisme. De la plus formelle à la moins formelle. C’est ainsi qu’on a : les revues formelles et les revues informelles. Les revues formelles est un type de revue qui ne suit aucun processus formel permettant de rechercher des erreurs dans le document. Sous cette technique, il vous suffit de consulter le document et de donner des commentaires informels dessus. La norme IEEE 1028-2008 a identifié plusieurs types de revues parmi lesquelles : 1. Management reviews 2. Technicals reviews 3. Inspections 4. Walk-throughs 5. l’audit 4.3) Management reviews L’objectif de cette revue est de suivre les progrès, de définir l’état des plans et du calendrier ou d’évaluer l’efficacité des méthodes de gestion utilisées et leur adéquation aux objectifs. Elles identifient la conformité et les écarts par rapport aux plans ou procédures de gestion. Des connaissances techniques peuvent être nécessaires pour gérer avec succès ces types de revues. 4.4) Technical reviews Ces revues ont pour objectif d’évaluer, avec une équipe de personnes qualifiées, un logiciel, de déterminer s’il est adapté à son utilisation qui a été prévu et de relever les écarts par rapport aux normes et spécifications en vigueur. Ces revues fournissent des preuves de la gestion concernant, le statut technique d’un projet ; ils peuvent également fournir des recommandations et évaluer des alternatives. 4.5) Inspections L’inspection est considérée comme la méthode de contrôle la plus formelle, souvent gérée par un modérateur bien formé. Dans cette méthode, chaque document est strictement contrôlé par rapport à la liste de contrôle. Cela aide à détecter et identifier les défauts et à les documenter. 4.6) Walk-through Ce type de revue est souvent géré par l’auteur pour que les membres de l’équipe appréhendent le projet, en particulier en termes de modification des exigences, et pour aider à recueillir plus de détails. Cette revue peut être organisée dans le but d’éduquer le public sur le produit. 4.7) L’audit Celle-ci est effectuée par le personnel en dehors du projet. Elle évalue le logiciel avec des spécifications, des normes, directives et d’autres critères. Analyse statique Ici, le code source écrit par les développeurs est analysé (généralement par des outils) pour rechercher les défauts structurels pouvant conduire à des défauts. C’est une aide importante pour les inspecteurs, mais ne remplace pas complètement l’inspection de code. Cette technique est utilisée dans les compilateurs. Ici, on a une liste de classe de défaut que l’analyse statique vérifie : 1. Les défauts de données : comme par exemples : – Les variables utilisées avant initialisation – Les variables non déclarées – Les variables déclarées mais jamais utilisées 2. Les défauts de contrôle – Code mort – Boucle infinie 3. Défauts d’entrées/sorties – Double affichage sans affectation 4. Défauts d’interface – Paramètres en mauvais nombre – Fonctions non utilisées 5. Défaut de gestion e la mémoire – Mémoire non libérée – Arithmétique des pointeurs complexes. 5.1) Types d’analyse statiques Il existe différents types d’analyse statique chacun avec ses forces et faiblesses : 1. L’analyse des flux de données qui consiste à évaluer les variables et à vérifier si elles sont correctement définies et initialisées avant d’être utilisées (référencées). 2. Analyse des flux d’information. Identifie les dépendances des variables de sortie. Ne détecte pas les anomalies proprement dites mais met en évidence des informations pour l’inspection ou la révision de code 3. Analyse de chemin. Identifie les chemins à travers le programme et énonce les instructions exécutées dans ce chemin. Encore une fois, potentiellement utile dans le processus de revue 4. Analyse d’interface. Vérifie la cohérence des déclarations de routine et de procédure et leur utilisation 5. Analyse des flux de contrôle. Vérifie les boucles avec plusieurs points d’entrée ou de sortie, trouve le code inaccessible, etc. Conclusion Au demeurant, il était question pour nous de parler des techniques statiques. Nous avons vu que, ces techniques consistent à tester le système sans toutefois l’exécuter. Ses avantages sont indéniables car elle permet de réduire le temps de test, car le nombre de fautes est déterminé et corrigé très tôt dans le cycle du développement du logiciel. Cependant, ces techniques ne permettent pas par exemple de tester la robustesse, l’aspect sécurité...D’où l’utilisation des techniques dynamiques. Références [1] Homès, Bernard. Fundamentals of Software Testing. John Wiley Son, 2013. [2] Sommerville, Ian and others Software engineering. Addison-wesley, 2007. DYNAMIC PROGRAM ANALYSIS Table des matières Chapter ....................................................................................................................................... 3 Introduction ................................................................................................................................ 3 I. Program analysis.................................................................................................................. 4 1. Definition ........................................................................................................................ 4 2. Purposes of Program Analysis ........................................................................................ 5 3. Applications of Program Analysis .................................................................................. 5 4. Program Analysis Flavors ............................................................................................... 6 a. Static analysis .............................................................................................................. 6 b. Advantages .................................................................................................................. 6 c. Limitations ................................................................................................................... 6 II. Dynamic analysis ............................................................................................................ 6 1. Definition ........................................................................................................................ 7 2. Dynamic Analysis goals .................................................................................................. 7 3. Advantages ...................................................................................................................... 7 4. Limitations ...................................................................................................................... 7 III. Dynamic Analysis Techniques ........................................................................................ 8 1. Program instrumentation ................................................................................................. 8 a. Static instrumentation .................................................................................................. 8 b. Dynamic instrumentation ............................................................................................ 9 2. Program tracing ............................................................................................................. 10 a. What is Tracing ......................................................................................................... 10 b. Why Tracing .............................................................................................................. 11 c. How to Trace ............................................................................................................. 11 3. IV. Program profiling .......................................................................................................... 13 a. What is profiling ........................................................................................................ 13 b. Why profiling ............................................................................................................ 14 c. How to do profiling ................................................................................................... 14 Dynamic Analysis Tools ............................................................................................... 15 References ................................................................................................................................ 18 Chapter Introduction Program analysis is a collection of techniques for computing approximate information about a program. Program analysis finds several applications: in compilers, in tools that help programmers understand and modify programs, and in tools that help programmers verify that programs satisfies certain properties of interest. It is subdivised into two major parts. Static analysis has long been used for analysing the dynamic behavior of programs because it is simple and does not require running the program. Dynamic analysis, on the other hand, is the analysis of the properties of a running program. It involves the investigation of the properties of a program using information gathered at run-time. Deployment of software now-a-days as a collection of dynamically linked libraries is rendering static analysis imprecise. In The remainder of our work, we are going to be focused on Dynamic analysis, and the work is organised as follows: Firstly an overview Program analysis (definition, purposes, application and types),followed by Dynamic analysis(definition,purpose, advantage and limitation) , then some techniques used in dynamic analysis. Finally in the last part, we present some tools for conduct dynamic analysis. I. Program analysis A program is a list of simples instructions be executed in order to perform a more complex task. Example: a washing machine 1. 2. 3. 4. 5. 6. Check that the cover is closed Start the drum motor Open the water valve Close the valve as soon as there is enough water Switch on the heating resistor 45 minutes after starting, drain the water and it’s all over. While writting a programs, Human can easly forget details that seem obvious to him (Eg:Open the water valve before switching on the heater), or poorly anticipates exceptional situations( Eg: water supply cut off). So we have to study the behavior of the program so that does what it is supposed to do, does not do what it should not do, and slove the expected problem in an efficient way (optimization). 1. Definition Program analysis is the process of automatically analyzing the behavior of computer programs regarding a property such as correctness, robustness, safety and liveness. Program analysis focuses on two major areas: • Program optimization : is the process of modifying a software system to make some aspect of it work more efficiently or use fewer resources.In general, a computer program may be optimized so that it executes more rapidly, or to make it capable of operating with less memory storage or other resources, or draw less power. • Program correctness : focuses on ensuring that the program does what it is supposed to do. In theoretical computer science, correctness of an algorithm is asserted when it is said that the algorithm is correct with respect to a specification. Functional correctness refers to the input-output behaviour of the algorithm (i.e., for each input it produces the expected output). 2. Purposes of Program Analysis Since Programs are written by men, during this step, human can make several errors(Easily forgets details that seem obvious to him , Poorly anticipates exceptional situations , Use the wrong data structure , Use less efficient alternative for au subtask, can be distranted while programming), so we have to develop some techniques to reduce theses errors. Program Analysis aim to ensure that: • The program does what it is supposed to do. • The program does not do what it should not do. • Eg : The heating resistor is never switched on when the tank is empty. • The program does what it is supposed to do efficiently 3. Applications of Program Analysis Program Analysis has several applications, we subdivise its applications into : • Optimization : In general, a computer program may be optimized so that it executes more rapidly, or to make it capable of operating with less memory storage or other resources. • Avoid redundant / unnecessary computation • Compute in a more efficient way • Verifying correctness : correctness of an algorithm is asserted when it is said that the algorithm is correct with respect to a specification. • Software Quality Assurance activities : To ensure that the software under development or modification will meet desired quality requirements • Finding bugs • Determining properties : • Performance • Security and reliability • Design and architecture 4. Program Analysis Flavors Two mains flavors for Program Analysis : • Dynamic analysis • Static analysis a. Static analysis In the context of program correctness, static analysis can discover vulnerabilities during the development phase of the program. These vulnerabilities are easier to correct than the ones found during the testing phase since static analysis leads to the root of the vulnerability. Incorrect optimizations are highly undesirable. So, in the context of program optimization, there are two main strategies to handle computationally undecidable analysis. b. Advantages • • • • It can find weaknesses in the code at the exact location. Source code can be easily understood by other or future developers. Weaknesses are found earlier in the development life cycle, reducing the cost to fix. It allows a quicker turn around for fixes. c. Limitations • It is time consuming if conducted manually. • Automated tools produce false positives and false negatives. • It does not find vulnerabilities introduced in the runtime environment. • There are not enough trained personnel to thoroughly conduct static code analysis. Program analysis seem often restricted to static analysis i.e the analysis of program source code only. Although static analysis has advantages, it is not always the most appropriate type of analysis. As Shown by the popularity of prototyping, it seems for example that people often have a better understanding of what a program should do than what it should be. The following pages will only talk about Dynamic Analysis. II. Dynamic analysis Static approaches typically concern (semi-)automatic analyses of source code. An important advantage of static analysis is its completeness: a system’s source code essentially represents a full description of the system. One of the major drawbacks is that static analyses often do not capture the system’s behavioral aspects: in object-oriented code, for example, occurrences of late binding and polymorphism are difficult to grasp if runtime information is missing. 1. Definition Dynamic program analysis is the analysis of computer software that is performed with executing programs built from that software on a real or virtual processor. Dynamic program analysis tools may require loading of special libraries or even recompilation of program code. Dynamic analysis is in contrast to static program analysis. 2. Dynamic Analysis goals Dynamic analysis is conducted for many reasons: • Collect runtime execution information. • Resource usage, execution profiles. • Program comprehension. • Find bugs in applications, identify hotspots • Program transformation • Optimize or obfuscate programs. • Insert debugging or monitoring code. • Modify program behaviors on the fly. 3. Advantages The advantages that we consider are: • It is able to detect dependencies that are not possible to detect in static analysis. • The precision with regard to the actual behavior of the software system, for example, in the context of object-oriented software software with its late binding mechanism. • The fact that a goal-oriented strategy can be used, which entails the definition of an execution scenario such that only the parts of interest of the software system are analyzed 4. Limitations The drawbacks that we distinguish are: • The inherent incompleteness of dynamic analysis, as the behavior or execution traces under analysis capture only a small fraction of the usually infinite execution domain of the program under study. • The difficulty of determining which scenarios to execute in order to trigger the program elements of interest. In practice, test suites can be used, or recorded executions involving user interaction with the system. • The scalability of dynamic analysis due to the large amounts of data that may be produced by dynamic analysis, affecting performance, storage, and the cognitive load humans can deal with. • The observer effect, i.e., the phenomenon in which software acts differently when under observation, might pose a problem in multithreaded or multi-process software because of timing issues. III. Dynamic Analysis Techniques Dynamic analysis techniques reason over the run-time behavior of systems.In general, dynamic analysis involves recording of a program’s dynamic state. This dynamic state is also called as profile/trace. A program profile measures occurrences of events during program execution. The measured event is the execution of a local portion of program like lines of code, basic blocks, control edges, routines etc. 1. Program instrumentation In its simplest form, instrumentation involves adding extra code to a program’s text. The intent usually is to monitor some kind of program behavior—either for debugging or for optimization purposes. For example, a programmer who wants to optimize the execution time of a large program might first like to know the regions of code that are executed most of the time, and then go on to optimize those frequently executed sections. Instrumentation techniques are usually classified based on how and when the instrumentation code is added inside a program in this compilation-execution process. Tools that add instrumentation code before execution are called static instrumenters, whereas those modifying a program during execution are known as dynamic instrumenters. a. Static instrumentation As has been mentioned earlier, static instrumentation techniques generally insert instrumentation code inside a program statically i.e. during of after compilation but definitely prior to execution. Compiling the source code of a program to the linked binary executable generally involves three steps (as shown in Figure 0.1): 1. Compilation. In this step, the source code written in a high-level programming language (such as C, Cþþ, Java, or Pascal) is translated to the assembly language representation for a target machine. 2. Assembling. The assembly language representation of the program is converted to binary object code in this step. For languages like C# or Java, this means translation into some form of bytecode. 3. Linking. In this step, a set of separately compiled and assembled binary object files are linked together to produce the target executable. For languages like C# and Java, linking can mean merging of several bytecode files into one single bytecode that can be executed on a VM. Instrumentation code can be inserted during each of the above three steps. It is also possible to instrument the source code before it is handed over to the compiler (source-to-source instrumentation), or the linked executable produced by the compiler tool chain (binary instrumentation/rewriting). Figure 0.1: Instrumentation techniques in the compiler tool chain. b. Dynamic instrumentation Historically, static instrumentation techniques appeared first; so far, they have been mainly used in program analysis, debugging, testing, and code-coverage tools. However, they suffer from some major limitations, which are sketched below: 1. Static instrumentation techniques modify the software executable. 2. Static instrumentation can only cover code that is statically linked. 3. Static binary instrumentation is often difficult for binary formats that allow mixing of code and data. Figure 0.2: Dynamic instrumentation These limitations have forced the development of dynamic instrumenters that perform instrumentation at runtime (i.e., when the program is executing). Unlike static instrumenters, dynamic instrumenters can only work on compiled executable binaries (or, bytecodes). Therefore, dynamic instrumentation is also called Dynamic Binary Instrumentation (DBI). Dynamic binary instrumentation (DBI) occurs at run-time. The analysis code can be injected by a program grafted onto the client process, or by an external process. If the client uses dynamically-linked code the analysis code must be added after the dynamic linker has done its job. Dynamic binary instrumentation has two distinct advantages. First, it usually does not require the client program to be prepared in any way, which makes it very convenient for users. Second, it naturally covers all client code; instrumenting all code statically can be difficult if code and data are mixed or different modules are used, and is impossible if the client uses dynamically generated code. 2. Program tracing a. What is Tracing Tracing is a process that faithfully records detailed information of program execution (lossless). • • • • Control flow tracing: the sequence of executed statements. Dependence tracing: the sequence of exercised dependences. Value tracing: the sequence of values that are produced by each instruction. Memory access tracing: the sequence of memory references during an execution b. Why Tracing Debugging, Code optimizations, Security c. How to Trace • Tracing by Printf • Tracing by Source Level Instrumentation • Read a source file and parse it into ASTs. • Annotate the parse trees with instrumentation. • Translate the annotated trees to a new source file. • Compile the new source. • Execute the program and a trace produced. Figure 0.3: An example • Tracing by Binary Instrumentation • Given a binary executable, parses it into intermediate representation. More advanced representations such as control flow graphs may also be generated. • Tracing instrumentation is added to the intermediate representation. • A lightweight compiler compiles the instrumented representation into a new executable. Static: takes an executable and generate an instrumented executable that can be executed with many different inputs Dynamic: given the original binary and an input, starts executing the binary with the input, during execution, an instrumented binary is generated on the fly; essentially the instrumented binary is executed. 3. Program profiling a. What is profiling • Tracing is lossless, recording every detail of a program execution • Thus, it is expensive. • Potentially infinite. • Profiling is lossy, meaning that it aggregates execution information onto finite entries. • Control flow profiling - Instruction/Edge/Function: Frequency; • Value profiling - Value: Frequency b. Why profiling • Debugging • Enable time travel to understand what has happened • Code optimizations • Identify hot program paths; • Data compression; • Value speculation; • Data locality that help cache design; • Performance tuning c. How to do profiling Path profiling, counts how often each path through a function is taken at runtime. Path profiles can be used by optimizing compilers. Functions can be compiled such that the “hot path”, the most frequently taken path through the function, executes particularly fast (Ammons & Larus, 1998). • Goal: Count how often a path through a function is executed • Interesting for various applications • Profile-directed compiler optimizations • Performance tuning: Which paths are worth optimizing? • Test coverage: Which paths are not yet tested? There is some challenges that a path profiling face : • Runtime overhead: Limit slowdown of program • Accuracy: Ideally, precise profiles (not heuristics, no approximations) • Infinitely many paths: Cycles in control flow graph Edge Profiling is a naive approach of path profiling • Instrument each branching point • Count how often each CFG edge is executed • Estimate most frequent path: Always follow most frequent edge Fails to uniquely identify most frequent path. Theoretically, path profiles contain strictly more information than edge profiles. A precise edge profile can be derived from a path profile (as shown in the next section), but the converse dues not hold. Stated another way, there are many different path profiles that induce the same edge profle, but different edge profiles impIy different path profiles. Thus, path profiles provide the most information about control flow behavior, subsumed only by the profiIing of longer and longer paths. The ultimate form of a path profile is a trace of an entire execution, which is extremely costly tQ collect and store. Today, edge profiles are the control flow profile of choice for compiler optimization, We identify three main reasons for this. First, edge profiles provide a clear advantage over vertex profiles as they provide more accurate information about the behavior of branches, which is crucial to many optimizations. Second, edge profiles are generally thought to be easier and cheaper to collect than path profiles. IV. Dynamic Analysis Tools Dynamic analysis tools have been widely used for memory analysis, invariant detection, deadlock and race detection, and metric computation. These tools are being used by companies for their benefits. For example, Pin is a tool which provides the underlying infrastructure for commercial products like Intel Parallel Studio suite of performance analysis tools. A summary of dynamic analysis tools is provided in Figure 0.4 Figure 0.4: Dynamic Analysis Tools. 1. Valgrind is an instrumentation framework for building dynamic analysis tools. It can automatically detect many memory management and threading bugs, and profile a program in detail. Purify and Insure++ have similar functionality as Valgrind. Whereas Valgrind and Purify instrument at the executables, Insure++ directly instruments the source code. Pin is a tool for dynamic binary instrumentation of programs. Pin adds code dynamically while the executable is running. Pin provides an API to write customized instrumentation code (in C/C++), called Pintools. Pin can be used to observe low level events like memory references, instruction execution, and control flow as well as higher level abstractions such as procedure invocations, shared library loading, thread creation, and system call execution. 2. Javana runs a dynamic binary instrumentation tool underneath the virtual machine. The virtual machine communicates with the instrumentation layer through an event handling mechanism for building a vertical map that links low-level native instruction pointers and memory addresses to high-level language concepts such as objects, methods, threads, lines of code, etc. The dynamic binary instrumentation tool then intercepts all memory accesses and instructions executed and provides the Javana end user with high-level language information for all memory accesses and natively executed instruction. 3. Daikon and DIDUCE are two most popular tools for invariant detection. The former is an offline tool while the latter is an online tool. The major difference between the two is that while Daikon generates all the invariants and then prunes them depending on a property; DIDUCE dynamically hypothesizes invariants at each program point and only presents those invariants which have been found to satisfy a property. Another major difference is that Daikon collects tracing information by modifying the program abstract syntax tree, while DIDUCE uses BCEL to instrument the class JAR files. References https://en.wikipedia.org/wiki/Program_analysis https://www.cis.upenn.edu/~alur/CIS673/isil-plmw.pdf A Survey of Dynamic Program Analysis Techniques and Tools,Anjana Gosain and Ganga Sharma , 2015, Springer Software Verification and Validation Selected state-of-the-art results and real-world applications Contents 1. Introduction 2. Method 2-1. GP Method 2-2. ML Method 2-3. Datasets 2-4. Experiment design 3. Results 4. Summary and Conclusions 5. References 1. INTRODUCTION In this paper we provide a broad benchmarking of recent genetic programming approaches to symbolic regression in the context of state of the art machine learning approaches. Since the beginning of the field, the genetic programming (GP) community has considered the task of symbolic regression (SR) as a basis for methodology research and as a primary application area. GP-based SR (GPSR) has produced a number of notable results in real-world regression applications, for example dynamical system modeling in physics , biology , industrial wind turbines , fluid dynamics , robotics , climate change forecasting , and financial trading , among others. However, the most prevalent use of GPSR is in the experimental analysis of new methods, for which SR provides a convenient platform for benchmarking. Despite this persistent use, several shortcomings of SR benchmarking are notable. First, the GP community lacks a unified standard for SR benchmark datasets, as noted previously [21]. Several SR benchmarks have been proposed, critiqued, and blacklisted, leading to inconsistencies in the experimental design of papers. We contend that the lack of focus in the GP community on rigorous benchmarking makes it hard to know how GPSR methods fit into the broader machine learning (ML) community. 2. METHODS We compare four recent GPSR methods in this benchmark and ten well-established ML regression methods. In this section we briefly present the selected methods and describe the design of the experiment. 2.1 GP methods A number of factors impacted our choice of these methods. Two key elements were open-source implementations and ease of use. In addition, we wished to test different research thrusts in GP literature. The four methods encompass different innovations to standard GPSR, including incorporation of constant optimization, semantic search divers, and Pareto optimization. Each method is described briefly below. Multiple regression genetic programming (MRGP). MRGP combines Lasso regression with the tree search afforded by GP. A weight is attached to each node in each program. ε-Lexicase selection (EPLEX). ε-lexicase selection adapts lexicase selection method for regression. Rather than aggregating performance on the training set into a single fitness score, EPLEX selects parents by filtering the population through randomized orderings of training samples and removing individuals that are not within ε of the best performance in the pool. We use the EPLEX method implemented in Ellyn 2. Ellyn is a stack-based GP system written in C++ with a Python interface for use with scikit-learn. It uses point mutation and sub tree crossover. Weights in the programs are trained each generation via stochastic hill climbing. A Pareto archive of trade-offs between mean squared error and complexity is kept during each run, and a small internal validation fold is used to select the final model returned by the search process. Age-fitness Pareto Optimization (AFP). AFP is a selection scheme based on the concept of agelayered populations introduced by Hornby. AFP introduces a new individual each generation with an age of 0. An individual’s age is updated each generation to reflect the number of generations since its oldest node (gene) entered the population. Parent selection is random and Pareto tournaments are used for survival on the basis of age and fitness. We use the version of AFP implemented in Ellyn, with the same settings described above. Geometric Semantic Genetic Programming (GSGP). GSGP is a recent method that has shown many promising results for SR and other tasks. The main concept behind GSGP is the use of semantic variation operators that produce offspring whose semantics lie on the vector between the semantics of the parent and the target semantics (i.e. target labels). Use of these variation operators has the advantage of creating a unimodal fitness landscape. On the downside, the variation operators result in exponential growth of programs. We use the version GSGP implemented in C++ by Castelli which is optimized to minimize memory usage. It is available from Source Forge. 2.2 ML methods We use scikit-learn [26] implementations of the following methods in this study: Linear Regression. Linear Regression is a simple model of regression that minimizes the sum of the square errors of a linear model of inputs. The model is defined by y = b + wTx, where y is a dependent variable(target), x are explanatory variables, b and w are intercept and slope variables, and the minimized function is equal to (1). Kernel Ridge. Kernel Ridge performs Ridge regression using a linear function in the space of the respective kernel. Least square with l2-norm regularization is applied in order to prevent over fitting. The minimized function is equal to (2), where φ is a kernel function and λ is the regularization parameter. Least-angle regression with Lasso. Lasso (Least absolute shrinkage and selection operator) is a popular method of regression that applies both feature selection and regularization. Similarly to Kernel Ridge, high values of w are penalized. The use of the l1-norm on w in the minimization function (see (3)) improves the ability to push individual weights to zero, effectively performing feature selection. Least-angle regression with Lasso, a.k.a. Lars [10], is an efficient algorithm for producing a family of Lasso solutions. It is able to compute the exact values of λ for new variables entering the model. Linear SVR. Linear Support Vector Regression extends the concept of Support Vector Classifiers (SVC) to the task of regression, i.e. to predict real values instead of classes. Its objective is to minimize an ε-insensitive loss function with a regularization penalty (1/2||w||2) in order to improve generalization. SGD Regression. SGD Regression implements stochastic gradient descent and is especially well suited for larger problems with over 10,000 of instances. We add this method of regression regardless, to compare its performance on smaller datasets. MLP Regressor. Neural networks have been applied to regression problems for almost three decades. We include multilayer perceptron’s (MLPs) as one of the benchmarked algorithms. We decided to benchmark neural network with a single hidden layer with fixed number of neurons (100) and compare different activation functions, learning functions and solvers. AdaBoost regression. Adaptive Boosting, called also AdaBoost, is a flexible technique of combining a set of weak learners into a single stronger regressor. By changing the distribution (i.e. weights) of instances in the data, previously misclassified instances are favored in consecutive iterations. The final prediction is obtained by a weighted sum or weighted majority voting. As the result, the final regress or has smaller prediction errors. The method is considered sensitive to outliers. Random Forest regression. Random Forests are a very popular ensemble method based on combining multiple decision trees into a single stronger predictor. Each tree is trained independently with a randomly selected subset of the instances, in a process known as bootstrap-aggregating or bagging. The resulting prediction is an average of multiple predictions. Random forests try to reduce variance by not allowing decision trees to grow large, making them harder to over fit. Gradient Boosting regression. Gradient Boosting is an ensemble method that is based on regression trees. It shares the AdaBoost concept of iteratively improving the system performance on its weakest points. In contrast to AdaBoost, the distribution of the samples remain the same. Instead, consecutively created trees correct the errors of the previous ones. Gradient Boosting minimizes bias (not variance like in Random Forests). In comparison to Random Forests, Gradient Boosting is sequential (thus slower), more difficult to train, but is reported to perform better than Random Forest. Extreme Gradient Boosting. Extreme Gradient Boosting, also known as XGBoost, incorporates regularization into the Gradient Boosting algorithm in order to control over fitting. Its objective function combines the optimization of training loss with model complexity. This brings the predictor closer to the underlying distribution of the data, while encouraging simple models, which have smaller variance. Extreme gradient boosting is considered a state-of-the-art Method in ML. 2.3 Datasets We pulled the benchmark datasets from the Penn Machine Learning Benchmark (PMLB) [25] repository, which contains a large collection of standardized datasets for classification and regression problems. 2.4 Experiment design In order to benchmark different regression methods, an effort was made to measure performance of each of the methods in as similar an environment as possible. First, we decided to treat each of the GP methods as a classical ML approach and used the scikit-learn library [26] for cross validation and hyper parameter optimization. This required some source code modifications to allow GSGP and MRGP to communicate with the wrapper. Second, instead of reimplementing the algorithms, we relied on the original implementations with as few modifications as possible. Wrapping each method allowed us to keep a common benchmarking framework based on the scikit-learn functions. Table 1. Analyzed algorithms with their parameters settings. The parameters in quotations refer to their names in the scikit-learn implementations. Data preprocessing. We decided to feed benchmarked algorithms with scaled data using StandardScaler function from scikit-learn. The reason for this is our effort to keep the format of the input data consistent across multiple algorithms for the purpose of benchmarking. Initialization of the algorithms. We initially considered starting each of the methods with the same random seed, but eventually decided to make all data splits randomly. In our belief both approaches have disadvantages: the results will either be biased by the choice of the random seed, or by using different splits for different methods. By taking a median of the scores we became independent of the initial split of the data. Wrappers for the GP methods. Some modifications needed to be done to each of the GP methods. For EPLEX and AFP, Ellyn provides an existing Python wrapper that was used. For other methods we implemented a class derived from scikit-learn Base Estimator, which implemented two methods: fit() , used for training the regressor, and predict (), used for testing performance of the regressor. The source code of MRGP and GSGP had to be modified, so that the algorithms could communicate with the wrapper. Parameters for the algorithms. The settings of the input parameters for the algorithms were determined based on the available recommendations for the given method, as well as previous experience of the authors. For GP-based methods we applied from 6 to 9 different settings (mainly: population size x number of generations and crossover and mutation rates). For the ML algorithms the number of settings was method dependent. 3. RESULTS The relative performance of the algorithms was determined as the ability to make the best predictions on the training and testing data using mean squared error (MSE) of the samples. The performance on the testing dataset is of primary importance, as it shows how well the methods can generalize to previously unseen data [7]. However we include the training comparisons as a way to assess the predilection for over fitting among methods. We first analyze the results for each of the regression tasks on the training data. The relative rankings of each method in terms of MSE is presented in Fig. 2. The best training performance was obtained with gradient boosting, which completed in top-2 for the vast majority of the benchmarked datasets. 4. CONCLUSIONS In this paper we evaluated four recent GPSR methods in comparison to ten state-of-the-art ML methods on a set of 94 real-world regression problems. We consider hyper-parameter optimization for each method using nested cross validation, and compare the methods in terms of the MSE they produce on training and testing sets, and their runtime. The analysis includes some interesting results. Two of the GPbased methods, namely: EPLEX and MRGP, produce competitive results compared to state-of-the-art ML regression approaches. The downside of the GP-based methods is their computation complexity when run on a single thread, which contributes to much higher runtimes. Parallelism is likely to be a key factor in allowing GP-based approaches to become competitive with leading ML methods with respect to running times. We also should note some shortcomings of this study that motivate further analysis. First, a guiding motivation for the use of GPSR is often its ability to produce legible symbolic models. Our analysis did not attempt to quantify the complexity of the models produced by any of the methods. Ultimately the relative value of explain ability versus predictive power will depend on the application domain. Second, we have considered real world datasets for the source of our benchmarks. Simulation studies could also be used, and have the advantage of providing ground truth about the underlying process, as well as the ability to scale complexity or difficulty. It should also be noted that the datasets used for this study were of relatively small sizes (up to 1000 of instances). Future work should consider larger dataset sizes, but will come with a larger computational burden. We have also limited our initial analysis to looking at bulk performance of algorithms over many datasets. Further analysis of these results should provide insight into the properties of datasets that make them amenable to, or difficult for, GP-based regression. Such an analysis can provide suggestions for new problem sub-types that may be of interest to the GP community. We hope this study will provide the ML community with a data-driven sense of how state-of-the-art SR methods compare broadly to other popular ML approaches to regression. . . 5. References [1] Aurum, A., Petersson, H. and Wohlin, C., “State-of-the-Art: Software Inspections after 25 Years”, Software Testing, Verification and Reliability, 12(3):133-154, 2002. [2] Basili, V. R. and Selby, R. W., “Comparing the Effectiveness of Software Testing Strategies”, IEEE Transaction on Software Engineering, 13(12):1278-1296, 1987. [3] Softaware Validation and Verification – a state of the art report(Panel discussion), published in: .processing ACM New York, NY, USA @1978.