Telechargé par Fredy Yann

expose verification

publicité
```COURS DE VERIFICATION LOGICIEL
MASTER II 2018-2019
Teacher ‘s name : Pr Atsa Etoundi Roger
Résumé
Ce document présente les différents travaux éffectués par les étudiants de
MASTER 2 promotion 2017-2018 pour le compte du cours de vérification
logiciel. Ce document est divisé en cinq parties à savoir :
- statistical and function approaches to testing,
- test data analysis, testability,
- static analysis techniques,
- dynamic analysis techniques,
- selected state-of-the-art results, and world application
statistical and function approaches to testing
Introduction
Software testing is the process of analyzing software to find the difference
between required and existing condition. Software testing is performed
throughout the development cycle of software and it is also performed to build
quality software, for this purpose two basic testing approaches are used, they
are white box testing and black box testing. One of the software testing
technique is Black Box Testing, and we’ll show the statistical testing.
I.
Black box testing
Black box testing is an integral part of correctness testing but its ideas are
not limited to correctness testing only.
The tester, in black box testing only knows about the input (process by a
system) and required output, or in the other word tester need not know the
internal working of the system.
II. DIFFERENT FORMS OF BLACK BOX TESTING
TECHNIQUE
The differents forms of black box testing technique are (see figure) :
For each forms we can respond at all the question : what ?,why ? ,how ?
1. Equivalance Partitioning
What ?
 Equivalence partitioning is a black box testing method that divides the
input data of a software unit into partitions of data from which test cases
can be derived
 In equivalence class partitioning an equivalence class is formed of the
inputs for which the behavior of the system is specified or expected to be
similar
 An equivalence class represents a set of valid or invalid states for input
conditions. See the figure
Why ?
The issue is to select the test cases suitably. The partitioning an
equivalence class is formed of the inputs for which the behavior of the system
is specified or expected to be similar.
How ?
Some of the guidelines for equivalence partitioning are given
below :
1) One valid and two invalid equivalence classes are defined if an input
condition specifies a range.
2) One valid and two invalid equivalence classes are defined if an input
condition requires a specific value.
3) One valid and one invalid equivalence class are defined if an input
condition specifies a no. of a set.
4) One valid and one invalid equivalence class are defined if an input
condition is Boolean
2. Boundary Value Analysis
What ?
Boundary value analysis is a testing technique that focuses more on
testing at boundaries or where the extreme boundary values are chosen.
Boundary value include maximum, minimum, just inside/outside boundaries,
typical values and error values. See figure
Why ?
The systems have tendency to fail on boundary as programmers often
make mistakes on the boundary of the equivalence classes/input domain .
How ?
Suppose in boundary value analysis each input values has defined range :
1) The extreme ends of the range.
2) Just beyond the ends.
3) Just before the ends.
3. Cause-Effect Graph
What ?

Cause Effect Graph is a black box testing technique that
graphically illustrates the relationship between a given outcome
and all the factors that influence the outcome.
 It is also known as Ishikawa diagram as it was invented by Kaoru
Ishikawa or fish bone diagram because of the way it looks
 On se focalise sur les causes et effets et non forcement sur les
donées d’entrées. See figure
Why ?
 Equivalence Partitioning Partition the input space into equivalence
classes and develop a test case for each class of inputs.
 Boundary value analysis develops test cases at the boundaries of the
possible input ranges (minimum and maximum values).
These techniques are important for data processing intensive
applications.
How ?

Step 1 : Identify and Define the Effect

Step 2 : Fill in the Effect Box and Draw the Spine

Step 3: Identify the main causes contributing to the effect being
studied.

Step 4 : For each major branch, identify other specific factors which
may be the causes of the EFFECT.

Step 5 : Categorize relative causes and provide detailed levels of
causes.
4. Fuzzing
why ?
 Fuzz testing has enjoyed great success at discovering
security critical bugs in real software
 Fuzzing is also used to test for security problems in software
what?
Fuzz testing is often employed as a black box software testing technique,
which is used for finding implementation bugs using malformed / semimalformed data injection in an automated or semi-automated fashion.
Two forms of fuzzing programs are :
1) Mutation based-mutation based fuzzers mutate existing data sample
to create test data.
2) Generation based- generation based fuzzers define new test data
based on models of input.
How ?
Using tools, they have each procedure. for example : AFL(american fuzz
loop). Steps of fuzzing :
1. Compile/install AFL (once)
2.
Compile target project with AFL
 afl‐gcc / afl‐g++ / afl‐clang / afl‐clang++ / (afl‐as)
3. Chose target binary to fuzz in project
 Chose its command line options to make it run fast
4. Chose valid input files that cover a wide variety of
possible input files
 afl‐cmin / (afl‐showmap)
5.Fuzzing
 afl‐fuzz
6. Check how your fuzzer is doing
 command line UI / afl‐whatsup / afl‐plot / afl‐gotcpu
7. Analyze crashes
 afltmin / triage_crashes.sh / peruvian were rabit
• ASAN / valgrind / exploitable gdb plugin / …
8 Have a lot more work than before
 CVE assignment / responsible disclosure / …
5. Orthogonal array testing
What ?
Orthogonal Array Testing (OAT) is defined as a type of software
testing strategy that uses Orthogonal Arrays especially when the system
to be tested has huge data inputs.
This type of pairing or combining of inputs and testing the system to
save time is called Pairwise testing.
Pairwise Testing also known as All-pairs testing is a testing
approach taken for testing the software using combinatorial
method. It's a method to test all the possible discrete
combinations of the parameters involved.
Why ?
Any scenario, delivering a quality software product to the customer
has become challenging due to the complexity of the code.
In the conventional method, test suites include test cases that have
been derived from all combination of input values and pre-conditions. As a
result, n number of test cases has to be covered.
How ?
How OAT's is represented ?



Runs (N) – Number of rows in the array, which translates into a number of test
cases that will be generated.
Factors (K) – Number of columns in the array, which translates into a maximum
number of variables that can be handled.
Levels (V) – Maximum number of values that can be taken on any single factor.
A single factor has 2 to 3 inputs to be tested. That maximum number of inputs
decide the Levels.
How to do Orthogonal Array Testing ?
1.
2.
3.
4.
5.
dentify the independent variable for the scenario.
Find the smallest array with the number of runs.
Map the factors to the array.
Choose the values for any "leftover" levels.
Transcribe the Runs into test cases, adding any particularly suspicious
combinations that aren't generated.
6. All pairs Testing
What ?
All-pairs also known as pairwise testing is a testing approach taken for
testing the software using combinatorial method. It's a method to test all the
possiblediscrete combinations of the parameters involved.
Why ?
Assume we have a piece of software to be tested which has got 10 input
fields and 10 possible settings for each input field. Then, there are 10^10 possible
inputs to be tested. In this case, exhaustive testing is impossible even if we wish
to test all combinations.
How ?
Step 1 : Order the values such that one with most number of values is the first and
the least is placed as the last variable.
Step 2 : Now, start filling the table column by column. List box can take 2 values.
Step 3 : The next column under discussion would be check box. Again, Check box
can take 2 values.
Step 4 : Now, we need to ensure that we cover all combinations between list box
and Check box.
Step 5 : Now, we will use the same strategy for checking the Radio Button. It can
take 2 values. Step 6 : Verify if all the pair values are covered as shown in the table
below
7. State Transition testing
What ?
State Transition testing, a black box testing technique, in which outputs
are triggered by changes to the input conditions or changes to 'state' of the
system. In other words, tests are designed to execute valid and invalid state
transitions.
Why ?
very large or inﬁnite number of test scenarios + ﬁnite amount of time =
impossible to test everything.
How ?
The tester built the model as following :
1)States → how the state of the system changes from one state to another.
2) Transition →how the state of the system changes from one state to another.
3) Events→input to the system 4) Actions→output for the events.
III. Statistical Testing
What ?
Statistical testing is based on a probabilistic generation of test patterns:
structural or functional criteria serve as guides for defining an input profile and
a test size. The method is intended to compensate for the imperfect
connection of criteria with software faults, and should not be confused with
random testing, a blind approach that uses a uniform profile over the input
domain.
Why ?
 Finding bug
 Estimate the reability of software
How to perform Statistical Testing ?
Step 1: Construct the statistical models based on actual usage scenarios
and related frequencies.
Step 2: Use these models for test case generation and execution.
Step 3: Analyze the test results for reliability assessment and
predictions, and help with decisionmaking.
Conclusion
I conclude black box testing is a testing technique that ignores the
internal mechanism or structure of a system and focuses on the outputs
generated in response to selected inputs and execution conditions.
The statistical testing is a suitable means to compensate for the
tricky link between test criteria and software design faults
References :
S. Akers. Probabiistic Techniques for Test Generation from Functional
Descriptions. 1979.
J. Whittaker. Markov chain techniques for software testing and
reliability analysis. Thesis, University of Tenessee, May 1992.
J. A. Whittaker, Stochastic Software Testing, Annals of Software
Engineering, vol. 4: pp. 115 – 131, 1997
Whittaker JA, Thomason MG. A Markov Chain Model for Statistical
Software Testing[J]. IEEE Transactions on Software Engineering, 1994,
20(10):812-824
Zhang Youzhi. Research and application of hidden Markov model in data
mining [C]. Second IITA International Conference on Geoscience and
Remote Sensing (IITA-GRS), Aug. 2010, 459-462.
Test Data analysis, Testability
III-
Defintion de donnéees de test
1- Exemple
Analyse partitionnelle des données d’entrées et test aux limites
1- Classe d’equivalence
- Partitionnement de domaines
- Analyse partitionnelle-Methode
2- Test aux limites
- Principe
- Analyse des valeurs aux limites
- Exemple
- Types de données
- Analyse partitionnelle et test aux limites- synthèse
- Combinaison des valeurs d’entrées
I.
Définition : données de test
Une donnée de test est un n-uplet DT (SC, DE, f, g, RA, tol) et d’une mesure N, tels que :
-
-
SC est le scénario à tester.
DE est la donnée en entrée représentative de ce scénario.
F est la fonction à tester.
G est la fonction de test. Il s’agit de la fonction à appliquer au résultat obtenu en appelant la
fonction à tester, afin d’obtenir une valeur nous permettant de savoir si le résultat est bon
ou non.
RA est le résultat attendu, et on doit avoir la condition.
N(g(f(DE)) - RA) < tol
Dans les cas les plus courants, on choisit N=valeur absolue, G=ID ou G=f-1
Remarques :
1. La fonction de test doit normalement être plus facile que la fonction à tester. Elle devrait
être suffisamment facile pour qu’il ne soit pas nécessaire de la tester elle-même.
2. Dès qu’un élément du n-uplet manque, le cas de test est incomplet. En particulier, pas de cas
de test sans résultat attendu.
3. La qualité du test dépend de la pertinence du choix des données de test.
II.
Analyse partitionnelle des domaines des données d’entrée et test aux limites
1. Classe d’équivalence de test (domaines des données d’entrées)
Définition : classe d’équivalence de test
Une classe d’équivalence correspond à un ensemble de données de tests supposées tester le
même comportement pour la fonction à tester, c’est-à-dire si la fonction marche pour l’une
d’entre elles, elle marche pour toutes les autres.
Les classes d’équivalences nous permettent de limiter le nombre de cas de tests. On évitera
de tester deux données de la même classe d’équivalence.
Partitionnement de domaines : exemple
-
-
Soit le programme suivant : Un programme lit trois nombres réels qui correspondent à la
longueur des côtés d’un triangle. Si ces trois nombres ne correspondent pas à un triangle,
imprimer le message approprié. Dans le cas d’un triangle, le programme examine s’il s’agit d
’un triangle isocèle, équilatéral ou scalène et si son plus grand angle est aigu, droit ou obtus
(< 90°, = 90°, > 90 °) et renvoie la réponse correspondante.
Classes d’équivalence et Données de Test
Scalène
Isocèle
Equilatéral
Aigu
6,5,3
6,1,6
4,4,4
+ non triangle -1,2,8
Obtus
5,6,10
7,4,4
Impossible
Droit
3,4,5
√2,2, √2
Impossible
Analyse partitionnelle – Méthode
Trois phases :
-
Pour chaque donnée d’entrée, calcul de classes d’équivalence sur les domaines de valeurs,
Choix d’un représentant de chaque classe d’équivalence,
Composition par produit cartésien sur l’ensemble des données d’entrée pour établir les DT.
Soit Ci, une classe d’équivalence,
∪ Ci = E ∧ ∀i,j Ci∩Cj= ∅
Règles de partitionnement des domaines
 Si la valeur appartient à un intervalle, construire :
o Une classe pour les valeurs inférieures,
o Une classe pour les valeurs supérieures,
o N classes valides.
 Si la donnée est un ensemble de valeurs, construire :
o Une classe avec l’ensemble vide,
o Une classe avec trop de valeurs,
o N classes valides.
 Si la donnée est une obligation ou une contrainte (forme, sens, syntaxe), construire :
o Une classe avec la contrainte respectée,
o Une classe avec la contrainte non-respectée
2. Test aux limites
Principe : on s’intéresse aux bornes des intervalles partitionnant les domaines des variables
d’entrées :
-
-
-
Pour chaque intervalle, on garde les 2 valeurs correspondant aux 2 limites, et les 4 valeurs
correspondant aux valeurs des limites ±le plus petit delta possible
n ∈ 3 .. 15 ⇒ v1 = 3, v2 = 15, v3 = 2, v4 = 4, v5 = 14, v6 = 16
Si la variable appartient à un ensemble ordonné de valeurs, on choisit le premier, le second,
l’avant dernier et le dernier
n ∈ {-7, 2, 3, 157, 200} ⇒ v1 = -7, v2 = 2, v3 = 157, v4 = 200
Si une condition d’entrée spécifie un nombre de valeurs, définir les cas de test à partir du
nombre minimum et maximum de valeurs, et des tests pour des nombres de valeurs hors
limites invalides.
Un fichier d’entrée contient 1 - 255 records, produire un cas de test pour 0, 1, 255 et 256.
Analyse des valeurs aux limites
Exemple
Types de données
 Les données d’entrée ne sont pas seulement des valeurs numériques : caractères, booléens,
images, son, …
 Ces catégories peuvent, en général, se prêter à une analyse partitionnelle et à l’examen des
conditions aux limites :
o True/ False
o
o
o
o
o
Fichier plein / Fichier vide
Trame pleine / Trame vide
Nuances de couleur
Plus grand / plus petit
….
Analyse partitionelle et test aux limites – synthèse
 L’analyse partitionnelle est une méthode qui vise à diminuer le nombre de cas de tests par
calcul de classes d’équivalence
o Importance du choix de classes d’équivalence : risque de ne pas révéler un défaut
 Le choix de conditions d’entrée aux limites est une heuristique solide de choix de données
d’entrée au sein des classes d’équivalence
o Cette heuristique n’est utilisable qu’en présence de relation d’ordre sur la donnée
d’entrée considérée.
 Le test aux limites produit à la fois des cas de test nominaux (dans l’intervalle) et de
robustesse (hors intervalle)
Combinaison des valeurs d’entrée
-
Le problème de la combinaison des valeurs des données d’entrée
Données de test pour la variable Xi : DT xi= {di1, …, din}
-
Produit cartésien : DTX1×DTx2×…×DTxn => Explosion du nombre de cas de tests
Choix de classes d’équivalence portant sur l’ensemble des données d’entrée
REFERENCES :

Fundamentals of Software Testing , Bernard Homès
SOFTWARE TESTING AND QUALITY ASSURANCE Theory and Practice , KSHIRASAGAR NAIK
Static techniques
Table des matières
Introduction .................................................................................................................. 26
Définition ..................................................................................................................... 27
Avantages des techniques statiques.............................................................................. 27
Inconvénients des techniques statiques ........................................................................ 27
Activités des techniques statiques ................................................................................ 27
4.1) La revue .......................................................................................................... 28
4.2) Types de revues ............................................................................................... 29
4.3) Management reviews ...................................................................................... 29
4.4) Technical reviews ............................................................................................ 29
4.5) Inspections ....................................................................................................... 29
4.6) Walk-through .................................................................................................... 30
4.7) L’audit ............................................................................................................. 30
Analyse statique ........................................................................................................... 30
5.1) Types d’analyse statiques ................................................................................ 31
Conclusion .................................................................................................................... 31
Références .................................................................................................................... 32
Introduction
La vérification et la validation désignent toute activité/processus visant à s’assurer qu’un
logiciel est conforme à ses spécifications et répond aux besoins du client. Pour atteindre les
objectifs du processus de vérification et de validation, des techniques d’analyse et de de
vérification du système devraient être utilisées. Ces techniques sont notamment les
techniques de statiques et dynamiques. Les techniques d’analyse statiques sont des
techniques de vérification de système qui n’impliquent pas l’exécution d’un programme alors
que les techniques d’analyse dynamiques nécessitent l’exécution de l’application pour
expérimenter et observer le comportement du système. Dans cet exposé, on s’attardera sur
les techniques d’analyse statiques.
1) Définition
Les techniques statiques consistent à tester un système sans toutefois l’exécuter. Elles visent
à inspecter/analyser de manière statique (sans exécution) un système pour découvrir des
problèmes ou prouver sa correction. Elles peuvent être manuelles ou automatiques. Les
techniques statiques peuvent être appliquées à de nombreux produits livrables, tels que des
composants logiciels, les spécifications de haut niveau ou détaillées ; listes d’exigences ; les
contrats ; plans de développement ou plans de test.
2) Avantages des techniques statiques
Les principaux avantages des techniques statiques sont :
– Elles ne nécessitent pas l’exécutable d’une application fonctionnelle, elles peuvent être
appliquées à des documents ou une partie du logiciel.
– Elles sont moins couteuses que les techniques dynamiques et leur retour sur
investissement est élevé. Puisque, plus les erreurs sont identifiées tôt, plus il est facile
de les corriger.
3) Inconvénients des techniques statiques
Les principaux inconvénients des techniques statiques sont :
– Elles prennent assez de temps surtout lorsqu’elles doivent être fait manuellement.
– Les outils automatisés pour les réaliser ne fonctionnent pas avec tous les langages de
programmation.
– Les outils automatisés peuvent fournir des résultats à la fois justes et faux.
4) Activités des techniques statiques
Les techniques statiques couvrent deux grandes activités : la revue et l’analyse statique qui se
matérialisent sur la figure ci-dessous :
4.1) La revue
Les objectifs de la revue peuvent être de quatre types :
– vérification de la conformité avec les documents de haut niveau qui ont servi à sa
création.
– vérification par rapport aux documents du projet au même niveau.
– vérification par rapport aux normes et aux standards, aux meilleures pratiques
recommandées, pour assurer la conformité du produit soumis aux revues et à l’analyse
statique.
– vérification de l’utilisation et à l’aptitude à l’emploi, par exemple pour concevoir des
composants plus détaillés.
4.2) Types de revues
Une revue est un processus/une réunion au cours de laquelle les documents d’un produit
logiciel sont examinés systématiquement par une ou plusieurs personnes dans le but principal
de détecter et de supprimer les erreurs tôt dans le cycle de développement d’un logiciel. Les
revues sont utilisées pour vérifier des documents tels que le cahier de charge, la conception
des systèmes, le code, les plans de test et les scénarios de test. De cette définition, il ressort
que la revue d’un logiciel est importante pour le test du logiciel. Son apport est que le temps
et le coût des tests sont réduits puisqu’on passe assez de temps à la phase initiale. En plus, on
note également l’amélioration de la productivité des développeurs et les délais réduits car la
détection des erreurs très tôt dans le cycle de vie du logiciel permettent de garantir que les
documents sont clairs et sans ambiguïté.
Les revues peuvent être groupées en fonction de leur niveau de formalisme. De la plus
formelle à la moins formelle. C’est ainsi qu’on a : les revues formelles et les revues informelles.
Les revues formelles est un type de revue qui ne suit aucun processus formel permettant de
rechercher des erreurs dans le document. Sous cette technique, il vous suffit de consulter le
document et de donner des commentaires informels dessus.
La norme IEEE 1028-2008 a identifié plusieurs types de revues parmi lesquelles :
1. Management reviews
2. Technicals reviews
3. Inspections
4. Walk-throughs
5. l’audit
4.3)
Management reviews
L’objectif de cette revue est de suivre les progrès, de définir l’état des plans et du calendrier
ou d’évaluer l’efficacité des méthodes de gestion utilisées et leur adéquation aux objectifs.
Elles identifient la conformité et les écarts par rapport aux plans ou procédures de gestion.
Des connaissances techniques peuvent être nécessaires pour gérer avec succès ces types de
revues.
4.4) Technical reviews
Ces revues ont pour objectif d’évaluer, avec une équipe de personnes qualifiées, un logiciel,
de déterminer s’il est adapté à son utilisation qui a été prévu et de relever les écarts par
rapport aux normes et spécifications en vigueur. Ces revues fournissent des preuves de la
gestion concernant, le statut technique d’un projet ; ils peuvent également fournir des
recommandations et évaluer des alternatives.
4.5) Inspections
L’inspection est considérée comme la méthode de contrôle la plus formelle, souvent gérée
par un modérateur bien formé. Dans cette méthode, chaque document est strictement
contrôlé par rapport à la liste de contrôle. Cela aide à détecter et identifier les défauts et à les
documenter.
4.6) Walk-through
Ce type de revue est souvent géré par l’auteur pour que les membres de l’équipe
appréhendent le projet, en particulier en termes de modification des exigences, et pour aider
à recueillir plus de détails. Cette revue peut être organisée dans le but d’éduquer le public sur
le produit.
4.7) L’audit
Celle-ci est effectuée par le personnel en dehors du projet. Elle évalue le logiciel avec des
spécifications, des normes, directives et d’autres critères.
Analyse statique
Ici, le code source écrit par les développeurs est analysé (généralement par des outils) pour
rechercher les défauts structurels pouvant conduire à des défauts. C’est une aide importante
pour les inspecteurs, mais ne remplace pas complètement l’inspection de code. Cette
technique est utilisée dans les compilateurs. Ici, on a une liste de classe de défaut que l’analyse
statique vérifie :
1. Les défauts de données : comme par exemples :
– Les variables utilisées avant initialisation
– Les variables non déclarées
– Les variables déclarées mais jamais utilisées
2. Les défauts de contrôle
– Code mort
– Boucle infinie
3. Défauts d’entrées/sorties
– Double affichage sans affectation
4. Défauts d’interface
– Paramètres en mauvais nombre
– Fonctions non utilisées
5. Défaut de gestion e la mémoire
– Mémoire non libérée
– Arithmétique des pointeurs complexes.
5.1) Types d’analyse statiques
Il existe différents types d’analyse statique chacun avec ses forces et faiblesses :
1. L’analyse des flux de données qui consiste à évaluer les variables et à vérifier si elles
sont correctement définies et initialisées avant d’être utilisées (référencées).
2. Analyse des flux d’information. Identifie les dépendances des variables de sortie. Ne
détecte pas les anomalies proprement dites mais met en évidence des informations
pour l’inspection ou la révision de code
3. Analyse de chemin. Identifie les chemins à travers le programme et énonce les
instructions exécutées dans ce chemin. Encore une fois, potentiellement utile dans le
processus de revue
4. Analyse d’interface. Vérifie la cohérence des déclarations de routine et de procédure et
leur utilisation
5. Analyse des flux de contrôle. Vérifie les boucles avec plusieurs points d’entrée ou de
sortie, trouve le code inaccessible, etc.
Conclusion
Au demeurant, il était question pour nous de parler des techniques statiques. Nous avons vu
que, ces techniques consistent à tester le système sans toutefois l’exécuter. Ses avantages
sont indéniables car elle permet de réduire le temps de test, car le nombre de fautes est
déterminé et corrigé très tôt dans le cycle du développement du logiciel. Cependant, ces
techniques ne permettent pas par exemple de tester la robustesse, l’aspect sécurité...D’où
l’utilisation des techniques dynamiques.
Références
[1] Homès, Bernard. Fundamentals of Software Testing. John Wiley Son, 2013.
[2] Sommerville, Ian and others Software engineering. Addison-wesley, 2007.
DYNAMIC PROGRAM ANALYSIS
Table des matières
Chapter ....................................................................................................................................... 3
Introduction ................................................................................................................................ 3
I.
Program analysis.................................................................................................................. 4
1.
Definition ........................................................................................................................ 4
2.
Purposes of Program Analysis ........................................................................................ 5
3.
Applications of Program Analysis .................................................................................. 5
4.
Program Analysis Flavors ............................................................................................... 6
a.
Static analysis .............................................................................................................. 6
b.
c.
Limitations ................................................................................................................... 6
II.
Dynamic analysis ............................................................................................................ 6
1.
Definition ........................................................................................................................ 7
2.
Dynamic Analysis goals .................................................................................................. 7
3.
4.
Limitations ...................................................................................................................... 7
III.
Dynamic Analysis Techniques ........................................................................................ 8
1.
Program instrumentation ................................................................................................. 8
a.
Static instrumentation .................................................................................................. 8
b.
Dynamic instrumentation ............................................................................................ 9
2.
Program tracing ............................................................................................................. 10
a.
What is Tracing ......................................................................................................... 10
b.
Why Tracing .............................................................................................................. 11
c.
How to Trace ............................................................................................................. 11
3.
IV.
Program profiling .......................................................................................................... 13
a.
What is profiling ........................................................................................................ 13
b.
Why profiling ............................................................................................................ 14
c.
How to do profiling ................................................................................................... 14
Dynamic Analysis Tools ............................................................................................... 15
References ................................................................................................................................ 18
Chapter
Introduction
Program analysis is a collection of techniques for computing approximate information
about a program. Program analysis finds several applications: in compilers, in tools that help
programmers understand and modify programs, and in tools that help programmers verify that
programs satisfies certain properties of interest. It is subdivised into two major parts. Static
analysis has long been used for analysing the dynamic behavior of programs because it is simple
and does not require running the program. Dynamic analysis, on the other hand, is the analysis
of the properties of a running program. It involves the investigation of the properties of a
program using information gathered at run-time. Deployment of software now-a-days as a
collection of dynamically linked libraries is rendering static analysis imprecise. In The
remainder of our work, we are going to be focused on Dynamic analysis, and the work is
organised as follows: Firstly an overview Program analysis (definition, purposes, application
and types),followed by Dynamic analysis(definition,purpose, advantage and limitation) , then
some techniques used in dynamic analysis. Finally in the last part, we present some tools for
conduct dynamic analysis.
I.
Program analysis
A program is a list of simples instructions be executed in order to perform a more
Example: a washing machine
1.
2.
3.
4.
5.
6.
Check that the cover is closed
Start the drum motor
Open the water valve
Close the valve as soon as there is enough water
Switch on the heating resistor
45 minutes after starting, drain the water and it’s all over.
While writting a programs, Human can easly forget details that seem obvious to him (Eg:Open
the water valve before switching on the heater), or poorly anticipates exceptional situations(
Eg: water supply cut off). So we have to study the behavior of the program so that does what
it is supposed to do, does not do what it should not do, and slove the expected problem in an
efficient way (optimization).
1. Definition
Program analysis is the process of automatically analyzing the behavior of computer
programs regarding a property such as correctness, robustness, safety and liveness.
Program analysis focuses on two major areas:
• Program optimization : is the process of modifying a software system to make some
aspect of it work more efficiently or use fewer resources.In general, a computer program
may be optimized so that it executes more rapidly, or to make it capable of operating with
less memory storage or other resources, or draw less power.
• Program correctness : focuses on ensuring that the program does what it is supposed to
do. In theoretical computer science, correctness of an algorithm is asserted when it is said
that the algorithm is correct with respect to a specification. Functional correctness refers
to the input-output behaviour of the algorithm (i.e., for each input it produces the
expected output).
2. Purposes of Program Analysis
Since Programs are written by men, during this step, human can make several
errors(Easily forgets details that seem obvious to him , Poorly anticipates exceptional situations
, Use the wrong data structure , Use less efficient alternative for au subtask, can be distranted
while programming), so we have to develop some techniques to reduce theses errors. Program
Analysis aim to ensure that:
• The program does what it is supposed to do.
• The program does not do what it should not do.
• Eg : The heating resistor is never switched on when the tank is empty.
• The program does what it is supposed to do efficiently
3. Applications of Program Analysis
Program Analysis has several applications, we subdivise its applications into :
• Optimization : In general, a computer program may be optimized so that it executes
more rapidly, or to make it capable of operating with less memory storage or other
resources.
• Avoid redundant / unnecessary computation
• Compute in a more efficient way
• Verifying correctness : correctness of an algorithm is asserted when it is said that the
algorithm is correct with respect to a specification.
• Software Quality Assurance activities : To ensure that the software under
development or modification will meet desired quality requirements
• Finding bugs
• Determining properties :
• Performance
• Security and reliability
• Design and architecture
4. Program Analysis Flavors
Two mains flavors for Program Analysis :
• Dynamic analysis
• Static analysis
a. Static analysis
In the context of program correctness, static analysis can discover vulnerabilities during
the development phase of the program. These vulnerabilities are easier to correct than the ones
found during the testing phase since static analysis leads to the root of the vulnerability.
Incorrect optimizations are highly undesirable. So, in the context of program optimization, there
are two main strategies to handle computationally undecidable analysis.
•
•
•
•
It can find weaknesses in the code at the exact location.
Source code can be easily understood by other or future developers.
Weaknesses are found earlier in the development life cycle, reducing the cost to fix.
It allows a quicker turn around for fixes.
c. Limitations
• It is time consuming if conducted manually.
• Automated tools produce false positives and false negatives.
• It does not find vulnerabilities introduced in the runtime environment.
• There are not enough trained personnel to thoroughly conduct static code analysis.
Program analysis seem often restricted to static analysis i.e the analysis of program source code
only. Although static analysis has advantages, it is not always the most appropriate type of
analysis.
As Shown by the popularity of prototyping, it seems for example that people often have a
better understanding of what a program should do than what it should be. The following pages
will only talk about Dynamic Analysis.
II.
Dynamic analysis
Static approaches typically concern (semi-)automatic analyses of source code. An
important advantage of static analysis is its completeness: a system’s source code essentially
represents a full description of the system. One of the major drawbacks is that static analyses
often do not capture the system’s behavioral aspects: in object-oriented code, for example,
occurrences of late binding and polymorphism are difficult to grasp if runtime information is
missing.
1. Definition
Dynamic program analysis is the analysis of computer software that is performed with
executing programs built from that software on a real or virtual processor. Dynamic program
analysis tools may require loading of special libraries or even recompilation of program code.
Dynamic analysis is in contrast to static program analysis.
2. Dynamic Analysis goals
Dynamic analysis is conducted for many reasons:
• Collect runtime execution information.
• Resource usage, execution profiles.
• Program comprehension.
• Find bugs in applications, identify hotspots
• Program transformation
• Optimize or obfuscate programs.
• Insert debugging or monitoring code.
• Modify program behaviors on the fly.
The advantages that we consider are:
• It is able to detect dependencies that are not possible to detect in static analysis.
• The precision with regard to the actual behavior of the software system, for example, in
the context of object-oriented software software with its late binding mechanism.
• The fact that a goal-oriented strategy can be used, which entails the definition of an
execution scenario such that only the parts of interest of the software system are analyzed
4. Limitations
The drawbacks that we distinguish are:
• The inherent incompleteness of dynamic analysis, as the behavior or execution traces
under analysis capture only a small fraction of the usually infinite execution domain of
the program under study.
• The difficulty of determining which scenarios to execute in order to trigger the program
elements of interest. In practice, test suites can be used, or recorded executions involving
user interaction with the system.
• The scalability of dynamic analysis due to the large amounts of data that may be produced
by dynamic analysis, affecting performance, storage, and the cognitive load humans can
deal with.
• The observer effect, i.e., the phenomenon in which software acts differently when under
observation, might pose a problem in multithreaded or multi-process software because of
timing issues.
III.
Dynamic Analysis Techniques
Dynamic analysis techniques reason over the run-time behavior of systems.In general,
dynamic analysis involves recording of a program’s dynamic state. This dynamic state is also
called as profile/trace. A program profile measures occurrences of events during program
execution. The measured event is the execution of a local portion of program like lines of code,
basic blocks, control edges, routines etc.
1. Program instrumentation
In its simplest form, instrumentation involves adding extra code to a program’s text. The
intent usually is to monitor some kind of program behavior—either for debugging or for
optimization purposes. For example, a programmer who wants to optimize the execution time
of a large program might first like to know the regions of code that are executed most of the
time, and then go on to optimize those frequently executed sections.
Instrumentation techniques are usually classified based on how and when the instrumentation
code is added inside a program in this compilation-execution process. Tools that add
instrumentation code before execution are called static instrumenters, whereas those modifying
a program during execution are known as dynamic instrumenters.
a. Static instrumentation
As has been mentioned earlier, static instrumentation techniques generally insert
instrumentation code inside a program statically i.e. during of after compilation but definitely
prior to execution. Compiling the source code of a program to the linked binary executable
generally involves three steps (as shown in Figure 0.1):
1. Compilation. In this step, the source code written in a high-level programming language
(such as C, Cþþ, Java, or Pascal) is translated to the assembly language representation for
a target machine.
2. Assembling. The assembly language representation of the program is converted to binary
object code in this step. For languages like C# or Java, this means translation into some
form of bytecode.
3. Linking. In this step, a set of separately compiled and assembled binary object files are
linked together to produce the target executable. For languages like C# and Java, linking
can mean merging of several bytecode files into one single bytecode that can be executed
on a VM.
Instrumentation code can be inserted during each of the above three steps. It is also possible to
instrument the source code before it is handed over to the compiler (source-to-source
instrumentation), or the linked executable produced by the compiler tool chain (binary
instrumentation/rewriting).
Figure 0.1: Instrumentation techniques in the compiler tool chain.
b. Dynamic instrumentation
Historically, static instrumentation techniques appeared first; so far, they have been mainly used
in program analysis, debugging, testing, and code-coverage tools. However, they suffer from
some major limitations, which are sketched below:
1. Static instrumentation techniques modify the software executable.
2. Static instrumentation can only cover code that is statically linked.
3. Static binary instrumentation is often difficult for binary formats that allow mixing of
code and data.
Figure 0.2: Dynamic instrumentation
These limitations have forced the development of dynamic instrumenters that perform
instrumentation at runtime (i.e., when the program is executing). Unlike static instrumenters,
dynamic instrumenters can only work on compiled executable binaries (or, bytecodes).
Therefore, dynamic instrumentation is also called Dynamic Binary Instrumentation (DBI).
Dynamic binary instrumentation (DBI) occurs at run-time. The analysis code can be
injected by a program grafted onto the client process, or by an external process. If the client
uses dynamically-linked code the analysis code must be added after the dynamic linker has
done its job.
Dynamic binary instrumentation has two distinct advantages. First, it usually does not
require the client program to be prepared in any way, which makes it very convenient for users.
Second, it naturally covers all client code; instrumenting all code statically can be difficult if
code and data are mixed or different modules are used, and is impossible if the client uses
dynamically generated code.
2. Program tracing
a. What is Tracing
Tracing is a process that faithfully records detailed information of program execution (lossless).
•
•
•
•
Control flow tracing: the sequence of executed statements.
Dependence tracing: the sequence of exercised dependences.
Value tracing: the sequence of values that are produced by each instruction.
Memory access tracing: the sequence of memory references during an execution
b. Why Tracing
Debugging, Code optimizations, Security
c. How to Trace
• Tracing by Printf
•
Tracing by Source Level Instrumentation
• Read a source file and parse it into ASTs.
• Annotate the parse trees with instrumentation.
• Translate the annotated trees to a new source file.
• Compile the new source.
• Execute the program and a trace produced.
Figure 0.3: An example
• Tracing by Binary Instrumentation
• Given a binary executable, parses it into intermediate representation. More
advanced representations such as control flow graphs may also be generated.
• Tracing instrumentation is added to the intermediate representation.
• A lightweight compiler compiles the instrumented representation into a new
executable.
Static: takes an executable and generate an instrumented executable that can be
executed with many different inputs
Dynamic: given the original binary and an input, starts executing the binary with the
input, during execution, an instrumented binary is generated on the fly; essentially the
instrumented binary is executed.
3. Program profiling
a. What is profiling
• Tracing is lossless, recording every detail of a program execution
• Thus, it is expensive.
• Potentially infinite.
•
Profiling is lossy, meaning that it aggregates execution information onto finite entries.
• Control flow profiling - Instruction/Edge/Function: Frequency;
• Value profiling - Value: Frequency
b. Why profiling
• Debugging
• Enable time travel to understand what has happened
• Code optimizations
• Identify hot program paths;
• Data compression;
• Value speculation;
• Data locality that help cache design;
• Performance tuning
c. How to do profiling
Path profiling, counts how often each path through a function is taken at runtime. Path profiles
can be used by optimizing compilers. Functions can be compiled such that the “hot path”, the
most frequently taken path through the function, executes particularly fast (Ammons & Larus,
1998).
• Goal: Count how often a path through a function is executed
• Interesting for various applications
• Profile-directed compiler optimizations
• Performance tuning: Which paths are worth optimizing?
• Test coverage: Which paths are not yet tested?
There is some challenges that a path profiling face :
• Runtime overhead: Limit slowdown of program
• Accuracy: Ideally, precise profiles (not heuristics, no approximations)
• Infinitely many paths: Cycles in control flow graph
Edge Profiling is a naive approach of path profiling
• Instrument each branching point
• Count how often each CFG edge is executed
• Estimate most frequent path: Always follow most frequent edge
Fails to uniquely identify most frequent path.
Theoretically, path profiles contain strictly more information than edge profiles. A
precise edge profile can be derived from a path profile (as shown in the next section), but the
converse dues not hold. Stated another way, there are many different path profiles that induce
the same edge profle, but different edge profiles impIy different path profiles. Thus, path
profiles provide the most information about control flow behavior, subsumed only by the
profiIing of longer and longer paths. The ultimate form of a path profile is a trace of an entire
execution, which is extremely costly tQ collect and store.
Today, edge profiles are the control flow profile of choice for compiler optimization,
We identify three main reasons for this. First, edge profiles provide a clear advantage over
vertex profiles as they provide more accurate information about the behavior of branches, which
is crucial to many optimizations. Second, edge profiles are generally thought to be easier and
cheaper to collect than path profiles.
IV.
Dynamic Analysis Tools
Dynamic analysis tools have been widely used for memory analysis, invariant detection,
deadlock and race detection, and metric computation. These tools are being used by companies
for their benefits. For example, Pin is a tool which provides the underlying infrastructure for
commercial products like Intel Parallel Studio suite of performance analysis tools. A summary
of dynamic analysis tools is provided in Figure 0.4
Figure 0.4: Dynamic Analysis Tools.
1. Valgrind is an instrumentation framework for building dynamic analysis tools. It can
automatically detect many memory management and threading bugs, and profile a
program in detail. Purify and Insure++ have similar functionality as Valgrind. Whereas
Valgrind and Purify instrument at the executables, Insure++ directly instruments the
source code. Pin is a tool for dynamic binary instrumentation of programs. Pin adds code
dynamically while the executable is running. Pin provides an API to write customized
instrumentation code (in C/C++), called Pintools. Pin can be used to observe low level
events like memory references, instruction execution, and control flow as well as higher
level abstractions such as procedure invocations, shared library loading, thread creation,
and system call execution.
2. Javana runs a dynamic binary instrumentation tool underneath the virtual machine. The
virtual machine communicates with the instrumentation layer through an event handling
mechanism for building a vertical map that links low-level native instruction pointers and
memory addresses to high-level language concepts such as objects, methods, threads,
lines of code, etc. The dynamic binary instrumentation tool then intercepts all memory
accesses and instructions executed and provides the Javana end user with high-level
language information for all memory accesses and natively executed instruction.
3. Daikon and DIDUCE are two most popular tools for invariant detection. The former is
an offline tool while the latter is an online tool. The major difference between the two is
that while Daikon generates all the invariants and then prunes them depending on a
property; DIDUCE dynamically hypothesizes invariants at each program point and only
presents those invariants which have been found to satisfy a property. Another major
difference is that Daikon collects tracing information by modifying the program abstract
syntax tree, while DIDUCE uses BCEL to instrument the class JAR files.
References
https://en.wikipedia.org/wiki/Program_analysis
 https://www.cis.upenn.edu/~alur/CIS673/isil-plmw.pdf
 A Survey of Dynamic Program Analysis Techniques and Tools,Anjana Gosain and
Ganga Sharma , 2015, Springer

Software Verification and Validation
Selected state-of-the-art results and real-world
applications
Contents
1. Introduction
2. Method
2-1. GP Method
2-2. ML Method
2-3. Datasets
2-4. Experiment design
3. Results
4. Summary and Conclusions
5. References
1. INTRODUCTION
In this paper we provide a broad benchmarking of recent genetic programming approaches to symbolic
regression in the context of state of the art machine learning approaches. Since the beginning of the field,
the genetic programming (GP) community has considered the task of symbolic regression (SR) as a basis
for methodology research and as a primary application area. GP-based SR (GPSR) has produced a number
of notable results in real-world regression applications, for example dynamical system modeling in physics ,
biology , industrial wind turbines , fluid dynamics , robotics , climate change forecasting , and financial
trading , among others. However, the most prevalent use of GPSR is in the experimental analysis of new
methods, for which SR provides a convenient platform for benchmarking. Despite this persistent use,
several shortcomings of SR benchmarking are notable. First, the GP community lacks a unified standard
for SR benchmark datasets, as noted previously [21]. Several SR benchmarks have been proposed, critiqued,
and blacklisted, leading to inconsistencies in the experimental design of papers. We contend that the lack of
focus in the GP community on rigorous benchmarking makes it hard to know how GPSR methods fit into
the broader machine learning (ML) community.
2. METHODS
We compare four recent GPSR methods in this benchmark and ten well-established ML regression
methods. In this section we briefly present the selected methods and describe the design of the experiment.
2.1 GP methods
A number of factors impacted our choice of these methods. Two key elements were open-source
implementations and ease of use. In addition, we wished to test different research thrusts in GP literature.
The four methods encompass different innovations to standard GPSR, including incorporation of constant
optimization, semantic search divers, and Pareto optimization. Each method is described briefly below.
Multiple regression genetic programming (MRGP). MRGP combines Lasso regression with the tree search
afforded by GP. A weight is attached to each node in each program.
ε-Lexicase selection (EPLEX). ε-lexicase selection adapts lexicase selection method for
regression. Rather than aggregating performance on the training set into a single fitness score, EPLEX
selects parents by filtering the population through randomized orderings of training samples and removing
individuals that are not within ε of the best performance in the pool. We use the EPLEX method
implemented in Ellyn 2. Ellyn is a stack-based GP system written in C++ with a Python interface for use
with scikit-learn. It uses point mutation and sub tree crossover. Weights in the programs are trained each
generation via stochastic hill climbing. A Pareto archive of trade-offs between mean squared error and
complexity is kept during each run, and a small internal validation fold is used to select the final model
returned by the search process.
Age-fitness Pareto Optimization (AFP). AFP is a selection scheme based on the concept of agelayered populations introduced by Hornby. AFP introduces a new individual each generation with an age of
0. An individual’s age is updated each generation to reflect the number of generations since its oldest node
(gene) entered the population. Parent selection is random and Pareto tournaments are used for survival on
the basis of age and fitness. We use the version of AFP implemented in Ellyn, with the same settings
described above.
Geometric Semantic Genetic Programming (GSGP). GSGP is a recent method that has shown
many promising results for SR and other tasks. The main concept behind GSGP is the use of semantic
variation operators that produce offspring whose semantics lie on the vector between the semantics of the
parent and the target semantics (i.e. target labels). Use of these variation operators has the advantage of
creating a unimodal fitness landscape. On the downside, the variation operators result in exponential growth
of programs. We use the version GSGP implemented in C++ by Castelli which is optimized to minimize
memory usage. It is available from Source Forge.
2.2 ML methods
We use scikit-learn [26] implementations of the following methods in this study: Linear Regression. Linear
Regression is a simple model of regression that minimizes the sum of the square errors of a linear model of
inputs. The model is defined by y = b + wTx, where y is a dependent variable(target), x are explanatory
variables, b and w are intercept and slope variables, and the minimized function is equal to (1).
Kernel Ridge. Kernel Ridge performs Ridge regression using a linear function in the space of the respective
kernel. Least square with l2-norm regularization is applied in order to prevent over fitting. The minimized
function is equal to (2), where φ is a kernel function and λ is the regularization parameter.
Least-angle regression with Lasso. Lasso (Least absolute shrinkage and selection operator) is a popular
method of regression that applies both feature selection and regularization. Similarly to Kernel Ridge, high
values of w are penalized. The use of the l1-norm on w in the minimization function (see (3)) improves the
ability to push individual weights to zero, effectively performing feature selection.
Least-angle regression with Lasso, a.k.a. Lars [10], is an efficient algorithm for producing a family of Lasso
solutions. It is able to compute the exact values of λ for new variables entering the model.
Linear SVR. Linear Support Vector Regression extends the concept of Support Vector Classifiers
(SVC) to the task of regression, i.e. to predict real values instead of classes. Its objective is to minimize an
ε-insensitive loss function with a regularization penalty (1/2||w||2) in order to improve generalization.
SGD Regression. SGD Regression implements stochastic gradient descent and is especially well
suited for larger problems with over 10,000 of instances. We add this method of regression regardless, to
compare its performance on smaller datasets.
MLP Regressor. Neural networks have been applied to regression problems for almost three
decades. We include multilayer perceptron’s (MLPs) as one of the benchmarked algorithms. We decided to
benchmark neural network with a single hidden layer with fixed number of neurons (100) and compare
different activation functions, learning functions and solvers.
AdaBoost regression. Adaptive Boosting, called also AdaBoost, is a flexible technique of
combining a set of weak learners into a single stronger regressor. By changing the distribution (i.e. weights)
of instances in the data, previously misclassified instances are favored in consecutive iterations. The final
prediction is obtained by a weighted sum or weighted majority voting. As the result, the final regress or has
smaller prediction errors. The method is considered sensitive to outliers. Random Forest regression.
Random Forests are a very popular ensemble method based on combining multiple decision trees into a
single stronger predictor. Each tree is trained independently with a randomly selected subset of the instances,
in a process known as bootstrap-aggregating or bagging. The resulting prediction is an average of multiple
predictions. Random forests try to reduce variance by not allowing decision trees to grow large, making
them harder to over fit.
Gradient Boosting regression. Gradient Boosting is an ensemble method that is based on regression
trees. It shares the AdaBoost concept of iteratively improving the system performance on its weakest points.
In contrast to AdaBoost, the distribution of the samples remain the same. Instead, consecutively created
trees correct the errors of the previous ones. Gradient Boosting minimizes bias (not variance like in Random
Forests). In comparison to Random Forests, Gradient Boosting is sequential (thus slower), more difficult
to train, but is reported to perform better than Random Forest.
Extreme Gradient Boosting. Extreme Gradient Boosting, also known as XGBoost, incorporates
regularization into the Gradient Boosting algorithm in order to control over fitting. Its objective function
combines the optimization of training loss with model complexity. This brings the predictor closer to the
underlying distribution of the data, while encouraging simple models, which have smaller variance. Extreme
gradient boosting is considered a state-of-the-art
Method in ML.
2.3 Datasets
We pulled the benchmark datasets from the Penn Machine Learning Benchmark (PMLB) [25]
repository, which contains a large collection of standardized datasets for classification and regression
problems.
2.4 Experiment design
In order to benchmark different regression methods, an effort was made to measure performance
of each of the methods in as similar an environment as possible. First, we decided to treat each of the GP
methods as a classical ML approach and used the scikit-learn library [26] for cross validation and hyper
parameter optimization. This required some source code modifications to allow GSGP and MRGP to
communicate with the wrapper. Second, instead of reimplementing the algorithms, we relied on the original
implementations with as few modifications as possible. Wrapping each method allowed us to keep a
common benchmarking framework based on the scikit-learn functions.
Table 1. Analyzed algorithms with their parameters settings. The parameters in quotations refer to their
names in the scikit-learn implementations.
Data preprocessing. We decided to feed benchmarked algorithms with scaled data using
StandardScaler function from scikit-learn. The reason for this is our effort to keep the format of the input
data consistent across multiple algorithms for the purpose of benchmarking.
Initialization of the algorithms.
We initially considered starting each of the methods with the same random seed, but eventually decided to
make all data splits randomly. In our belief both approaches have disadvantages: the results will either be
biased by the choice of the random seed, or by using different splits for different methods. By taking a
median of the scores we became independent of the initial split of the data.
Wrappers for the GP methods. Some modifications needed to be done to each of the GP methods. For
EPLEX and
AFP, Ellyn provides an existing Python wrapper that was used. For other methods we implemented a
class derived from scikit-learn Base Estimator, which implemented two methods: fit()
, used for training the regressor, and predict (), used for testing performance of the regressor. The source
code of MRGP and GSGP had to be modified, so that the algorithms could communicate with the
wrapper.
Parameters for the algorithms. The settings of the input parameters for the algorithms were
determined based on the available recommendations for the given method, as well as previous experience
of the authors. For GP-based methods we applied from 6 to 9 different settings (mainly: population size x
number of generations and crossover and mutation rates). For the ML algorithms the number of settings
was method dependent.
3. RESULTS
The relative performance of the algorithms was determined as the ability to make the best
predictions on the training and testing data using mean squared error (MSE) of the samples. The
performance on the testing dataset is of primary importance, as it shows how well the methods can
generalize to previously unseen data [7]. However we include the training comparisons as a way to assess
the predilection for over fitting among methods.
We first analyze the results for each of the regression tasks on the training data. The relative rankings of
each
method in terms of MSE is presented in Fig. 2. The best training performance was obtained with gradient
boosting,
which completed in top-2 for the vast majority of the benchmarked datasets.
4. CONCLUSIONS
In this paper we evaluated four recent GPSR methods in comparison to ten state-of-the-art ML
methods on a set of 94 real-world regression problems. We consider hyper-parameter optimization for each
method using nested cross validation, and compare the methods in terms of the MSE they produce on
training and testing sets, and their runtime. The analysis includes some interesting results. Two of the GPbased methods, namely: EPLEX and MRGP, produce competitive results compared to state-of-the-art ML
regression approaches. The downside of the GP-based methods is their computation complexity when run
on a single thread, which contributes to much higher runtimes. Parallelism is likely to be a key factor in
allowing GP-based approaches to become competitive with leading ML methods with respect to running
times. We also should note some shortcomings of this study that motivate further analysis. First, a guiding
motivation for the use of GPSR is often its ability to produce legible symbolic models. Our analysis did not
attempt to quantify the complexity of the models produced by any of the methods. Ultimately the relative
value of explain ability versus predictive power will depend on the application domain. Second, we have
considered real world datasets for the source of our benchmarks. Simulation studies could also be used, and
have the advantage of providing ground truth about the underlying process, as well as the ability to scale
complexity or difficulty. It should also be noted that the datasets used for this study were of relatively small
sizes (up to 1000 of instances). Future work should consider larger dataset sizes, but will come with a larger
computational burden. We have also limited our initial analysis to looking at bulk performance of algorithms
over many datasets. Further analysis of these results should provide insight into the properties of datasets
that make them amenable to, or difficult for, GP-based regression. Such an analysis can provide suggestions
for new problem sub-types that may be of interest to the GP community. We hope this study will provide
the ML community with a data-driven sense of how state-of-the-art SR methods compare broadly to other
popular ML approaches to regression.
.
.
5. References
[1] Aurum, A., Petersson, H. and Wohlin, C., “State-of-the-Art: Software Inspections
after 25 Years”, Software Testing, Verification and Reliability, 12(3):133-154, 2002.
[2] Basili, V. R. and Selby, R. W., “Comparing the Effectiveness of Software Testing Strategies”,
IEEE Transaction on Software Engineering, 13(12):1278-1296, 1987.
[3] Softaware Validation and Verification – a state of the art report(Panel discussion), published in:
.processing ACM New York, NY, USA @1978.
```