Manuscript Number: EVOS-D-20-00113 Article Title: A novel approach for facial expression recognition based on genetic algorithm Journal: Evolving Systems Authors: A. Boughida, M.N. Kouahla, Y. Lafifi Response to reviewer # 1 We appreciate the reviewer’s helpful comments and wish to thank him for the time spent on our paper. Reviewer’s comment #1 English expressions need polishing. For instance, "The stop stopping chosen is when the number of generations…" does not make sense to me. Author’s answer We have considered these remarks; a native English speaker checked the grammatical review. For instance, “The stop stopping”, becomes “The stop criteria”. Reviewer’s comment #2 P12L33 "The mutation rate is usually chosen to be 1/m [], where m is the length of the chromosome." I think that it misses a citation. Author’s answer Yes, we just forgot to insert the citation that refers to the mutation rate that I used in the genetic algorithm. The text becomes after insertion of the citation: "The mutation rate is usually chosen to be 1/m [52], where m is the length of the chromosome." Where the added reference is: [52]. Ghodrat Moghadampour. Outperforming mutation operator with random building block operator in genetic algorithms. In International Conference on Enterprise Information Systems, pages 178–192. Springer, 2011. Reviewer’s comment #3 In 5.1 the authors state that they performed a random train-test split with 0.75/0.25. Since the samples are below 1000 why did not perform cross-validation instead of keeping a hold-out dataset? Author’s answer Most of the related works use directly the cross-validation strategy with the CK, CK +, and JAFFE databases. That's why we redid the whole experiment with 10-fold-cross-validation. Despite this, there is work that uses the hold-out strategy in experimenting with the same databases as work (Yang, et al, 2010), where the authors have randomly divided the data into 66 % train set and 33% test set. We therefore decide to preserve some results with the Hold-out strategy (Results of table 6 become table 9). PS: Since the use of 10-fold-cross-validation caused a redo of the whole experiment, values were changed: The population size becomes 200 instead of 80 (see 4th experiment). Also, we were forced to set the maximum iterations of the genetic algorithm by 200 instead of 100 because the algorithm does not converge at 100 generations with this new experiment. The downside when using 10-fold-cross-validation is the high complexity compared to Hold-out, because for each chromosome the learning will be repeated 10 times. We notice that the results are very close to that of hold-out, with a non-decisive advantage. Modification of paragraph: section 5.2, page 14. Addition of paragraph: section 5.2, 2nd experiment, page 17 Reviewer’s comment #4 Using a single number, we cannot conclude for the significance of the results of Table 6. Author’s answer To conclude the significance of the results of Table 6 (renamed to table 7), we can compare the results of GA-based method and Randoized search based method by using not only the recognition rate but also the number of features used (a strong point of our contribution). For this, we have added a table 9 (new table) which compares the size of the feature vector of the two methods. For the method based on Randomized Search, the size of feature vector is directly the number of features after reduction with PCA unlike to our method, which will optimize the feature vector. Modification of paragraph: section 5.2, 2nd experiment, page 16. Addition of table 8: section 5.2, 2nd experiment, page 17 Reviewer’s comment #5 Did the authors perform stratification when splitting the dataset in train/test splits? Author’s answer Of course, we used it in k-fold-cross-validation, to preserve the same proportions of examples in each class as seen in the original dataset. Modification of paragraph: section 5.2, page 14. Reviewer’s comment #6 What is the distribution of each expression in each dataset? Author’s answer The distribution for each expression in each dataset: CK+: Angry (An) 45, Contempt (Co) 18, Disgust (Di) 59, Fear (Fe) 25, Happy (Ha) 69, Sadness (Sa) 28, Surprise (Su) 83. CK: Angry (An) 101, Disgust (Di) 20, Fear (Fe) 33, Happy (Ha) 112, Sadness (Sa) 150, Surprise (Su) 70. JAFFE: Angry (An) 30, Disgust (Di) 29, Fear (Fe) 32, Happy (Ha) 31, Neutral (NE) 30, Sadness (Sa) 31, Surprise (Su) 30. We have added these distributions in table 4, page 13, section 5.1 that describes the distribution for each expression in each of the three datasets: CK, CK +, and JAFFE. Reviewer’s comment #7 The authors used everywhere the accuracy. Assuming that we are interested in all expressions, accuracy makes sense if the distribution of the expressions is similar for each class, otherwise a metric that takes into account the class imbalance should be employed. Author’s answer We can compare the recognition rates of the three databases by adding the F1-score metric, which will allow us to take into account the imbalance problem. We noticed that accuracy ≈ f1_score for the three databases (for example for CK+ f1_score = accuracy = 94.20%). For that, the use of accuracy is not going to be a problem. Also, we decide to use accuracy to compare our recognition rate with that of related work. Addition of 2 paragraphs: section 5.2, page 14 and page 15. Reviewer’s comment #8 In 5.2.1 in the first experiment the authors compare their proposed approach against the grid search approach using predefined values. This is not adequate and on each own does not show the superiority against their genetic algorithm. For fair comparison they should compare against the random search approach where for same iterations (100) they should sample the hyper parameters at random from a uniform distribution. Author’s answer We replaced GridSearch with Ramdomized Grid Search. It will allow the C/gamma pairs to be chosen at random, and it stops in iteration 100. The pair chosen will be the pair which gives the best accuracy. PS: As we said above, we changed the max iteration by 200 Modification of paragraph: section 5.2, 2nd experiment, page 16. Reviewer’s comment #9 I think that Table 5 is redundant. Author’s answer It is not too important in the comparison between our approach and Randomized Seacrh (we replaced Grid Search with Randomized Search). It will just make it possible to compare each fold of the GA-based method with that of the Randomized search. That is why we deleted it. Deletion of table 5 page 15 (OLD PAPER) Reviewer’s comment #10 Why did the authors not include the Gabor filter parameters inside their chromosome of their genetic algorithm? Author’s answer We thought of enriching this genetic system with Gabor parameters in another separate work, but several challenges are noted: The codification of Gabor parameters in the chromosome Increase in the complexity of the genetic system, because for each chromosome, we will not only select the characteristics and learn the model, but also the extraction of the characteristics. The choice of the range of values for each of these parameters. Addition of paragraph: section 6, page 20 ------------------------------------------------------------------Dear Reviewer, We have done our best to answer all your remarks and comments. We would like to thank you for the time you spent reviewing our paper. We hope that, in our corrections, we have followed your advice and done what you expected us to do. Thank you for your help. Adel BOUGHIDA --------------------------------------------------------------------