Mastering noise and silence in learner answers processing Simple techniques for analysis and diagnosis Olivier Kraif, Claude Ponton, Alexia Blanchard LIDILEM Laboratory, Stendhal University, Grenoble, France {olivier.kraif; claude.ponton; alexia.blanchard}@u-grenoble3.fr ■ Learner answers analysis Error description A real need for high quality feedback but4 Some facts: ♦ In most systems, analysis is only made by testing character string identity ♦ NLP techniques in the field of CALL are underused due to: ∗ the lack of reliability (noise, erroneous analyses) ∗ the high cost of implementation ♦ Lack of systematic follow up on experiments ♦ Overambitious and hardly attainable goals Some hopes: ♦ Error detection alone may be a valuable step towards didactic use ♦ Some straightforward and basic NLP techniques are reliable enough ♦ To cope with the lack of reliability, it is possible to put forward "Computer Aided" approaches rather than "Automatized" processes (correction, evaluation, feedback generation, activity generation, etc.) Diagnosis Enriched learner production Detection/ description Diagnosed production Annotation Generic NLP processes Specific NLP processes (triangulation) Enriched expected answer Towards a low cost strategy An empirical approach based on the following principles: • Identifying the applications which allow the user to keep some leeway in interpreting results (partial analyses, unsolved ambiguities, etc.) ⇒ machine aided correction, comprehension aids, activity generators, contentoriented tools • Implementing first the most basic and reliable NLP techniques such as tokenization, POS tagging, lemmatization, morphological analysis. • Mastering, from the end-user (i.e. didactic) point of view, the short comings of Natural Language Processing. For instance, in the context of an activity, the knowledge about the expected answer (EA) may yield additional data for the given answer (GA) analysis. • When ambiguities remain, multiple analyses may be integrated into the learning process, in order to help users (teachers or learners) to make the right decisions. • Developing a modular and declarative approach designed for resources and processes reusability, and allowing end-users to define by themselves the relevant knowledge and parameters. Learner production (GA) ♦ ♦ ♦ Expected Answer (EA) Lemmatization POS tagging Morphological analysis Contextual knowledge Activity Didactic knowledge Feedback generation ExoGen Learner ■ The ExoGen system General principle Examples - Simplification of triangulation : The analysis is reduced to a comparison between EA and GA (no contextual analysis). Resource: online inflected forms dictionary (http://abu.cnam.fr/) glace glacé glacent glacera glaceraient glacer glacer glacer glacer glacer Ver:IPre+SG+P1:IPre+SG+P3:SPre+SG+P1:SPre+SG+P3:ImPre+SG+P2 Ver:PPas+Mas+SG Ver:IPre+PL+P3:SPre+PL+P3 Ver:IFut+SG+P3 Ver:CPre+PL+P3 Analysis principle : Lesser difference heuristic, the analysis is guided by similarities between potential tags of both EA and GA EA: si j'avais su GA: si j'aurais su Common tags : Ver+SG Category : Ver Tags : IImp+SG+P1 or IImp+SG+P2 Category : Ver Tags : CPre+SG+P1 or CPre+SG+P2 Disambiguated difference: IImp ¹ CPre Not disambiguated : P1 or P2 Examples of error Description (automatically generated) (9) avant de retourner [arriver] en Angleterre Forme grammaticalement correcte (verbe infinitif), mais on attendait une autre forme et beaucoup d’échafaide [échafaudages] Orthographe erronée ou mot inconnu du dictionnaire Je dois me dépécher [dépêcher] Orthographe erronée : problème d’accent (9) sommes bien amusées et c’est vrai [juste] de dire que nous avons Forme grammaticalement correcte (adjectif ou adverbe ou nom mascudansé assez bien lin singulier), mais on attendait une autre forme C’était désespéré [désespérant] mais c’était la seule chance (9) S’il s’agit du verbe désespérer : Cas 1 [Masculin singulier] : On attend un participe présent et non un participe passé Pour moi l’ [cette] image crée une ambiance délassante Forme grammaticalement correcte sur le plan de la catégorie (déterminant), mais on attendait une autre forme avec d’autres traits Le Premier ministre reste toujours un britannique [Britannique] Exact, mais il faut une majuscule à l’initiale Legend : Error found [correction] Evaluation of error descriptions EA=G A a fte r g r a p h ic a l n o r m a lis a tio n All cases Non ambiguous Totally disambiguated Partially disambiguated Not disambiguated Correct 312 187 104 14 7 Incorrect 6 1 5 0 0 Precision 0,981 0,995 0,954 1 1 fa ls e tr u e C ase, s p a c in g ,... d iffe r e n c e s Frida corpora (Granger, 2001) G A = unknow n fa ls e tr u e Forthcoming: integration of a morphological analyzer (Blanchard, 2007) G A a n d E A s h a re th e s a m e le m m a G A c lo s e to E A Aim: morphological analysis of unknown forms (paradigm confusion) fa ls e tr u e GA = EA e x c e p t d ia c r itic s tr u e D ia c r itic d iffe r e n c e s fa ls e O r th o g r a p h ic a l d iffe r e n c e tr u e A fo r m c lo s e to G A e x is ts in th e le x ic o n tr u e O r th o g r a p h ic a l d iffe r e n c e : lis tin g o f th e n e a r e s t fo r m s G A a n d E A s h a re th e s a m e c a te g o r y tr u e fa ls e U nknow n fo r m fa ls e Tag d iffe r e n c e s tr u e fa ls e fa ls e G A and EA s h a r e th e s a m e ta g s Tag and c a te g o r y d iffe r e n c e s e.g. e.g. e.g. e.g. e.g. e.g. "échafaide" "égales" "considère" "dépécher" "comtempler" "CEE" instead of instead of instead of instead of instead of instead of "échaffaudage" "égaux" "considérer" "dépêcher" "contempler" "C.E.E." ■ Perspectives G A a n d E A s h a re th e s a m e c a te g o r y tr u e Lem m a d iffe r e n c e s fa ls e Lem m a and ta g d iffe r e n c e s e.g. e.g. "prennent" "souffrons" instead of instead of "saisissent" "subirons" L e m m a , ta g a n d c a te g o r y d iffe r e n c e s e.g. "mieux" instead of "préférables" General principle: segmentation of inflected forms into a [base form + inflection(s)] which are interpreted linguistically 1. Integration into generic NLP processes in order to reduce numbers of unknown forms and therefore to generate an analysis 2. Modifying tree analysis with checking inflectional model Example GA: attitudent EA: attitudes Category: N Category: N Tags: fem,plu Tags: fem,plu Model: inflection [-ent] (plu) Model: inflection [-s] (plu) This analysis allows description of “attitudent” as flexional error on plural Completion of lesser difference analysis: integration of a wordnet or a thesaurus (semantic distance between lemmas) Context analysis in order to disambiguate more precisely (depending on triangulation EA/GA/Context) Definition of declarative rules to design a diagnosis process based on the lesser difference analysis (detection/description level). These rules should be applicable even in case of residual ambiguity (e.g. suggestions, hypothesis, more general diagnosis,...) Experimentation (work in progress): past participle agreement errors analysis in perfect tense (“passé composé”). Evaluation with end-users: French as a Foreign Language teachers / learners