Mastering noise and silence in learner answers processing Simple

The ExoGen system
Towards a low cost strategy
An empirical approach based on the following principles:
Identifying the applications which allow the user to keep some leeway in interpreting results (partial analyses,
unsolved ambiguities, etc.) machine aided correction, comprehension aids, activity generators, content-
oriented tools
Implementing first the most basic and reliable NLP techniques such as tokenization, POS tagging, lemmatization,
morphological analysis.
Mastering, from the end-user (i.e. didactic) point of view, the short comings of Natural Language Processing. For
instance, in the context of an activity, the knowledge about the expected answer (EA) may yield additional data for
the given answer (GA) analysis.
When ambiguities remain, multiple analyses may be integrated into the learning process, in order to help users
(teachers or learners) to make the right decisions.
Developing a modular and declarative approach designed for resources and processes reusability, and allowing
end-users to define by themselves the relevant knowledge and parameters.
Learner answers analysis
Perspectives
Completion of lesser difference analysis: integration of a wordnet or a thesaurus (semantic distance between lemmas)
Context analysis in order to disambiguate more precisely (depending on triangulation EA/GA/Context)
Definition of declarative rules to design a diagnosis process based on the lesser difference analysis (detection/description level). These rules should be applicable even in case of
residual ambiguity (e.g. suggestions, hypothesis, more general diagnosis,...)
Experimentation (work in progress): past participle agreement errors analysis in perfect tense (“passé composé”). Evaluation with end-users: French as a Foreign Language teachers /
learners
Mastering noise and silence in learner answers processing
Simple techniques for analysis and diagnosis
Olivier Kraif, Claude Ponton, Alexia Blanchard
LIDILEM Laboratory, Stendhal University, Grenoble, France
{olivier.kraif; claude.ponton; alexia.blanchard}@u-grenoble3.fr
Contextual
knowledge
Enriched
expected
answer
Diagnosed
production
Feedback
generation
Generic NLP
processes
Lemmatization
POS tagging
Morphological
analysis
Didactic
knowledge
Activity
Learner
production
(GA)
Enriched
learner
production
Error
description Annotation
Detection/
description
Diagnosis Specific NLP
processes
(triangulation)
Expected
Answer
(EA)
Learner
ExoGen
Examples of error Description (automatically generated)
() avant de retourner [arriver] en Angleterre Forme grammaticalement correcte (verbe infinitif),
mais on attendait une autre forme
et beaucoup d’échafaide chafaudages] Orthographe erronée ou mot inconnu du dictionnaire
Je dois me dépécher [dépêcher] Orthographe erronée : problème d’accent
() sommes bien amusées et c’est vrai [juste] de dire que nous avons
dansé assez bien
Forme grammaticalement correcte (adjectif ou adverbe ou nom mascu-
lin singulier), mais on attendait une autre forme
C’était désespéré [désespérant] mais c’était la seule chance ()
S’il s’agit du verbe sespérer :
Cas 1 [Masculin singulier] : On attend un participe présent et non un
participe passé
Pour moi l’ [cette] image crée une ambiance délassante Forme grammaticalement correcte sur le plan de la catégorie
(déterminant), mais on attendait une autre forme avec d’autres traits
Le Premier ministre reste toujours un britannique [Britannique] Exact, mais il faut une majuscule à l’initiale
Legend : Error found [correction]
Examples -
Frida corpora (Granger, 2001)
General principle
Some facts:
In most systems, analysis is only made by testing
character string identity
NLP techniques in the field of CALL are underused
due to:
the lack of reliability (noise, erroneous analyses)
the high cost of implementation
Lack of systematic follow up on experiments
Overambitious and hardly attainable goals
A real need for high quality feedback but4
Some hopes:
Error detection alone may be a valuable step towards
didactic use
Some straightforward and basic NLP techniques are
reliable enough
To cope with the lack of reliability, it is possible to put
forward "Computer Aided" approaches rather than
"Automatized" processes (correction, evaluation,
feedback generation, activity generation, etc.)
Simplification of triangulation :
The analysis is reduced to a comparison between EA and GA (no contextual analysis).
Resource: online inflected forms dictionary (http://abu.cnam.fr/)
glace glacer Ver:IPre+SG+P1:IPre+SG+P3:SPre+SG+P1:SPre+SG+P3:ImPre+SG+P2
glacé glacer Ver:PPas+Mas+SG
glacent glacer Ver:IPre+PL+P3:SPre+PL+P3
glacera glacer Ver:IFut+SG+P3
glaceraient glacer Ver:CPre+PL+P3
Analysis principle : Lesser difference heuristic, the analysis is guided by similarities
between potential tags of both EA and GA
EA: si j'avais su Category : Ver Tags : IImp+SG+P1 or IImp+SG+P2
GA: si j'aurais su Category : Ver Tags : CPre+SG+P1 or CPre+SG+P2
Common tags : Ver+SG Disambiguated difference: IImp ¹ CPre Not disambiguated : P1 or P2
Evaluation of error descriptions
All cases Non ambiguous Totally
disambiguated
Partially
disambiguated
Not
disambiguated
Correct 312 187 104 14 7
Incorrect 6 1 5 0 0
Precision 0,981 0,995 0,954 1 1
Forthcoming: integration of a morphological analyzer (Blanchard, 2007)
Aim: morphological analysis of unknown forms (paradigm confusion)
General principle: segmentation of inflected forms into a [base form + inflection(s)] which are
interpreted linguistically
1. Integration into generic NLP processes in order to reduce numbers of unknown forms and
therefore to generate an analysis
2. Modifying tree analysis with checking inflectional model
Example
GA: attitudent Category: N Tags: fem,plu Model: inflection [-ent] (plu)
EA: attitudes Category: N Tags: fem,plu Model: inflection [-s] (plu)
This analysis allows description of “attitudent” as flexional error on plural
E A = G A
a fte r g r a p h ic a l
n o r m a lis a tio n
G A = u n k n o w n
fa ls e
tr u e
G A c lo s e to E A
tru e fa ls e
G A a n d E A s h a r e
th e s a m e le m m a
G A a n d E A s h a re
th e s a m e c a te g o ry
G A a n d E A s h a r e
th e s a m e c a te g o ry
tru e fa ls e
tru e fa ls e
G A = E A
e x c e p t d ia c r itic s
A fo r m c lo s e to G A
e x is ts in th e le x ic o n
tru e fa ls e
D ia c r itic
d iffe re n c e s
O rth o g ra p h ic a l
d iffe re n c e
tru e fa ls e
U n k n o w n
fo r m
O rth o g ra p h ic a l
d iffe re n c e :
lis tin g o f th e
n e a re s t fo r m s
tru e
T a g
d iffe re n c e s
fa ls e
T a g a n d
c a te g o r y
d iffe re n c e s
tru e
fa ls e
fa ls e
tru e
G A a n d E A
s h a re th e s a m e ta g s
L e m m a
d iffe re n c e s L e m m a a n d
ta g d iffe r e n c e s
L e m m a , ta g
a n d c a te g o ry
d iffe re n c e s
C a s e ,
s p a c in g ,...
d iffe re n c e s
e.g.
"dépécher"
instead of
"dépêcher"
e.g.
"comtempler"
instead of
"contempler"
e.g.
"CEE"
instead of
"C.E.E."
e.g.
"échafaide"
instead of
"échaffaudage"
e.g.
"égales"
instead of
"égaux"
e.g.
"considère"
instead of
"considérer"
e.g.
"prennent"
instead of
"saisissent"
e . g .
" s o u f f r o n s "
i n s t e a d o f
" s u b i r o n s "
e . g .
" m i e u x "
i n s t e a d o f
" p r é f é r a b l e s "
1 / 2 100%

Mastering noise and silence in learner answers processing Simple

La catégorie de ce document est-elle correcte?
Merci pour votre participation!

Faire une suggestion

Avez-vous trouvé des erreurs dans linterface ou les textes ? Ou savez-vous comment améliorer linterface utilisateur de StudyLib ? Nhésitez pas à envoyer vos suggestions. Cest très important pour nous !