Telechargé par larisa.grcic

eLex Vintar

publicité
Modelling specialised knowledge with conceptual frames: the TermFrame approach to a
structured visual domain representation
Špela Vintar, Amanda Saksida, Uroš Stepišnik, Katarina Vrtovec
{spela.vintar, amanda.saksida, uros.stepisnik, katarina.vrtovec}@ff.uni-lj.si
Faculty of Arts, University of Ljubljana
Extended abstract
1. Introduction
The frame-based approach in terminology (Faber 2009; 2012) has provided a valuable new
framework for representing specialised knowledge by combining linguistic information derived from
specialised corpora with conceptual structures and by highlighting the fact that the cognitive frames
underlying specialised communication are dynamic, context- and culture dependent. The
TermFrame project reimplements, extends and adapts the above methodology to the domain of
karstology and a set of three languages, English, Slovene and Croatian, in order to propose a new
multi-purpose model of knowledge representation which can be used by linguists, terminologists,
experts and data scientists alike. We describe the TermFrame annotation framework, our annotated
corpora and an analysis of the conceptual frames which emerged from the semantically annotated
definitions. In particular we focus on cross-language differences and the use scenarios to adequately
represent and visualise complex domain-specific knowledge.
2. TermFrame corpora, definition extraction and annotation framework
For the purposes of our research, we built three corpora, Slovene, English and Croatian. The corpora
contain relevant, authentic and contemporary works on karstology and are comparable in the terms
of domain and text types included. The corpora comprise scientific texts (scientific papers, books,
articles, doctoral and master’s theses, glossaries and dictionaries) from the field of karstology, which
in itself is an interdisciplinary domain partly overlapping with geology, hydrology, speleology and
other fields.
Definition candidates were extracted automatically using ClowdFlows (Pollak et al. 2012), then
manually validated to retain only contexts with valuable explanatory information about the concept,
but not necessarily compliant with the traditional analytic definition structure. Since our domain of
interest is karstology, only definitions of concepts relevant to karst were considered.
The TermFrame annotation framework was developed within an interdisciplinary team consisting of
a domain expert, a terminologist and a cognitive linguist, and in collaboration with text mining
experts. Each definition is annotated on four layers:
- Definition element with the tags DEFINIENDUM, DEFINITOR, GENUS
- Semantic category, a taxonomically organised conceptual hierarchy adapted to karstology
with the following top-level nodes: A. Landform, B. Process, C. Geome, D. Entity and E.
Instrument/Method
- Relation, to mark the part of the definition where a specific property of the definiendum is
described; our framework defines 16 relations such as HAS_FORM, HAS_CAUSE, CONTAINS,
AFFECTS, HAS_SIZE etc.
- Relation frame, this level shall facilitate Machine Learning of knowledge patterns and marks
the phrase which introduces a specific type of relation.
Figure 1: Annotated sentence in the WebAnno tool
3. Conceptual frames in karstology and their representation
On the basis of the multi-layer semantic annotation we can discern the most frequent conceptual
frames for specific semantic categories, explore cross-language differences and visualize conceptual
networks directly from the annotated corpus. In the full paper we present such frames and the
results of a quantitative analysis for English and Slovene. Table 1 below illustrates how cooccurrences between semantic categories and relations allow us to define prototypical definition
frames. A surface landform (e.g. karren, grike, sinkhole) will be defined by specifying its FORM,
LOCATION and CAUSE, while larger karst entities (geomes) are primarily defined by ennumerating
the typical features they might contain.
A.1
Surface
landform
2
1
A.2
Underground
landform
C.
Geome
affects
2
composed_of
contains
9
defined_as
has_attribute
3
1
has_cause
19
8
2
has_form
37
7
5
has_function
1
4
3
has_location
33
9
6
has_position
1
has_size
7
3
has_result
measures
studies
occurs_in_medium 3
3
1
occurs_in_time
2
2
TOTAL
108
37
23
Table 1: Co-occurrences between semantic categories and semantic relations for a sample of our
corpus
References
Faber, Pamela (2009): The Cognitive Shift in Terminology and Specialized Translation. MonTI.
Monografías de Traducción e Interpretación. 1: 107-134.
Faber, Pamela, Ed. (2012): A Cognitive Linguistics View of Terminology and Specialized Language.
Berlin, Boston: De Gruyter Mouton.
Faber, Pamela, León-Araúz, Pilar, & Reimerink, A. (2016). EcoLexicon: new features and challenges.
GLOBALEX, 73-80.
Pollak, Senja, Vavpetič, Anže, Kranjc, Janez, Lavrač, Nada, Vintar, Špela. NLP workflow for on-line
definition extraction from English and Slovene text corpora. Jancsary, Jeremy (ur.). Empirical
methods : proceedings of the Conference on Natural Language Processing 2012, 11th Conference on
Natural Language Processing (KONVENS) [September 19-21, 2012, Vienna, Austria], (Scientific series
of the ÖGAI, volume 5). Wien: ÖGAI: = Österreichischen Gesellschaft für Artificial Intelligende. 2012,
53-60.
Téléchargement