Modelling specialised knowledge with conceptual frames: the TermFrame approach to a structured visual domain representation Špela Vintar, Amanda Saksida, Uroš Stepišnik, Katarina Vrtovec {spela.vintar, amanda.saksida, uros.stepisnik, katarina.vrtovec}@ff.uni-lj.si Faculty of Arts, University of Ljubljana Extended abstract 1. Introduction The frame-based approach in terminology (Faber 2009; 2012) has provided a valuable new framework for representing specialised knowledge by combining linguistic information derived from specialised corpora with conceptual structures and by highlighting the fact that the cognitive frames underlying specialised communication are dynamic, context- and culture dependent. The TermFrame project reimplements, extends and adapts the above methodology to the domain of karstology and a set of three languages, English, Slovene and Croatian, in order to propose a new multi-purpose model of knowledge representation which can be used by linguists, terminologists, experts and data scientists alike. We describe the TermFrame annotation framework, our annotated corpora and an analysis of the conceptual frames which emerged from the semantically annotated definitions. In particular we focus on cross-language differences and the use scenarios to adequately represent and visualise complex domain-specific knowledge. 2. TermFrame corpora, definition extraction and annotation framework For the purposes of our research, we built three corpora, Slovene, English and Croatian. The corpora contain relevant, authentic and contemporary works on karstology and are comparable in the terms of domain and text types included. The corpora comprise scientific texts (scientific papers, books, articles, doctoral and master’s theses, glossaries and dictionaries) from the field of karstology, which in itself is an interdisciplinary domain partly overlapping with geology, hydrology, speleology and other fields. Definition candidates were extracted automatically using ClowdFlows (Pollak et al. 2012), then manually validated to retain only contexts with valuable explanatory information about the concept, but not necessarily compliant with the traditional analytic definition structure. Since our domain of interest is karstology, only definitions of concepts relevant to karst were considered. The TermFrame annotation framework was developed within an interdisciplinary team consisting of a domain expert, a terminologist and a cognitive linguist, and in collaboration with text mining experts. Each definition is annotated on four layers: - Definition element with the tags DEFINIENDUM, DEFINITOR, GENUS - Semantic category, a taxonomically organised conceptual hierarchy adapted to karstology with the following top-level nodes: A. Landform, B. Process, C. Geome, D. Entity and E. Instrument/Method - Relation, to mark the part of the definition where a specific property of the definiendum is described; our framework defines 16 relations such as HAS_FORM, HAS_CAUSE, CONTAINS, AFFECTS, HAS_SIZE etc. - Relation frame, this level shall facilitate Machine Learning of knowledge patterns and marks the phrase which introduces a specific type of relation. Figure 1: Annotated sentence in the WebAnno tool 3. Conceptual frames in karstology and their representation On the basis of the multi-layer semantic annotation we can discern the most frequent conceptual frames for specific semantic categories, explore cross-language differences and visualize conceptual networks directly from the annotated corpus. In the full paper we present such frames and the results of a quantitative analysis for English and Slovene. Table 1 below illustrates how cooccurrences between semantic categories and relations allow us to define prototypical definition frames. A surface landform (e.g. karren, grike, sinkhole) will be defined by specifying its FORM, LOCATION and CAUSE, while larger karst entities (geomes) are primarily defined by ennumerating the typical features they might contain. A.1 Surface landform 2 1 A.2 Underground landform C. Geome affects 2 composed_of contains 9 defined_as has_attribute 3 1 has_cause 19 8 2 has_form 37 7 5 has_function 1 4 3 has_location 33 9 6 has_position 1 has_size 7 3 has_result measures studies occurs_in_medium 3 3 1 occurs_in_time 2 2 TOTAL 108 37 23 Table 1: Co-occurrences between semantic categories and semantic relations for a sample of our corpus References Faber, Pamela (2009): The Cognitive Shift in Terminology and Specialized Translation. MonTI. Monografías de Traducción e Interpretación. 1: 107-134. Faber, Pamela, Ed. (2012): A Cognitive Linguistics View of Terminology and Specialized Language. Berlin, Boston: De Gruyter Mouton. Faber, Pamela, León-Araúz, Pilar, & Reimerink, A. (2016). EcoLexicon: new features and challenges. GLOBALEX, 73-80. Pollak, Senja, Vavpetič, Anže, Kranjc, Janez, Lavrač, Nada, Vintar, Špela. NLP workflow for on-line definition extraction from English and Slovene text corpora. Jancsary, Jeremy (ur.). Empirical methods : proceedings of the Conference on Natural Language Processing 2012, 11th Conference on Natural Language Processing (KONVENS) [September 19-21, 2012, Vienna, Austria], (Scientific series of the ÖGAI, volume 5). Wien: ÖGAI: = Österreichischen Gesellschaft für Artificial Intelligende. 2012, 53-60.