doi:10.1093/ijl/eci042 COLLOCATION, COLLIGATION AND ENCODING DICTIONARIES. PART I: LEXICOLOGICAL ASPECTS Dirk Siepmann: Universita«t-GH Siegen, Fachbereich 3, Adolf-Reichwein-Strae, D-57068 Siegen,Germany ([email protected]) Abstract This article attempts to synthesise recent advances in collocational theory into a coherent framework for lexicological theory and lexicographic practice. By posing a number of fundamental questions related to the definition of collocation, it critically reviews frequency-based, semantic and pragmatic approaches to collocation. It is found, among other things, that two types of collocation, namely ‘long-distance’ collocation and collocation between semantic features, have suffered almost total neglect. This leads to suggestions for a new division of the collocational spectrum and for a revised definition of ‘collocation’ based on the notions of ‘usage norm’ (Steyer 2000) and ‘holisticity’ (Siepmann 2003). It is argued that this new view of collocation considerably widens the dictionary maker’s brief, since future lexicography will have to provide a full account of both structurally simple and structurally complex units, including fixed expressions of regular syntactic-semantic composition (see Part II of this article, to be published in the March issue of this journal). 1. Introduction According to modern science, there is no such thing as ‘independent existence’; at least since the advent of chaos theory, there has been full recognition that all forms of life and material phenomena, whether at the micro-level or at the macro-level, are interdependent. In linguistics, this realization has found its fittest expression in the idea of linguistic rather than literary ‘intertextuality’, whereby the meaning of one text and its constituent elements depends on millions of other texts using similar or identical elements. Textual meaning is thus created by the interplay of two types of repetition, viz. (a) collocation (in the largest possible sense, including colligation1 and phraseology) and (b) cohesion. It turns out that one instance of collocation and the entire language are mutually illuminating, since the instance is understood in terms of International Journal of Lexicography, Vol. 18 No. 4 ß 2005 Oxford University Press. All rights reserved. For permissions, please email: [email protected] 410 Dirk Siepmann the whole, and the whole in terms of the instance (cf. Hunston 2001: 31); taking this a bit further, we might say that not only is each pattern necessary for comprehending the sum total of similar patterns, but each pattern is also a miniature version of that sum total, as shown by the fact that the meaning of individual patterns (e.g. German ‘sonniges Gemüt’ [‘sunny disposition’ ¼ irrepressible high spirits] vs ‘sonnige Lage’ [sunny location]), even if shorn of any context, is evident to the native speaker. This relatively recent view of meaning creation (Hoey 1991, 1998, 2000, Feilke 1994, 1996) seems much more in keeping with speakers’ intuitive knowledge about language than was the case in earlier structuralist theories. The latter tended to assume that expressions such as ‘sonnige Lage’ have both a compositional, literal meaning and a non-compositional, figurative meaning (Feilke 1996: 128). In an intertextual or socially-based view of meaning creation, the compositional meaning is exposed for what it is, namely an abstraction of the linguist which has no base in the native speaker’s mental lexicon; the expression ‘sonnige Lage’ is then considered to be a ‘holistic’ sign that is irreducible to the sum of its parts. In a related development, computational and cognitive linguists have used corpus-linguistic insights to work out models of language grounded in actual usage rather than abstract general rules (Chandler 1993, Croft and Cruse 2003, Skousen 1989). In these models word or clause formation is by analogy with existing exemplars, and it will be seen that such models can also be applied to collocation. This article reviews, one by one, the various defining criteria that have in the last half century been called upon to define the notion of collocation, pursuing a dual objective: (a) to show that none of these criteria apply in all cases, so that we can at best give a prototypical definition of collocation, and (b) to demonstrate that the problems associated with the definition of collocation stem from the mechanistic, old-paradigm view of language embodied in structuralist theories which try to impose theoretical abstractions on an infinitely complex reality arising from communicative interaction and the institutional practices such interaction puts in place. This will then allow us to provide a more secure and more broadly based underpinning for the treatment of colligation and collocation in lexicography. With the exception of Steyer (2000), no such model has as yet been proposed. The subject of collocation has been approached from two main angles: on one side are the semantically-based approaches (e.g. Benson 1986, Mel’čuk 1998, González-Rey 2002, Hausmann 2003, Grossmann and Tutin 2003) which assume a particular meaning relationship between the constituents of a collocation; on the other is the frequency-oriented approach (e.g. Jones and Sinclair 1974, Sinclair 1991, Sinclair 2004, Kjellmer 1994) which looks at statistically significant cooccurrences of two or more words. This theoretical distinction is paralleled by a geographical divide: the semantic approach has its Collocation, Colligation and Encoding Dictionaries 411 origins in continental European research into phraseology, while the frequency approach is firmly rooted in British contextualism. There has until now been surprisingly little exchange between the two groups, and when the semanticist Hausmann (2003) claims to have won the war over collocation, one wonders if that war has ever been fought. A third, more recent approach to phrasemes and collocations (Feilke 1996, 2003) might be termed ‘pragmatic’, since it claims that the structural irregularities and non-compositionality underlying such expressions are diachronically and functionally subordinate to pragmatic regularities determining the relationship between the situational context and linguistic forms. In this view, collocation can best be explained via recourse to contextualisation theory (Fillmore 1976). In what follows, I shall argue that there is no reason to resort to the military metaphor, let alone go to war on matters of collocation. It is much wiser to unify the three approaches. Tersely stated, I shall argue the following theses: (1) Only the frequency-based approach can provide a heuristic for discovering the entire class of co-occurrences; in a way, it is safe from refutation, but empty – it gives us all the raw material, but tells us nothing about how this material came to be or how it is to be structured; it has also resulted in lexicographic products of doubtful value, such as Kjellmer (1994) and Sinclair (1995) (cf. Hausmann 2003: 319–320, Siepmann 1998). (2) By contrast, the semantically-based approach is fragmentary – it cannot account for all possible cases. It would nevertheless seem absurd to abandon such an intuitively appealing approach at the first appearance of a counterexample, since it has given rise to reliable collocational dictionaries such as Langenscheidts Kontextwörterbuch Französisch-Deutsch. (3) Likewise, as I shall explain below, a purely pragmatic approach relying on the extralinguistic context cannot explain a large number of co-occurrences operating at the level of semantic features. (4) It follows from this that the debate between the various approaches is a more/less rather than a yes/no issue. What is needed is an extension of the semantically-based approach that will take account of strings of regular syntactic composition which form a sense unit with a relatively stable meaning. ‘Lexical bundles’ (Biber et al. 1999) such as je sais que c’est or it’s been will not be included among the class of collocations (cf. Siepmann 2003). Although such sequences may perform similar or identical functions across a range of texts, they have no meaning ‘by themselves’. In sharp contrast, there are good theoretical and practical reasons for subsuming under the notion of collocation such colligational patterns as regarde où tu vas, dans les colonnes de (þ name of newspaper or magazine) or si elle est prise à temps (referring to an illness), which have so far been regarded as free sequences of words subject only to general rules of syntax and semantics. 412 Dirk Siepmann For greater expository convenience, the various questions raised by the discussion of the above theses will be broken down under five separate heads: (1) (2) (3) (4) How many elements make a collocation? What elements make a collocation? Are collocations arbitrary? Can we distinguish between collocations and phraseology on the one hand, and collocations and free combinations on the other? (5) Are collocations monosemous and monoreferential? Are there synonymic collocations? This will lead to a division of the collocational spectrum into four major categories, all of which have their role to play in the making of dictionaries, especially those aimed at the non-native speaker. My theoretical arguments will be leavened with a large number of concrete examples encountered during the ongoing compilation of three unabridged bilingual thesauri intended mainly for non-native speakers of English, French and German (the ‘Bilexicon’ project). All of these examples have been drawn from the following authentic sources (for a detailed account of corpus construction, see Siepmann 2005): (a) electronic editions of wide-circulation quality newspapers and news magazines (The Times, The Guardian, The Economist, Le Monde, Le Monde Diplomatique, Süddeutsche Zeitung, Frankfurter Rundschau, Der Spiegel ); (b) a large corpus of academic texts produced from reviews, journal articles, doctoral theses and portions of books; (c) 50-million-word corpora of fiction and fan fiction freely available on the Internet; (d) a 100-million word corpus of the language of motoring based mainly on Internet sources. Table 1 gives a breakdown of the sources used by corpus type, content, size, baseline year and analysis software. 2. How many elements make a collocation? It is accepted wisdom among European researchers that collocations are binary units, and this is probably true for the majority of the class. Thus, the most common type of collocation is the combination of a noun with a verb, and there are hundreds of thousands of examples which confirm this point of view (e.g. take a step, launch an appeal ). Mel’čuk (most recently 2003) argues that the constituents of such collocations tend to be linked by a standard lexical function, such as Magn (rely on [Magn] ¼ heavily, beautiful Table 1: Corpora used in this study Type Content Word Count Baseline Year Corpus of Academic English (CAE) Corpus of Academic French (CAF) Corpus of Academic German (CAG) Corpus of English Fiction (FE) Corpus of French Fiction (FF) Corpus of German Fiction (FG) Corpus of English Motoring (CME) full-text reviews, journal articles, doctoral theses and portions of books reviews, journal articles, doctoral theses and portions of books reviews, journal articles, doctoral theses and portions of books reviews, journal articles and portions of books from CAE reviews, journal articles and portions of books from CAF reviews, journal articles and portions of books from CAG Internet forums and chatrooms, electronic magazines, transport sites, encyclopaedia and dictionary articles Issues of The Times, The Guardian and The Economist, published in London 30 million full-text full-text full-text and samples full-text and samples full-text and samples full-text and samples British Newspapers and full-text News Magazines (NE) 50 million 1980 (less than 5% of texts predate 1980) 1980 (less than 5% of texts predate 1980) 1980 (less than 5% of texts predate 1980) 1980 50 million 1980 50 million 1980 100 million 1995 100 million 1990 30 million 30 million (Continued) Collocation, Colligation and Encoding Dictionaries Corpus (Abbreviation) 413 414 Dirk Siepmann Table 1: Continued Corpus (Abbreviation) Type Content Word Count Baseline Year French Newspapers and News Magazines German Newspapers and News Magazines (NG) full-text Issues of Le Monde and Le Monde diplomatique, published in Paris Issues of Süddeutsche Zeitung, Frankfurter Rundschau and Der Spiegel, published respectively in Stuttgart, Frankfurt and Hamburg 100 million 1990 100 million 1990 full-text Collocation, Colligation and Encoding Dictionaries 415 [Magn] ¼ drop-dead2). Furthermore, as Hausmann (2003), Siepmann (2003, 2004) and Schafroth (2003) have argued, many three-element collocations can be shown to be reducible to a binary structure: (1) allgemeine Gültigkeit haben -4 (allgemein þ Gültigkeit) þ haben hohes Ansehen genießen -4 (hohes þ Ansehen) þ genießen ulcère gastrique bénin -4 (ulcère þ gastrique) þ bénin prendre une bouffée d’air -4 (air þ bouffée) þ prendre joli petit cul -4 (cul þ petit) þ joli not wildly original -4 (original þ wildly) þ not The same goes for combinations of multi-word idioms and other items; consider for example his plan came to fruition or their disagreement brought them to blows. Some of these three-element collocations have a higher frequency of occurrence than their constituents, which suggests that they are learned and reproduced as wholes rather than recombined each time, but this presents no serious challenge to the view of collocation as a binary phenomenon. More threatening to this view are irreducible three-element collocations such as the following: (2) the car holds the road well (?holds the road [may be used of tyres]) -4 la voiture tient bien la route/tient la route (meaning either ‘holds the road well’ or ‘stays on course’) -4 der Wagen hat eine gute Straßenlage (*hat eine Straßenlage) the car has too wide a turning circle -4 la voiture braque mal -4 der Wagen hat einen zu großen Wendekreis In two of the languages under consideration the three-element collocation cannot be broken down into what seem to be its two major constituents. Thus, while it is perfectly possible to single out gute Straßenlage as one constituent of the German collocation, the word combination *eine Straßenlage haben is inadmissible in German.3 It is also pertinent to note that the English collocation does not appear to have a negative counterpart (a search for hold the road poorly/badly on Google yields no results), whereas the opposite is true of the French collocation, where the adverb is optional and a negative wording appears admissible (la voiture tient mal la route). Other examples of this type include: (3) avoir un geste déplacé (FF) -4 (?)avoir un geste recevoir un accueil chaleureux -4 (?)recevoir un accueil take a harder line (against) (NE) -4 (?)take a line (against) shall I break this note into something smaller (NE) den Kasten sauber halten (NG) (football) 416 Dirk Siepmann Once we have grasped this concept of the three-element collocation, it is easy to see that many binary word combinations which have traditionally been regarded as free (such as accepter des pie`ces) are in fact embedded in larger structures of a collocational nature, such as the following three-element collocations with a non-human object: (4) the pay and display machine (parking meter, etc.) only takes twenty cent coins -4 l’horodateur (le parcmètre, etc.) n’accepte que des pièces de 20 centimes -4 der Parkscheinautomat (die Parkuhr usw.) nimmt nur 20 Cent-Stücke the battle (war, etc.) claimed many casualties -4 la bataille (la guerre, etc.) a fait beaucoup de victimes -4 die Schlacht (der Krieg, etc.) hat viele Opfer gefordert cette expérience a marqué ma vie -4 dieses Erlebnis hat mein Leben geprägt The list of such examples could be lengthened. With collocations such as hold the road (subject: tyre), tomber à gros flocons (subject: neige), emporter la conviction (subject: argument) or eine Kurve machen (subject: Straße), it would clearly seem difficult to identify a standard lexical function (in the sense of Mel’čuk) that can provide a systematic link between the verb and the noun; this is because the entire collocation is semantically dependent on a specific subject. The English translation of the German collocation eine Kurve machen, where the prepositional phrase road is a standard postmodifier of bend (Kurve ¼ bend in the road or bend ), shows how closely the two concepts4 are connected: (5) die Straße macht hier eine Kurve (NG) -4 there’s a bend in the road here (CME) Likewise, current theorizing on collocation does not make allowances for the relationship between collocation and verb complementation, or ‘valency’. Thus, (auto)route þ filer (literally: ‘road’ þ ‘rush’) may well be considered a collocation in Hausmann’s and Mel’čuk’s theories, but this disregards the fact that the collocation itself requires a particular verb pattern including a locative element (l’autoroute file vers la valle´e, à gauche, etc.); in other words, it cannot be used with all the verb patterns entered by filer (cf. *l’autoroute file, *l’autoroute file à toute allure). A related case is that of the German collocations ein Kind schenken and ein Kind machen, where the former can only be used with a female subject and the latter only with a male subject. In other words, collocation and verb complementation are intimately related, since many noun-verb collocations require a specific distribution of semantic roles. Collocation, Colligation and Encoding Dictionaries 417 Clearly, then, the two-word combination (auto)route þ filer cannot possibly be viewed as a fully-fledged collocation. Evidence is also gathering of three-word collocations one of whose constituents is delexicalised, and hence redundant. Kenny (2003: 343) cites the German phrase die Augen weit aufreißen, where the semantic feature ‘wide open’ (= weit) is included in the meaning of the verb aufreißen. Such delexicalisation has long been observed in so-called ‘support verb constructions’ (take a decision), but it seems to be just as common in other types of collocation. The evidence of such examples points to the conclusion that multi-word collocations cannot always be split up into two basic constituents, and that collocations consisting of three items or syntactic ‘slots’ are in fact quite common. This is particularly true of collocations involving neither a human subject nor a human object, such as expe´rience þ marquer þ vie. Strictly speaking, then, we would not be entitled to define collocations as binary units, as do Hausmann and Mel’čuk, unless we are willing to adopt a very broad prototypical definition. From the vantage point of practical lexicography, then, it is preferable, and to some extent already established practice, to record tripartite lexical units even where binary units could be justified (e.g. schmal geschnitten -4 die Hose ist schmal geschnitten [NG]), since dictionary users could be led astray if such information were missing. 3. What elements make a collocation? This section starts by discussing the distinction made by European collocation scholars between semantically dependent ‘collocates’ and semantically autonomous ‘bases’, or nodes. It goes on to show that this distinction, as well as the related assumption of directionality in collocational attraction, is not applicable in a great many cases, and that a large number of word combinations, notably long-distance collocations, operate at the level of semantic features rather than lexemes. This leads to suggestions for a new typology of collocations. 3.1 The autonomous/dependent distinction At the heart of collocational theory is the assumption that the constituents of the collocation differ in what Hausmann (1999: 122ff.) calls their ‘semiotactic’ status: an ‘Autosemantikon’, or semantically autonomous lexeme such as decision or disaster functions as the base, which co-occurs with an arbitrarily selected, semantically dependent ‘collocate’ (‘Synsemantikon’) such as take or unmitigated. Intimately connected with this is the idea that the collocate 418 Dirk Siepmann Table 2: Hausmann’s distinction between free word combinations and collocations semantically autonomous þ semantically autonomous ¼ free word combination semantically autonomous þ semantically dependent ¼ collocation he likes money look at the sea! he prefers fish to meat money þ withdraw decision þ take clouds þ scudding takes on a meaning peculiar to the collocation. Diagrammatically this can be represented as in Table 2. This definition is echoed in meaning-text theory (cf. Mel’čuk 1998), albeit in slightly different terms: ‘A collocation AB of language L is a semantic phraseme of L such that its signified ‘‘X’’ is constructed out of the signified of one of its two constituent lexemes – say, of A – and a signified ‘‘C’’ [‘‘X’’ ¼ ‘‘AþC’’] such that the lexeme B expresses ‘‘C’’ only contigent on A.’ (Mel’čuk 1998: 30) The dependency relation between B and C covers four types of collocations (cf. Mel’čuk 1998: 30–31): (a) ‘C’ 6¼ ‘B’, that is, B does not have the corresponding meaning in the lexicon, and a) ‘C’ is empty, that is, the lexeme B is a delexical support verb selected by A [e.g. give (s.th.) a vacuum, take a decision, porter un jugement) or b) ‘C’ is not empty but the lexeme B expresses ‘C’ only in combination with A [e.g. black coffee, bie`re bien frappe´e] (b) ‘C’ ¼ ‘B’, that is, B has the corresponding meaning in the lexicon, and a) ‘B’ cannot be replaced by any synonym when it appears in conjunction with A [e.g. strong coffee rather than *powerful coffee, heavy smoker] or b) ‘B’ includes the meaning ‘A’ (e.g. rancid butter, artesian well ) 3.2 Criticism ofthe autonomous/dependent distinction There are four main problems inherent in the autonomous/dependent distinction; let us expound these in greater detail. 3.2.1 Semantic autonomy vs semantic dependency. The dividing line between semantically autonomous and semantically dependent words is hazy and not clearly defined (cf. Brauße 1992). For some linguists, it runs parallel to the boundary between word classes, with items able to function as sentence Collocation, Colligation and Encoding Dictionaries 419 constituents (nouns, verbs and adjectives) on one side, and words with a morphological or syntactic function (articles, prepositions, etc.) on the other. Other scholars assume that the boundary cuts across different parts of speech; according to them, a noun such as scholar is semantically autonomous, whilst a noun like member is semantically dependent on its linguistic environment (e.g. party member, family member). Yet others (e.g. Lutzeier 1981) go so far as to claim that there are no criteria at all allowing us to differentiate between words that have lexical content and those that do not. Indeed, words that have been intuited as semantically dependent by collocation scholars may, on inspection, turn out to be semantically autonomous (see 2.2.3 below). 3.2.2 Collocations ofregular syntactic-semantic composition. As seen in Section 1, the collocational character of seemingly free combinations such as accepter des pie`ces (‘take coins’) only comes to light if the wider context is taken into consideration. Similar considerations hold true for other types of combinations involving items with the same or a similar semiotactic status; here are a few typical examples: (6) I’ve got grease all over my shirt. (FE) regarde où tu vas! (FF) (¼ pass auf, wo du hintrittst; watch where you are going/stepping!, watch where you put your feet!) I didn’t bring the car (FE) look at the time! (FE) From the perspective of structuralist linguistics, such sentences would be considered composite units whose meaning is the sum total of the literal meaning of its constituents; in other words, they would be viewed as falling within the scope of the open-choice principle. On inspection, however, they turn out to be semi-phrasemes (i.e. collocations). Three main reasons can be advanced for this: firstly, they are clearly not idioms, since they are immediately comprehensible to anyone who is familiar with their basic constituents; thus, the first example can be analysed as follows: [subject] þ have got þ [object] þ [locative]. Secondly, it is evident that the ‘literal’ meaning of the first sentence could only be construed as referring to a shirt every square inch of which was entirely smeared with grease, but, of course, this is not what the sentence means to a native speaker, who will take it to mean that only part of the shirt’s surface has been stained.5 Thirdly, the same meaning could be expressed quite differently in another language such as German: ich habe mein Hemd mit Fett beschmiert/mein Hemd ist voller Fett/mein Hemd ist ganz fettig. What we are dealing with, then, is an instance of a collocational framework (Renouf and Sinclair 1991) or, more precisely, a type of colligation, that is, a recurrent grammatical pattern that is lexically restrained: have got þ [liquid, crumbs, etc.] þ on/all over [item of clothing, body, body part]. 420 Dirk Siepmann Table 3: Translational equivalences at different levels Seemingly free combination Collocation/idiom 1. den Rock enger machen 2. on a clear motorway / sur (une) autoroute dégagée 3. einen Unfall nach dem anderen bauen [‘have one accident after another’] colligation 4. his attempt on the (NP: mountain / record) free combination of morphemes 5. Freizeit-(N), Hobby-(N), Gelegenheits-(N) [Gelegenheitsdichter, Freizeitmaler, Hobbykoch] take in the skirt auf freier Strecke (alongside: auf einer freien Autobahn) collectionner les accidents colligation 1 collocation sein Versuch, [Berg] zu bezwingen / [Rekord] zu brechen colligation N à ses heures [ poe`te, peintre, cuisinier à ses heures] Similar observations can be made for the second example, where the interlingual equivalents clearly show that the phrase is idiomatically constrained. The standard German translation uses two entirely different and more specific verbs (regarder -4 aufpassen (¼ ‘pay attention’), aller -4 hintreten (= ‘step [somewhere]’). This kind of finding links up with Hausmann’s (1997) claim that ‘everything in language is idiomatic’ and with Hunston’s (2001) investigation into colligation, which shows that even grammatical strings of a fairly random nature may carry a particular semantic prosody. Thus, the sequence NP may not be a(n) NP is used as a signal of concession commonly followed by a contrasting clause introduced by but (Hunston 2001: 24). This is also obvious from such interlingual correspondences as those given in Table 3. These examples show that translational equivalence can usually be achieved at the level of ‘constructions’ (in the sense of Fillmore). Probably the most frequent case is the rendition of one construction type by the same type in another language (e.g. espionner, c’est attendre; to spy is to wait; spionieren heißt warten); it is by no means uncommon, however, to find one construction type translated by another. Thus, equivalences 1–3 of Table 3 can be accounted for in terms of a shift from an English complex and schematic construction, whose rules of semantic composition are fairly general, to a German complex and substantive construction, whose rules of semantic composition are more specialized (for a listing of construction types, see Table 4). The French phrase Collocation, Colligation and Encoding Dictionaries 421 Table 4: The syntax-lexicon continuum (Croft/Cruse 2003: 255) Construction type Traditional name Examples Complex and schematic syntax complex, substantive verb subcategorization frame idiom morphology syntactic category word/lexicon SBJ be-TNS Verb –en by OBL SBJ consume OBJ complex and substantive complex but bound atomic and schematic atomic and substantive kick-TNS the bucket [NOUN-s], [VERB-TNS] [DEM], [ADJ] [this], [green] sur (une) autoroute de´gage´e (example 2), where the indefinite article is optional, shows how increased use may result in greater fixity and brevity, in other words, in ‘phraseologicization’ (cf. German Porsche fahren alongside einen Porsche fahren, or French sur chausse´e mouille´e alongside sur une chausse´e mouille´e). Equivalence 4 is remarkable as demonstrating that mainly schematic constructions in one language may correspond to combinations of schematic and substantive constructions in another. Even stronger support for the notion of different construction types comes from such equivalences as 5, where a complex but bound construction in German corresponds to a complex and schematic construction in French. 3.2.3 Contingent meaning. The autonomous/dependent distinction presupposes that, in the words of Mel’čuk (1998: 31), ‘the problem of the lexicographic description of lexical units is an independent problem that has to be solved . . . prior to any discussion of phraseology’. Thus, Mel’čuk seems to assume that the meaning of the adjective rancid, which occurs in the noun-verb collocation rancid butter, can only be defined with reference to butter. This assumption is, however, belied by even the briefest corpus enquiry; it is found that the adjective itself has a wide combinatorial range, which divides into two separate meaning groups, viz. (a) food, butter, bacon, milk, meat, cream, fat, grease, flour, wheat, oil, chocolate; smell, odour, aroma; socks, sweat; water and (b) atmosphere, sentiment, academics, affair, show, humour, prune. This shows that the adjective has at least two metonymically related meanings of its own which might be glossed respectively as ‘(of food) having a rank smell or taste as the result of decomposition or chemical change’ and ‘(of people or things) having vile, revolting, obnoxious qualities’; these two meanings would have to be recorded in the dictionary. Similar analyses have been proposed for other seemingly ‘unique’ collocations of Mel’čuk’s type 2(b), such as schütteres Haar (‘thin hair’; Steyer 2003: 107), with the same results. Another reason why 422 Dirk Siepmann lexical entries cannot simply be presupposed as given is that some nouns simply do not have any meaning in isolation. One example cited by Feilke is German Lage (‘situation’), and the same goes for its standard English and French equivalents. The French collocation situation þ faire (‘la situation faite aux protestants’) could therefore be said to consist of two semantically empty items, and yet the combination of the two yields a meaningful collocation. 3.2.4 Collocation of semantically autonomous items. Even if we assume that a sharp line can be drawn between content words and ‘delexical’ words, there remain numerous examples of collocations made up of two semantically autonomous items (printed in bold below), some of which have interlingual relevance: (7) an empty parking space (or: a vacant parking space) -4 un emplacement libre -4 ein freier Parkplatz (cf. ein leerer Parkplatz ¼ an empty/deserted car park) a quiet drink (hypallage) -4 (cf. prendre un verre en toute tranquillité) -4 (cf. the idiom: in Ruhe einen trinken) (have) cold feet (in the non-figurative sense) -4 (avoir) les pieds gelés / glacés (cf. also: avoir froid aux pieds) -4 kalte Füße (haben) to stop for petrol (for coffee, for a pee) -4 (free combination: s’arrêter pour faire le plein) -4 (free combination: anhalten um zu tanken) to tell a joke -4 faire une blague -4 einen Witz erzählen The first example shows that English distinguishes between ‘free’ (¼ free of charge) and ‘empty’ (¼ unoccupied) parking spaces. The second example illustrates a case of ‘frozen’ hypallage: the semantic features of the adjective quiet are incompatible with the noun drink; it is the situational context in which the drink is taken that would normally be described as ‘quiet’.6 The third example demonstrates that French cannot use the adjective froid attributively when reference is made to parts of the body. The fourth example illustrates equivalences between seemingly free combinations in German and fixed expressions in English. Although there is a small number of variants in evidence, we cannot assume compositionality here. The fifth example is interesting in that there are synonymic collocations where the verb would be regarded as semantically contingent on the noun: crack/make a joke. 3.3 Collocations of verbs with locative prepositional phrases Once we have realised that there are too many exceptions to the definition of collocation as a combination of items with a distinct semiotactic status, it becomes evident that a large number of other lexical units should be classified Collocation, Colligation and Encoding Dictionaries 423 as collocations. A clear example is afforded by combinations of a locative prepositional phrase with a verb: (8) to hide behind the curtain (cf. also ‘to be a curtain twitcher’) -4 guetter derrière le rideau -4 hinter dem Vorhang stehen (sich hinter dem Vorhang verschanzen) to wipe out on the bend/to go out of control on the bend/to be unable to stay on its own side of the road -4 se déporter dans le virage -4 aus der Kurve getragen werden There are two main reasons for including such items among the class of collocations. One is that they are both cognitively and semantically similar to noun þ verb collocations of the type trim þ hedge, serrer þ vis or Hörer þ abnehmen. In the case of the latter, the verbs (trim, serrer, abnehmen) describe an action that is typically performed with the object in question; similarly, verbs such as guetter and se de´porter designate actions that typically occur in particular places: nosy neighbours make a habit of hiding behind the curtain, and speeding drivers run the risk of losing control of their vehicles on a bend. The second reason is that these word combinations tend to be interlingually unpredictable (cf. the above examples), making them prime sources of difficulty for second-language learners. 3.4 Directionality A related problem is the assumption of directionality (Hausmann 1979) or of a hierarchical relationship between the constituents of the collocation (González-Rey 2002), whereby the selection of the collocate is contingent on the prior selection of the base. While this is more or less obvious with items such as table þ lay/set or money þ withdraw, we have already seen above that examples such as road þ hold cast serious doubt upon the validity of the theory. Hartenstein (1996: 95) cites counterexamples of the type he`re þ pauvre (‘poor wretch’) where the noun cannot be viewed as semantically autonomous since it has no referent in present-day French. In similar vein, Scherfer (2002) notes that even such textbook examples of collocational theory as ce´libataire þ endurci (‘confirmed bachelor’) may be viewed as bidirectional, since the adjective endurci combines with any noun carrying the semantic feature [þ figé dans son comportement]: criminel, catholique, Parisien, etc; it is monosemous, semantically autonomous and just as clearly defined as the noun ce´libataire. Similar considerations hold for adjectives such as crowded or busy in combination with nouns like street, road or square. Another example of this is the French adjective sauf, as witness the concordance given in Table 5 (cf. Siepmann 2003). 424 Dirk Siepmann Table 5: sauf/sauve 1 honneur, morale, etc. être assuré, c’est que son univers est mique du pays. Le consensus social est euve. L’honneur des Bafana Bafana est int. Mine de rien, l’honneur a été silence. Le conservatisme ambiant est ité fait sa force. Mais la morale est le conseil de guerre. La morale est udrier incarne le péché. La morale est (XO de préférence). La tradition est CERTES, toutes les apparences sont sensibles. Certes, les apparences sont i les apparences de la démocratie sont sauf, et qu’en dépit de bien des zig sauf, les exportations allemandes ne sauf, leurs séances d’entraı̂nement a sauf, on a terminé septième sur neuf sauf. Dominique Lecourt évoque en sauve ; le bon sens aussi : ’’ Vivre sauve : les innocents seront punis e sauve puisque l’auteur châtie son hé sauve ! Pour l’amateur de Oolong . . . P sauves, et on peut mettre au crédit sauves, et le parti sorti vainqueur sauves, la guérilla est quand même The syntagm we are dealing with here can be formalised as NP (abstract) þ eˆtre þ sauf. From a directional point of view, the adjectival collocate sauf would only take on its full meaning through the presence of the semantically independent noun phrase (e.g. l’honneur des Bafana Bafana) (cf. Hausmann 1979: 191–192). In the present case, however, this argument does not hold water. The adjective sauf is as sharply defined as honneur, apparence, tradition and morale, and it is the adjective that is the invariable factor in the equation. This becomes even more apparent if we look at English or German translations of the phrase, which use the clearly delimited verbs keep up/save and wahren/retten respectively: (9) les apparences sont sauves -4 appearances are kept up -4 der Schein ist gewahrt la république était sauve -4 the republic had been saved -4 die Republik war gerettet 3.5 Collocation between semantic features Taking this one step further, I would like to suggest that dependencies exist not merely between lexical units, but also between semantic features. Consider the examples in Table 6. As can be seen from these examples, the French lexical units (in the sense defined by Cruse (1986) mordre sur (1) (‘veer off course onto/into’) and mordre sur (2) (‘cut into’, ‘overlap with’) impose severe lexical constraints on the choice of subject and (prepositional) object: mordre sur (1) takes a subject designating a vehicle and an object designating a part of the road, mordre sur Collocation, Colligation and Encoding Dictionaries 425 Table 6: Typical linguistic environments of the French verb mordre (sur) subject (semantic field: vehicle) verb un car un bus une voiture subject (semantic field: region) mord (sur) (1) trois villages dont le territoire la région urbaine de Lyon sa bordure méridionale verb mord (sur) (2) prepositional object (semantic field: part of the road) le côté la ligne blanche la voie opposée prepositional object (semantic field: region) Jérusalem les départements de la Loire, de l’Ain et de l’Isère le continent africain (2) requires both the subject and object slots to be filled by items denoting areas (mainly geographical areas or parts of the body). The question then arises whether the relationship between subject and object can be best captured in terms of selectional restrictions inherent in the verb or in terms of collocational restrictions operating across the entire phrase (verb þ two nouns). To resolve this question, we may turn to Cruse’s (1986: 278–279) distinction between selectional and collocational restrictions. Cruse defines selectional restrictions as being logically necessary: according to him, it is logically necessary for the subject of the verb die to carry the semantic traits ‘organic’, ‘alive’ and ‘mortal’. It is different with kick the bucket, which, although identical in meaning to die, arbitrarily requires a human rather than an animal subject (*the horse kicked the bucket vs the horse died ). Following Cruse, we would be entitled to consider the above example as an instance of collocational rather than selectional restriction. Firstly, there are no logical constraints on the subjects of mordre sur (1) and (2), whose meaning is simply glossed as ‘empiéter sur’ (‘overlap into’, ‘eat into’) in the Tre´sor de la Langue Française; indeed, mordre sur occurs with a wide range of subjects and objects in a more general sense: (10) les luttes politiques, religieuses et morales, les activite´s de parti, l’agitation e´lectorale, le fait que les associations croissent de façon excessive, tout ceci . . . mord sur le temps de de´tente (‘all this . . . takes up a lot of our spare time’) je ne voudrais pas mordre sur le temps des questions (heard in a lecture) (‘I don’t want to take up any of the time reserved for questions’) 426 Dirk Siepmann plus nous vivons dans les signes, et moins les choses mordent sur nous (‘less things will affect us’) sans jamais leur (aux lois, D.S.) permettre de mordre sur son esprit (‘never allowing them to affect one’s mental state’) le nazisme a mordu sur une large tranche du prole´tariat (‘many workingclass people were drawn to Nazi ideology’) une abstraction qui mord sur le re´el (‘an abstraction which is close to reality’) (all examples except the second from NF) Secondly, there is a mutual dependency between the subject noun phrase and the object noun phrase in that (e.g.) a subject noun phrase denoting a vehicle will entail an object noun phrase designating a part of a road, and vice versa. We are thus dealing with collocation between certain semantic properties rather than between specific lexical items. Again, as with the example of autoroute þ filer þ locative discussed above, we have a three-slot collocation mixing collocational attraction and valency: vehicle þ mordre (sur) þ locative(part of a road).7 Valency theory does not make allowances for collocational constraints of such a specific nature, as it posits only three levels of semantic restrictions, the ‘highest’ of which is selectional restrictions of the type [þ human] (cf. Blank 2001: 238). Collocation thus turns out to have a paradigmatic as well as a syntagmatic dimension, with an entire semantic set (body part, region) - rather than a clearly delimited lexical set (tousled þ 1. hair 2. mane) - sharing the same syntagmatic environment.9 The case for collocation between semantic features is strengthened further when we look at adjectival collocations. A fine example is provided by cooccurrences of the adverb beautifully with participial adjectives such as carved, draped, drawn, restored, etc. The verbs on which these participial adjectives are based share a common semantic feature in describing artwork or craftwork. Thus, there is a lexical dependency between a specific semantic feature and a lexeme.10 The list of such examples could be lengthened. To take but one more case, the adjective bad and the adverb badly co-occur significantly with a semantic feature which can be glossed as ‘physical imperfection’; thus, we have: (11) I never had a bad chest he’s had a bad concussion Never had a bad cough, not even a sniffle. He had a bad heart. Hole in the left ventricle. He stuttered badly. (all examples from FE) Note that a distinction could be made between two types of collocation here, viz. (a) words which share the semantic feature ‘bad’ (concussion, cough, stutter, Collocation, Colligation and Encoding Dictionaries 427 limp) and (b) words which require the adjective to add the notion of ‘badness’ (chest, heart). It is important to reiterate that many such collocations between semantic features and lexemes are bidirectional. With a collocation such as beautifully carved it is perfectly conceivable that speakers begin by encoding the type of craftwork involved, but it is equally likely that they are awe-struck by the sheer beauty of a painting or other work of art, and the first thing that comes to their minds is an adverbial expression of the concept of beauty. This latter hypothesis is also borne out by the high frequency of the unspecific collocation beautifully done, which does not specify the type of work involved. The notion of beauty would seem to be just as semantically or cognitively autonomous as that of craftwork, so that the collocation should be regarded as bidirectional or even as one conceptual unit. Similar but less regular collocational dependencies have been observed by Grossmann and Tutin (forthcoming), Mel’čuk and Wanner (1996) and L’Homme (2003). These authors prefer to analyse such regularities in terms of ‘semantic classes’. In weighing the two analyses, my judgement is that the assumption of semantic features is more consistent, especially if long-distance collocations (Siepmann 2003; 2005) are taken into account. By long-distance collocations are meant lexical dependencies which manifest themselves over considerable stretches of text. A convenient illustration is provided by the topic initiator turning to, which is commonly followed at some distance by informers such as I/we þ find/see/note or it appears that: (12) Turning to the use of semi-auxiliary is to/are to in if-clauses, we find that a fifth of the instances in the sample (and 1340 in the corpus as a whole) appear in this syntactic environment. In this respect the speech of younger British speakers appears to be following the lead of American English. Turning to the speech of older speakers, we note some words which are suggestive of hesitation, uncertainty or turn manipulation: well, mm, er. The corresponding Middle High German forms are fuoss, füesse; mus, müse. Modern German Fuss: Füsse, Maus: Mäuse are the regular developments of these medieval forms. Turning to Anglo-Saxon, we find that our modern English forms correspond to fot, fet; mus, mys. Turning to requirements involving both age plus service, it appears there has been an increase in the propensity of participants to have normal retirement available at age 62 with a combination of years of service. (all examples from CAE) A similar phenomenon can be observed with the marker of comparison any more than. This marker, which introduces a 428 Dirk Siepmann subordinate clause, is always preceded by the negative particle not in the main clause: (13) Not all women are ‘carers’ any more than all women are ‘victims’ or ‘contractors’. (CAE) Such examples could be multiplied; they force us to recognise that, in order to account for at least some collocational links, it is necessary to abandon the four-word span on either side of the node which Sinclair (1991) postulates as the cut-off point for collocational significance because 95 per cent of collocational attraction occurs within this span (Jones and Sinclair 1974: 21f.). Sinclair’s idiom principle should therefore perhaps be revised to accommodate ‘long-distance’ collocations entered by multiword markers; I propose the following restatement of the idiom principle for written text: One of the main principles of the organisation of text is that the choice of one semantic feature, word or phraseological unit affects the choice of other words or phraseological units, usually within a maximum span of several paragraphs. (based on Sinclair 1991: 173) This reformulation of the idiom principle also takes account of cases where there is a great deal of variation among the node and the collocate(s). One typical case is the collocation of the contrast marker not so with lexical items such as surely, seem, appear, you/one might think that, it was hoped that or one hears that, all of which contain a semantic trait implying ‘uncertainty’ or ‘error’: (14) Some might think Volkswagen, which now owns 70 per cent of the Czech company, would have thought the Skoda’s identity problematic. Not so. VW sees Skoda as one of the most recognised brand names in advertising. (NE) After recriminations last summer, when a number of big trading houses were accused - nothing was ever proved - of forcing the FTSE 100 higher ahead of options expiry dates, it was hoped the Stock Exchange had nipped things in the bud. Not so. Yesterday afternoon, after a solid if unspectacular morning’s business, shares in some of the biggest Footsie companies - the ones heavily weighted in the premier index - motored sharply upwards. (NE) Regulators and providers ought surely to be kept apart. Not so, according to the NRA’s board - and to Lord Crickhowell, who insists that water management and regulation are inextricably linked. (NE) Collocation, Colligation and Encoding Dictionaries 429 So when some 100,000 demonstrators clogged the streets of the capital, Minsk, on April 10th to support striking industrial workers and to protest against price rises, it seemed as if discontent had come out of the blue. Not so: beneath the surface the republic had been stirring for months. (NE) Here one might make a case for the collocation of underlying rhetorical strategies rather than strings of words or semantic features. This would be correct to the extent that the discourse preceding not so sets up an expectation which is not fulfilled in the subsequent discourse. In actual fact, however, rhetorical strategy and occurrence of semantic features are two sides of the same linguistic coin. Not surprisingly, the ‘error’ part of the above pattern may also be found in nominal form; in the following example from an academic text, you might think has been converted to the more formal noun misconception: (15) Another misconception about meditation is that the meditator should fall into a trance. Not so. As a famous Chinese Buddhist put it: There is a class of foolish people who sit quietly and try to keep their minds blank (. . .) (CAE) A more complex realisation of a long-distance collocational pattern is seen in the following extract: (16) But if one considers that in college dictionaries the average number of column-lines allotted to each entry (not each definition) is a bit less than two, one will see why space is at a premium. (CAE) In the present case the collocational relationship holds between two types of SLDM which occur in, respectively, the main clause and the sub-clause of a complex sentence: the topic shifter (if) one considers (þ wh-clause / NP) and the suggestor one will see (þ wh-clause / NP). Again, it is not so much the lexical items themselves which enter into collocation; rather, we are dealing with a recurrent type of semantic-functional relationship, where both the second and the first part of the collocation may be replaced by other lexical items. A few more examples follow: (17) If one considers that the various paths do not exist except as perceived by some mind, then one immediately arrives at the conclusion that the probability of a path should be chosen proportionally to its algorithmic information. (CAE) If we consider the nature of Christian persecution as it is currently understood, we can easily see how the personal attitudes of the 430 Dirk Siepmann presiding official could have been a significant factor in any particular trial. (CAE) If, however, one reads the early dramas of Augustus Thomas and Clyde Fitch, it will be realized how dexterously the American playwright profited by the French technician in whom the commercial manager had faith. (CAE) French concession markers, too, are evidence of lexical dependencies operating across considerable spans of text. Thus, the concessive en admettant que tends to pre-empt the choice of an adversative marker such as pourtant, encore faut-il que or le fait demeure que: (18) R.-L. Wagner (1968), qui note que le « terme de ‘‘mot’’ en est venu assez tard en français à traduire la notion d’une unite´ lexicale autonome », tout en admettant le bien-fonde´ de l’analyse qu’A. Martinet fait de la notion de « mot », refuse pourtant d’abandonner ce terme parce que la lexicologie porte sur l’e´tude des signes en situation. (CAF) The uncovering of such patterns is of great value for language teaching. Just as lexico-grammars (Francis, Hunston and Manning 1996, 1998) have illustrated the close links between word complementation and meaning, so future text grammars and dictionaries may reveal the collocational nature of specific rhetorical moves. Again, such examples could easily be multiplied. They all illustrate the density and conformity of lexical patterning in text, and suggest that a ‘semantic feature’ approach to collocation holds greater explanatory power than one based on the assumption of semantic classes, since it would be difficult to group such items as it is hoped, misapprehension and seem in one class. To sum up our discussion so far, we can say that the case for distinguishing semantically autonomous and semantically dependent constituents of collocations is extremely weak. 3.6 A typology of collocation The inescapable conclusion to be drawn from this section is that collocational phenomena span the entire range of morpho-syntactic constructions. The terms ‘collocation’ and ‘construction’ turn out to be almost synonymous, a clear indication of the fact that phraseology is at the centre of language rather than at the periphery. The only category of collocation that cannot be captured by the notion of construction is collocation of Collocation, Colligation and Encoding Dictionaries 431 semantic features. We might therefore posit four main types of collocational relationship: (a) Colligation (t’avais qu’à þ INF, ignorer tout de þ N, il n’y a qu’à þ INF, ce/cette N [tradition, etc.] est reste´(e), NP dans l’âme, typisch þ N, far be it from me to þ INF, etc.); note that this definition of colligation goes further than Hoey’s (see endnote 1). Colligation concerns not only the grammatical preferences of individual words, but also those of longer syntagms. Thus, the phrase t’avais qu’à can be said to be in colligation with an infinitive clause. (b) Collocation between lexemes or phrasemes (de meˆme . . . de meˆme que, briser ses chaussures, c’est-à-dire en l’occurrence, regarde où tu vas, bon ben, à la fin, etc.). (c) Collocation between lexemes and semantic-pragmatic (contextual) features (beautifully þ [result of creative activity], [uncertainty] þ not so, [question] þ eh bien, [expectation] þ duly, [negative contextual aspect] þ (not) detract from s.o.’s enjoyment, [vehicle] þ mordre sur þ [part of the road], help! (on such one-word collocations, cf. González-Rey 2002: 95, 101). (d) Collocation between semantic-pragmatic features (extended lexical units, cf. Sinclair 1996/2004, 1998/2004; long-distance collocations, cf. Siepmann 2005). We are now in a position to reconsider the question we started out from in this section: what elements make a collocation? The answer now appears almost disarmingly simple: any colligational pattern may provide the basis for collocation. Some patterns are particularly common and therefore account for the majority of collocations (cf. Siepmann 2003): X þ Y (grand maigre, gros mal, re´action à chaud, bon ben, où là, de meˆme que þ de meˆme) X þ Y þ Z (þ n) (vilain petit canard, petit coin tranquille) X þ et þ Y (sain et sauf, sel et poivre, sick and tired) X þ Prep (wedded to his profession, averse to risk, à la fin) X þ Prep þ Y (grand chasseur devant l’e´ternel) X þ Verb þ Y (to say . . . is to say . . . ., la voiture a mordu sur la ligne blanche) We have also seen that some collocations, especially long-distance collocations, are not merely, or not at all, based on colligational, that is, syntactic relations, but on semantic relations. Diagrammatically, this gives us: semantic feature of X þ (semantic feature of) Y 432 Dirk Siepmann 4. Are collocations arbitrary? It seems likely that collocational knowledge is prototypical: to return to one of the aforementioned examples, children acquiring French as their first language come across several prototypical utterances containing the lexical unit mordre sur (1) and then intuitively proceed to build up paradigmatic classes. These prototypical utterances are made against a specific situational background, namely motoring. It is the entire figure-ground-relation (moving object/person – mordre sur – a part of the road [background: account of a car ride, a car race, etc.]) that is acquired, not just the verb. This creates numerous associations in the speaker’s mind, so that there are several pathways to accessing the prototype: seeing a car, using the word ‘car’ at the start of a sentence, thinking of a car race, etc. Once such associations have been acquired, it becomes possible for the native speaker to initiate language change by modifying existing collocations syntactically or semantically via the same processes (e.g. metaphor, metonymy) as those underlying change in individual lexical units. It is not surprising therefore that some authors (Grossmann and Tutin, forthcoming) have entertained the bold hypothesis of an underlying semantic systematicity of collocational networks, only to find it disproved by a detailed study of intensifiers accompanying nouns denoting emotions ( parfait bonheur, amour fou, etc.); Grossmann and Tutin (forthcoming) conclude that the positioning and generativity of such adjectives is ‘hard to predict’. Further confirmation for this is provided by the aforementioned investigation into road transport vocabulary, where it became clear that, while collocational synonymy makes for economy of learning (e.g. la route/l’autoroute/le chemin/ la rue passe/arrive/conduit/me`ne quelque part), there are also divergent tendencies (e.g a little alley vs *a little boulevard, l’autoroute file vs *le chemin file; desservi par une autoroute vs *desservi par un chemin de terre). This is even clearer with collocations such as fashionably late or flou artistique, where there does not seem to be any previous semantic model on which the collocation could have been based. Thus, although a post hoc explanation is sometimes possible, collocation remains an arbitrary phenomenon based on ‘language games’ where semantics clearly play an unpredictable role. However, although semantic relationships can only be discerned post hoc, we should not forget that they may lighten the language learner’s task. 5. Can we distinguish between collocations and phraseology on the one hand, and collocations and free combinations on the other? This section is concerned with the various strands of argument that have been deployed in favour of a clear distinction between collocation and phraseology on the one hand, and collocations and free combinations on the other. These Collocation, Colligation and Encoding Dictionaries 433 arguments can be broadly classified into two variants, viz. the argument from syntax and the argument from semantics. First, let us look at the argument from syntax. It has been repeatedly claimed by theoretical linguists that a sharp boundary can be drawn between collocations and fixed expressions by resorting to standardised tests such as passivization or pronominalisation (Gross 1996, Scherfer 2002). Thus, a fixed expression such as prendre la tangente can indeed be neither passivized nor pronominalised (or rather, it is not normally passivized or pronominalised): *la tangente a été prise par lui *la tangente, il l’a prise Detailed observation of real language use, however, leaves the theoreticians without a leg to stand on. As Moon (1998), Partington (1998), Burger (1998) and Siepmann (2003) have shown, modification of ‘standard’ citation forms of phrasemes is almost the rule rather than the exception, and we find numerous instances of passivization or relativization where we might not have expected it. A few examples will suffice: (19) jeter un pavé dans la mare -4 ce pave´ dans la mare e´tait lance´ par quelqu’un qui . . . découvrir le pot aux roses -4 le pot aux roses a e´te´ de´couvert cracher dans la soupe -4 la soupe dans laquelle peu osent cracher avaler des couleuvres -4 en compensation des couleuvres qu’elle a dû avaler (all examples from NF) Our linguistic competence invariably allows us to modify previous utterances, and this seems to occur quite commonly with phrasemes. The argument from syntax is spurious for another reason, namely that, just like phrasemes, collocations (in the traditional sense defined by Hausmann and Mel’čuk) may also be syntactically or otherwise restricted. One such restricted collocation is the French noun þ verb combination situation [‘ensemble des circonstances dans lesquelles une personne (un pays, une collectivité) se trouve’] þ faire (cf. Siepmann 2003: 244–245). In this construction faire invariably introduces a participial relative clause: (20) la situation faite aux protestants la situation faite aux immigrants la situation faite aux prisonniers guine´ens (all examples from NF) A construction of the type ‘on a fait une situation (ADJ) aux protestants’ appears to run counter to the norms of French prose. Such examples could be multiplied (e.g. la confiance qui l’habite; see Siepmann 2003); they show that 434 Dirk Siepmann Table 7: Exocentric vs endocentric items Exocentric (phrasemes) Endocentric (collocations) Tiens! Quand le chat n’est pas là, les souris dansent. poivre et sel (¼ gris) un panier percé faire l’autopsie d’un corps avoir intérêt à aigre-doux un panier à salade grammatical preferences must not be left out of consideration when dealing with collocation, despite claims – still to be found even in recent scholarship – that collocations can be represented as quasi-mechanical associations of the type Sonne þ sitzen (Steyer 2000: 110). Turning now to the argument from semantics, we find that this argument is far more difficult to get to grips with, since it raises fundamental questions about a theory of collocation and language, some of which we dealt with in Section 2 above. There we found that the assumption of a differing semiotactic status for the constituents of a collocation, though intuitively appealing, runs into severe difficulties. Another semantically-based suggestion for drawing the dividing line between collocations and phrasemes has been put forward by González-Rey (2002: 120ff.); it is based on the endocentric/exocentric distinction which is quite well known from morphology, where it serves to differentiate different types of compounds (e.g. credit card [endocentric] vs blackhead [exocentric]). Consider the examples in Table 7. The left-hand items are said to be exocentric because none of their components can be deleted, their meaning is not derivable from their constituents, and they can only be understood in a specific situational context. Endocentric items, on the other hand, are said to be characterized by the following features: (a) the constituents are deletable (e.g. un ton aigre, un ton doux) (b) the meaning of the whole is compositional (c) the expression has a referential meaning Unfortunately for Rey-Gonzalez’ theory, there is no basic difference between the kind of context-dependence posited for exocentric items and that which applies to purportedly endocentric items such as ‘quiet drink’, ‘sudden bend’, ‘le paysage défile’, ‘lu et approuvé’ or ‘pour valoir ce que de raison’ (the last two being cited by Rey-Gonzalez). The meaning of such items can hardly be referred to as compositional, since there is no compatibility between their institutionalised senses. A landscape cannot ‘rush’, any more than a bend in the road can be ‘sudden’. Hausmann (2003) cites a number of Collocation, Colligation and Encoding Dictionaries 435 similar borderline cases, such as krummer Hund, where it must be assumed that Hund has the langue-meaning ‘person’ if it is to be considered the base of the collocation. It is also doubtful whether deletability can serve as a valid definining criterion. Counterexamples are not far to seek; thus, it is quite common to find the second part of an idiom, especially a proverb, deleted, as in ‘speak of the devil, . . .’ or ‘quand le chat n’est pas là, . . .’. Feilke (1994, 1996, 2003) was the first to discern the root cause of such classificatory problems with full conceptual clarity. Recognizing that linguistic expressions can be ‘idiomatic’ while at the same time being syntactically and semantically well-formed, he advocates the theoretical decoupling of idiomaticity and syntactic-semantic compositionality (Feilke 2003: 60). According to him, it is the context and the participants placed in that context which, via a figure-ground relationship, bestow meaning on such collocations as the landscape rushes past or lu et approuve´. This is all the more convincing since some words (e.g. ‘Lage’ [‘situation’]) have no distinctive meaning components, so that it is impossible to attribute a summative meaning to such expressions as sonnige Lage (‘sunny location’). 6. Are collocations monosemous and monoreferential? Are there synonymic collocations? According to González-Rey (2002: 117), collocations are monoreferential and do not allow synonymic variation: ‘L’unité ne peut se constituer comme variante, exprimée sous la forme de périphrase, d’un mot déjà établi, ni admettre d’autres variations pour le même référent, à moins d’en créer des sous-catégories.’ (González-Rey 2002: 117, my emphasis) Although this statement is generally correct, here too it is relatively easy to find a number of counterexamples, such as to stick to/keep to the speed limit; Verbrechen begehen / verüben11; parvenir/arriver à un compromis; la pluie baisse / baisse d’intensite´ / diminue / se calme, etc. It is often claimed that such synonyms differ in some aspects of their meaning, especially according to style level, but this line of argument clearly does not apply to the first two examples just cited. It is also interesting to note that one collocation may take on several meanings, a factor that has been neglected both in lexicological theory and in dictionary making. A simple example of a polysemous collocation is English ‘avoid an accident’: (21) s.o. avoids an accident (1) -4 qqn évite un accident -4 j-m vermeidet einen Unfall 436 Dirk Siepmann s.o. avoids an accident (2) -4 qqn échappe à un accident -4 j-m entgeht einem Unfall To take a more complex example, the French collocation donner þ exemple, normally translated by give þ example and geben þ Beispiel, can occur in two different types of linguistic environment (cf. Siepmann 2003). Compare the following groups of examples: (22) Les grammaires disent encore que les adjectifs verbaux issus d’un participe pre´sent ou passe´ ou d’une de leurs formes pre´fixe´es sont presque toujours place´s apre`s le nom. Mon corpus donne de nombreux exemples d’infractions à cet usage (. . .) D’autres exemples ont e´te´ donne´s à la re´union de la Socie´te´ française de microbiologie à l’Institut Pasteur en de´cembre 1997. Les e´conomies re´gionales autarciques ont existe´ jusqu’au moment où se sont de´veloppe´s les moyens de communication. G. Kuhnholtz-Lordat en donne un remarquable exemple dans le « pays de Costie`re » (de´partement du Gard). L’Arabie Saoudite donne un exemple d’Etat islamique moderne. R. T. T. Forman et M. Godron (1986) de´finissent un paysage comme un espace de plusieurs kilome`tres carre´s, où un assemblage particulier d’e´cosyste`mes interactifs se re´pe`te à peu pre`s à l’identique. La mosaı¨que des champs, des pre´s, des haies et des bois d’un bocage en donne un exemple. De sorte que les villes ont crû, se sont transforme´es et fragmente´es, d’une manie`re qui de´passe tout ce que l’on avait pu imaginer. Le meilleur exemple est donne´ par Mexico, la ville du monde la plus peuple´e, dont il est de´sormais impossible de fixer les limites et de dresser le plan. (all examples from CAF) In the first group of sentences donner has retained one of its dictionary meanings (‘communiquer, exposer’). In functional grammar terms, the subject of donner would be labelled an ’actor’; the construction belongs to the material process type. It is somewhat different with the second group of sentences, where donner has an equative meaning characteristic of the relational process type. Its subject is a ’token’ that has a ’value’ ascribed to it in the form of an object. Since the English collocation give þ example and the German collocation geben þ Beispiel can only be used with material processes, a literal translation of the second group of examples is out of the question. We thus have to resort to equivalents based around copular be. Collocation, Colligation and Encoding Dictionaries 437 The first sentence of the second group, for example, could be translated as follows: Saudi-Arabia is an example of a modern Islamic state. Saudi-Arabien stellt ein Beispiel für einen modernen islamischen Staat dar. The above considerations also hold true for noun-adjective combinations such as heures creuses (literally ‘hollow hours’). Heures creuses is a semitechnical term which occurs in at least four different fields: power generation, rail transport, road transport and telecommunications: (22) Les radiateurs à accumulation ne´cessitent la mise en oeuvre d’un asservissement aux heures creuses EDF. la SNCF renforce les trains aux heures creuses entre Paris et Combsla-Ville 0,075 ou 0,105 (Bouygues) aux heures creuses (all examples from NF) Such collocational polysemy is also apparent from the paradigmatic relations entered by heures creuses. Thus, whereas in telephony the antonym of heures creuses is heures pleines, in road transport it is heures de pointe. Somewhat counterintuitively, collocational polysemy is particularly common in special-purpose language. Thus, some French noun-(relational) adjective combinations of the type roue inte´rieure can usually be disambiguated in context only, since at least one of its meanings arises from the deletion of an intermediate element: roue (à denture) inte´rieure (Forner 2000: 180ff.). 7. Conclusion: A redefinition of collocation for lexicographic purposes It should have become clear that previous definitions of collocations have relied too heavily on introspection rather than corpus evidence. This has prevented linguists from realizing that what has traditionally been known as ‘collocation’ or ‘phraseology’ is only one aspect of idiomatic language use, and that the boundaries between the two are hazy and uncertain. The only way out of this dilemma is a rigourously corpus-driven approach to the study of lexis and grammar, and this is the approach that has been taken in the present study. Our discussion suggests that even the most sophisticated structuralist definitions cannot adequately capture the phenomenon of habitual co-occurrences, and that the frequency-based approach to collocation cannot account for the collocation of semantic features. We would therefore be justified in loosening the definition of collocation to a considerable extent; collocation could be defined pragmatically with reference to the notions of ‘Gebrauchsnorm’, or ‘usage norm’ (Steyer 2000: 108), reflected in concepts 438 Dirk Siepmann such as ‘minimal recurrence’ (Kocourek 1991, Siepmann 2003) or ‘statistical significance’ (Sinclair 1991), on the one hand, and the notion of ‘inhaltliche Geschlossenheit’, or ‘holisticity’, on the other hand (Siepmann 2003), the latter referring to the facts that (a) native speakers can ascribe meaning to generallanguage collocations even if these are divorced from context and (b) that such units are intuitively considered as self-contained ‘wholes’: a collocation is any holistic lexical, lexico-grammatical or semantic unit normally composed of two or more words which exhibits minimal recurrence within a particular discourse community ‘Holisticity’ should here be taken to include colligation with a particular grammatical category, such as a noun phrase. Thus, the collocations the future belongs to (die Zukunft gehört, l’avenir appartient à) or l’autoroute file would be felt to be incomplete by most speakers, requiring as they do a prepositional object. This variable complement is conceived of as part of the collocation. There is some evidence from a psycholinguistic study by Schmitt et al. (2004) to suggest that the above definition, first proposed in sketchier form in Siepmann 2003, is psychologically valid. Schmitt and his colleagues administered an oral dictation task to a number of native and non-native English speakers, who were asked to reproduce dictation ‘bursts’ of considerable length which contained different types of recurrent clusters. It was found that not all statistically significant clusters retrievable from corpora are stored as holistic units in the mental lexicon; there was a discernible tendency for semantically transparent clusters (e.g. to make a long story short) rather than sentence fragments (e.g. in a variety of ) to be reproduced intact. This finding seems all the more plausible since even participants’ failure to reproduce original sequences does not mean that they are not stored in the mind, for the simple linguistic reason that many so-called ‘fixed’ expressions admit of variants (e.g. to cut a long story short; see above) that participants may prefer. From this section there emerge two important conclusions for linguistic and lexicological theory. Firstly, collocation, as defined above, dominates language use (at least from a statistical perspective). That is, Sinclair’s open-choice principle only has a marginal role to play compared to his idiom principle, which, as seen in our discussion of long-distance collocations, needs to be considerably widened. Secondly, collocations should be considered as fullyfledged linguistic signs in their own right, so that Saussure’s word-based linguistics will have to be complemented by a collocation (or ‘expression’)based linguistics (cf. Feilke 1996, 2003). This redefinition of collocation enables us to account for the ways in which language users operate with wholes (words, collocations) and at the same time with parts (words, semantic features) which they have extracted from contextual wholes – a key demand to be placed on any semantic theory Collocation, Colligation and Encoding Dictionaries 439 (cf. Bolinger 1965: 570–571). Both operations have been shown to be governed by collocation, thus providing further evidence for Hoey’s claim that collocation is indeed one of the central mechanisms involved in meaning creation (see introduction). It thus appears that both structurally simple (i.e. [bound] morphemes, lexemes) and structurally complex units (i.e. collocations/colligational patterns) are linguistic signs. If the dictionary is meant to be a record of such signs, the task of the lexicographer is to gather together evidence of both types of sign. So far it has been lexemes, non-compositional idioms and morphemes that have received the bulk of lexicographic attention, but the future clearly belongs to collocation and colligation in the widest possible sense. In the second part of this article, I shall discuss some of the implications such a change in perspective – not to say paradigm shift – has for the making of encoding dictionaries. Notes 1 For those readers who are not yet familiar with the relatively recent notion of colligation (a term originally coined by Firth), here is how Hoey (1998) defines colligation: - the grammatical company a word keeps (or avoids keeping) either within its own group or at a higher rank. - the grammatical functions that the word’s group prefers (or avoids). - the place in a sequence that a word prefers (or avoids). 2 Even a superficial glance at lexical functions shows that they disregard contextual relationships. Thus, the adverb drop-dead may intensify the adjective beautiful with reference to women, but not with reference to buildings. 3 On an alternative construal, the German sequence might be viewed as a colligational pattern or schematic construction (Croft and Cruse 2003): eine ADJ Straßenlage haben, but this seems problematic to the extent that very few adjectives can fill the slot. 4 I use the term ‘concept’ more or less in its standard terminological sense to refer to a ‘unit of thought constituted through abstraction on the basis of properties common to a set of objects or phenomena’. 5 Clearly, then, the notion of literal meaning turns out to be a linguistic abstraction (see also the introduction to this article). 6 A point of criticism that might be raised is that we are here dealing with an instance of regular polysemy. The meaning of ‘drink’ could be glossed as referring to an occasion where people have a drink, and the same reasoning would apply to cases such as quiet dinner/breakfast/lunch/tea. I would argue that such apparent regularities are in fact more or less accidental; as Blank (2001) and Grossmann and Tutin (forthcoming) have shown, nouns belonging to the same semantic class may share some of their collocations or colligations, but not all of them (e.g. nach der Schule gingen die Schüler nach Hause vs *nach dem Parlament gingen die Abgeordneten nach Hause). 7 Or a three-item construction in the sense of Croft and Cruse (2003). 440 Dirk Siepmann 8 Blank is not unaware of the fact that verbs may also be associated with particular circumstantial complements (‘Zirkumstanten’) which may themselves carry selectional restrictions, but he considers these two levels to be of lesser importance. As our analysis has shown, however, it is often the particular collocation that determines the verb pattern (l’autoroute file quelque part). Put another way, valency and collocation appear to shade off into each other; speakers have semantically and syntactically prepatterned collocations or ‘constructions’ (Fillmore) at their disposal. 9 Interestingly, the distinction we have just established between selectional and collocational restrictions has a parallel in theories of formal grammar, such as Headdriven Phrase Structure Grammar, where selection refers to the process whereby a head selects its complements and an adjunct selects its head. Using the example of the German verb fackeln, whose linguistic environment invariably comprises a durational modifier (most commonly nicht lange), Sailer and Richter (2002) show that the durational modifier cannot be interpreted as a complement of the head verb, but rather as an adjunct. Therefore, they argue, the relationship between the head verb and the relational modifier is one of collocation rather than selection. 10 An alternative, cognitive-linguistic explanation might take the conceptual background as its starting point. Since paintings, carvings, etc. are often perceived as aesthetically pleasing, the adjective beautiful readily springs to mind to describe them. Collocations incorporating the adverb beautifully could then be regarded as being derived from the original collocation (beautiful carving -4 beautifully carved ). The problem with this explanation is that such derivation is not always possible. 11 Dieter Wirth, personal communication. References A. Dictionaries Ilgenfritz, P. et al. 1989. Langenscheidts Kontextwörterbuch Französisch-Deutsch. Ein neues Wörterbuch zum Schreiben, Lernen, Formulieren. Munich: Langenscheidt. (LKFD) B. Other literature Benson, M. 1986. Lexicographic Description of English. Amsterdam: Benjamins. Biber, D. et al. 1999. Longman Grammar of Spoken and Written English. London: Longman. Blank, A. 2001. Einführung in die lexikalische Semantik für Romanisten. Tübingen: Niemeyer. Bolinger, D. 1965. ‘The atomization of meaning.’ Language 41: 555–573. Brauße, U. 1992. ‘Funktionswörter im Wörterbuch’ in U. Brauße and D. Viehweger, Lexikontheorie und Wörterbuch: Wege der Verbindung von lexikologischer Forschung und lexikographischer Praxis. Tübingen: Niemeyer, 1–88. Burger, H. 1998. Phraseologie: Eine Einführung am Beispiel des Deutschen. Berlin: Schmidt. Chandler, S. 1993. ‘Are Rules and Modules Really Necessary for Explaining Language?’ Journal of Psycholinguistic Research 22: 593–606. Croft, W. and Cruse, A. D. 2003. Cognitive Linguistics. Cambridge: Cambridge University Press. Cruse, A. D. 1986. Lexical Semantics. Cambridge: Cambridge University Press. Collocation, Colligation and Encoding Dictionaries 441 Feilke, H. 1994. Common sense-Kompetenz. Überlegungen zu einer Theorie ‘‘sympathischen’’ und ‘‘natürlichen’’ Meinens und Verstehens. Frankfurt: Suhrkamp. Feilke, H. 1996. Sprache als soziale Gestalt. Frankfurt: Suhrkamp. Feilke, H. 2003. ‘Kontext – Zeichen – Kompetenz. Wortverbindungen unter sprachtheoretischem Aspekt’ in K. Steyer (ed.), Wortverbindungen – mehr oder weniger fest. (Jahrbuch des Instituts für deutsche Sprache.) Berlin: De Gruyter, 41–64. Fillmore, C. J. 1976. ‘Pragmatics and the Description of Discourse’ in S. J. Schmidt (ed.), Pragmatik/Pragmatics II. Grundlegung einer expliziten Pragmatik. Munich: Fink, 83–104. Forner, W. 2000. ‘Fachsprachliche Nominationstechniken: Informationsverwertung und Informationsbewertung’ in K. Morgenroth (ed.), Hermeneutik und Manipulation in den Fachsprachen. Tübingen: Narr, 167–190. Francis, G., Hunston, S. and Manning, E. (1996). Collins Cobuild Grammar Patterns 1: Verbs London: HarperCollins. Francis, G., Hunston, S. and Manning, E. (1998). Collins Cobuild Grammar Patterns 2: Nouns and Adjectives London: HarperCollins. González-Rey, I. 2002. La phraséologie du français. Toulouse: Presses Universitaires du Mirail. Gross, G. 1996. Les expressions fige´es en français. Noms compose´s et autres locutions. Paris: Ophrys. Grossmann, F. and Tutin, A. (eds.) 2003. Les collocations: analyse et traitement. Travaux et recherchers en linguistique appliquée Série E. Amsterdam: De Werelt. Grossmann, F. and Tutin, A. (forthcoming). ‘Motivation of Lexical Associations in Collocations: the Case of Intensifiers Denoting ‘‘Joy’’ ’ in L. Wanner (ed.), Festschrift in Honour of Igor Mel’cˇuk. Amsterdam: Benjamins. Hartenstein, K. 1996. ‘Faustregeln als Lernhilfen für Lexemkollokationen (vorgeführt am Beispiel des Deutschen, Englischen, Französischen und Russischen)’ in K. Hartenstein (ed.), Aktuelle Probleme des universitären Fremdsprachenunterrichts. ZFI Arbeitsberichte 11, Zentrales Fremdspracheninstitut der Universität Hamburg, 83–134. Hausmann, F. J. 1979. ‘Un dictionnaire des collocations est-il possible?’ Travaux de linguistique et de litte´rature 17: 187–195. Hausmann, F. J. 1997. ‘Tout est idiomatique dans les langues’ in M. Martins-Baltar, La locution entre langue et usages. Fontenay/Saint-Cloud: ENS Éditions, 277–290. Hausmann, F. J. 1999. ‘Le dictionnaire de collocations – Critères de son organisation.’ in N. Greiner et al., Texte und Kontexte in Sprachen und Kulturen. Festschrift für Jörn Albrecht. Trier: Wissenschaftlicher Verlag Trier, 121–140. Hausmann, F. J. 2003. ‘Was sind eigentlich Kollokationen?’ in K. Steyer (ed.), Wortverbindungen – mehr oder weniger fest. (Jahrbuch des Instituts für deutsche Sprache.) Berlin: De Gruyter, 309–334. Hoey, M. 1991. Patterns of Lexis in Text. Oxford: Oxford University Press. Hoey, M. 1998. ‘ ‘‘Introducing Applied Linguistics’’: 25 Years on.’ Plenary Paper in the 31st BAAL Annual Meeting: ‘Language and Literacies’, University of Manchester, September 1998. Hoey, M. 2000. ‘A World beyond Collocation: New Perspectives on Vocabulary Teaching’ in M. Lewis (ed.), Teaching Collocation. Further Developments in the Lexical Approach. Hove: LTP, 224–243. Hunston, S. 2001. ‘Colligation, Lexis, Pattern and Text’ in M. Scott and G. Thompson (eds.), Patterns of Text. In honour of Michael Hoey. Amsterdam: Benjamins, 13–34. 442 Dirk Siepmann Jones, S. and Sinclair, J. M. 1974. ‘English Lexical Collocations’ Cahiers de lexicologie 24: 15–61. Kenny, D. 2003. ‘Die Übersetzung von usuellen und nicht usuellen Wortverbindungen vom Deutschen ins Englische. Eine korpusgestützte Untersuchung’ in K. Steyer (ed.), 335–347. Kjellmer, G. 1994. A Dictionary of English Collocations. Oxford: Clarendon Press. Kocourek, R. 1991. La langue française de la technique et de la science. Wiesbaden: Brandstetter. L’Homme, M.-C. 2003. ‘Les combinaisons lexicales spécialisées (CLS). Description lexicographique et intégration aux banques de terminologie’ in F. Grossmann and A. Tutin (eds.), Les collocations: analyse et traitement, Travaux et recherches en linguistique appliquée Série E, No 1, 89–104. Lutzeier, P. R. 1981. Wort und Feld. Wortsemantische Fragestellungen mit besonderer Berücksichtigung des Wortbegriffes. Tübingen: Niemeyer. Mel’čuk, I. 1998. ‘Collocations and Lexical Functions’ in A. Cowie, Phraseology. Theory, Analysis and Applications. Oxford: Clarendon Press, 23–53. Mel’čuk, I. 2003. ‘Les collocations: définition, rôle et utilité’ in F. Grossmann and A. Tutin (eds.), Les collocations: analyse et traitement, Travaux et recherches en linguistique appliquée Série E, No 1, 23–32. Mel’čuk, I. and Wanner, L. 1996. ‘Lexical Functions and Lexical Inheritance for Emotion Lexemes in German’ in L. Wanner (ed.), Lexical Functions in Lexicography and Natural Language Processing. Amsterdam: Benjamins, 207–277. Moon, R. 1998. Fixed expressions and idioms in English. Oxford: Clarendon. Partington, A. 1998. Patterns and meanings. Amsterdam: Benjamins. Renouf, A. J. and Sinclair, J. M. 1991. ‘Collocational Frameworks in English’ in K. Ajmer and B. Altenberg (eds.), English Corpus Linguistics. London: Longman, 128–143. Sailer, M. and Richter, F. 2002. ‘Not for Love or Money: Collocations!’ in G. Jäger et al. (eds.), Proceedings of Formal Grammar 2002. Trento, 149–160. Schafroth, E. 2003. ‘Kollokationen im GWDS’ in H. E. Wiegand (ed.), Untersuchungen zur kommerziellen Lexikographie der deutschen Gegenwartssprache I. ‘‘Duden. Das große Wörterbuch der deutschen Sprache in zehn Bänden.’’ Print- und CD-ROMVersion. Tübingen: Niemeyer, 397–412. Scherfer, P. 2002. ‘Lexikalische Kollokationen’, in I. Kolboom et al. (eds.), Handbuch Französisch. Berlin: Schmidt, 230–237. Schmitt, N. et al. 2004. ‘Are Corpus-derived Recurrent Clusters Psycholinguistically Valid?’ in N. Schmitt (ed.), Formulaic Sequences: Acquisition, Processing and Use. Amsterdam: Benjamins, 127–152. Siepmann, D. 1998. Review of ‘John Sinclair (ed.), Collins Cobuild Collocations CD-ROM’, Fremdsprachen und Hochschule 53, 134–137. Siepmann, D. 2003. ‘Eigenschaften und Formen lexikalischer Kollokationen: Wider ein zu enges Verständnis.’ Zeitschrift für französische Sprache und Literatur 1: 260–283. Siepmann, D. 2004. ‘Kollokationen und Fremdsprachenlernen: Imitation und Kreation, Figur und Hintergrund.’ Praxis Fremdsprachenunterricht 2: 107–113 Siepmann, D. 2005. Discourse Markers across Languages. A contrastive study of secondlevel discourse markers in native and non-native text. New York: Routledge 2005. Sinclair, J. M. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press. Sinclair, J. M. 1995. Collins Cobuild English collocations on CD-ROM. London: Harper Collins. Collocation, Colligation and Encoding Dictionaries 443 Sinclair, J. M. 2004. Trust the Text. Language, Corpus and Discourse. New York/ London: Routledge. Sinclair, J. M. 1996/2004. ‘The Search for Units of Meaning’ in J. M. Sinclair, Trust the Text, Language, Corpus and Discourse. New York/London: Routledge, 24–48. Sinclair, J. M. 1998/2004. ‘The Lexical Item’ in J. M. Sinclair, Trust the Text, Language, Corpus and Discourse. New York/London: Routledge, 131–148. Skousen, R. 1989. Analogical Modelling of Language. Dordrecht: Kluwer. Steyer, K. 2000. ‘Usuelle Wortverbindungen des Deutschen. Linguistisches Konzept und lexikografische Möglichkeiten.’ Deutsche Sprache 2: 101–125. Steyer, K. 2003. ‘Kookkurrenz. Korpusmethodik, linguistisches Modell, lexikografische Perspektiven’ in K. Steyer (ed.), Wortverbindungen – mehr oder weniger fest. (Jahrbuch des Instituts für deutsche Sprache.) Berlin: De Gruyter, 87–116.