3.Metalinguistic terms as keywords

The relative regularity, throughout the eight editions, that we have observed in the use of typeface holds true also for the terminology of the dictionary metalanguage. Nouns are normally labelled as masculine or feminine nouns, verbs as transitive or intransitive verbs; the formulae introducing lexicalized expressions, definitions, sense articulation and usage levels remain the same. In the absence of etymology and pronunciation (the latter is only given in exceptional cases), the number of information fields is relatively small. In this section we shall examine from the point of view of their efficacity for information retrieval a few characteristic metalinguistic terms.

We shall use the term Keyword List to refer to the alphabetical index containing the database addresses of occurrences of the metalinguistic keywords. The items of the Keyword List are lemmas grouping variant textual forms; for example, the lemma FEMININ gives access to the text strings «f.», «fem.», «fém.», «fémin.» and «féminin». The frequency in the sample database of the unedited keyword FEMININ is 204. 201 (98.53%) of these occurrences indicate the gender of the lexical item under consideration; in nearly all of them the word is preceded either by the keyword SUBSTANTIF (196) or the keyword ADJECTIF (2). An examination of the six remaining cases (6 of the 8 occurrences of the string «féminin») shows that three concern autonymic linguistic signs («GROSSE, au féminin» Acad6-7; «Au féminin» Acad8 s.v. GROS), while three others refer to a semantic property at the level of metametalanguage («On appelle en termes de Grammaire, Noms douteux, Ceux que les uns mettent au masculin, et d'autres au féminin.» Acad5-7). The keyword FEMININ qualifying a lexical unit can thus be edited to reduce its number of occurrences to 201. The explicit keyword FEMININ as defined above does not, of course, give access to all the feminine forms in the nomenclature: the feminine form of an adjective is normally indicated by the form itself and not by a label -- «DOUX, DOUCE. adj.» --, while dual-gender nouns are marked negatively by the absence of a gender label. For example, the 150 occurrences of the keyword SUBSTANTIF not followed by either MASCULIN or FEMININ contain 30 concerning the feminine; e.g.:

The occurrences of the keyword FEMININ can this time be augmented to include the database addresses of relevant non-labelled feminine items.

In the case of gender, the lack of an explicit label does not, as we have seen, preclude the objective retrieval of relevant items: TIGRESSE is given as feminine by virtue both of its position as second of two co-headwords and of the indication «s.»; DOUILLETTE is given as a feminine noun in the example «C'est une douillette.» For usage level and semantic dependency we have to rely on explicit labels, without which we either subjectively interpret the text or go outside of the dictionary to obtain the information we seek.

A slightly different approach can be adopted with terms like «aussi». Occurring in roman typeface «aussi» is used metalinguistically in nearly all of its occurrences (770 out of 786 = 97.96%) and is polysemous, serving as a copula for information on part of speech, meaning, synonymy and syntactic construction. The choice, for the definition of the keyword AUSSI, lies then between the simple "fuzzy", but efficient rule "«aussi» preceded by a roman typeface tag" (f 786), the more precise global list of edited occurrences (f 770) and the creation of several specific AUSSI keywords according to type of information (part of speech, meaning, etc.)

Familiar usage is marked as such in the text by the terms «familier», «fam.», «famil.», «familière», «familières», «familiers» or «familièrement». The keyword FAMILIER refers in the sample database to 319 occurrences of these variant forms. It is important to distinguish between the subjectivity of the lexicographer's decision to label an item as familiar rather than, for example, popular (the keyword POPULAIRE -- «pop.», «popul.», «populaire», «populairement» -- has a frequency of 41) and the objectivity of the retrieval of the chosen text labels.

In order to retrieve occurrences of figurative usage, one may choose either to restrict oneself to the keyword FIGURÉ (text strings «fig.», «figur.», «figuré», «figurées», figurém.», «figurément», f 517) or combine it with ANALOGIE («par analogie», «par une sorte d'analogie», f 13) and/or PROVERBIAL («prov.», «proverb.», «proverbe», «proverbiale», «proverbialem.», «proverbialement», f 275). One finds that in 112 of its occurrences PROVERBIAL is used in combination with FAMILIER (e.g., «On dit prov. et fig. Joüer à quitte ou à double, pour dire, Hazarder tout pour se tirer d'une affaire.» Acad1-5 s.v. DOUBLE; cf. «[...] figurément et familièrement [...]» Acad6-7, «Voyez QUITTE» Acad8).

If part of speech, gender and field labels are fairly easy to determine, the identification of other information fields, such as definition and example of usage, can be complex and involve subjective interpretation. As we have seen in the preceding section, a prerequisite for the definition is that it be in roman typeface, and for the example that it be in italics. Other keywords are occasional and varied, and no combination comes near to permitting the retrieval of all and only definitions or all and only examples. A problem that precedes the application of keywords (or tags in a "systematic" modern dictionary whose information fields have been tagged) is the definition of what constitutes a definition or an example.

The "definition" can function in content metalanguage or sign metalanguage (métalangue de contenu vs. métalangue de signe: Rey-Debove 1971); it can treat the word at the level of the lexicon or in discourse.

Occasional explicit copulas linking the lexical item (subject) to the definition (predicate) include «signifie» (f 414) -- cf. «sign.» (= «signifie») 9, «pour/peut signifier» 17, «signifioit/signifiait» 5, «phrases [...] qui signifient» 1, «signification(s)» 29, «sign.» (= «signification») 1 --, «pour dire» (f 801) and «se prend pour» (f 53). Another occasional definition marker is the explicitation of the status "species" (hyponym) of the lexical item as opposed to that of "genus" (hyperonym) of the nuclear term of the definition. Thus «espece/espèce» (f 78) and «sorte» (f 86) qualifying, for example, DOUBLON and LOIR as types of currency and small animal respectively: With respect to examples, there is no absolute means of determining in the dictionary text the boundary between lexical units and examples, between lexicalized and free syntagmas. In a paragraph containing several italicized sequences, lexicalized items normally precede free ones. In most cases, but not all, a lexicalized syntagma is followed by a semantic treatment, whereas a free example is given in final position. In extract 1 following, the first italicized sequence is a lexical unit followed by a definition, the second a series of three examples; in extract 2, the one italicized sequence is a lexical unit followed by a definition; in extract 3, the several italicized sequences have to be considered as exemplifying collocations or sentences even though many of them are followed by definitions (of the word in discourse).


Although modern print dictionaries are never entirely systematic, they are relatively so; when retroconverted to electronic form, their information fields are systematically tagged to a more or less sophisticated degree. Early dictionaries are less systematic, to varying degrees, than modern ones. In order not to distort them, systematic tagging of information fields should not be attempted. Instead a high degree of efficiency can, in the great majority of cases, be achieved by the use, for purposes of information retrieval, of typeface tags and metalinguistic keywords. In the application of typeface tags and the definition of keywords, proper consideration should be given to the efficacy of "fuzzy" searching as opposed to time-consuming post-edition.


