Les dictionnaires anciens n'ont pas une structure suffisamment claire et récurrente pour permettre un balisage systématique des champs informationnels. Les deux concepts de "mots clés métalinguistiques" et de "recherche floue" ont l'avantage de faciliter l'interrogation des champs informationnels sans déformer le texte.
Metalinguistic Keywords, Dictionary Information Fields, Fuzzy Structures, Lemmatization
Thus the database of Nicot's TLF -- and that of Ménage's DEOLF in progress -- are only tagged for page-column, headword, typography and language. In order to give access to information fields -- part of speech, definition, field labels, example, quotation, source, etymology, etc. -- without distorting the text, indexed lists of lemmatized metalinguistic keywords are provided, allowing the retrieval of occurrences of information field markers. For example, the lemma MASCULIN is linked to the contexts in which the lexicographer has indicated -- by «masculin», «m.», «mas.», «masc.» or «mascul.» -- the gender of a masculine noun, adjective or participle.
The international project -- announced at the Institut de France in November 1994 (Wooldridge 1994; cf. Leroy-Turcan & Wooldridge 1995) -- to computerize the eight complete editions of the Dictionnaire de l'Académie française (1694-1935) also has to deal, though to a lesser degree, with structural approximation and variation. Again, imposing on the text a highly-structured information field metatext would result in misrepresenting it.
The aim of our paper is to give a measure, based on extensive computerized samples, of the efficiency of metalinguistic keyword indices, allied to microstructural position and typographical markers such as roman/italic/bold, upper/lower case and indenting, as a retrieval tool for the Dictionnaire de l'Académie ("Acad") and, by implication, early dictionaries in general. The sample database comprises, for each edition, the entries ÂME, DOUAIRE to DOUZIL, GAGNER, GRAS, GROS, LOIN to LOISIR, QUE, QUEUE, TIGE to TINTOUIN, VOLER. Reference may also be made to other entries to illustrate particular problems. Explicit tagging is done for edition, headword, paragraph, typeface and column-page. Of these, all are based on systematic objective formal criteria except headwords. The paper sets out to show that, while tagging as a means of retrieving headwords is preferred to the keyword approach of typographical marking (in this case capitals and position), both typeface and metalinguistic keywords can be used for the location of information fields.
An important concept in this approach to data retrieval is that of "fuzzy searching". Simply put, it means that rather than expending an enormous effort on retrieving 100% of what is sought and nothing more, one can obtain practically the same results with considerably less effort by contenting oneself with a range of 95% to 105% of the theoretical total, the small number of irrelevant occurrences being easy to discard. Fuzzy searching is a particularly appropriate tool for structural approximation (see above).
The formal properties of the items of the alphabetical macrostructure are: large capitals and initial position in a paragraph. To these are added, in some editions, bolding (Acad6-8) and increased preceding line spacing (Acad8). Although large capitals are also used for cross-references in the first edition («TIMPAN. Voy TYMPAN.») and the initial position of a paragraph may be occupied by a variety of objects, the two together are in general sufficient to determine all and only alphabetical headwords. The very occasional exceptions to this rule can be considered as accidents (e.g., QUI VIVE, in the paragraph «QUI VIVE. Voy VIVRE.» placed within the article QUI, functions, like QUICONQUE or QUIDAM, as a sub-headword of QUI).
The secondary macrostructural level, that of sub-headwords, is more problematic. In order to be able to compare the contents of the macrostructure of the first edition with those of the others, it is necessary to tag as headwords in Acad1 those types of subsidiary items that appear in the primary macrostructure in Acad2-8 (e.g. TIMIDITÉ, INTIMIDER, TIMIDEMENT and TIMORÉ in the example given above). These sub-headwords have two formal properties: small capitals and initial position in a paragraph. However, these two properties are frequently shared by what in the dictionary as a whole (Acad1-8), as in the general lexicographical tradition, have to be considered as sub-addresses functioning within the microstructure. The various difficulties involved in distinguishing, in Acad1, sub-headwords from sub-addresses are many; we shall restrict ourselves here to an indication of the problem through a consideration of the first few capitalized paragraph beginnings in the article LONG in Acad1: «LONG, LONGUE. [...] SE FORLONGER. [...] LOIN. [...] AU LOIN. [...] LOIN A LOIN, DE LOIN A LOIN. [...] LOINTAIN, AINE. [...] ELOIGNER. [...]». Applying the rule that the end of the first unit of a capitalized paragraph beginning is marked by the first comma unless it be preceded by a period, we are left with LONG, SE FORLONGER, LOIN, AU LOIN, LOIN A LOIN, LOINTAIN and ELOIGNER. A comparison with Acad2-8 and with general dictionary tradition tells us that LONG, FORLONGER (SE), LOIN, LOINTAIN and ELOIGNER are (sub-)headwords, and that AU LOIN and LOIN A LOIN are sub-addresses of LOIN.
To sum up, formal, objective criteria are insufficient for automatic headword and sub-headword tagging of Acad1. A reasonable way to proceed is to automatically tag capitalized paragraph beginnings and then in a manual, interpretative post-edition to eliminate the headword tags of sub-addresses.
As mentioned above, lower-case roman is used for several information fields: part of speech is usually given immediately after the headword («DOUBLE. adj. de tout genre.»), occasionally elsewhere («Il est aussi subst.»); usage and semantic marks tend to be non-initial in the paragraphs of the discursive texts of the earlier editions («On dit fig. et fam. [...] une cervelle, une teste bien timbrée, mal timbrée» Acad2-5 s.v. TIMBRER), initial in the later ones («Fig. et fam., Une cervelle, une tête timbrée» Acad6-8). Italics are used systematically for examples, irregularly for collocates and synonyms: «GAGNER, se joint quelquefois avec la préposition Sur» (Acad2; Acad1 «sur») vs. «SANS DOUTE, [...] se joint quelquefois avec Que» (id.; Acad1 «que»); «DOUBLON. [...] On dit aussi, Pistole» (Acad6; Acad5 «[...] que nous appelons Pistole») vs. «Ne... que peut, dans certains cas, être considéré comme entièrement synonyme de l'adverbe Seulement» (id. s.v. QUE = Acad7-8).
Bold typeface can be used on its own to retrieve headwords and co-headwords in Acad6-8 (325 sequences in the sample database = 100% of the (co-)headwords). Large roman capitals (387) are used for headwords (Acad1-5) and co-headwords (Acad2-5) in 374 cases (96.64%) and for cross-references in 13 (Acad1, 3.36%). Small roman capitals (790) are highly polysemous: they are used regularly for co-headwords in Acad1 (5 occurrences = 0.63%), sub-headwords and sub-addresses in Acad1-8 (726 = 91.90%), and cross-references in A4-8 (55 = 6.96%); their status of marked typeface explains four idiosyncratic, or irregular, occurrences: a synonym («On dit aussi DUPLICATA» in Acad8 s.v. DOUBLE), a collocate («Il se joint quelquefois avec la préposition SUR» in Acad8 s.v. GAGNER) and an example element («âme rachetée par le sang de JÉSUS-CHRIST» in Acad6-7 s.v. ÂME; cf. Acad5 «[...] JÉSUS-CHRIST», Acad2-4 «[...] Jésus-Christ»).
According to the general logic of the dictionary, definitions (metalanguage) are printed in romans, collocates, synonyms and antonyms (language) in italics. When, as is often the case for adjectives and adverbs, the definition is a single word rather than a periphrasis, the distinction between definition and synonym is unclear, with the result that the use of different typecases can become muddled: «Il signifie aussi, Espais, et est opposé à delié, delicat» (Acad1-7 s.v. GROS), instead of «Il signifie aussi, Espais, et est opposé à delié, delicat» (cf. «DOUBLE [...] Il est opposé à simple» Acad1).
It becomes necessary then to turn to metalinguistic keywords, such as SIGNIFIE, SE JOINT AVEC, ON DIT AUSSI, OPPOSÉ À, etc., for help with retrieving definitions, collocates, synonyms and antonyms.
[Return to Table] -- [Continue]