5. The metalinguistic keyword base.

I have talked so far of dictionary, text and bibliographical bases, image files and notes. All of these components are well-known and are used in many applications. There remains to be mentioned one component particular to the genre of the dictionary: the metalinguistic keyword base, a tool applied to Nicot's Thresor from the moment of its computerization more than ten years ago.

In early dictionaries the degree of variation in the presentation of microstructural informational fields is often sufficient to make any attempt at systematic tagging difficult and hasardous. Whereas it is possible, to a certain extent, to rewrite an OED or a TLF in order to computerize them, one does not have the right to do so for a dictionary of the past, unless one is to compose a new Johnson or a new Littré. The appropriate conceptual tool for the querying of informational fields in these cases is that of fuzzy searching, which has proved itself in the natural sciences, notably physics, and in technological applications. In brief, «fuzzy»2 here means: rather than spending an enormous amount of effort in order to obtain 100% of what one wants and no more, one does better and obtains practically the same results, with much less effort, by making do with a range of 95% to 105% of the theoretical total, the 5% of noise being easy to discard subsequently. Applied to the metalinguistic keyword base3, fuzzy searching has the great merit of offering the author of an early dictionary database a solution other than that of posing as the holder of an absolute truth through the imposition of exhaustive and systematic tagging. Dictionaries such as those of Estienne, Nicot, Ménage and the Académie contain sufficient structural fuzziness to justify an appropriate approach.

The occurrences of metalinguistic keywords in the Early dictionary base are linked to items in the Metalinguistic keyword base; in turn these latter allow one to extract all their tokens in the dictionary base. Thus the keyword apothicaires gives access to all of the occurrences of the field label concerning herbalist terms. Whereas the keyword apothicaires is precise -- there are exactly 106 occurrences in the corpus comprising the Dictionarium, the Thresor, the Grand dictionaire and the Acadé sample4 --, others are imprecise in the sense that they are high-frequency polysemous words whose metalinguistic use comprises at least 85% of their occurrences, which represents a little more noise to eliminate. I shall mention two cases, among others, characteristic of both Latin lexicography and French lexicography, the latter deriving from the former: the keyword aussi, in Latin etiam, articulator of multiple properties -- meanings, usages, spellings, etc. -- and dit, which in the great majority of its occurrences appears in the expressions on dit, se dit or dicitur, articulators of signs, signifieds and usage restrictions.

Note 2. Cf. Wooldridge, «Le flou en informatique textuelle», Texte, 13/14 (1993), 275-89.

Note 3. Cf. T.R. Wooldridge & I. Leroy-Turcan, «Metalinguistic Keywords as a Structural Retrieval Tool for Early Dictionaries».

Note 4. An occurrence of a referent label is considered to be an occurrence of a field label of the corresponding word when the word is given in the context.