Lemon-tree is an RDF vocabulary for capturing the content of lexicographical resources known as thesauri. These resources organize words and their senses according to a hierarchy that conveys their meaning. Lemon-tree specifies how thesauri can be captured in RDF, reusing existing standards: SKOS for capturing hierarchies, LEMON for lexical and lexicographic material. To that end, Lemon-tree offers a guideline in how these two models fit together and adds terminology for perceived lacunae.
A thesaurus is a special kind of dictionary. Dictionaries commonly employ an alphabetical ordering of their words and phrases. Thesauri, by contrast, organize their items according to their meaning. They do this by means of a topical structure: a tree of concepts, if you will. This overarching structure offers generic meanings to users as a starting point, which branch out to meanings increasingly specific. Once users locate the meaning which they are interested in, they are presented with the words or phrases that express that meaning. This overarching topical system in a thesaurus thus allows the user to move from meaning to lexical item. For further detail on this type of lexicographical work, and the distinction with other common senses of the word thesaurus, we refer the reader to Hartmann-2006 and Kay-2016.
Lemon-tree is an RDF vocabulary that bridges SKOS and LEMON in order to capture the content of thesauri. The SKOS vocabulary already allows for sharing concepts in RDF and organizing them in hierarchies. LEMON allows for sharing lexical entries, senses, and further lexicographic material. In this document, we will mainly employ its core module: Ontolex. Terminology from both SKOS and LEMON, then, are valuable for sharing thesauri on the Web. Lemon-tree therefore aims to facilitate their combined use for that purpose. To that end, Lemon-tree offers a guideline in how these two models fit together and adds some terminology for perceived lacunae (e.g., for categorizing senses and for expressing levels in the hierarchy).
The lemon-tree model employs the following namespace:https://w3id.org/lemon-tree#
The lemon-tree model can be retrieved in two ways:
Lemon-tree does not aim to supplant existing and well-known standards that may be used for sharing thesaurus content on the Web. Instead, the purpose of this model and its documentation are to provide a guideline and to offer additional terminology where appropriate. The need for such terminology has been identified in earlier publications, e.g. [Stolk-2017]. For further advice and recommendations, we refer the reader to the documentation on the existing standards that Lemon-tree adopts for thesauri: SKOS and the SKOS primer, LEMON and its Lexicography Module.
This document contains images depicting the content of existing thesauri. A legend to these images is provided below.
Examples of Lemon-tree content in this document are provided using the Turtle RDF Syntax. The following namespaces are used in these examples:
@prefix tree: <https://w3id.org/lemon-tree#> . @prefix ontolex: <http://www.w3.org/ns/lemon/ontolex#> . @prefix skos: <http://www.w3.org/2004/02/skos#> . @prefix xkos: <http://rdf-vocabulary.ddialliance.org/xkos#> . @prefix dcterms: <http://purl.org/dc/terms/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
The examples in this document are based on existing thesauri. Each of these works has its own content and purpose.
This selection was made to illustrate various characteristics of thesauri in this document. The order these thesauri are presented in here follows that in which they are used as examples in the following sections.
There are many ways to present or visualize a thesaurus. Even so, the information conveyed by each of them tends to consists of the same main components:
The figure below displays these main components for a sample of the Historical Thesaurus of the Oxford English Dictionary. The senses of four nouns are shown to be categorized under "Freedom/liberty" (of which those marked with a cross no longer exist). As these four senses convey the same meaning, they are thought to be loosely synonymous.
In a thesaurus, then, a word or phrase in a specific sense is located (or categorized) within a topical system, may be part of a set of synonyms, and is typically accompanied by additional information such as its part of speech and usage features. We discuss these main components in the following sections.
The topical system of a thesaurus is its overarching structure used to organize lexical items. This structure is not unlike the taxonomies of animals and plants created by the eighteenth-century biologist Carl Linnaeus (1707-1778) and later expanded by Georges Cuvier (1769-1832) [Faria-2013]. In these tree-like structures, the most generic or abstract concepts are used as roots, which branch out to concepts increasingly specific in meaning.
Topical systems in thesauri go by different names. Some thesauri refer to their overarching system as the classification system [LSM]. Others position their topical structure as consisting of categories rather than classes, suggesting that their lexis has been categorized rather than classified [HTOED, ScT]. For topical systems used to organize lexis, this document prefers the terms category and categorization over class and classification. For further information on the distinction between the two, see Jacob-2004.
Lemon-tree captures the topical system using terminology from SKOS. This standard from W3C was designed specifically for knowledge organization systems, including topical systems. Thus, the topical system as a whole would be captured as follows for the Historical Thesaurus of the Oxford English Dictionary.
<htoed> a skos:ConceptScheme ; skos:prefLabel "Historical Thesaurus of the Oxford English Dictionary"@en .Its category "Freedom/liberty" would be captured as shown below. This example includes the relation to the parent category "Lack of subjection". Note that categories are captured as SKOS Concepts.
<freedom-liberty> a skos:Concept ; skos:prefLabel "Freedom/liberty"@en ; skos:inScheme <htoed> ; skos:broader <lack-of-subjection> .As we will see further on in the document, it is possible to use a specialized variant of SKOS Concept when categorizing senses. This topic will be treated in the section Categorization and lexicalization.
In a topical system, much like in any tree data structure, it is possible to distinguish multiple levels. Each level is found at a specific depth. For thesauri, however, there tend to be two forms of levels. Their topical system, after all, is meant to capture meaning and can therefore be subdivided into both levels of the tree structure and levels of meaning. The former are known in Lemon-tree as tree levels; the latter as conceptual levels. The next two sections will discuss each in more detail.
A topical system of a thesaurus consists of categories that have been placed in a hierarchy. This hierarchical structure can be described using words for data structures known as trees. Each category in the hierarchy is a node in the tree, the nodes at the very top of the tree are called roots, and relations between nodes are known as edges.
Each node is positioned at a certain depth of the tree. Roots, part of the first tree level, are at depth 0; nodes positioned directly below a root are at depth 1; nodes directly below these are at depth 2, and so on. The figure below displays such tree levels for the topical system of Roget's Thesaurus, perhaps the most well-known thesaurus in existence. Categories displayed on the same dotted line are part of the same tree level.
Tree levels can, of course, be calculated from the position of each node in the tree structure. Even so, some communities find it worthwhile to capture this information explicitly, too. Indeed, terminology to represent tree levels can already be found in XKOS, a vocabulary that extends SKOS. In XKOS, each tree level is seen as a collection of categories, positioned at a specific tree depth. Lemon-tree therefore proposes to reuse this terminology in cases where capturing tree levels of a specific thesaurus is desired. (The terms TreeLevel, treeDepth, and treeLevels that were coined in an earlier version of Lemon-tree are as a consequence deprecated.)
Deprecated. Please use the term ClassificationLevel from the XKOS vocabulary instead.
Deprecated. Please use the term depth from the XKOS vocabulary instead. Please be aware that XKOS measures the depth slightly different from lemon-tree (i.e., roots are considered to be at depth 1 rather than 0).
Domain: TreeLevel ∪ skos:Concept
Deprecated. Please use the term levels from the XKOS vocabulary instead.
<treelevel2> a xkos:ClassificationLevel ; xkos:depth 3 ; skos:member <volition-in-general> ; skos:member <antagonism> ; skos:member <possessive-relations> . <rogets> a skos:ConceptScheme ; skos:prefLabel "Roget's Thesaurus"@en ; xkos:levels ( <treelevel0> <treelevel1> <treelevel2> ) .
This section will introduce the notion of conceptual levels by means of an example: Roget's Thesaurus. This thesaurus provides an outline of its topical system, which includes clear distinctions. All categories in Roget's are not simply known as categories, but go by the name of class, division, or section. Indeed, the topical system starts out with six of these classes, which may branch out into divisions which are more specific, and ultimately into sections. A sample of its contents that includes these names is shown in the figure below.
It is plain to see that the three types of category in Roget's act as a level of sorts. Classes convey the highest level of abstraction; sections convey the lowest. Intuitively, categories of a higher level of abstraction branch out only to categories of a lower level of abstraction. As a consequence, we do not find sections in Roget's Thesaurus branching out into classes or divisions.
These levels mentioned do not necessarily map one-to-one with tree levels. In the figure above, for example, both divisions and sections may be part of the 2nd tree level (at tree depth 1). Other thesauri, too, use similar notions to distinguish such levels that we will henceforth call conceptual levels [HTOED, LSM]. In the Historical Thesaurus of the Oxford English Dictionary, the first conceptual level consists of sections, followed by categories and lastly subcategories. Here, unlike in Roget's Thesaurus, a single category can branch out to categories from both the same conceptual level and one level beyond. A case in point is "Freedom/liberty". This is one of the so-called categories and branches out to a number of other categories (including "Independence" and "Liberation") but also to subcategories (including "Civil liberty" and "Moral freedom").
Lemon-tree offers terminology to express conceptuals levels, too. Although these levels are different from tree levels, the patterns in which we capture both are quite similar. This will be evident in the definitions below and the examples that follow them.
Domain: ConceptualLevel ∪ skos:Concept
<sections> a tree:ConceptualLevel ; skos:prefLabel "Sections"@en ; tree:conceptualDepth 2 ; skos:member <existence> ; skos:member <quantity> ; skos:member <volition-in-general> ; skos:member <antagonism> ; skos:member <possessive-relations> . <rogets> a skos:ConceptScheme ; skos:prefLabel "Roget's Thesaurus"@en ; tree:conceptualLevels ( <classes> <divisions> <sections> ) .
<categories> a tree:ConceptualLevel ; skos:prefLabel "Categories"@en ; tree:conceptualDepth 1 ; skos:member <freedom-liberty> ; skos:member <lack-of-subjection> ; skos:member <permission> ; skos:member <authority> ; skos:member <communication> ; skos:member <society> ; skos:member <independence> ; skos:member <liberation> . <htoed> a skos:ConceptScheme ; skos:prefLabel "Historical Thesaurus of the Oxford English Dictionary"@en ; tree:conceptualLevels ( <sections> <categories> <subcategories> ) .
A thesaurus contains lexical items that have been categorized, allowing users to go from meaning to words or phrases that express that meaning. Lemon-tree captures such words and senses using LEMON Ontolex terminology. A word or phrase is captured as an Ontolex LexicalEntry and each of its senses is captured as a LexicalSense. An example is presented below.
<entry-freedom> a ontolex:LexicalEntry ; skos:prefLabel "freedom"@en ; ontolex:canonicalForm [ a ontolex:Form ; ontolex:writtenRep "freedom"@en ; ] .
<sense-freedom-3> a ontolex:LexicalSense ; skos:prefLabel "freedom"@en ; ontolex:isSenseOf <entry-freedom> .For further details on the notion of LexicalEntry and LexicalSense, we refer the reader to the LEMON documentation. Advice on how to best capture other aspects of lexical items, such as their part of speech and other labels, can be found there too.
Thesauri do not categorize lexical items or word-forms but lexical senses: words or phrases in a particular sense. This statement may at first glance appear counter-intuitive for users of thesauri. After all, a number of these resources simply present head-forms of a word (or phrase) as member of their categories. In the Shakespeare Thesaurus, for instance, category “01.02 sky” contains the following item:
heaven, n.The head-form “heaven” in this example is similar in appearance to a headword, or lemma, found in typical dictionaries. This gives off the appearance that thesauri categorize lexical items. The following fictitious dictionary entry, however, demonstrates otherwise.
heaven, n. 1) abode of one or more gods 2) the skyIt is evident that the “heaven, n.” entry in the Shakespeare Thesaurus, found in the category “01.02 sky”, represents the lexical item heaven in not all of its senses listed above but in only the second sense. Werner Hüllen, who has thoroughly researched the topical tradition of thesauri, acknowledges that the entries in thesauri indeed represent senses rather than lexical items: [Hüllen-1999]
Strictly speaking, topical dictionaries [i.e., thesauri] have no headwords but head-forms as linguistic dummies for their meanings [i.e., senses]. Admittedly, a highly developed linguistic awareness is needed to keep this difference in mind when using a topical dictionary. Hence the humorous criticism that in order to work with such dictionaries you must be so highly educated that you do not need to consult a dictionary at all.Confirmation that thesauri categorize lexical senses can be found in the online edition of the Historical Thesaurus of the Oxford English Dictionary. This edition takes advantage of both the topical structure of the thesaurus and the full dictionary entries of the Oxford English Dictionary. This rich set-up allows for a closer investigation of the relation between a thesaurus and entries in a dictionary. Dictionary entries in the Oxford English Dictionary have a number of senses. Each sense listed contains a reference to a thesaurus category. Conversely, the thesaurus categories in this edition list the senses they contain and provide hyperlinks not simply to dictionary entries but to specific senses within these entries. As such, it is evident that this thesaurus indeed categorizes senses of lexical entries, and not lexical entries as a whole. In the next section, we will provide more detail on categorization and how to capture it using Lemon-tree.
There is something special going on in our main example thesaurus, the Historical Thesaurus of the Oxford English Dictionary. In this thesaurus, we see that words in a particular sense directly express their concept. They lexicalize that concept. In the figure below, "freedom" and "liberty" can directly be used if one wants to express "Liberty/freedom". This used to be the case for "freeship" and "franchise", too, in the history of the English language. (As this is no longer the case, these word senses are marked with a cross in front of them.)
Such lexicalization is not present in every thesaurus, however. In fact, it is more often the case than not in thesauri that it is absent. The sample below has been taken from the Scots Thesaurus and illustrates this lack of lexicalization. Here, the sense `to disperse scantily` of "blander" can hardly be said to directly express "Sowing". This is likewise the case for the sense `a basket or container` of "happer". These senses may have a relation to the concept of "Sowing" but they do not lexicalize that concept. Their meaning causes them to be listed as part of the concept instead, that they are senses in that concept, as it were.
Senses that lexicalize a concept are by definition senses also found in that concept. In other words, lexicalization is a special form of categorization. This distinction can be captured using Lemon-tree.
Examples of categorization and lexicalization in Lemon-tree will be provided after providing details on the relevant terminology. For categorizing lexical senses, Lemon-tree offers the property isSenseInConcept. For asserting that senses are lexicalizations of a concept, Ontolex offers the property isLexicalizedSenseOf.
The relation between isSenseInConcept and terminology from Ontolex has been added to the Lemon-tree model. As a result, the Ontolex property isLexicalizedSenseOf is deemed a sub property of isSenseInConcept. Moreover, the property evokes has an additional property chain of Ontolex sense followed by isSenseInConcept. These relations are shown below.
PropertyChain: ontolex:sense o isSenseInConcept
The examples below show how categorization and lexicalization can be captured by employing the properties mentioned above: isSenseInConcept from Lemon-tree and isLexicalizedSenseOf from Ontolex. Notice that the property to express lexicalization is used in the example of the Historical Thesaurus of the Oxford English Dictionary. There, the use of this property automatically indicates that the category is not only a SKOS Concept, but a concept that is expressed or lexicalized. Such a concept is called a LexicalConcept according to LEMON Ontolex.
<sense-happer-basket> a ontolex:LexicalSense ; skos:prefLabel "happer"@sc ; tree:isSenseInConcept <sowing> . <sowing> a skos:Concept .
<sense-freedom-3> a ontolex:LexicalSense ; skos:prefLabel "freedom"@en ; ontolex:isLexicalizedSenseOf <freedom-liberty> . <freedom-liberty> a ontolex:LexicalConcept .
It should be noted that, whenever definitions are available for senses, it is possible to make these definitions part of the topical system. After all, the topical system allows a user to go from meaning to items that express that meaning. A sense definition is just such a meaningful item. The snippet below shows the result of this practice when applied to The Scots Thesaurus. Here, an additional concept is added to the topical system. This concept represents the sense definition of "happer" and is lexicalized by this sense.
<sense-happer-basket> a ontolex:LexicalSense ; skos:prefLabel "happer"@sc ; ontolex:isLexicalizedSenseOf <a-basket-or-container> . <a-basket-or-container> a ontolex:LexicalConcept ; skos:prefLabel "a basket or container"@en skos:broader <sowing> . <sowing> a skos:Concept .Be aware that, when following this approach, one should ensure that multiple senses with the same definition lexicalize the same concept. Existing thesauri may not contain information for this additional level of categorization. (In fact, they often do not. Otherwise such thesauri could have already grouped their synonyms at concepts they lexicalize.) If sense definitions are to be part of the topical system in such situations, then, it may require additional effort to add the knowledge required.
Categories in a topical system group lexical senses into sets with a similar or related meaning. In some thesauri, though certainly not all, sets exist that indicate an even stronger semantic tie: one of synonymy. A case in point is the Historical Thesaurus of the Oxford English Dictionary, in which senses placed at the same category are deemed loosely synonymous. That is to say, grouped senses in this thesaurus have a similarity in meaning and are interchangeable in specific contexts. The introduction to the thesaurus Love, Sex, and Marriage discusses synonymy found in thesauri as follows:
Grouping terms together in a thesaurus, even in a thesaurus as detailed as this, does not imply absolute synonymy. Many scholars doubt whether absolute interchangeability is actually possible [...].Instead of absolute synonymy, then, it is common to find a looser form of synonymy in thesauri. (In fact, it has been called “the staple” of thesauri [Murphy-2016].) This form is referred to as near-synonymy [Murphy-2016].
In Lemon-tree, near-synonymy is evident for lexical senses that lexicalize the same concept. After all, such senses directly express the same meaning. Thus, all the senses that lexicalize category "Freedom/liberty" of the Historical Thesaurus of the Oxford English Dictionary are known to be near-synonyms.
<sense-freedom-3> a ontolex:LexicalSense ; ontolex:isLexicalizedSenseOf <freedom-liberty> . <sense-freeship-2> a ontolex:LexicalSense ; ontolex:isLexicalizedSenseOf <freedom-liberty> . <sense-franchise-1a> a ontolex:LexicalSense ; ontolex:isLexicalizedSenseOf <freedom-liberty> . <sense-liberty1-1b> a ontolex:LexicalSense ; ontolex:isLexicalizedSenseOf <freedom-liberty> .It is possible to link synonyms together also via a direct relation between LexicalSenses, or to form groups of synonyms known as synsets. For information and advice on both these aspects, we refer the reader to the LEMON specification.