![]() |
|
The thesaurus is based on the guidelines of the ANSI/NISO Z39.19-2005 standard (hereafter referred to as Z39). Suitability: The Daily Grind Thesaurus (DGT) meets the needs of our anticipated users: both the indexers and end-users. It facilitates the ordering process at our coffee stores by providing the controlled vocabulary used for indexing records of coffee samples and blends. The customers use the DGT descriptors in order to retrieve terms for the coffee properties they wish to order. Coffee tasting terminology tends to be very ambiguous so it is necessary to have very specific vocabulary to capture the many possible nuances of coffee characteristics. Since the end users are a fairly heterogeneous group of consumers, lay-terms were chosen over technical vocabulary. Even though some of the far-flung customers who enjoy gourmet coffee may be coffee connoisseurs well acquainted with subtle concepts, we selected consumer-oriented controlled vocabulary sources instead of the Specialty Coffee Association of America's (SCAA) colour wheel for professional tasters. The result was an indexing language accessible to a diverse group: (1) Indexers:
(2) Users:
(3) Corporate:
Since we want the DGT to help our patrons find the words to describe the "perfect cup of coffee," our thesaurus is relatively flat; we have many entry terms, and few preferred terms:
By guiding people to dark roast as our preferred term, we are facilitating the ordering process because we have helped people find a term that will put them closer to the coffee that they want, while removing doubt about whether or not the other entry terms are related. In the above example, we identified these terms as synonyms (section 5.3.2) for dark roast. As one may have determined from the example of dark roast, we have developed a faceted analysis of coffee, meaning that we created a thesaurus that "takes a bottom-up approach, forming areas of knowledge after first having pieced together their parts and determining the areas of knowledge they form . . . (section 5.3.4). Given that there are so many ways to describe coffee, a faceted analysis will help bring our clients to a resolution about the most likely term that matches the description of coffee that they have in mind. Type of indexing language:DGT consists of controlled vocabulary only but the database can be searched by keyword. We encountered many natural language terms that were ambiguous (section 5.3.1) but we incorporated them into our controlled vocabulary:
As the "Annotated Bibliography" demonstrates, we adhered to Z39 standards, specifically where it says to avoid duplicating existing vocabularies (section 11.1.1). After consulting a variety of sources on coffee, including books, journal articles, web sites, associations, and trade publications, we were unable to find a thesaurus equal to the DGT that was freely accessible for the purposes of this assignment. Post-coordinate retrieval: The DGT terms represent single concepts allowing for post-coordinate retrieval. However, where possible, the form of term has been selected to reduce complexity or the need for post-coordinate retrieval. For example, the thesaurus uses noun phrases rather than single adjectives to eliminate potential false drops (section 6.4.1):
Forms of terms: One indicator of the importance of an item in the lives of human beings is the number of words used to describe it. Acknowledging the need for the thesaurus to organically grow, and realizing the speed at which new words come into use (e.g. "podcasting" was named "word of the year" after less than 2 years in use), we have a "Suggest a Term" box on our web site that allows users to suggest a descriptor. Slang and colloquialisms describing coffee spread quickly, but take time to work their way into books and articles (i.e. sources of literary warrant), so this box is one way to help our users assist us in developing a better tool. Warrant: Most importantly, we were mindful of how we chose our terms. We had sources that were examples of user warrant (e.g. web sites), literary warrant (books and articles on coffee), and organizational warrant (e.g. associations and trade organizations). Also, we encountered some terms that seemed to be used equally according to our sources, so we used a Google search to break the tie; we selected the more popular term, which we feel is in line with the Z39 standard (section 6.6.1.1). Although the DGT was designed for the average consumer, and user warrant was important, we know that our clients use many terms for describing coffee, and some of our clients may know a lot about coffee. Thus, we relied a lot on literary warrant to collect as many terms as possible (section 5.3.5.1). By relying on literary warrant, we have introduced some stability to our thesaurus, since it takes time for new terms to be introduced in glossaries and indexes, and a definition is often provided next to terms in a book, article, or web site. Using focus groups, or interviews, may be the next step to see if we have missed any relevant terms, but we wanted to keep our costs down. Nouns and noun phrases Aware that the grammatical form should be a noun or noun phrase (section 6.4.1), we opted for premodified noun phrases:
By using noun phrases we can increase the probability that a novice user knows what the term means. While the thesaurus allows the use of single adjectives (section 6.4.2) where the meaning of the noun is obvious from the context, we chose noun phrases as they are more user-friendly. Including the noun in the term provided term-level precision since delicate taste is more exact. In addition, it would have been acceptable to exclude the noun if users would only use the adjective term to search for one concept but many adjectives such as winey refer to more than one aspect of coffee -- taste and aroma. Noun phrases also eliminate complexity for the user. To limit retrieval to a particular use of winey [taste or aroma] the user would have to do post-coordinate retrieval which requires more effort and increases the chance of false drops. Using noun phrases helps avoid potential false drops (section 7.3.d).
There
were cases where we used only nouns:
Count Nouns & Mass Nouns We debated over count nouns/mass nouns and abstract concepts (section 6.5.3.1). We considered "roasts" to be either a count noun or an abstract concept. By treating it as an abstract concept, which we liken to properties, we followed the standard by keeping the term singular:
Capitalization: Most of the terms in the DGT thesaurus are lowercase because this conforms with the Z39 standard (section 6.7.1):
Proper names and trade names use capital letters:
Hyphens: We removed hyphens were permissible (section 6.7.2.2), but we had to keep them where the OED noted that the hyphen word was correct:
Neutrality
Adhering to the Z39 standard, we checked our terms for neutrality:
Loanwords: We also included some loanwords (section 6.6.6.1) in our thesaurus:
Equivalence: Preferred term was indicated by USE and UF (Used For) (section 8.2.1). For synonyms and near-synonyms:
UF: city roast UF: full city roast UF: regular roast . . .(etc.) Lexical variants: based on the OED, we selected the shorter term as a preferred term with other variants as entry terms. We considered using the number of Google results to decide what variant spelling would be our preferred term, but this would have meant having some variant spellings as preferred terms ending with "ey" and "y." The numbers in brackets indicate the results of a Google search with the term:
Hierarchical: Used mainly to indicate generic nature of terms (section 8.3.1)
Associative: where terms had a conceptual relationship, we used RT for "related term. In the case of acidity, we had a relationship within the same hierarchy, and a relationship outside of it:
RT: tart taste (i.e. secondary taste characteristics) (section 8.4.2) RT: winey taste (i.e. primary taste characteristics) (section 8.4.1) Display:
The indexer's display will provide more details than the end-user's search interface display. For example, it is intended that DGT indexers will indicate the history of the terms but this will not be displayed on the user's display. Precision/Recall: Scope Notes: To assist the novice consumer, we used scope notes to define terms that have other meanings outside of our thesaurus (section 6.2.2). Given the abundance of terms whose meaning is not self-evident to the average consumer, the use of scope notes ensures that the thesaurus is user friendly:
SN: Without off-flavour (Glossary of Coffee Terminology) We also used reciprocal scope notes when a term in our definition referred to another term in our thesaurus (section 6.2.2.1):
SN: Acidy flavour is sharp and pleasing to the taste as opposed to sour or fermented. It denotes a taste that has sharpness and life compared to a sweet, heavy and mellow flavour Glossary of Coffee Terminology).
SN: Taste sensation created as acids in the coffee combine with salts to increase the overall saltiness (Glossary of Coffee Terminology). For the purposes of our assignment, we put in parentheses the source of the definition. Many definitions were quoted verbatim. Qualifiers: Where the homograph was a noun, we used qualifiers. (section 6.2.1), For example:
Ambiguity: We addressed adjectives that refer to more than one concept, such as light by incorporating the noun to eliminate ambiguity e.g. light roast. If we had opted to use adjectives as entry terms (which we did not) one option would have been to use qualifiers such as light (roast). This increases false drops because a search done using only the adjective retrieves all concepts that are qualified by that adjective (e.g. light feel, light acidity). Thus, the use of noun phrases minimizes ambiguity in our thesaurus (section 7.5.a and 7.5.b). Specificity and exhaustivity: As this module focuses on taste characteristics, the terminology for this domain is very specific with descriptors that represent concepts at a highly granular level. For example, the property of acidity is described by the narrower terms: low acidity, fine acidity and high acidity.
The thesaurus is not exhaustive because it excludes most "negative terms" (i.e. an undesirable characteristics of coffee), since we want to encourage our users to think of coffee that they want, rather than coffee that they do not want. Terms representing properties that could be either positive or negative, depending on an individual's preference, were included:
|
|
|
| © 2006 Jeremiah Saunders, Elisheba Muturi, Duncan Dixon |