Behind the scenes in the movie industry is where much of the most important action takes place. Makeup artists, wardrobe experts, voice experts, choreographers, and more work together, so that in the final production, the actors can shine.

Behind the scenes of our thesaurus is the true workhorse, the indexing language. How did we create the controlled vocabulary that comprises our thesaurus? What rules did we follow? What relationships did we discover?

Join us for a tour behind the scenes. We might even share a few of our outtakes!

METHOD

We used a combination of deductive and inductive methods in the development of the corset thesaurus, as per 8.3.3 in the ANSI/NISO Guidelines for the Construction, Format, and Management of Monolingual Thesauri (herein called the Guidelines). The terms were gathered from a wide selection of historical costume books, general encyclopedias, articles, and contemporary websites. Terms were vetted, incorporated into an overall hierarchical structure, and thesaural relationships and features were added. A thesaurus expert (Susie) reviewed our work.

The corset thesaurus is not exhaustive. It reflects only that which pertains to the current movie production, and will expand as further modules are developed in harmony with the greater plans of MMM.

SUITABILITY

We have tailor-made this thesaurus for the users and conditions delineated in our contract with Modern Movie Megacorp (MMM). In the end our thesaurus will be integrated in to the MMM Thesaurus.

First of all, the corset thesaurus must reflect the fact that it will be just one small component of a much larger merchandise thesaurus. This means that careful consideration must be given to whether or not homonyms require qualifiers in the context of what other items are likely to be sold. In most cases this meant including the qualifier. Fortunately, the result is a much more adaptable module.

We also took into account the fact that our employer has plenty of money to invest in a top-notch thesaurus. As such, they can afford to have professional indexers deal with the long-term upkeep of thesaurus, as well as index the ready to wear corsets and pre-packaged corset styles. While this will of course be more expensive for the company, it will be more efficient in the long run, as less direct indexing instruction need be applied. It also supports a much higher level of thesaurus complexity.

The ability to support a large and complex thesaurus is a great boon, since that is precisely what is demanded by the varied nature of our user groups. In addition to the indexers mentioned above, we also have seamstresses, who normally use the terminology native to their profession, and will require a great amount of detail in the indexing in order to understand precisely what the characteristics are of the corset they are about to make. We also have layperson web designers and telephone representative taking calls from corset consumers, who may have little or no familiarity with corset terminology, and so will require extensive scope notes to explain these matters to the clients. We also have two classes of consumers themselves - those who have seen Dressed to Kill: The Merry Widow Project and are therefore familiar with Victorian terms, and those consumers who have not seen the movie but would like to buy a corset anyway, and who will likely know only modern corset-related terminology. In addition, consumers are unlikely to know what options are available and may request an option or search for a term that does not exist, and so these must also be anticipated and included in the thesaurus. The color 'pink' for example is not an available (historically correct) corset color. Any request for 'pink' will redirect the searcher to the 'basic colors' hierarchy which is already included in the MMM Thesaurus (as it would be a related term in the 'corset colors' hierarchy), where they may select from the colors that are available.

Clearly, a thesaurus is necessary here to translate each user group's terminology into all the other's terminology, such that the indexer may index a corset that means the same thing to the telephone representative talking to the consumer who may or not have seen the movie as it does to the seamstress who will make said corset in the end!

All this means that for each concept, possible user terms must be gathered for each of these user groups and incorporated into the final thesaurus. This will result in a more expensive, more complex, larger thesaurus. Efficiency, however, is also a concern. Just because our employers have money, there is no call to make the thesaurus any larger or more complex than absolutely necessary to meet the needs of our users. To that end we threw out all and any terms that did not fulfill at least one of the above needs.

Our users will also interface with the thesaurus in electronic format only, which will make it much quicker to use. This could mean an even larger list terms, however. Since users will be unable to browse for the next closest alternative to a requested term, we would have to provide every possible form of every term (e.g. corset and corsets). One possible solution to this problem would be to give a reminder to always use plural forms, but unfortunately, between the different types of fabric and the body parts measurement module there would also be plenty of singular forms to make this unsatisfactory. We have instead solved this problem by specifying to MMM that they must provide truncation-capable search software, along with an ever-present note to searchers saying what truncation is and how best to employ it.

Fortunately, the one thing all our user groups have in common is some involvement with corsets, and in that respect, our thesaurus is quite suitable indeed.

Top of Page

TYPE OF INDEXING LANGUAGE

Free language indexing would be just that, free, and TOB&B would be unlikely to make much money in that business! That leaves us with natural and controlled languages. Natural language is derived from the item itself, and only applies to verbal or textual documents. Corsets are not textual documents. There are no words written on them, except perhaps a 'care and cleaning' tag … which we have no reason to index whatsoever. MMM will not be monogramming the corsets, nor will they be using fabric that has written words in the pattern. The corset module, then, does not reflect natural language indexing. It may in the future, if any of these exceptions becomes relevant. Also, the larger thesaurus in which the corset module resides may itself incorporate natural language, though this depends entirely on just what they decide to sell. It is, however, a controlled language.

PRE- VS. POST-COORDINATION

"What is this object? Why, it's a corset!"

Because the objects in question need to be indexed to some depth for the indexing to be of use, each will require the application of more than one term. This brings up the question of how those multiple terms will be applied - as one great long string, or individually. That is, will the indexing be pre- or post-coordinate? While this seems like a simple question, it is quite a bit more complex than it seems.

MMM only carries a few actual pre-set corsets for sale. It would be quite easy, if somewhat lengthy, to string together a pre-coordinate description for each one and be done ("Why, it's a scarlet-sateen-divorce-corset-with-steel-diagonal-boning-and-a-spoon-busk!"). However, in order for a user's search to be successful it must match a string the indexer created, but since MMM also makes custom corsets, there are an almost infinite number of items to index that must be indexable before they ever exist. This means that every possible combination of the terms in the thesaurus must be strung together in the absence of any actual request for the (not yet existing) item, resulting in a ridiculous amount of conceptual redundancy in the resulting catalogue.

Rules for pre-coordination are intended to improve relevance in retrieved items by making it clear what a combination of terms means. Fortunately, an item with the following descriptors - scarlet, sateen, divorce corset, steel, diagonal boning, spoon busk - is unlikely to be misconstrued as anything else! (Especially when laid out in a web-based order forms with labels indicating what each option applies to.)

In the end, pre-coordination is more time-consuming to do, and there seems not to be any compelling reason to do so. Post-coordination is easier to do, more flexible and more efficient, especially when adding new modules such as an evening dress module. Therefore, we post-coordinate. And since the indexing is post-coordinate, the retrieval cannot be both pre- and post-coordinate, but post-coordinate only.

Top of Page

FORM OF TERMS

Single vs. Multiword Descriptors

The corset thesaurus includes both single and multiword descriptors. According to section 3.1 of the Guidelines, each descriptor selected for inclusion in the thesaurus must represent only one concept, though more than one word may be necessary. The restriction to a single indexable concept is absolutely necessary to the concept of a controlled language, and as such we have followed the Guideline here. While we also endeavored to keep multiword descriptors to a minimum, several proved to be necessary. Our treatment of those terms is discussed below under the heading 'Compound Descriptors'.

Scope of the Descriptors

We limited the scope of the descriptors to those meanings within the domain of the thesaurus (keeping good company with section 3.2 of the Guidelines). The potential domain of the thesaurus as a whole (not just our module) however, is great. As a result, several orphan terms entered the thesaurus as alternative potential meanings of the homographs we had to modify with parenthesized qualifiers. This practice is also recommended in the Guidelines in section 3.2.1. For example, 'amber' gained the qualifier '(color)', as it was anticipated that 'amber (fossilized resin)' would be in the larger MMM thesaurus, as they may also market amber jewellery.

We also included many scope notes, in particular where we felt a descriptor was a sufficiently uncommon term that one or more of our user groups would not know its meaning. After all, most indexers are unlikely to also be corseters! As section 3.2.2 of the Guidelines suggest that scope notes can be used to give advice on term usage, this appears to be an acceptable practice.

Types of Concept

Section 3.3 of the Guidelines discusses various types of concepts that descriptors may represent, several of which are included in the corset thesaurus. For example there are 'things and their physical parts' ('corsets' and 'corset components'), materials ('steel') and disciplines ('corsetry'). There are also unique entities ('Nicolette Kidman'), as discussed in section 3.3.1. Normally these would be expressed as proper nouns, with the first letter capitalized and the rest lower case. As discussed below in the 'Capitalization' section, however, in the corset thesaurus they are represented either in all capitals or all lower case.

Grammatical Forms of Descriptors

As represented in section 3.4 of the Guidelines, we have endeavoured to format as many descriptors as possible as nouns. Alternative formats, where applicable, were included as entry terms.

Singular vs. Plural

The 'count vs. mass nouns' issue seems like such a simple proposition when dissolved down to the basic rule: how much or how many? Section 3.5 indicates we should decide which applies, pluralize the count nouns and leave the mass nouns singular (and then immediately proceeds to list exceptions).

Sometimes, however, the answer is both! Take the word 'fish'. I know there are no fish in our corset thesaurus, but 'bear' with me … when one is talking about different types of fish, one says 'fishes', as in 'how many fishes do you carry?' When one is talking only about one type, one says 'fish', as in 'how much fish would you like?' Suddenly our rule is of no help! Well, it appears the same holds true for 'fabric' and 'color'. And with the guidelines offering so many exceptions … and so many different user groups to consider … in the end we simply took a vote within TOB&B to go with the plural form, and included the alternate forms as entry terms.

Preferred vs. Non-preferred Descriptors

Such varied users means that the very concept of choosing preferred terms based on user warrant, as recommended in section 3.6.1 of the Guidelines, is impossible. It is unlikely that all these groups will agree on any, never mind most, terms. Fortunately, MMM's marketing strategy solves this problem. Their goal of course is to sell as many corsets and movie tickets as possible, and part of their strategy is to inspire a Victorian craze. What better way to enhance consumers' 'Victorian experience' then using authentic Victorian terminology?

We used American spellings since MMM is an American based company employing largely Americans and selling those movie tickets and corsets to a largely American consumer base. We did however include the alternate British and Canadian spellings as entry terms in accordance with section 3.6.2.

Though the thesaurus uses American spelling, we have also included several preferred terms of non-English origin. These include 'corsets spécialité' and 'corsets callisthenic', and were included to further enhance the above-mentioned 'Victorian experience'. While section 3.6.7.1 suggest we use words from other languages when they are commonly accepted, it says nothing about what terms you may include if you are intending to manipulate what is commonly accepted.

MMM intends to make available a few pre-set corset styles - those that are worn by the stars in the movie. As such, the characteristic of the star association is an important element to include in the indexing of those items, as consumers will likely want to search by the stars' names. That being said, it was necessary to include the stars' names in the thesaurus. Fortunately, only a select few of the stars in the movie were actually wearing corsets (er, I mean, the rest of the stars were men), so the number to include was limited. However, in accordance with section 3.6.8 of the Guidelines, we did see fit to include variant forms of the stars' names as entry terms. This included both their real names and the names of the roles they played in the movie.

Top of Page

Capitalization

Section 3.7 in the Guidelines essentially recommends that one use lower case throughout the thesaurus, except for the first letter of proper names. You may have noticed that this is simply not the case for the corset thesaurus. The problem with this suggestion is that it does not play nicely with the thesaurus management and creation software we used, MultiTes.

Or, more precisely, the display ordinances in section 6.3.3 invoke us to distinguish between preferred and non-preferred terms. While the Guidelines suggest we do this via bolding or italics, MultiTes does not support these. Our choices, then, were to format everything later in a word processing environment, or find another way to distinguish preferred from non-preferred terms. The problems with formatting later are that, first, it was going to cost MMM even more (they do have a budget), and second, as new indexers and thesauri creators, we needed to be able to distinguish the terms right from the beginning to understand what we are doing.

MultiTes does however support both capitalization and lowercase, and so we decided to use that to distinguish the non-preferred from the preferred terms. While this works fine from 6.3.3's perspective, it doesn't work at all from 3.7's perspective. We did however display the non-preferred proper noun with proper orthography.

In the end however, MMM is concerned with selling those corsets and movie tickets, and sadly, indicating proper orthography to its various user groups is not a major concern.

To appease the NISO Gods via section 3.7.2.2 we did however remove as many hyphens as possible. This left only two - the first was the 'S-curve corset', because we felt it would be confusing without the hyphen unless we added quotation marks, which of course would have thrown the sorting out of whack. The second was a star's name, which was a proper noun.

The other difficulty we encountered was due to the fact that MultiTes does not support italics, either. This meant that instead of distinguishing nodes or facets from descriptors using italics, we could rely only on the application of angle brackets. However, since some of our nodes required alternate forms as entry terms ('colours' for 'colors'), the preferred versions will remain lowercase to further distinguish them from descriptors.

Because MultiTes does not support diacritical marks (Section 3.7.2.4 & 3.6.7.1) we chose not to insert them into our thesaurus. As well, the typical user would not insert them in searching. Our search software supports our terms with or without diacritical marks.

Compound Descriptors

As mentioned above under the heading 'Single vs. Multiword Descriptors', Section 4.1 of the Guidelines permits multiword or compound descriptors so long as it represents a single indexable concept. We retained compound descriptors when splitting the two words apart would change their meaning, for example 'diagonal boning'. We also kept compound descriptors that represented a 'type' as opposed to a 'part', for example 'divorce corset'.

Top of Page

RELATIONSHIP STRUCTURES

The corset thesaurus features the syndetic structure of the three relationships found in section 5 of the Guidelines. Reciprocity is a feature of all three relationships, and may be either asymmetric, as in the case of equivalence relationships, and hierarchical relationships. Or it may be symmetrical as is the case with associative relationships.

EQUIVALENCE

Where there is more than one term in the thesaurus that expresses the same concept, a preferred term, or descriptor had to be determined. We chose our descriptors following literary warrant where possible, and kept in mind user warrant when choosing lexical variants and synonyms. This is in accordance with section 5.2.

Synonyms
The corset thesaurus features a number of types of equivalence relationships, as discussed in section 5.2.2 of the Guidelines, including some synonyms based on current or favoured terms, as well as common or slang nouns. For example:

5.2.2.e: current vs outdated terms
stays
USE CORSETS

CORSETS
UF stays

5.2.2.f: common nouns and slang or jargon
merry widows
USE CORSETS

CORSETS
UF merry widows

Section 5.2.3, discusses lexical variants. The term 'corsets' is found in many spellings in the thesaurus, as through history the spelling has varied. As the thesaurus focuses on nineteenth century corsets, we included some of the historical spellings, but for user warrant chose the most current spelling to aid in recall for searching. We offer the lexical variants such as 'coursettes' as a reciprocal USE/UF relationship.

coursettes
USE CORSETS

CORSETS
UF coursettes

HIERARCHICAL

The hierarchical relationship is discussed in section 5.3 of the Guidelines. This relationship is the one that distinguishes a thesaurus from a glossary, or list of words. There are two levels in this relationship - superordinates or broader terms (BTs) and subordinates or narrower terms (NTs). Of course in a corset thesaurus Narrower Terms can have more than one meaning! The hierararchy may extend to many levels.

As discussed in sections 5.3.1 & 5.3.2, both generic relationships and whole-part relationships are present in the corset thesaurus. In fact, the term 'corsets' includes both relationships in its narrower terms, with 'corset types' and 'corset components'. However, 'corsets' are not one of the listed instances where a whole-part relationship is recommended, and so we have used thesaural licence here. Section 5.3.2 does however explicitly state that "the four types enumerated below are not intended to be exhaustive", so it is unlikely that we will be dragged away by the thesaurus police anytime soon.

As discussed in section 5.3.5, we incorporated node labels to bring together sibling terms, such as <corset types>. We enclosed these in angle brackets, and they will not be used as descriptors in indexing.

ASSOCIATIVE RELATIONSHIPS

This symmetrical relationship, covered in section 5.4, links terms that are not equivalents, nor hierarchical in nature, yet are conceptually or semantically linked in such a manner that the relationship should be noted.

For example:
5.4.2 a: Discipline and objects studied
CORSETRY
RT: CORSETS

Top of Page

PRECISION AND RECALL

The corset thesaurus uses a number of features to aid in precision and recall.

We used a controlled vocabulary rather than natural language or free language to increase the precision and recall. Our hierarchical relationships extend to as many as 5 levels that aids in precision.

We have included many scope notes to clarify terminology. Because we had such a heterogeneous user group we felt it was important to clarify any terms that could be unfamiliar to our users. This way indexing terms are more likely to be applied correctly, increasing precision.
For example:

BUSKS
SN: Piece of wood, whalebone, ivory, horn, or steel slotted into the front of the corset to hold the torso erect.

Homographs within the context of the thesaurus are differentiated by qualifiers to aid in precision and to aid in the integration of the thesaurus into the MMM thesaurus.

For example:
horn (animal)

For the purchasers of corsets, precision and recall are controlled by the organization in the order form. Drop-down menus for each component and variable will specify choices. There is an inverse relationship here. The higher the precision, or number of components specified, the lower the number of hits or recall. Conversely, purchasers who explicitly choose only a few components will have a higher recall or number of hits (corset choices), but lower precision.

A link to the thesaurus will be available on the order page for users to read scope notes. Access to these scope notes should help increase precision from the searcher's perspective.

SPECIFICITY AND EXHAUSTIVITY

These two areas relate to the detail and depth of vocabulary for the domain of our thesaurus. How precisely can the user describe a concept or item? How many terms are available for a specific concept?

The corset thesaurus covers a very narrow domain, and within that domain, specificity and exhaustivity at this time varies. For example: both bodice materials and colors are offered in many varieties and historical colors, as we felt these very visible details would be of primary importance to buyers. This held true for busk materials as well, as this is a special feature that we felt users would want control over. But lacings and lining are not covered in any depth as users will have no choice for these less visible components.

Color was a thorny issue. The level of specificity and exhaustivity for color was a concept that caused us some difficulty. Since this module is predicted to be the first of many modules for merchandise, all major colors in the color wheel should be represented. However, the basic colors are already included in MMM's Thesaurus, so only historically accurate colored corsets were included in our thesaurus.

 

||| |||

||| ||| ||| ||| ||| ||| r