Table 1: Sample thesaurus - hierarchical sequence
|
knitwear
> cardigans
> pullovers
outerwear
> blouses
> cardigans
> coats
> > raincoats
> dresses
> jackets
> > anoraks
> > blazers
> > dinner jackets
> > donkey jackets
> > reefer jackets
> leggings
> pullovers
> rainwear
> > raincoats
> shawls
> shirts
> skirts
> suits
> trousers
> > jeans
> > shorts
> > slacks
|
|
Table 2: Sample thesaurus - alphabetical sequence
|
anoraks
|
BT
|
jackets
|
blazers
|
BT
|
jackets
|
blouses
|
UF
BT
|
smocks
outerwear
|
breeches
|
USE
|
trousers
|
capes
|
USE
|
coats
|
cardigans
|
SN
|
knitted jackets
with front opening
|
BT
|
knitwear
|
outerwear
|
cloaks
|
USE
|
coats
|
coats
|
UF
|
capes
|
cloaks
|
overcoats
|
BT
|
outerwear
|
NT
|
raincoats
|
dinner jackets
|
BT
|
jackets
|
|
donkey jackets
|
BT
|
jackets
|
dresses
|
UF
BT
|
frocks
outerwear
|
duffel jackets
|
USE
|
reefer jackets
|
frocks
|
USE
|
dresses
|
jackets
|
BT
|
outerwear
|
NT
|
anoraks
|
blazers
|
dinner jackets
|
donkey jackets
|
reefer jackets
|
jeans
|
BT
|
trousers
|
jumpers
|
USE
|
pullovers
|
knitwear
|
NT
|
cardigans
|
pullovers
|
leggings
|
BT
|
outerwear
|
|
outerwear
|
NT
|
blouses
|
cardigans
|
coats
|
dresses
|
jackets
|
leggings
|
pullovers
|
rainwear
|
shawls
|
shirts
|
skirts
|
suits
|
trousers
|
overcoats
|
USE
|
coats
|
pullovers
|
UF
|
jumpers
|
sweaters
|
BT
|
knitwear
|
outerwear
|
raincoats
|
BT
|
coats
|
rainwear
|
rainwear
|
BT
|
outerwear
|
NT
|
raincoats
|
reefer jackets
|
UF
|
duffel jackets
|
BT
|
jackets
|
|
shawls
|
UF
|
wraps (clothing)
|
BT
|
outerwear
|
shirts
|
BT
|
outerwear
|
shorts
|
BT
|
trousers
|
skirts
|
BT
|
outerwear
|
slacks
|
BT
|
trousers
|
smocks
|
USE
|
blouses
|
suits
|
BT
|
outerwear
|
sweaters
|
USE
|
pullovers
|
trousers
|
UF
|
breeches
|
BT
|
outerwear
|
NT
|
jeans
|
shorts
|
slacks
|
wraps (clothing)
|
USE
|
shawls
|
|
Many thesauri have been created with the intention of being used to index documentary material, and thus they include many terms which relate to abstract concepts, disciplines and areas of discussion, as well as the names of concrete objects which are of primary interest to museums. We have to be careful to be consistent in how we use these terms. The most straightforward way is to concentrate first on what objects actually are - spades are Spades and should be given this term, rather than the area in which they are used, whether it is gardening or gravedigging.
You may well wish to allocate abstract and discipline terms to objects too, so that you can retrieve all the objects to do with Dentistry, Laundry, Warfare or Food preparation. These terms can also be included in the thesaurus, so long as they are not given hierarchical relationships to names of objects. They should be given RT relationships to an appropriate level of object terms.
Some thesauri, such as ROOT [published by the British Standards Institution in 1981], interfile terms of different types in their hierarchical display. Indentation in such cases does not necessarily indicate a BT/NT relationship. The relationships are shown in ROOT's alphabetical sequence, and it is unfortunate that they are not distinguished in the hierarchical one.
Because these abstract terms do not describe what the object is, they could be put into a field in the catalogue record labelled concept or subject, distinct from the field containing terms which name the object. I do not think that such a distinction will generally be helpful to users, however, and there seems to be no disadvantage in putting both types of term into a single field so that they can easily be searched as alternatives or in combination. Such a field would not be correctly called name and I therefore prefer to call it simply indexing terms or subject indexing terms.
There has been much discussion on whether thesaurus terms should be expressed in the singular or the plural. I believe that the difficulty arises from different views of what is being done when a term is assigned to an object record. If a cataloguer thinks that (s)he is naming the object in hand, (s)he will naturally use the singular: "This is a clock". If (s)he is assigning the object to a category of similar objects, the thought will be "This belongs in the category of clocks". An enquirer will normally ask for a category, so the latter form will be more natural and logical.
The point is not a trivial one, because as discussed in section 2 above there is a conceptual difference between naming or describing an object and grouping it with others so that it can be found. Both are essential steps, but an information retrieval thesaurus is primarily concerned with grouping.
Singular or plural terms?
|
The cataloguer thinks:
"This is a clock".
|
|
The enquirer asks:
"What clocks do you have?"
|
|
Prefer plural terms because:
· We should design the catalogue to fit the way the user thinks.
· Clocks is the name of a category, including many types,
so plural is more logical.
|
The British Standard for thesaurus construction [which has served as a basis for and has been superseded by ISO 25964] recommends that plural terms should be used, except for a few well-defined cases, and my view is that this practice should be followed. Unfortunately, there are many records in museum collections which have been given singular "object names", and the work of changing these to plurals in a move to a thesaurus structure may be so great as to require some compromise.
The British Standard recommends that when indexing parts or components, separate terms should be assigned for the component and for the object of which it forms part, so that aircraft engines would be indexed by the two terms Aircraft and Engines. This causes problems in a museum collection, however, because items indexed in this way would be retrieved in a search for Aircraft, when only whole aircraft were being sought. It therefore seems preferable to use a term such as Aircraft components. A particular engine may well be an aircraft component, but it is not an aircraft. Similarly a timer from a cooker can be indexed by the terms Timers and Cooker components, and a handle broken from a vase might be indexed as Handles and Vase fragments. There needs to be local agreement on how this approach is to be applied to a particular collection.
In the thesaurus, BT/NT relationships can be used for parts and wholes in only four special cases: parts of the body, places, disciplines and hierarchical social structures.
As shown in the sample thesaurus above, a term can have several broader terms, if it belongs to several broader categories. The thesaurus is then said to be polyhierarchical. Cardigans, for example, are simultaneously Knitwear and Jackets, and should be retrieved whenever either of these categories is being searched for.
With a polyhierarchical thesaurus it would take more space to repeat full hierarchies under each of several broader terms in a printed version, but this can be overcome by using references, as ROOT does. There is no difficulty in displaying polyhierarchies in a computerised version of a thesaurus.
A thesaurus is an essential tool which must be at hand when indexing a collection of objects, whether by writing catalogue cards by hand or by entering details directly into a computer. The general principles to be followed are:
1. Consider whether a searcher will be able to retrieve the item by a combination of the terms you allocate.
2. Use as many terms as are needed to provide required access points.
3. If you allocate a specific term, do not also allocate that term's broader terms.
4. Make sure that you include terms to express what the object is, irrespective of what it might have been used for.
If you have a computerised thesaurus, with good software, this can give you a lot of direct help. Ideally it should provide pop-up windows displaying thesaurus terms which the cataloguer can choose from and then "paste" directly into the catalogue record without re-typing. It should be possible to browse around the thesaurus, following its chain of relationships or displaying tree structures, without having to exit the current catalogue record, and non-preferred terms should automatically be replaced by their preferred equivalents. A cataloguer should be able to "force" new terms onto the thesaurus, flagged for review later by the thesaurus editor. When editing thesaurus relationships, reciprocals should be maintained automatically, and it should not be possible to create inconsistent structures.
As there are many thesauri in existence already, it is worth considering seriously whether one of these can be used before embarking on the job of creating a new one for a particular museum or collection. So long as the general principles are followed, you should be able to expand a thesaurus to give you more detail if you need it, or truncate some sections at a high level if they contain more detail than your collections justify. So long as the relationships are universally true, it should be possible to combine sections of thesauri developed by different museums and thus avoid duplication of work.
Even when using an authoritative thesaurus, some care is needed, and I have mentioned some limitations of ROOT and AAT in 7.1 and 7.4 above. It is still much easier to base your work on something like these than to build your own from scratch, unless you have a very specialised collection.
Someone has to be responsible for this. New terms can be suggested, and temporarily "forced" into the thesaurus by cataloguers as they catalogue objects, but someone has to review these terms regularly and either accept them and build them into the thesaurus structure, or else decide that they are not appropriate for use as indexing terms. In that case they should generally be retained as non-preferred terms with USE references to the preferred terms, so that people who seek them will not be frustrated. An encouraging thought is that once the initial work of setting up the thesaurus has been done, the number of new terms to be assessed each week should decrease, and many systems have operated successfully in the past with printed thesauri, which are quite difficult to keep up to date.
A thesaurus is not a panacea which will meet all subject retrieval needs. It is particularly appropriate for fields which have a hierarchical structure, such as names of objects, subjects, places, materials and disciplines, and it might also be used for styles and periods. A thesaurus proper would not normally be used for names of people and organisations, but a similar tool, called an authority file is usually used for these. The difference is that while an authority file has preferred and non-preferred relationships, it does not have hierarchies.
[Authority files and thesauri are two examples of a generalised data structure which can allow the indication of any type of relationship between two entries, and modern computer software should allow different types of relationship to be included if needed.]
A thesaurus is an essential component for reliable information retrieval, but it can usefully be complemented by two other types of subject retrieval mechanism.
While a thesaurus inherently contains a classification of terms in its hierarchical relationships, it is intended for specific retrieval, and it is often useful to have another way of grouping objects. This may relate to administrative distribution of responsibility for "collections" within a museum, or to subdivisions of these collections into groups which depend on local emphasis. It is also often necessary to be able to print a list of objects arranged by subject in a way which differs from the alphabetical order of thesaurus terms. Each subject group may be expressed as a compound phrase, and given a classification number or code to make sorting possible.
It is highly desirable to be able to search for specific words or phrases which occur in object descriptions. These may identify individual items by unique words such as trade names which do not occur often enough to justify inclusion in the thesaurus. A computer system may "invert" some or all fields of the record, i.e. making all the words in them available for searching through a free-text index, or it may be possible to scan records by reading them sequentially while looking for particular words. The latter process is fairly slow, but is a useful way of refining a search once an initial group has been selected by using thesaurus terms.