DAM and the Tao of Taxonomy

This digital asset management webinar features David Riecks speaking about the advantages of the proper use of taxonomy and controlled vocabularies in DAM software.

DAM and the Tao of Taxonomy

Digital Asset Management Webinar

Learn about DAM taxonomy, controlled vocabularies and how they work together.

RECORDING DATE June 5, 2013

GUESTS David Riecks (owner of ControlledVocabulary.com)

HOST David Diamond

Want to continue the discussion? Visit the YouTube page for this digital asset management webinar and leave a comment. You can reach David Riecks at http://controlledvocabulary.com and Twitter (@davidriecks).

Webinar Questions and Answers

A number of questions came in during the webinar that could not be answered due to time constraints. Those questions, along with answers from our panelists, can be found below.

Why can’t you use Unicode to address multilingual issues?

(David Riecks) Unicode is a standard that unifies various character sets (such as those used in different languages) which are used when storing text files that contain characters above and beyond your standard a to z (and A to Z), and 0 (zero) to 9 characters. Unicode encoding formats (like UTF-8, or UTF-16) are used for storing the text file used to represent the simple “keyword catalogs” such as those used by Adobe Lightroom or Bridge, Photo Mechanic, Expression Media/Media Pro, Aperture and others.

While Unicode allows you to represent and store these extended characters, like â è í ö æ or others, it doesn’t provide any additional facilities that would allow you to have someone type in “perro”, and have the system know that this means “dog” in Spanish. Within XML (and XMP), there are “Language” attributes used to indicate which language was used. However, storing multiple languages within the same field isn’t a good idea, as nearly every field of which I’m aware, only allows for one language to be referenced for that particular field.

What is the difference between taxonomy and ontology?

(David Riecks) It’s not clear if you are talking about the strict definitions or are interested in the practical aspects, so I will attempt to answer both. In terms of a definition, lists of subject headings and thesauri are both typical examples of controlled vocabularies, but these may also be referred to in a general sense by other names such as ontology, or even taxonomy.

A Controlled Vocabulary becomes a taxonomy when it is organized hierarchically to show the parent/child relationships between terms using a broader term (BT) and a narrower term (NT) for each entry. The classic use of the term taxonomy is typically applied to the field of biology, where organisms are classified starting with “Kingdom”—the broadest or least specific term—working down to “Species” which are the narrowest or most specific terms.

Broader Term

Kingdom

Phylum

Class

Order

Family

Genus

Species

Narrower Term

A thesaurus is a collection of terms that contains a classified set or list of synonyms and related terms for a given word or description. Unlike a taxonomy, these may be polyhierarchical and can imply complex relationships beyond just broader and narrower terms, as well as indicating related terms and those which are preferred for use.

An Ontology has both a philosophical definition as well as a more information technology oriented meaning which is more suitable in the context of Digital Asset Management. An ontology shows the relationships, properties and functions between terms or concepts. Unlike a taxonomy, you can express a wider range of relationships between attributes or terms with an ontology than with a simple hierarchy. This can be very useful when attempting to represent complex or multi-faceted relationships.

That said, most systems which I’ve used only allow the use of simple hierarchical lists of terms. Some of the more advanced DAM applications allow you to designate synonyms, or terms to be excluded when exporting the keywords. But that is it. So in the grand scheme it’s mostly a semantic question at this point in time from a practical standpoint.

What is the best option when dealing with metadata in XLS, PPT and DOC files and then importing into a DAM software?

(David Riecks) In most instances, you first need to understand what types of metadata your DAM supports. Many of those DAM systems that are geared for images may not support these other file formats (spreadsheet, powerpoint and word) at all. Others may read the information embedded in these other file formats if you enable additional scripts, or features within the program. Some of the newer DAM systems may allow the use of an XMP sidecar (a small text file separate from the actual asset that has the same filename, but ends in .xmp) and will import the contents of that sidecar file on ingest.

We have a system that uses scientific names and common names for animals that occasionally get changed, any ideas on the best way to incorporate the old term with the new one?

(David Riecks) The scientific community does make changes (usually after much deliberation) in how flora and fauna are classified over time. I’d suggest first starting with your users and seeing which version of the term they would use when searching for such subjects. Not everyone will be up on the latest literature, and many may still continue to use the old name for some time after it’s been officially changed. In most instances, it’s probably better to include both the old and new names for some time (if not forever). At minimum, I’d recommend that you make a note in the Caption/Description about when the name change occurred, and include that old name in that field for reference. This way users could still find that subject when they do a Caption search, even if it wouldn’t show up in a search limited to the keywords or category fields.

I’m looking at the keywords from an international point of since I’m from Finland, Europe. We have two official languages here, and here where I live up in Lapland, three official languages more. So we are very multilingual and dealing with that is everyday business here! I suggest that everyone should use their own mother tongue for keywords. The language you are the most familiar with. And let the translation systems to appear to deal with multilingual systems. What do you think? When would those appear?

(David Riecks) For larger ‘enterprise’ systems, this is often the best way to go. However, not all systems will allow for this approach.

I don’t recommend, commingling words from different languages in the keyword field as this leads to more problems. After all, the people using your DAM might think it odd to find images of boats when the keyword term they typed in was “Boot”, simply because the system allowed someone to enter both German and English terms.

To ‘wrangle’ your vocabulary, a taxonomy management software application is a great time saver. I use MultiTes and it saves days/weeks of organizing.

(David Riecks) When I was talking about “text wrangling” I was referring to the need to modify the information from some of the examples listed on the Examples page (http://www.controlledvocabulary.com/examples.html) of the ControlledVocabulary site or Taxonomy Warehouse. The primary limitation with most of the professional applications that I’ve used has to do with being able to import the information from these sources and then export that modified data into the specific DAM or Catalog application I’m using. Most of the systems I use (Adobe Bridge, Lightroom, Photo Mechanic, Expression Media) only can deal with fairly simple Unicode encoded tab delimited text files. Hence the need for “wrangling” that data from an unknown/unspecified format, into one that can be used.

From what I see, it looks like MultiTes can export out into XML, or HTML, most likely with standard ANSI/NISO relationships (USE: Use Preferred Term, UF: Used for, BT: Broader Term, NT: Narrower Term, RT: Related Term, SN: Scope Note). This could be very useful, if your DAM system supports that format. My experience has been that the vast majority, however, do not. If the XML or HTML format could be finessed into a format that could be imported without much effort it might be useful. If you are creating a hierarchically arranged set of controlled vocabulary terms, many of the DAM or Catalog applications I mentioned above have the ability to create hierarchies within their own software. For instance, you can add new Top level keywords, or keywords at a level below one at a higher level. This feature alone may be sufficient if that is where you intend to house your “taxonomy.”

It looks like MultiTes is several hundred dollars and requires an annual maintenance fee (http://www.multites.com/purchase.htm), so that might be beyond the reach of many potential users. Some tools are emerging now that work with Simple Knowledge Organization System or SKOS (http://en.wikipedia.org/wiki/Simple_Knowledge_Organization_System#Tools) so that might be another option for others to pursue.