Taxonomy, Metadata, Tags and Controlled Vocabularies
By David Diamond • Nov 10, 2016
Let’s review some simplified definitions:
- Taxonomy – A collection of terms that are organized into some structure that provides some semantic understanding of those terms. (For example, where “Turkey” falls beneath “Food” in a hierarchy, it’s presumed to be about the cooked dish, not the live bird, country or bowling triumph.)
- Metadata – Terms, numeric values or any other information that describes a piece of content’s status, location, physical attributes, etc.
- Tags – Terms that are assigned to content to help identify that content. Tags are often applied without semantic or hierarchical consideration.
- Controlled Vocabularies – A collection of standardized or agreed upon terms that are used to identify content.
Now take another look at those definitions. If you’re thinking they all sound pretty much the same, you are correct. A tag is just a term (word or phrase). A vocabulary is made up of multiple terms. A taxonomy is tree-like structure that provides some semantic understanding of the terms it contains. And metadata is the word we use to describe all of it.
Any software-specific differences that exist between these various metadata concepts come entirely from how each is used within the configuration and use of the system.
The Software System Perspective
When we look at these metadata concepts through the eyes of a content management interface, we gain some clarity on how each fits into the bigger picture.
The taxonomy tree
Many content systems include a tree-like structure that enables terms to be organized into hierarchies. Users can drag and drop content onto the terms (or vice versa) to make assignments. This is how most systems present taxonomies to users. It’s a fairly intuitive means for viewing and working with a taxonomy, but such a hierarchical representation of terms presents drawbacks too, which are discussed later.
This simple taxonomy includes branches (nodes) for Expertise, Industries and Languages. Some content systems enable you to use taxonomy nodes as input filters for metadata fields, as shown below.
Field filters as controlled vocabularies
When branches (or nodes) of a taxonomy are assigned as input filters to specific metadata fields, they serve as controlled vocabularies. When a field is configured as such, only those terms in the assigned branch can be entered into the metadata field; terms not found in the vocabulary branch are rejected.
The Languages node of the taxonomy shown above has been assigned as an input filter for this field. Only terms found in that branch of the taxonomy are accepted into the field; other terms would be rejected. As the user types “Ger…” the word “German” appears as a suggestion.
Some systems also enable users to create lists that are used in the same way: a list of terms is created (or subscribed to as an external resource), and the terms in that list become the allowable values for any metadata field to which the list has been assigned. In these cases, the lists might be completely disconnected from any taxonomy.
In some systems, users can simply enter terms that then become system-wide tags. The system remembers which tags have been used before, and it offers them as suggestions in the future. This could be considered an “uncontrolled” vocabulary in that it doesn’t restrict what can be entered, whether the values are accurate or not.
Some systems require that tags be provided by a node in the taxonomy tree or a predefined list of terms, essentially making the concept of controlled vocabulary input and tagging input identical.
The entire array of available metadata fields is considered the system’s metadata schema. In most DAMs, metadata schemas are static, meaning that the same set of fields is available across all content.
Adaptive metadata schemas make it possible for each piece of content in a system to have its own metadata schema, if required. As of this writing, Picturepark is the only content system I know of that offers this capability; but other vendors can be expected to follow suit because of the potential this new metadata paradigm offers.
If you’re in the market for content software, ask the vendors you speak to if they support multiple or layered metadata schemas, or plan to in the future. They might offer the feature using a different name than adaptive metadata. Just make sure what is offered is truly about enabling you to define different classes of content, and not just that they do something as simple as change available metadata fields based on the file format of the underlying content. For example, a logo, product box photo or a CEO headshot might all be JPG files, but their metadata needs are dramatically different.
Location vs. attributes
Understanding that vocabularies, tags and taxonomies are all just metadata makes it easier and more difficult to explain their differences to users. In order to simplify the discussion, I explain it like this:
Taxonomy terms, like folders in a file cabinet, are locations into which digital content can be classified. Vocabularies and tags are attributes that are assigned to that content.
Admittedly, this perspective is based entirely on the user experience provided by content systems. Meaning, for example, that I can drag and drop content onto taxonomy terms, which “feels” like I’m putting them there. By contrast, when I’m adding tags inside a metadata editing window, it feels like I’m adding attributes.
In fact, both operations are exactly the same, particularly when controlled vocabulary terms come from the taxonomy.
But because these concepts are derived from the physical world, we have to consider that perspective when discussing the concepts with others. For example, the folders in a file cabinet are locations into which we put documents (taxonomy), while the “PAID” and other stamps we put on those documents are attributes (tags).
For many, it’s too tempting to not borrow real-world concepts when configuring a CMS or DAM, which is fine in many cases. Just keep in mind that any such definitions you adopt for your system are purely academic—it’s all metadata.