Slippery Synonyms

The final part of this miniseries on the use of synonyms explains how and when synonyms can adversely affect the value of a content system.

When you start to think in terms of preferred terms and their synonyms, it can become an addictive game: How many synonyms can you come with for each term?

As with the granularity or depth of a taxonomy, you can easily go too deep with synonyms too.

Metadata for Content ManagementBook cover for Metadata for Content Management by David Diamond

You’re reading an excerpt from the book Metadata for Content Management by David Diamond. Additional excerpts will be published here over time. You can purchase the complete book via Amazon.

There are two basic branches of “too deep” that apply to synonyms:

  • Terms that are homonyms (words that have multiple, completely different meanings) can confuse users
  • Terms that only loosely mean the same thing can introduce ambiguity that confuses users

If, while tagging a tree photo, you add the term “bark,” what will users think when they’re looking for dog photos and they see your tree? We can all imagine the most obvious connection between a dog and a tree (or fire hydrant), but this isn’t likely the kind of logic users consider when trying to find photos of Fido.

Likewise if you add “bark” as a synonym to “wood.” A user might type “dog bark” into a search field hoping to find images of vocal pooches, but the system displays photos of two-by-fours.

If the system’s search behavior default is set to require that all search terms entered be present in the found content, this wouldn’t likely be an issue. On the other hand, the system might consider “dog bark” to be the same as “dog wood,” which might then return photos of plants too.

The creation of synonyms affects all assets tagged with the preferred term. For this reason, it’s important to carefully consider any unintended consequences of synonyms you consider adding.

Synonyms can also introduce ambiguity. Take, for example, “cake.” When considering synonyms for this term, you might consider “pie,” assuming that baked goods are the goal for any search for “cake.” But a pie is no more a cake than a cupcake is a cake. If someone really needed a cake photo, those search results would be annoying.

On the other hand, depending on the expertise and common use cases of your user base, ambiguity might be welcome.

Consider a tomato: It’s a fruit, not a vegetable. But does this matter in your system? Perhaps while searching for “red vegetables,” it would be okay to show tomato photos because you know that your users are creating brochures, not recipes or culinary lesson plans. (If you fear that someone might put tomatoes into a fruit salad and blame your content system, disregard this advice.)

Ideally, the synonymic relationship between two terms should go both ways. For example, it’s safe to assume that all cats are felines, and that all cars are automobiles; but not all boats are ships, and not all trucks are pick-ups. A human male is either a boy or a man; but a male canine is never either.

As a rule, consider whether there might be a hierarchical relationship between any two synonyms you consider. If you find it difficult to make one term subordinate to the other, they’re probably good candidates for sibling synonyms.

On the other hand, if a parent/child relationship between the two is clear, you might be introducing problems. For example, all tablets are computers, but not all computers are tablets. This is an example where you might want “Computer” to be a parent term to “Tablet.”

As with many topics, the rules for using synonyms in a DAM or content management system can differ from the academic rules that would otherwise apply.

Here are a few considerations to remember when planning your synonyms:

  • The goal of a synonym in a content system is to increase the value of search results, not to educate users about language.
  • Synonyms should never introduce ambiguity.
  • Synonyms that are homonyms should be avoided if their alternate meanings hold some significance to other assets in your system. For example, if your system is about microbiology, chances are, the use of “cell” as a search term won’t result in too many jailhouse photos.

Tip: If you like these guidelines, why not make them part of your content policy that governs the creation of synonyms?

When you consider the complexity of dealing with written synonyms, imagine how complex things will become when content management interfaces are voice-based.

  • flower or flour
  • muscle or mussel
  • yoke or yolk

Use the nuance of language to your advantage when designing your system. Don’t be afraid to break the rules of language when it makes sense to do so. Your content system isn’t about educating users about language; it’s about enabling them to find what they need, as conveniently as possible.

Don’t forget that machines aren’t (currently) too great at deriving context from language. Sentence structure can help engines like Google derive context, but most users don’t search in sentences when using a DAM or content management system, which is fine because most such systems don’t handle sentences very well.

You’ll unlikely account for all adverse situations, which means that you have yet another good reason to make it easy for users to send questions when they arise. If users are ever confused by what they see, they should be letting you know and you should be taking those concerns seriously. ♦

Other articles in this series:

This excerpt from Metadata for Content Management is published with permission from author, David Diamond, who retains the exclusive copyright. Additional excerpts will be published in the Picturepark DAM Innovation blog over time. You can purchase the book via Amazon.