Content Management White Paper: Storage and Archiving

An important content management concern is storage space: where, what type and how much. In most cases, these choices will be up to you. In some cases, content systems mandate where and how they store content. More modern systems will not impose such limitations.
If your entire content system will reside in the cloud, alternative storage options might be limited, and your concern about those options might be limited at first too. But it is still important to know how your service provider is accounting for storage, and disaster avoidance and recovery.

Data Protection Regulations Data protection regulations might require you make clear to users the locations in which their data is processed and stored. In some cases, not all countries, regions or jurisdictions will be legal options. (See GDPR.)

You might decide to have multiple types of storage. For example, faster storage for current and popular content, and slower, more affordable storage for older or less frequently accessed content. In addition, you might have offline (and offsite) storage options that are used for archived content that should no longer be readily available, or just to ensure you have a more secure backup of your entire system.

Offsite backups provide organizations with a means to safeguard their content collections from disasters, such as fire, earthquakes, floods or cybercrimes that might destroy online data.

Though there is understandably extra cost involved in maintaining an always-current offsite backup, the value of doing so will depend on what a “disaster” would mean for your organization. If, for example, the complete destruction of your content system would mean the end of your organization, or it would result in costly litigation with customers, you might find the cost of offsite backups to be worthwhile.

Other considerations are recovery time objective (RTO) and recovery point objective (RPO) policies. RTO defines the acceptable amount of time that data or systems can be remain offline and inaccessible after a failure. RPO defines the acceptable amount of time (or traffic) that can exist between data backups.

These policies can vary dramatically, depending on the organization. For example, a company like Amazon cannot remain offline for long without suffering millions in lost revenue. And imagine the outcry if the world’s previous day of Facebook activity was lost forever, even if the system itself went offline for only a minute or so.

Though it is easy to say that systems and data cannot be inaccessible for more than a minute, and that you require that no data be lost in the event of a system or network failure, this requirement is both unreasonable and unaffordable.

When a system has gone offline because of an external network glitch, it might be back up again within minutes, with little or no effort on the part of your IT teams. But if the failure is the result of your hardware (or personnel), recovery will not be so easy.

Well trained IT teams are ready to replace hard drives and other hardware components, but even the fastest swap will require time. In some cases, a system can take longer to reboot up to operational status than it took to swap the faulty component. And there is the testing required to ensure everything is functioning properly. And all of this, of course, assumes your team was able to identify the problem quickly.

Cloud service providers sometimes offer different RTO/RPO options to clients; but rest assured, “fast” will not be “cheap.” Worse, the warranties offered by hosting providers might not adequately cover your losses. So, even if you can calculate what downtime costs you per minute, you are not likely to recover those losses via service warranties.

“Cold storage” options are popular for organizations that have large content archives. This type of storage can cost a fraction of traditional storage. A number of cloud storage providers offer cold storage as a service. If your chosen content system does not directly support the cold storage provider you prefer, see if access can be developed via the content system’s API. There are many cloud backup services available that should meet your requirements, some of which you might have already entered into agreements with for other purposes.

Depending on the capabilities of your system, content might be moved between storage types automatically. So, when something is accessed once, it might be copied to faster storage where it remains cached for a given period of time. Or, when a user marks a piece of content as being “archived,” the system might move it to archive storage.

Similar in concept to using multiple storage types is using a content distribution network (CDN). These networks cache recently used files so that they are faster to access for future users. Increasing their value, CDNs can distribute content to locations all over the world, so that Internet latency is reduced, and system downtimes or peak loads can be compensated.

In virtually all cases, the use of a CDN is not apparent to users and managers of a content system. In most cases, the use of one is merely a factor of subscribing to the CDN of your choice and enabling support for it, either from within your content system or at the CDN control panel.

What is important to keep in mind about CDN use is that it might affect the usage statistics managed by your content system. For example, the first time a person accesses a piece of content, the content is copied to the CDN, from where it is accessed thereafter for a period of time. But it is possible that the actual number of accesses via the CDN will not be available in your content system. If accurate statistics are a requirement, make sure you discuss this with the makers of your content system to see what options are available to you. As for all storage and processing of data, make sure that your chosen CDN complies with your regulatory requirements.

If your content collection is large, or will become so over time, make sure your system supports multiple types of storage, and the movement of content between storage locations. Just this one aspect of content management alone can improve performance for users and greatly reduce operational costs.

Managing Remote Content

Even if your entire content system is used for nothing more than management of remote links, some storage will be required. Thumbnails, cached previews and metadata all require storage space. Keep this in mind when creating storage apolicy and making storage decisions.

Be the First to Learn.

Interested in getting notified about new blogs and other news from Picturepark? Follow us on Twitter, Linkedin or Facebook, and subscribe to our monthly newsletter.

Picturepark News

We'll send you a monthly update of what is happening with Picturepark and the Digital Asset and Content Management industry.