Content Management White Paper: Storage and Archiving
By David Diamond • Mar 06, 2018
An important content management concern is storage space: where, what type and how much. In most cases, these choices will be up to you. In some cases, content systems mandate where and how they store content. More modern systems will not impose such limitations.
If your entire content system will reside in the cloud, alternative storage options might be limited, and your concern about those options might be limited at first too. But it is still important to know how your service provider is accounting for storage, and disaster avoidance and recovery.
You might decide to have multiple types of storage. For example, faster storage for current and popular content, and slower, more affordable storage for older or less frequently accessed content. In addition, you might have offline (and offsite) storage options that are used for archived content that should no longer be readily available, or just to ensure you have a more secure backup of your entire system.
Offsite backups provide organizations with a means to safeguard their content collections from disasters, such as fire, earthquakes, floods or cybercrimes that might destroy online data.
Though there is understandably extra cost involved in maintaining an always-current offsite backup, the value of doing so will depend on what a “disaster” would mean for your organization. If, for example, the complete destruction of your content system would mean the end of your organization, or it would result in costly litigation with customers, you might find the cost of offsite backups to be worthwhile.
Other considerations are recovery time objective (RTO) and recovery point objective (RPO) policies. RTO defines the acceptable amount of time that data or systems can be remain offline and inaccessible after a failure. RPO defines the acceptable amount of time (or traffic) that can exist between data backups.
These policies can vary dramatically, depending on the organization. For example, a company like Amazon cannot remain offline for long without suffering millions in lost revenue. And imagine the outcry if the world’s previous day of Facebook activity was lost forever, even if the system itself went offline for only a minute or so.
Though it is easy to say that systems and data cannot be inaccessible for more than a minute, and that you require that no data be lost in the event of a system or network failure, this requirement is both unreasonable and unaffordable.
When a system has gone offline because of an external network glitch, it might be back up again within minutes, with little or no effort on the part of your IT teams. But if the failure is the result of your hardware (or personnel), recovery will not be so easy.
Well trained IT teams are ready to replace hard drives and other hardware components, but even the fastest swap will require time. In some cases, a system can take longer to reboot up to operational status than it took to swap the faulty component. And there is the testing required to ensure everything is functioning properly. And all of this, of course, assumes your team was able to identify the problem quickly.
Cloud service providers sometimes offer different RTO/RPO options to clients; but rest assured, “fast” will not be “cheap.” Worse, the warranties offered by hosting providers might not adequately cover your losses. So, even if you can calculate what downtime costs you per minute, you are not likely to recover those losses via service warranties.
“Cold storage” options are popular for organizations that have large content archives. This type of storage can cost a fraction of traditional storage. A number of cloud storage providers offer cold storage as a service. If your chosen content system does not directly support the cold storage provider you prefer, see if access can be developed via the content system’s API. There are many cloud backup services available that should meet your requirements, some of which you might have already entered into agreements with for other purposes.
Depending on the capabilities of your system, content might be moved between storage types automatically. So, when something is accessed once, it might be copied to faster storage where it remains cached for a given period of time. Or, when a user marks a piece of content as being “archived,” the system might move it to archive storage.
Similar in concept to using multiple storage types is using a content distribution network (CDN). These networks cache recently used files so that they are faster to access for future users. Increasing their value, CDNs can distribute content to locations all over the world, so that Internet latency is reduced, and system downtimes or peak loads can be compensated.
In virtually all cases, the use of a CDN is not apparent to users and managers of a content system. In most cases, the use of one is merely a factor of subscribing to the CDN of your choice and enabling support for it, either from within your content system or at the CDN control panel.
What is important to keep in mind about CDN use is that it might affect the usage statistics managed by your content system. For example, the first time a person accesses a piece of content, the content is copied to the CDN, from where it is accessed thereafter for a period of time. But it is possible that the actual number of accesses via the CDN will not be available in your content system. If accurate statistics are a requirement, make sure you discuss this with the makers of your content system to see what options are available to you. As for all storage and processing of data, make sure that your chosen CDN complies with your regulatory requirements.
If your content collection is large, or will become so over time, make sure your system supports multiple types of storage, and the movement of content between storage locations. Just this one aspect of content management alone can improve performance for users and greatly reduce operational costs.
This excerpt from Picturepark’s Routing Digital Content through the Enterprise is part of a multi-part blog series that features sections of the complete document.
- Users and Flow
- User Groups and Roles
- Content Creation and Acquisition
- Access for Collaboration
- Storage and Archiving
- Collaborative Communication
- Real-World Metadata
- Automated Metadata
- Semantic Links
- Archiving Content
- Content Routing
- Making Content Available
- Output Channels
- Measuring Results
- Next Steps
Access for Collaboration
Once content is in the system, it must be managed to ensure it can be accessed by those who need it, and to ensure it cannot be accessed by users and systems that should not see it.