Optimising storage infrastructure using deduplication

Monday, 08 March, 2010

With massive volumes of data being generated daily, organisations are being flooded with data and they are struggling to effectively manage information growth. Symantec’s Paul Lancaster sets out the reasons why deduplication technologies will help with data storage.

The spiralling volume of documents, audio and video files, images and email attachments are swamping storage resources and making data management a difficult task for IT staff. Industry analyst IDC, in its worldwide enterprise storage systems forecast for 2009-2013, has predicted storage capacity will grow between 40Â–50% each year to cope with the increasing amount of data generated.

Research indicates that many organisations are tempted to use a ‘quick-fix’ approach for their storage management problems. In a recent survey, Symantec found that 42% of Australian businesses responded to data storage capacity challenges by simply buying more storage or deleting files. In fact, 15% of Australian organisations described themselves as being ‘addicted to storage hardware’ and a similar percentage conceded they would need to understand how to become more efficient with existing storage.

While this approach may be practical in times of robust economic growth, purchasing more storage is not always a financially viable option. Obtaining storage is costly, accounting for a significant portion of the IT budget across both physical and virtual environments. This spend can drain budget resources which could better be utilised elsewhere in the business. Additionally, increasing storage capacity also increases the complexity of managing the storage environment and may potentially result in performance issues, such as the speed at which data can be recovered from archives.

The recent downturn has provided a catalyst for change by encouraging businesses to develop more refined information management strategies. IT teams are now focusing their efforts on extending the life of their existing storage environment, looking to significantly reduce storage costs while also simplifying data management.

Utilising deduplication to maximise storage resources

One way to reduce the sheer volume of data being backed up is to eliminate redundant data, using technologies such as deduplication. For the majority of businesses, approximately 70% of data is duplicate and has not been accessed in more than 90 days. Data deduplication tackles this issue by eliminating duplicate data across back up files, even when such data is unrelated.

Email archiving is an example of where deduplication can make a significant impact. Emails can be stored multiple times, including on an email server, on a personal computer, in a proprietary email system file, on file servers and in backups. With deduplication, only one copy of the email file will be captured, reducing the volume of data that is stored and replacing redundant data with a pointer to the unique data copy. This approach can dramatically reduce the volume of data that businesses need to store.

Which type of deduplication should businesses use?

There are generally two types of deduplication that are used by businesses - client and target. Client deduplication removes redundant data directly from a computer or other end-point devices before it’s transmitted to the backup server. This approach reduces the bandwidth required to move data to a backup server, by as much as 99%, as well as reducing the overall backup time. However, client deduplication requires between 15-20% of processing power during a backup, slowing client performance, which can lead to productivity issues in busy organisations.

Deduplication at the target (or media server) removes redundant data after it is transmitted over the network to the backup server. This approach is often utilised by businesses that want to avoid updating the client on their network. It is also useful for businesses that want to minimise the performance burden on the client during backup.

Organisations should aim to incorporate client and target deduplication in their backup and data-recovery strategy. A combined approach will enable them to simplify the storage management process by enabling deduplication at various stages across the environment. This approach gives IT teams the power to decide which type of deduplication is most appropriate for the storage task in question.

Currently, it can be difficult to take a combined approach to deduplication as most data protection solutions only support target deduplication. Client deduplication is often overlooked as businesses try to avoid adding another layer of hardware, cost and complexity to their IT infrastructure.

Symantec advocates a new approach with the launch of data management products that integrate client and target deduplication. The latest releases of Symantec’s NetBackup and Backup Exec simplify information management by integrating deduplication into various points across a businesses’ IT architecture. Deduplication is carried out closer to the data source on the client as well as on the media server that manages back end storage. This approach empowers businesses to utilise a combination of deduplication methods to streamline storage management.

The benefits of a combined approach to deduplication

The use of client-side deduplication as well as target deduplication can realise dramatic storage savings for businesses. Client deduplication increases the speed and efficiency of backups in remote offices, data centres and virtual environments and reduces network traffic by up to 90%. This can be a powerful weapon when tackling data management.

By integrating client and target deduplication into its recently released NetBackup 7 and Backup Exec 2010 products, Symantec is making client-side deduplication a realistic option for organisations, without forcing them to purchase additional hardware. Businesses can now access both client and target deduplication technology by simply upgrading their backup system. This makes it easy for organisations to implement a combined deduplication policy.

Looking forward

The argument supporting a combined data deduplication policy is certainly compelling. It enables organisations to dramatically reduce storage backup costs by consolidating and re-using existing resources.

Beyond reducing storage capacity, the secondary business benefits include the significant reduction in the amount of disk space required for backup, resulting in cost savings. Less bandwidth is also required to move information to archives. Organisations utilising combined data deduplication will experience fast, reliable backup and information recovery, an increase in the number of recovery points and a reduction in disk capacity.

When data deduplication is deployed as part of an overall backup strategy, there is no doubt that a combined approach can bring both operational and economic benefits to organisations.

Paul Lancaster is director of system engineering, responsible for the Systems Engineering (pre-sales) team operation for Australia, New Zealand and the Pacific Islands. In this role, Lancaster oversees and manages Symantec’s pre-sales engineers operation for the availability line of business, driving the focus on data protection, performance management, storage management and high availability.

Optimising storage infrastructure using deduplication

Seven predictions that will shape this year

ARENA jointly funds Vic's first large-scale battery storage

Protecting next-gen storage infrastructures

Content from other channels on our network