Posts Tagged ‘Data De-Duplication’

Data De-duplication

October 28, 2009 Leave a comment

Data De-duplication.  In an effort to learn more about this technology and help our clients make sense of it all,  I will be blogging my findings.

I predict Data De-duplication will be a common occurrence in all types of data storage systems.  At this point the industry – so it appears – is rushing to be more Green conscience, however IT Management Staff is also concern in storage cost reduction, reduction of bandwidth utilization and Transfer Speed.  I will focus my findings in these areas and ask for help from you and other industry leaders to add notes and comments.

Like always, I want to find and recommend the right solution (product).  The products I have review thus far are all over the map.  Some of them have their own proprietary technology and others make claims to be faster or better in some way.

I started this project trying to find an answer to a backup problem.  According to some experts Corporate data storage requirements is increasing at an alarming 60% annual rate.  As a consulting company we have seen this first hand in a number of clients.  Data Storage requirements has increase for a number of reasons;

  • Scan Images  are very common in small to large companies.  Technically, the storage of scan documents  is simpler and more cost effective then traditional paper storage.  With the proper solution Images could be  catalogue or index  to make them searchable and easy to find.  However, this solution adds to the total backup storage requirements.
  • Everyone for one reason or another is reducing the use of paper either because they want to be more IT Green or just because it is much easier to store  everything onto some type of a data storage device like a Hard Drive.  Traditionally people would write a document,  print it and then mailed it.  With the introduction of electronic mail there is no need to print,  stuff the envelope, spent money on the stamp or spent  time going to the post office; now, you simply write your document and emailed it.  You may want to email this document to group of co-workers, again the process is simple – However after years of doing this process the same document may be stored in multiple locations throughout your Internal Computer Network.  Likewise, other duplicated large-in-size data of all kinds – Pictures, Music, Videos and Movies are now  increasing backup storage requirements.

The practice of storing all corporate data and keeping it for historic or legal reasons is now more complex and harder to manage.  You have Database Servers managing production, inventory, sales  and accounting data.  You also have email servers with years worth of history.  Other Application Servers equally important may contain Electronic Data Exchange (EDI) transactions,  client information or patient medical records.  All these data is important to secure and keep.  Not all critical data is stored in servers in some cases critical data sits on local users computers and in some cases it is never backup! –  Users tend to store data onto their local Desktop not being aware that it points to the local Hard Drive.

The problem

Tape Backup Systems are no longer able to meet the challenging  demanded from the up sized data increased.


The answer lies in a mix environment where backups happen automatically from any source and onto any storage media.  The solution has to be reliable, it should be easy to manage,  it has to be fast – specially when it comes to restoring, it also has to have low bandwidth utilization and finally it should be economically feasible so companies of any size can utilize it.

Data De-duplication technology – no matter what flavor of implementation – offers  huge number of advantages over traditional tape backups.  The best advantage, from my point of view,  is automation as no matter how good of a tape backup system you have it all requires human intervention and  this intervention always fails – not if it happens but when it happen it is at an unfortunately crucial time. File-and block-level de-duplication eliminates backup copies of the same data and delivers substantial storage cost savings.

For mid to large companies,  the leading comercial vendor for Data Deduplication Systems is Data Domain (part of EMC).   Other vendors like Computer Associates, Symantec and Barracuda have solutions suited for smaller, midsize or even  departments of large companies.