Data deduplication is an increasingly important aspect of storage technology

Good post by Scott Lowe (thank you) over at Wikibon

Although new product has been continuously shipped over the past two decades, the world of storage advancement has remained relatively stagnant, at least from a performance perspective.  According to PCWorld’s 50 Years of Hard Drives, the first 10,000 RPM disk was released in 1996 and the first 15,000 RPM disk released in 2000.  Since that time, storage companies have focused on density and capacity rather than on performance, leading to the need for an ever-increasing number of spindles—spinning disks in an array of arrays—in order to improve overall storage performance.  As a result of this eager march toward density, the primary metric by which storage has been measured has been as a function of capacity—dollar per gigabyte or dollar per terabyte, for example.

Although capacity has played a central role, storage performance can’t be overlooked, particularly as organizations seek to leverage more and more centralized technology through various virtualization initiatives.  However, these initiatives have created some new challenges and opportunities:

  • New performance challenges.  As more workloads are centralized—particularly workloads that are directly user-facing, such as virtual desktops—storage performance becomes an ever-increasing factor.  While server workloads might be able to “hide” behind somewhat lesser-performing storage, once workloads are exposed directly to the user, performance becomes even more important than it already was.  VDI, in particular, places new stressors on storage.  VDI introduces occasional massive spikes in storage need as users all boot virtual desktops simultaneously—a phenomenon known as a boot storm.
  • More homogenous workloads in some cases.  The number of workloads in the data center has burgeoned.  Between servers and desktops, there’s a lot more in the data center than their used to be.  This has created a situation in which many different workloads all look very similar—all of the VDI-based machines run the same operating system, for example.

Solving the performance challenges is hard work and can be expensive.  Companies have to build their virtual environments around very high performance standards while, at the same time, maintaining the capacity requirements needed to meet business objectives.  As mentioned earlier, this can mean having to buy a large quantity of spindles just to meet basic needs.

What’s a storage architect to do?

Flash storage would seem to be the logical successor to the current lineup of spindle-based disks.  From a business perspective, the primary issue with flash-based is the cost.  From this article at Pingdom, we learn that, in 2011, the average per gigabyte cost for solid state storage was $2.42 versus $0.075 per gigabyte for traditional magnetic media.  This cost differential is incredible… and not in a good way.

Next, solid state disks don’t haven nearly the capacity of their rotating cousins.  With spinning disks, 3 TB disks are available.  However, with solid state disks, the largest disk is measured in the hundreds of gigabytes.  This creates capacity challenges.  But all is not lost.

Some companies—such are Pure Storage—are embracing the solid state disk trend and using it as the primary building block for a new class of storage arrays that solves the performance problems while leveraging the centralization of similar workloads.  Knowing that businesses will balk at the price of raw SSD storage, Pure Storage isn’t just offering data deduplication as an optional feature.  A building block feature in Pure Storage’s offering, deduplication is an integral, necessary, standard component of the product and it allows the company to compete on price in a more equal way with traditional vendors.  The data deduplication feature in the Pure Storage FlashArray is one that doesn’t compromise on performance in order to achieve capacity efficiency. This feature is one of the company’s main selling points and allows it to claim that their storage solution carries a similar dollars per GB cost of traditional storage solutions.

Read on here

Leave a comment

1 Comment

  1. Suzy

     /  March 16, 2012

    I agree with you, data deduplication is an increasingly important aspect of storage technology. Thought I’d share!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s