Designed to Fail

Very good post from Martin Glassborow (thank you) on DR, multiple 9’s availability and why your Storage array can fail.

Randy Bias has written an interesting piece here on the impact of complexity on reliability and availability; as you build more complex systems, it becomes harder and harder to engineer in multiple 9′s availability. I read the piece with a smile on my face and especially the references to storage; sitting with an array flat on it’s arse and already thinking about the DAS vs SAN argument for availability.

How many people design highly-available systems with no single points of failure until it hits the storage array? Multiple servers with fail-over capability, multiple network paths and multiple SAN connections; that’s pretty much standard but multiple arrays to support availability? It rarely happens. And to be honest, arrays don’t fall over that often, so people don’t tend to even consider it until it happens to them.

An array outage is a massive headache though; when an array goes bad, it is normally something fairly catastrophic and you are looking at a prolonged outage but often not so prolonged that anyone invokes DR. There are reasons for not invoking DR, most of them around the fact that few people have true confidence in their ability to run in DR and even fewer have confidence that they can get back out of DR, but that’s a subject for another blog.

I have sat in a number of discussions over the years where the concept of building a redundant array of storage arrays has been discussed i.e stripe at the array level as opposed to the disk level. Of course, rebuild times become interesting but it does remove the array as a single point of failure.

But then there are the XIVs, Isilons and other clustered storage products which are arguably extremely similar to this concept; data is striped across multiple nodes. I won’t get into the argument about implementations but it does feel to me that this is really the way that storage arrays need to go. Scale-out ticks many boxes but does bring challenges with regards to metadata and the like.

Read on here

Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s