Hadoop Storage: DAS vs. Shared

Post by George Crump (thank you)

Hadoop is a software solution that was developed to solve the challenge of doing a very rapid analysis of vast, often disparate data sets. Also known as big data, the results of these analytics, especially when produced quickly, can significantly improve an organization’s ability to solve problems, create new products and to cure diseases. One of the key tenets of Hadoop is to bring the compute to the storage instead of the storage to the compute. The fundamental belief is that the network in-between compute and storage is too slow, impacting time to results.

Read on here

Flash, Trash and data-driven infrastructures!

Post by Enrico Signoretti (thank you)

I’ve been talking about two-tier storage infrastructures for a while now. End users are targeting this kind of approach to cope with capacity growth and performance needs. The basic idea is to leverage Flash memory characteristics (All-flash, Hybrid, hyperconvergence) on one side and implement huge storage repositories, where they can safely store all the rest (including pure Trash) at the lowest possible cost, on the other. The latter is lately also referred to as a data lake.

Read on here