Hadoop: Nucleus of the Next-Generation Big Data Warehouse

Post by James Kobielus (thank you)

How Hadoop complements data warehouses to create a powerful converged platform

Apache Hadoop is fundamental to the next generation of data warehousing. Companies are adopting Hadoop for strategic roles in their current warehousing architectures, such as extract/transform/load (ETL), data staging, and preprocessing of unstructured content. I also see Hadoop as a key technology in next-generation massively parallel data warehouses in the cloud, which will complement today’s warehousing technologies as well as low-latency streaming platforms.

At IBM, we expect Hadoop and data warehousing to tie the knot more completely over the next several years and converge into a new platform paradigm: the Hadoop data warehouse. Hadoop won’t render traditional warehousing architectures obsolete; instead, it will supplement and extend the data warehouse to support a single version of the truth, data governance, and master data management for multi-structured data that exists in at least two of the following formats: structured (such as relational or tabular), semi-structured (including XML-tagged free-text files), and/or unstructured (for example, ASCII and other free-text formats).

Read on here

 

Leave a comment

1 Comment

  1. Hadoop: Nucleus of the Next-Generation Big Data Warehouse « Storage CH Blog « Sutoprise Avenue, A SutoCom Source

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 842 other followers