![]()
Post by James Kobielus (thank you)
How Hadoop complements data warehouses to create a powerful converged platform
Apache Hadoop is fundamental to the next generation of data warehousing. Companies are adopting Hadoop for strategic roles in their current warehousing architectures, such as extract/transform/load (ETL), data staging, and preprocessing of unstructured content. I also see Hadoop as a key technology in next-generation massively parallel data warehouses in the cloud, which will complement today’s warehousing technologies as well as low-latency streaming platforms.
At IBM, we expect Hadoop and data warehousing to tie the knot more completely over the next several years and converge into a new platform paradigm: the Hadoop data warehouse. Hadoop won’t render traditional warehousing architectures obsolete; instead, it will supplement and extend the data warehouse to support a single version of the truth, data governance, and master data management for multi-structured data that exists in at least two of the following formats: structured (such as relational or tabular), semi-structured (including XML-tagged free-text files), and/or unstructured (for example, ASCII and other free-text formats).
Read on here


1 Comment