Gigaom Research webinar: Data management for production Hadoop data lakes

Gigaom

In the new world of “data lakes,” where raw data is collected for subsequent discovery and analysis, lies the task of managed data ingestion. While data lakes may dispense with the ultra-formality of an Enterprise Data Warehouse (EDW), data quality is nonetheless crucial. Users may like informal access to data, but they don’t want data that’s dirty or inaccurate. If the data lake is polluted, then it will also be stagnant, from disuse.

The ability to organize and prepare data, on the fly, as it is ingested into Hadoop Distributed File System (HDFS) storage, is therefore of utmost importance. Things like tracking operational metadata, business metadata, data lineage, and dataset quality are important in a data lake world, as they increase confidence in the Hadoop platform overall. This isn’t just Enterprise Information Management (EIM) for Hadoop; it’s agile rigor for data lakes. And it may be make-or-break for data lake…

Ver la entrada original 109 palabras más

Anuncios

Responder

Introduce tus datos o haz clic en un icono para iniciar sesión:

Logo de WordPress.com

Estás comentando usando tu cuenta de WordPress.com. Cerrar sesión / Cambiar )

Imagen de Twitter

Estás comentando usando tu cuenta de Twitter. Cerrar sesión / Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Cerrar sesión / Cambiar )

Google+ photo

Estás comentando usando tu cuenta de Google+. Cerrar sesión / Cambiar )

Conectando a %s