Cloudera Launches SaaS Platform for the Lakehouse Crowd • The Register

Former Hadoop star Cloudera has announced a fully managed software-as-a-service (SaaS) version of its data platform, which it claims is more open than its competitors in the crowded marketplace.

With the product Cloudera Data Platform (CDP) One – initially only available on AWS – Cloudera promises analytics and data exploration in a single platform.

Adoption of the term “Lakehouse” – shaped by databricks to bring together the chaotic world of data lakes with the ordered approach of data warehouses – Cloudera also claims that the new product offers a set of low-code data engineering and exploration tools to improve efficiency for experienced business users.

Cloudera Merged with Hortonworks in 2018 in a $5 billion deal after both companies rode the wave of Big Data on Hadoop hype.

The merger coincided with the emergence of cloud-based object storage technologies like AWS S3, Azure BlobStorage, and GCP Cloud Storage, which solve many of the same problems as Hadoop Distributed File System.

In September 2019, the company launched its Cloudera Data Platform (CDP), designed to create an integrated approach to provisioning, managing and consuming data across enterprise on-premises, hybrid cloud and private cloud infrastructures.

While the cloud version of CDP was available on AWS, Google Cloud and Azure, CTO Ram Venkatesh said The registry it was a platform-as-a-service offering operated jointly with customers. CDP One is a fully managed service.

However, it enters a crowded market. Snowflake tried to bring together structured and unstructured Data in its SaaS data platform, while Databricks – which shares Cloudera’s Hadoop heritage – has done so brought SQL Analytics to its data lake.

One difference, Venkatesh said, however, is Cloudera’s openness to giving customers a choice between the tools they use to manage and analyze their data.

“The cardinal sin that was in previous attempts [at combining data lakes and data warehouses] the mapping was always tied to an engine. If it was built on top of Hive, then Spark would be a second-class citizen. If Spark came up with it – what’s what? [Databrick’s] Delta – it’s not great for Impala,” he said.

But Venkatesh said Cloudera avoided that approach the introduction of Iceberg by the Apache Software Foundationthat offers an open table format designed for high performance on big data workloads while supporting query engines such as Spark, Trino, Flink, Presto, Hive, and Impala.

“The middle level – if it is independent – ​​is not tied to a master. It was designed from the ground up to work with cloud storage – not just HDFS – on the low end, and on the high end it’s Spark, Hive, Impala, and Pesto, things Cloudera might not even support.

“When you’re managing so much data, it’s just hubris to think that one engine can solve everything,” Venkatesh said.

CDP One is available now for customers who sign up and will be generally available later this year. ® Cloudera Launches SaaS Platform for the Lakehouse Crowd • The Register

Laura Coffey

World Time Todays is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – The content will be deleted within 24 hours.

Related Articles

Back to top button