Scale-up database wrangler MotherDuck hits $47.5 million • The Register

interview In analytical database systems, the story for the last decade or more has been one of expansion. Only databases distributed across multiple nodes could handle the scale required by so-called big data. Web and mobile data drove the demand for systems that scale, rather than relying on increasingly powerful single instances.
Hadoop (technically a distributed file system), AWS Redshift, Snowflake, and Google’s BigQuery all followed this trend—at least in terms of Online Analytical Processing (OLAP).
But one of BigQuery’s chief architects makes a bet on a system going the other way. Jordan Tigani’s new company, MotherDuck, has just raised $47.5 million in seed and Series A funding, with backers including a16z, the VC co-founded by web pioneer Marc Andreessen.
MotherDuck developed a serverless extension for the open source database DuckDB, which was featured in The registry in September.
Although DuckDB just released its 0.6.0 iteration this week, DuckDB has already found a home on Google, Facebook, and Airbnb.
DuckDB is embedded in a host process with no DBMS server software to install, update, or maintain. For example, the DuckDB Python package can run queries directly on data in the Pandas Python software library without importing or copying data.
The other thing that makes it different is that DuckDB scales up instead of scaling.
Tigani tells The registry: “Everyone is talking about big data. Databricks and Snowflake have attempted to outdo each other in benchmark wars over a 100TB data set. In reality, nobody uses this amount of data. Everyone focuses on huge datasets, but actual workloads on the database are typically gigabytes.”
While serving as chief product officer for SingleStore — the database that claims to support both analytical and transactional workloads on a single system — Tigani saw DuckDB, an open-source project created by Dutch computer science researchers Hannes Mühleisen and Mark Raasveldt was written together.
“Since MapReduce’s inception in 2004, upscaling has been a dirty word, but when you realize that most of the data we work on isn’t that huge, and at the same time laptop and desktop hardware has gotten better, stop that Scale-up is much easier and more robust. When we built Google BigQuery as a large distributed system, it took a tremendous amount of energy to get it up and running,” says Tigani.
Because DuckDB runs in-process, it can run on a laptop, in the browser, on a cloud VM, or on a cloud function, for example. It can be used in Python notebooks, R scripts, Javascript data apps or Java backends.
MotherDuck provides DuckDB with a backend extension that allows the database to work analogously to Google Sheets, which runs partly on the client and partly on the server. It connects the client database to a backend execution pipeline and a cost-based optimizer that uses the “standard tricks” used to optimize queries in the data warehousing world. It also helps the system decide what to run on the client and what to go to the cloud, Tigani says.
Additionally, it allows developers and data scientists to collaborate on the same data set, avoiding replication and version control – although DuckDB literature makes it clear that it is not a replacement for large client/server installations for centralized enterprise data warehousing.
DuckDB, which remains open source under the permissive MIT license, has attracted interest from developers looking to build it into their data analytics and machine learning systems.
Matthew Mullins, CTO of collaborative analytics tools maker Coginiti, tells The registry: “I’m super excited about DuckDB and all the things people are going to build on it because it’s very easy to use, it’s incredibly fast, and as soon as you touch it you think of all the places you put it.” could use.
“Our product enables analysts to transform data in the leading analytical data platforms, all of which are column-oriented like DuckDB. Implementing DuckDB in our product was a way to separate data warehouse-like computations and replicate them in the browser. It’s so pulling fire out of the clouds. For us, DuckDB allows users to manipulate large datasets with incredible speed and accuracy, while leveraging local processing power to save on platform costs. We’re just at the beginning of our journey with DuckDB,” says Mullins . ®
https://www.theregister.com/2022/11/17/475_million_says_scaleup_databases/ Scale-up database wrangler MotherDuck hits $47.5 million • The Register