Our Investment in MotherDuck

Nov 16, 2022

Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius—and a lot of courage to move in the opposite direction.” - Albert Einstein

For the past decade, our industry has obsessed over the promises of Big Data. We captured every bit, routed it to centralized storage systems (read: Cloud Data Warehouses), and got caught holding the proverbial data bag ($$$). Along the way, we overlooked the “biggest” data of all: the uncountable small datasets underserved by these analytics platforms.

When data teams collect data carefully and intentionally to solve problems, they typically amass smaller datasets; a lot of data that teams collected in the past is either low quality or not relevant. While we think that cloud data warehouses are well-suited for certain use cases, they’re not compatible with the needs of all teams. CDWs often sacrifice performance on smaller, curated datasets to support queries across petabytes of data (scalability! but at what COST?). Let’s explore two use cases for smaller, faster data:

Edge computing - More internet-connected devices equals more data produced at the edge. Moving computations closer to the data can significantly lower costs by taking advantage of the “free” compute already there (and avoiding roundtrips to a cloud server). However, these edge devices (e.g. phones, sensors) often have resource constraints — any analysis on the client must be done with a small footprint.
Interactive analytics - More analysts and data scientists are demanding real-time, interactive analytics. Lower latency means data teams can explore ideas and build models, reports, and analyses faster. Faster response times also usher in new possibilities for building user-facing apps.

These use cases demand similar requirements from a database: something portable, fast, cost-efficient, and analytically capable. A couple of brilliant researchers out of CWI in the Netherlands were among the earliest to identify this gap. In 2019, Mark Raasveldt and Hannes Mühleisen released a paper on a new research project titled “DuckDB: an Embeddable Analytical Database”. They set out to build an embedded OLAP database that could serve the evolving needs of users (from edge computing to interactive analytics).

How big is the need for an embedded database, you may ask? Look no further than SQLite, the most ubiquitous database on the planet with over a trillion instances (by some estimates). There’s nothing “light” about that. DuckDB promises to deliver on the “SQLite for analytics” by being:

compact (a single file that runs in-process),
fast (parallel and vectorized query processing),
analytical (aggregate millions of rows or combine giant tables),
And cross-platform (copy once, run anywhere).

DuckDB more than delivered on that promise and has steadily amassed a loyal following of users in their open-source community. In 2021, Mark and Hannes formed DuckDB Labs to further promote development around the project. Over the past couple of years, we got to know the DuckDB team and advisors more closely.

In April 2022, the DuckDB Labs team introduced us to Jordan Tigani, who had recently left his CPO role at SingleStore after an impressive career in the database world. He was formerly the founding Tech Lead for BigQuery and knew the use cases for cloud analytics intimately. At Google, he observed that many customers sought fast, easy access to data and analytics. However, Google and other cloud vendors prioritized scalability at the expense of user experience and developer ergonomics for many use cases. Jordan shared a vision with us of building a fast, lightweight data platform that could meet the needs of the long tail of users for whom CDWs were too expensive and complex, by leveraging DuckDB. He also shared how this platform could grow with data teams as their needs evolved, including by exploiting modern hardware. We were instantly intrigued.

Within weeks, Jordan recruited some of the best database talent in the industry to join his team (from the likes of Google, Snowflake, Databricks, Firebolt, AWS, Meta, Elastic, and Teradata). In addition, he built a strong relationship with the DuckDB Labs folks who were equally excited to see Jordan succeed, amplifying their research for a broader set of use cases.

In a short seven months, the MotherDuck team has shown incredible progress toward realizing their vision. MotherDuck is building a serverless data platform that unlocks new workflows and applications for users — underpinned by DuckDB. MotherDuck has the potential to create truly universal data plane, bridging analytical workloads between client and server. In MotherDuck’s words, “long live easy data.”

It’s time to bring compute closer to the data, execute where it makes more sense, and address all sizes of datasets. This is why we are so excited to announce our investment in MotherDuck’s Series Seed and Series A. The MotherDuck team is building a foundational new piece of the data stack and we are thrilled to be part of their journey!

This article was originally published on Amplify Partners’ blog, where I wrote and published it on November 15, 2022: https://www.amplifypartners.com/blog-posts/motherduck

Natalie's notes

Discussion about this post