Dremio, a startup offering tools to help streamline and curate data, today announced that it raised $135 million in series D funding at a post-money valuation of $1 billion. The company says it’ll use the funds, which come nine months after a $70 million round, to invest in cloud data lake technologies that could benefit businesses looking to connect, analyze, and process data while accelerating database queries. Specifically, Dremio plans to expand its engineering centers of excellence and grow its customer-facing organizations to keep pace with new customer acquisitions.
Due to its scalability, low cost, and simplicity of management, cloud data lake storage has become the destination of choice for storing high volumes of data. According to a recent Allied Market Research report, the global data warehousing market size was valued at $18.61 billion in 2017, growing at a compound annual growth rate of 8.2% from 2018 to 2025. However, to audit that data, it has to be moved and copied into proprietary data warehouses, a process that can become costly, complex, and inflexible.
Jacques Nadeau and Tomer Shiran, a former product manager at Microsoft who’s held engineering and research roles at IBM and HP, founded Santa Clara, California-based Dremio in 2015 to solve this challenge. CEO Billy Bosworth tells VentureBeat that Tomer saw the rise of public clouds like Amazon Web Services, Microsoft Azure, and Google Cloud Platform as an opportunity to reinvent big data technology and develop a cloud data lake engine, enabling companies with large storage volumes to rapidly analyze their data.
“Dremio customers are running millions of queries per day for high concurrency BI with tools like Tableau and Power BI, ad-hoc data processing, and mission-critical dashboards. This is made possible by fundamentally simplifying the workflow for data engineers who are already centralizing data from many sources into cloud stores like AWS S3 and Microsoft ADLS,” Bosworth said in an email interview with VentureBeat. “With Dremio, that data does not need to be further moved or copied into data warehouses for analytics; instead, the full data set is available directly in native cloud storage. ”
Dremio offers a virtualization toolkit that bridges the gaps among relational databases, Hadoop, NoSQL, ElasticSearch, and other data stores, connecting to business intelligence software as if it were a primary data source and querying it via SQL. (SQL is the domain-specific language designed for stream processing and managing data held in a relational database management system.) The startup’s eponymous platform maintains a catalog of sources, physical and virtual datasets, and datasets’ lineage, making it easier to search and find datasets and see how data are being transformed.
Dremio is available in an open source Community edition as well as a commercial Enterprise edition. It runs in the cloud via Kubernetes or in a Hadoop cluster, and subscription pricing scales based on the number of nodes to which Dremio is deployed.
Joining capabilities native to Dremio enable data lakes to benefit from other stores, including Oracle, SQL Server, and PostgreSQL databases. And Dremio automatically detects schemas and supports cloud data lakes in Amazon S3 and other cloud storage providers, leveraging the Apache Arrow data structure to speed up performance by 1,000 times, the company claims.
Thanks to features like automatic failover, Dremio can automatically select new nodes in the event of node and instance cluster failures. The platform’s dynamic access, moreover, delivers programmatic security controls through integration with Kerberos, LDAP, and other centralized providers.
On the AI side of the equation, Dremio taps machine learning to recommend datasets to users and adapt catalogs in response to changes in schema and execution. It also algorithmically caches and indexes metadata as needed, in real time and on the fly.
Asked whether the pandemic has affected business, Bosworth said it hadn’t, pointing to Dremio’s 60% growth in headcount since March. Other than a delayed sales cycle when the startup’s customers transitioned to working from home, Dremio weathered the storm well, growing its customer base to 100 companies — a majority of which are from the Forbes Global 2000 — with over 75,000 users.
“Data analytics has always been important to our customers. This year, it has become more imperative than ever as we navigate this pandemic,” Bosworth said. “Dremio was already a distributed company, so we did not experience any loss of productivity.”
Dremio’s series D round announced today was led by Sapphire Ventures and included participation from existing Dremio investors Insight Partners, Lightspeed Ventures, Norwest Venture Partners, Redpoint Ventures, and Cisco Investments. As of today, the company has about 160 employees — a number it expects will double by the end of 2021 — and has raised $247 million in venture capital.
Dremio raises $135 million to help companies rapidly analyze data The British Journal Editors and Wire Services/ Venture Beat.