Snowflake pushes back at… whom?


Disclaimer: Matt Asay works for AWS but the views expressed herein are his and don’t reflect those of his employer.

In two recent blog posts (“Striking a balance with ‘open’ at Snowflake” and (“Where open helps and where it hurts”), Snowflake spent 6,064 words arguing a very simple concept: All software need not be open—open source, open standards, open APIs. It’s not a particularly objectionable argument and reflects the reality that while virtually all software includes open source code, most software isn’t licensed as open source. Snowflake, in other words, is safely within its rights to keep its software closed.

And yet the company clearly felt the need (twice) to justify its decision, reflecting the strong gravitational pull of open source, open standards, and open APIs, even when its customers don’t appear to be clamoring for them.

Open sourcing data

Nearly a decade ago, Cloudera Co-founder Mike Olson made a bold declaration: “No dominant platform-level software infrastructure has emerged in the last 10 years in closed source, proprietary form.” Olson was mostly correct. Splunk had emerged in that time and perhaps a few other examples, but, on balance, he was right.

Fast forward to 2021 and Olson’s pronouncement has remained pretty accurate with few exceptions. Snowflake is one of them. The company that bills itself as the data cloud company has managed to build a big business with a proprietary SaaS offering in an industry awash in exceptional open source data infrastructure like Apache Hadoop, Apache Arrow, Apache Spark, and more.

This perhaps reflects a more nuanced reality: Enterprises may intuitively want “open” but they place a bigger premium on “working.” This has been clear for years as companies have introduced managed services to make it easier to consume open source software or, in the case of companies like Fauna and Snowflake, provide managed services that aren’t based on open source at all. Getting both “open source” and “operationally easy” in the same service is the holy grail, but if enterprises must choose one, they’re going to pick the solution that is easiest for them. After all, a customer can turn to Apache Spark, Dremio, or any number of tools to build data warehouses or data lakes, yet thousands of customers spent roughly half a billion dollars with Snowflake last year.

So why is Snowflake defending a position that its customers seem to like?

That’s a lot of words

Between the two posts, Snowflake spent a lot of effort (3,798 words on the Snowflake blog and 2,266 on the InfoWorld post) to say “We don’t think everything should be open.” That’s a lot of digital ink spilled to obfuscate a clear and perfectly acceptable message that pretty much every vendor on the planet agrees with. For example, in the InfoWorld blog the company touts the excellent contributions its employees have made to the open source database FoundationDB, which the company uses in its infrastructure. Great!

But then it follows that statement with an awkward add-on: “However, we don’t extrapolate from this to say there is an inherent merit to open source software.” The authors then double down on the argument that “open isn’t a panacea. We strive to avoid misguided applications of open that create costly complexity instead of low-cost ease of use.”

The company simply intends (and ultimately says) that open source is a means, not an end. That’s true! But along the way it also makes errant claims about open source, suggesting that it somehow would diminish the company’s ability to secure their software, which simply isn’t true. “At Snowflake, we believe in the value of open standards and open source, but also in the value of data governance and security,” the company’s co-founder says in the InfoWorld blog. That “but” is wholly unnecessary and implies that open standards and open source undermine data governance and security. Neither is true.

There’s also the false premise that source code must be useful to all to be useful at all. On the company blog, the authors say, “The query processor of a sophisticated data platform is typically built by dozens of PhD program graduates, evolved, refined, and optimized over years. Source code availability may not significantly increase the ability to comprehend its inner workings.”

Michael Fischer, a containers expert at AWS, picks up on this: “Open source was not about enabling users to understand and enhance the software. It’s about enabling the world to do so. Just because relatively few people are capable of understanding or patching Linux kernel code doesn’t mean its openness has had little impact. It’s a little smug and insulting to suggest that they shouldn’t share because only PhDs would understand it. In fact, science advances through sharing and publication. That’s the whole point of scientific journals and conferences. The art advances through disclosure.”

Fischer is correct, but of course, there’s no law stipulating that Snowflake must or even should open its code, file formats, or anything else. Dave McCrory, VP of Growth and Global Head of Insights and Analytics at Digital Realty, and a longtime cloud and open source observer, points out, “Not all software needs to be or should be open sourced. Open source is an appropriate license/model for a lot of software but not all.”

Whether Snowflake should is ultimately a decision for its customers, and based on revenues, it seems that Snowflake’s customers don’t care. So again, why write the posts?

Selling past the close

Most of Snowflake’s big competitors also offer proprietary data cloud/platform services. (Disclosure: I work for AWS, which is a Snowflake partner and competitor, though I am not involved with that part of our business.) It’s highly unlikely, for example, that Oracle salespeople are beating up Snowflake for offering proprietary software. Perhaps the pressure is coming from Databricks or other open source vendors?

Databricks recently launched its Delta Sharing project, an open protocol for securely exchanging large data sets in real time. This was just one of Databricks’ announcements at the Data + AI Summit, which sported the tagline, “The future is open.” Nor is Databricks alone in positioning its data cloud as an open alternative to solutions like Snowflake. Journalist Sean Kerner told me, “You should see my inbox… Every other pitch is ‘X is an open alternative to Snowflake.’ ”

Snowflake, for its part, is adamant that open is not the correct answer in file formats, source code, and more. Not always, anyway. Maybe it’s correct. But writing thousands of words arguing against open, versus simply demonstrating value to customers through its offerings, is poor marketing. As I wrote in 2020 about the Snowflake IPO:

“Developers have never been overly religious about open source. The reason for [Olson’s comment about a] ‘stunning’ trend is simply that open source made it easier for developers to get their jobs done thanks to high-quality, easily accessible, open source data infrastructure. There are, of course, other benefits, such as the communities that often accompany open source projects, coupled with a desire to have more granular control of one’s software stack. But ultimately open source has won because it enables developers to ’get —- done.’ Which is why, for example, you’ll find developers happy to use open source software like Apache Airflow to load data into their proprietary Snowflake data platform. It’s not cognitive dissonance. It’s pragmatism.”

By rationalizing its decisions rather than simply delivering value to customers, Snowflake ends up confusing more than it clarifies. Enterprises clearly appreciate what it’s selling. No need for apologies about not being open enough.