The data science boom has contributed to renewed corporate in open platforms. But this open embrace can sometimes be taken too far.
Allow me to illustrate.
Australian businesses are in a war for talent. In some cases, open source adoption is being used as a drawcard to attract developers. These are tools that developers may already use freely, and there’s an attractiveness in the continuity of being able to carry familiar tools and practices into the workplace. Organisations trying to attract developers therefore tout themselves as playgrounds of open source tooling. Whether that makes sense from a practical work perspective is unclear, however, and it may invite conflict down the path if better - but not open source - tooling is found and favoured.
On the flipside, there’s evidence of some Australian organisations being selective and learning to take what makes sense from open source and open standards. They may like what they see with open source from a practices and community collaboration perspective, and want to bring those cultural elements in-house. For these companies, the buzzword is ‘innersource’ and it is being seen in more and more local code shops, such as insurer IAG.
At Snowflake, we’ve had to balance these same influences on our product development process over the years. We constantly evaluate where open standards, open formats, and open source can help or hinder our progress towards those goals. In short, it is a matter of degree.
We frequently see strong opinions for and against open and ‘table pounding’ demanding it. Some companies would want everyone to believe that open is what really matters. But we always come back to what matters to us most, which is what our customers want.
What matters to customers is security, performance, costs, simplicity, and innovation. Our argument has always been, and remains, that using open should be at the service of these goals, and that going all-in on open should not become a goal unto itself.
This is being lost in the current push around open platforms. A level of pragmatism needs to be brought back into the conversation.
Where open works
Open is often understood to encompass two broad characteristics: open standards and open source. In the appropriate context, these characteristics can enhance value to users of technology systems.
In the data world, open standards for file formats are often portrayed as a solution to data portability and interoperability challenges. These focus on the ability to get data out of a system and avoiding costly lock-in. But open formats often aren’t an optimal way to represent data during processing.
At Snowflake, we don’t think of open as a blanket non-negotiable attribute of our platform, but we’re instead very intentional in choosing where and how to embrace open.
We’ve fully embraced standard file formats, standard protocols, standard languages, and standard APIs. We are very intentional and careful about where and how we do so in order to maximise the value we provide to our customers. Similarly, we fully embrace open protocols, languages, and APIs.
With proper abstractions and promoting open where it matters in the domain of file access and formats, open protocols allow us to move faster (we do not need to reinvent them), allow our customers to re-use their knowledge, and enable fast innovation due to abstracting the “what” from the “how.”
On the open source front, Snowflake also delivers a small number of components that get deployed as software solutions into our customers’ systems, such as connectivity drivers such as JDBC or Python connectors or our Kafka connector. For all of these we provide the source code.
However, we maintain control of our core, because the query processor of a sophisticated data platform is typically built by dozens of PhD program graduates, evolved, refined, and optimised over years. Source code availability may not significantly increase the ability to comprehend its inner workings. We’ve also seen many unintended or undesired consequences when source code is opened, including fragmentation of platforms, incompatible forks, less agility to implement changes, and competitive dysfunctions.
The better model is the deployment of that core as a cloud-based managed service. Encapsulation of the inner workings of the service allows for its fast evolution and speedy delivery of innovation and improvements to customers.