An Overview of Open Source
This series of posts will discuss all facets of what it takes to build an open source business. Part 1 focuses on a broad overview of OSS with a little history.
Introducing the Series: Open Source Based Businesses
Why write this series?
I get questions all the time from friends and folks on the internet about the fundamentals of open source businesses. It could be a question about a specific company and how they’ve gone to market (or should go), whether an investment is worth taking on or not, or whether a competitor needs to be taken seriously.
Rather than answer these one off, it struck me as a good idea to write down the ideas, the conclusions, the lessons and what open source has been able to accomplish.
What will this series cover?
This series will cover everything about open source, focusing on the following structure:
The Market (or rather, the basics of product development)
Building your Open Source Solution
Validation & the Community Choice
Getting Revenue
Expanding the Market and Longer Term Strategy Choices
All of this content will look at real world examples diving into how open source businesses have succeeded and failed.
This is just the first post in this series and there’s a lot more to come, so please subscribe to the Product of Data Newsletter!
If you’re interested in hiring me to consult for your open source company, please send me a message on LinkedIn.
What is open source?
Open source is all about developing a project, product, or code out in the open. Most commonly the source code is freely available so that others can contribute, modify, fork, or at a minimum view the source code. The project is developed in a decentralized way so that people from across the world can contribute to the project. The code of the project is made available under a license that controls what happens to the contributions of others.
There’s plenty of resources on the web from wikipedia to opensource.com to learn more about open source. This series will focus on monetization and building a business around open source. Let’s review a quick history and then dive into the three core monetization models.
blitz history of open source
I won’t pretend to know the entire history of open source nor do I care to cover it. However, reviewing the broad strokes of history will be helpful. This section defines the three generations of open source based companies.
Generation 1 “Services Generation”: Linux, Operating Systems, and the Free Software Movement
In the early 80s and 90s, folks like Richard Stallman and Linus Torvalds started building free and open operating systems, licensing them in ways that other people could use them (for instance, with the GPL license). This was a time when proprietary software was the standard and building something and giving it away for free was heretical (see the tone of an original manifesto).
In this era, the term open source was coined and spawned a revolution. While this revolution was taking place, commercialization was quick to follow. While open source (certainly at this time) focused on free and open source, at the same time companies commercialized it by selling support and services to compete with Microsoft’s dominance in computing.
Folks started talking commodity hardware and running Linux rather than buying proprietary hardware - Most notably, Red Hat. They built a version of Linux around which they sold support and services.
I call this Open Source Business Gen 1: “We (a company) are smarter than you (another company) and you can pay us to build software for you and / or teach you how to use the software. The software remains free, but it’s complicated to run and operate and that’s why you need us.”
Generation 2 “Better Way Generation”: Google File System, MapReduce, and the Apache Ecosystem
In the early 2000s, with the massive increase in computing and data, companies like Google developed amazing tools to help them solve internal infrastructure problems. This generation began with, closed source projects, the Google File System and MapReduce for storing and processing massive amounts of data.
Other companies, institutions and people got jealous.
Given the designs in the GFS paper, communities created competing projects, in the open source. Most notably, folks built Apache Hadoop and Hadoop MapReduce.
The Hadoop Project strove to make a competing project available to any company (but based its creation at Yahoo).
Hadoop made the following promise:
Hadoop abstracts the hardware (allow you to use commodity hardware with Linux),
Hadoop leverages this abstraction to give you a file system (HDFS), and
Hadoop makes available compute, on top of that storage, and a framework for manipulating the data stored on HDFS (MapReduce).
A true platform, Hadoop spawned a massive number of projects built on top of the core primitives of Hadoop and MapReduce. This ecosystem attempted to cover a mass number of use cases, as you can see in a reference diagram of from the approximate time.
Copying proprietary designs (or open sourcing parts of internal infrastructure at big companies) became a pattern. Projects like Apache Spark, Apache Kafka, and Apache Mesos became open source projects modeled on existing projects by a community (or research group) or spun out of companies. These companies typically sold a blend of consulting, managed services and hosted product.
Do not be mistaken, there were other companies in this generation that approached the problem slightly differently like Elastic, MongoDB, Redis, CockroachDB (an interesting wired article at the time sharing the creator’s perspective) that have been quite successful outside of the Apache ecosystem (and weren’t necessarily spun out of a larger company or existing project), but the business approach was roughly the same.
I call this Open Source Business Gen 2: “We, as a developer community, can compete with large companies with dedicated resources. After establishing a community, we sell services around that project and begin giving you either a distribution of our software (along with support) or even a managed version of the product.” The key difference in this generation is that rather than building (pure) services around the open source project or community, there’s a focus on building and selling services.
I call this the “Better Way” generation because it’s a better way than selling services 😉.
Generation 3 “Product Generation”: Delta Lake, ksql, Streamlit
In the past 10 years, open source has become mainstream in the developer tooling space. Many projects are with the express goal of building a community around them and then monetizing that community. Streamlit being a great example.
In addition, generation 2 companies looking for growth also built new projects with the intention of breaking into new markets and dominating them. Databricks (former employer, known for Apache Spark) and Confluent (of Apache Kafka fame) are two great examples of this trend.
In any marketplace, competition will increase, and companies need to break into new markets. An option you have is to create and roll out a new project like Delta Lake and release it (or some of it) to the community. You use this to grow your market, expand into new markets, or to differentiate against the competition.
ksql makes for an interesting example. While Apache Kafka and its community always touted its capability as the inside-out database, the dream was complex to implement and adoption was therefore limited. However, ksql enables the users of Kafka to drive more usage on the platform (with a new set of users). Delta Lake did a similar thing for Databricks.
Of note for both of these projects, however, is that neither of them joined the “upstream” project - they became their own. These projects facilitated a strategic judo of improving adoption of the underlying platform more easily than you could with a proprietary product.
Streamlit, from the outside, approached it in a similar way. Starting with an awesome demo, they built a strong community product and focused on building an exclusively managed service for that project. Their outcome, wasn’t too bad 😉.
I call this Open Source Business Gen 3: “We will give you this software, but it’s going to be complicated to run and let’s be real, you’re focused on other things. We’ll run it for you and you just pay us, with the ‘peace of mind’ that if you need to move off of us, you can go back to open source.” The key difference in this generation is that dealing with services at all, it’s just building a project and community, and wrapping it up in a managed way.
Conclusion
In this post we took a whirlwind review of open source. We took a look at its foundations, we look at the various generations of open source from services companies to product companies.
Over time, we’ll dive a lot deeper into each company, their approach, where it works well and where it doesn’t. This is just the first post in this series and there’s a lot more to come, please send your feedback.