Our first year open-sourcing machine learning

Bigger Picture

On a wet and windy day in October 2014, Team Seldon was sitting on Brighton Beach, discussing the bigger picture of what we could achieve. We had come a long way over the previous three years; every month we were serving content recommendations to over a hundred million people. However, we believed that if we continued to ship a black box solution we would increasingly face obstacles to user adoption by enterprises. Machine learning technology was becoming increasingly commoditized, and new applications were, and continue to be, both widely developed and adopted.

Through the mist that had settled over Brighton Beach the bigger picture of what we could achieve became clear — democratising machine learning. It was time to do the most disruptive thing we could imagine and open-source the platform and algorithms that we had spent many years, and a couple of million pounds, building. There was a risk that our competitors would steal our IP, but there was an even greater risk that we would lose our momentum by looking to protect our position rather than continuing to innovate. So we took the decision to pivot in one of the most exciting ways a technology company can, and go open-source.

Going open source

Our primary motivation — that remains stronger than ever today — is to give data scientists the best tools for the job, so they can focus on using machine learning to solve the problems that are unique to their organization and help decision makers make better decisions instead of reinventing the wheel.

Choosing an open source license

Picking an open-source license was the biggest decision after going open-source. The two main contenders were GPL and Apache 2.0.

The most significant difference is that GPL has “copyleft” built into the license, which means developers must release modifications under the same license. The reason most companies that use GPL software don’t copyleft in practice is because the license contains an “application service provider loophole” which means you can use GPL code to build a SaaS product. Affero GPL seeks to close this loophole. But we believed forcing copyleft would come at the expense of reducing adoption. Seldon didn’t choose GPL because we didn’t want to bind companies with a license that forces their modifications to become open-source in any circumstance.

So we opted for Apache 2.0 and haven’t looked back. It’s a much more permissive license that gives businesses the flexibility of using Seldon in their projects, but there is no obligation to make any modifications, or products that use Seldon, open-source. You are free to create a commercial service on top of Seldon, as are we. And while one reasonable option would have been to ship a restricted version of Seldon under this license, we didn’t hold back. To this date, our entire codebase has been released under Apache 2.0. We are planning to release some enterprise tools, and algorithms and models that serve specific industry use cases under a separate commercial license in future.

Countdown to release

Four months later in February 2015, we were ready to release our open-source machine learning platform.

In the run up to launch we were kept very busy finishing code refactoring and documentation. In most SaaS services the complexity gets abstracted by an API in the “as a Service” part. As we discovered, for open-source projects, setup and deployment present a new set of challenges. To help developers and data scientists to get started quickly, we distilled and updated a tremendous amount of internal documentation on our wiki into a new community site hosted on GitHub Pages — our online docs have consistently had much better engagement metrics than our core website.

A few days after our first announcements and press coverage, hundreds of well-known global tech, media, e-commerce and financial companies signed up to access our beta releases. I’ve been building start-ups since 2003 and have never seen early signs of traction like it. These first signals substantiated our belief that enterprises were hungry for an end-to-end platform-agnostic open-source machine learning solution.

It took a few months for the awareness and interest generated by our open-source launch to filter through into tangible commercial opportunities. We found in the period leading up to our first open source release that companies wanted to wait and see what was coming in the open-source version, even companies who were exploring our SaaS offering.

About six months after release, Seldon gained sufficient momentum that made it clear that it was gaining users, and we were making regular releases to our code and supporting documentation. For the first time, we found ourselves in a position of having to be more selective about the opportunities that we invest pre-sales efforts. Over the last few months, we focused more on the larger customers, unfortunately at the expense of working with some fantastic SMB companies. Serving all businesses — from the Fortune 100 to pre-revenue start-ups — drives our current focus on streamlining our commercial proposition (I will talk in more detail later in this series of blog posts).

Impact on customer acquisition

By removing commercial barriers, our goal was to eliminate the cost of onboarding new users. Our code is open source, but there are still setup and maintenance costs when building a solution for a production environment. We believed a large enough percentage of a larger pool of open source users would want to deploy solutions faster by leveraging a combination of Seldon’s technology and team.

Many companies prefer working with software based on open standards delivered by a simple API service. While some companies who previously opted for SaaS no longer pay for open source alternatives, open source is a significant driver of SaaS customers.

We are planning to release a fully self-serve SaaS service later this year. Currently, most of our new customers are paying for a solution where Seldon maintains the service and infrastructure and provides integration support. Regardless of the deployment choice, we found developers love having a clear migration path between self-hosted open-source and fully managed SaaS.

Interestingly we never lost any existing SaaS customers to open source. Disrupting ourselves in this way was one of the biggest risks, and it’s clear that although it appears on the surface that all of the value is now on Github, this isn’t the case. Open-source and SaaS business models complement each other extremely well.

Building an open-source community

Creating a broad network of contributors with different ideas and experience can help to shape the platform in new and exciting ways.

Seldon is an enterprise product. Many of the developers using it will be subject to the employment terms around any code that they produce. While many of the most successful technologies companies are now open-minded about allowing their developers to contribute to open source projects, it certainly isn’t the norm.

We have thousands of open-source users and have seen three key phases of community engagement so far:

1. Documentation — Our first pull requests were fixes and amends to our documentation on GitHub Pages.

2. Support — We encourage our community to make support requests via our public user group and real-time Gitter chat room and create issue tickets on GitHub. Seldon’s team makes it a priority to monitor and respond to these channels, but we are now starting to see members of the community supporting each other with technical questions.

3. Pull Requests — We’re now seeing pull requests to update and fix the code. For example, Seldon has dependencies on many open source libraries, and it’s often important to test and upgrade these to ensure the latest features, and patches for any security vulnerabilities are in place. Our users are working across a variety of different development and production environments that would be burdensome for us to test individually.

Right decision

Going open source gave team renewed our sense of purpose. We wanted to help thousands of developers by releasing technologies previously reserved for companies such as Google and Amazon into the wild. We are huge supporters of open source, and we found each new release that we tag on GitHub so much more rewarding than releasing updates to our API.

As we have been hiring over last few months, we have seen first hand that getting paid to work on an active and growing open-source project is an attractive opportunity for talented engineers. So much quality code never sees that light of day. Working with an open-source organisation helps developers to build a public reputation for their code.

So, is open source the right strategy for your company? Despite the exciting first year we’ve had since open sourcing, we plan to release some enterprise products and domain-specific features under a separate commercial license. Open source makes sense if your service is a platform that developers build upon, like Seldon.

If your app or service seeks to solve a domain-specific problem or use case directly for consumers or businesses, the case isn’t as strong. In any event, if you have developer APIs as a supplementary rather than core part of your proposition, it’s harder to draw a parallel to our experience and motivations. In these cases, consider open-sourcing the components of your overall product that your developer community use, such as client libraries for the programming language that developers use to integrate your API, instead of your entire service. You should also consider the technology in your space itself is heading toward commodification, as has proved to be the case with AI.

There is a tension between focusing on customers and building the community. Paid projects are the ultimate source of customer development as it gives us the opportunity to learn from short feedback loops that help us to gain a better understanding of how we should prioritise our roadmap for the benefits of both the community and our customers. However, there must be a balance, and it’s important that we continue to carve out enough time to nurture our community. As we scale out the team in the coming months, a dedicated community manager and developer evangelist will be a pivotal role. If you are interested in taking on the challenge, I would love to hear from you.

Barclays Accelerator

Since late January, Seldon has been part of the Barclays Accelerator, working closely with the Techstars team, mentors and associates, to refine our core proposition and explore use cases in the financial services sector. We have been blown away by the quality and commitment of the Techstars network, both at home in our day-to-day operations and the engaging with the global community. Barclays have learned a lot from this pioneering initiative in their first two cohorts, and we have had the opportunity to discuss a huge variety of use cases with executives and technical leaders at a pace that would be unheard of outside of this kind of engagement.

We have seen first hand that machine learning has the potential to impact all parts of a bank, not just the sharp end of revenue generation in the front office where quant developers have been working their magic for years. Barclays Accelerator finishes with a huge Demo Day on 18th April, pitching to an audience of around a thousand people at the O2 — on the same stage that Coldplay performed just a few weeks ago. But it doesn’t finish there. The Techstars network membership is for life, and we anticipate that it will continue to play an active part of how Seldon develops in the long-term. I hope that we can also give back to the broader community. Some great things are coming out of these past ten weeks or so, and I can’t wait to share further developments with you over the next few weeks and months.

This is the first of a four-part blog post series. In my next post, I will dive into the product milestones since we went open-source with some narrative on why we chose to focus on these features. Later in the series, I will discuss the rapidly evolving open-source machine learning landscape and share some of Seldon’s plans for year two and beyond.

To find out more about Seldon, please check out technical documentation and enterprise services. Get started in minutes with our virtual machines and AMIs. Star and fork us on Github. The best ways to keep in touch is to sign up to our newsletter, join the seldon-users group, and follow us onTwitter and Facebook. Don’t be shy — send us an email. Did I mention we’re hiring? 🙂

Thanks to Techstars associates Libby Kinsey, Ha Duong and Andy Tomlinson for all of your great input into these posts.

Contents