Open Sourcing a Predictive API

On 6–7th August, Sydney will host the 2nd international conference for Predictive APIs and Apps. PAPIs will bring together machine learning practitioners from industry, government and academia to present new developments, identify new needs and trends, and discuss the challenges of building real-world predictive applications.

I look forward to discussing the lessons learned from the development of Seldon, which was originally a closed, black-box predictive API, now transitioned to an open-source model. Here’s the abstract from my speaking proposal.

Large organisations and start-ups are increasingly investing in building in-house predictive data science and trying to figure out how machine intelligence can be deployed to solve real-world business problems. Data scientists are demanding more control and flexibility than existing predictive APIs delivered over HTTP by third parties can offer.

By closely examining my experience of being the first commercial predictive API to switch to an open-source model after operating at scale, I will explore an important paradigm shift that will affect everyone in the industry. Over the last decade, most of the industry-changing innovation has occurred lower down in the data science stack. Why has it taken so long for innovation to move higher up the stack?

The standard method for building predictive APIs was to connect the separate open-source components that make a modern predictive data science infrastructure, create algorithms that can build predictive models from live behavioural data, and then deliver the service to projects or consumers via an API. Communities of contributing developers helped to accelerate the innovation of new open source technologies -such as Spark- moving from early brittle releases to the prime time. Data scientists have been building “black boxes” to protect IP; however this leads to many companies working to reinvent the wheel.

We are now entering a new phase of commoditization by using open-source across the entire data science stack, all the way from hardware to OS to algorithms. This openness brings with it an exciting phase of accelerated innovation and opportunities for collaboration, even between organisations that had previously considered themselves to be competitors.

This fundamental shift in industry economics requires a rethinking of existing business models and comes at a time when predictive APIs are specialising to serve vertical business model because each industry requires domain-specific knowledge.

Open platforms are an opportunity for standardisation of the back-end predictive data science stack, with pluggable architectures that enable developers of front-end algorithms to focus on solving the last mile of domain-specific problems. Open innovation and tools are particularly important because there is a shortage of skilled data scientists with specialist industry knowledge (e.g. finance or genomics) and accelerating demand.

An open data science stack reduces time to market, enabling data scientists to focus on solving the problems specific to their business and deploy cutting edge machine intelligence within their organisation. Enabling faster adoption of both open and closed machine intelligence advancements leads to exciting new possibilities. For example, neural networks that can make sense of images in the field of computer vision and transcribe audio can be combined with predictive models and decision-making expert systems to make machines more human.

If you have any comments on my talk abstract, let’s kick off the discussion in advance. And if you enjoyed this post, please hit the recommend button to say thanks. I’d love to hear from anyone who will be attending PAPIs 2015 on 6–7th August or KDD 2015 on 10–13th August in Sydney. Get a 20% discount on PAPIs 2015 registration by using the promo code SELDON.

To find out more about Seldon, join our newsletter, read our technical docs, and star, watch and fork us on Github, follow @seldon_io on Twitter and like our Facebook Page. And feel free to drop me a line directly.

Contents