Machine Learning


There is no one-size-fits-all predictive algorithm for all digital services. Or even a silver bullet for all users within a service across any given context (time, location, device, etc). Throughout our years of experience building recommendation engines and predictive models, we are constantly developing better ways to test and optimise algorithm performance in real-time to boost KPIs such as CTR, engagement and conversions.

Common Features

  • Ability to apply multiple algorithms.
  • Combine the results and scores in different ways, e.g. first successful, weighted average, reorder by popularity.
  • Diversify the results based on previous recommendations to increase randomness.
  • Exclude previous recommended content based on implicit/explicit feedback and on content position in the recommended list.
  • Change algorithm configuration in real time with no redeployment.
  • Create A/B tests with no redeployment.
  • Dynamic optimisation with Multi-Armed Bandits.
  • Combine with domain-specific business logic (i.e. media, e-commerce, finance).
  • Microservices API provides a pluggable architecture to integrate your in-house algorithms using a common interface.

User Clusters

Improve relevance of recommendations in high churn media services.

  • Cluster users based on historical activity.
    • configurable taxonomy (category, price range, brand, visit referrer, ..)
    • unsupervised (fuzzy k-means)
  • Apache Spark to handle large historical data sizes.
  • Load user clusters into front end servers periodically and count content hits for users in same cluster.
  • Decay counts to provide activity dynamics as new content is published.
  • Recommend by combining counts for content based on cluster membership of user.
  • Real-time stream processing for adding short-term dynamics to recommendations.


Tag Affinity

Combine metadata with trending articles

  • Model associates users with weights to tags from content.
  • At runtime find trending articles with tags associated with user.
  • Combine results for recommendations.
  • Can be used to find niche user clusters.

Item Activity Correlation

Built for static slowly changing historical inventory

  • Similar to Amazon’s “people who bought this also bought…”
  • Use historical user activity to find items that share similar user activity.
  • Apache Spark scalable offline implementation.
  • Upload for each item: top-N similar items.
  • For each user: item recommendations based on their historical activity.


Topic Models

Built for sites needing long tail recommendation

  • Assume activity is associated with a set of topics.
  • Users individuals tastes are covered by a subset of topics.
  • Describe users by the set of keywords for the items they have interacted with.
  • Built with Apache Spark and Vowpal Wabbit implementation of Latent Dirichlet Allocation.
  • Online serving layer scores user association with items in real time.


Latent Factor Models

Best for e-commerce sites lower churn sites

  • Netflix Prize winning solution.
  • Use Matrix Factorization to reduce activity matrix to two low dimension user and item factor matrices.
  • Load factors into API servers and score users and items in real time.
  • Fold-in new users and items until next batch update of model.
  • Utilize Apache Spark mllib and streaming modules.


Association Rules

Suggested the next best item given current set of items

  • A form of basket analysis that is useful in e-commerce to provide recommendations for which items could be added to a basket given a current set of items.
  • There are two Spark jobs that need to be run consecutively:
    • Basket Analysis: break up the actions event stream and process the add-to-basket and remove-from-basket events and create a set of session baskets.
    • Association Rules : This will process the baskets , find frequent itemsets using the Spark MLib FP Growth algorithm and create association rules.

Content Similarity

Built for services with rich metadata and high sparsity

  • Requirement – fast content based technique to match user history to similar content based on text/tags of content.
  • Utilize random vectors technique. Each word/tag is assigned a random high-dimensional vector.
  • Open-source Semantic-Vectors implementation.
  • Periodically process recent content into vectors and update servers.
  • Servers load vectors into memory.
  • Recommendation on recent user activity to find similar content in real-time.


Let's talk
US: +1 (646) 397-9911
UK: +44 (20) 7193-6752

Find us

Rise London
41 Luke Street
view map

Join our newsletter

Receive updates, invites to events, and early access to product releases.