When Substitution Models Go Wrong

FREE GUIDE

Deploying LLMs: Best Practices

Learn how to efficiently deploy, scale, and monitor Large Language Models with expert insights and real-world strategies.

By submitting this form, I agree to the Terms of Use and acknowledge that my information will be used in accordance with the Privacy Policy.

One of the most critical challenges with the data of the retailer is to determine the substitutable product pairs, usually online. While we apply machine learning approaches, it should involve assortment optimization engines to generate outcomes.

The shopping experience is definitely ruling the brick and mortar stores, however, the rise of online shopping has revolutionized this area and led to online retailers emulating this tactic. Estimated to reach a whopping 6.54 trillion US dollars by the end of 2022, the global retail e-commerce industry has grown leaps and bounds in the last few years. With multiple players competing for buyer’s attention, one of the most useful features that help attract customers and ensure a constant repeat business flow is product recommendation.

The cost of not having options

When you accumulate consumer purchase history, you can easily apply data analytics. This assists you in identifying the relationship between different items, placements of the items, supply-chain movement, and, lastly, forecast inventory needs. This is known as Retail Analytics.

For instance, when a customer orders a certain brand of bread, and that product is out of stock on the day, it makes sense to substitute that item with a similar bread from another brand. In retail, if you go to purchase a t-shirt and it’s no longer available, then substitute options are made available to give the audience alternative options.

At scale, these can be the high-revenue items, so when we offer suitable substitutes, we can significantly improve profitability. With the help of alternative product recommendations, we assist the customers in speeding up their process to find the right products which enables a better experience for the buyer, and protected revenue for the retailer in which they may have lost without having the substitute engine. These substitutions can be achieved in two ways:

content-based recommendation
leverage customer behavior.

For the first method, we can replace items with each other which have similar attributes. Here, we can learn item embeddings to compare the similarities between the items. The second method is by following the shopping preference of the customer. It is like item-to-item collaborative filtering.

Nevertheless, this supervised learning process has a cold start. Newer algorithms help here, but the purchase behavior-based graph will not cover most of the products. You can use data mining and machine learning to generate the best fit covering almost every product attribute data.

Amazon is an excellent example that uses item-to-item collaborative filtering recommendations in most of the pages on their website and email campaigns. A report by McKinsey suggests that 35% of Amazon purchases are based on recommendations systems.

With so much choice and so many alternative vendors, consumers want a frictionless buying experience and to purchase at their convenience, but when fast moving goods combine with inaccurate demand forecasting, substitution offerings need to be provided to remedy certain items that are not available. If you are not able to satisfy orders, you run the risk of the customer exiting without purchase completion, and fulfilling their entire order elsewhere.

When substitution models go wrong

Substitution models can perform incorrectly for various reasons when not managed properly, and these stem from not having a suitable machine learning framework. Models are trained and served into production, but not monitored and optimized, and then when errors start to happen, Data Scientist’s don’t have the solutions necessary to understand why – which slows down the feedback process and even makes it hard to know where to start.

Poor substitute products are often vocalized by customers through social media as this Daily Mail article explains. Here are some examples of where companies have made errors:

Hair dye instead of diapers
Batteries being sent in a different size
Daffodils being sent in place of Cadbury’s Creme Eggs
3kg of courgettes instead of 3 loose courgettes
Bottle of ginger beer sent instead of a panini

Creating a machine learning framework to deploy and scale substitution models

Prior to deploying models into production, Data Scientists will train and test models which perform to a level of accuracy. However, the minute they’re made live is when models start to decay and perform to a lower level, which is why errors start occurring. It’s important to recognize and remember that machine learning isn’t a tick-box activity in terms of training and deployment, but rather an ongoing feedback loop to unlock and maintain business potential. As the retailer’s machine learning operation scales (ie. the number of models and applications it integrates with in production increases), it becomes more challenging to manage, maintain and organize this environment to ensure ongoing accuracy and performance.

Over time, models will drift. Concept drift is a specific type of model drift, and can be understood as changes in the relationship between the input and target output. The properties of the target variables may evolve and change over time. As the model has been trained on static training data, this evolution can negatively impact the accuracy of the model. For example, the increase in vegan food popularity may impact the accuracy of meat and dairy product substitutes – in this case, a vegan may receive a meat substitute to their meat-replacement item. Or, changing fashion trends and buying behavior might result in substitution items being less relevant. To ensure ongoing accuracy, best practice is to implement drift detection and alerts, to notify owners when models start to perform at a lower rate, so that this issue can be dealt with efficiently and the feedback loop accelerates. Models can then be put into the retraining phase and redeployed.

Another potential issue to monitor is anomalies. This is a value that deviates from other observations on data that a model was trained on. In short, the algorithm doesn’t have an answer to the query as it’s not included within the model’s data ranges. Again, this can be managed by alerts, in which case, these are flagged, a team member can deal with the substitution product which, although is a manual process, ensures errors are not made, and the Data Scientist can then retrain the model on data that factors this anomaly, so that in future cases, the model runs like clockwork.

Finally, explainability is prevalent in highly regulated industries (such as financial services) where regulations pose time constraints and compliance. However, for a more meaningful feedback loop, explainability within machine learning will enable Data Scientists in the retail industry to understand why models perform the way they do, and in this case, why certain substitute products were chosen. This level of qualitative insight reduces the retraining period and improves the feedback process, as the Data Scientist can understand the performance in more depth and knows how and where to address the issue early on, and not just that it needs to be retrained.

In a competitive environment where consumers have a wealth of choice at their fingertips, it’s more important than ever that retailers can offer a better experience and relevant alternatives to products of choice. In a one-off scenario, you may at worst lose a customer, but at scale, this can maintain or increase your average order values, and ensure a long lasting relationship with your customers.

Take Control of Complexity With Seldon

With over 10 years of experience deploying and monitoring more than 10 million models across diverse use cases and complexities, Seldon is the trusted solution for real-time machine learning deployment. Designed with flexibility, standardization, observability, and optimized cost at its core, Seldon transforms complexity into a strategic advantage.

Seldon enables businesses to deploy anywhere, integrate seamlessly, and innovate without limits. Simplified workflows and repeatable, scalable processes ensure efficiency across all model types, while real-time monitoring and data-centric oversight provide unparalleled control. With a modular design and dynamic scaling, Seldon helps maximize efficiency and reduce infrastructure waste, empowering businesses to deliver impactful AI solutions tailored to their unique needs.

Talk to our team about machine learning solutions today –>

Stay Ahead in MLOps with our
Monthly Newsletter!

Join over 25,000 MLOps professionals with Seldon’s MLOps Monthly Newsletter—your source for industry insights, practical tips, and cutting-edge innovations to keep you informed and inspired. You can opt out anytime with just one click.

Email Signup Form

✅ Thank you! Your email has been submitted.

When Substitution Models Go Wrong

Table of Contents

FREE GUIDE

Deploying LLMs: Best Practices

The cost of not having options

When substitution models go wrong

Creating a machine learning framework to deploy and scale substitution models

Take Control of Complexity With Seldon

Stay Ahead in MLOps with our
Monthly Newsletter!

Solutions

Company

Resources

Privacy

When Substitution Models Go Wrong

Table of Contents

FREE GUIDE

Deploying LLMs: Best Practices

The cost of not having options

When substitution models go wrong

Creating a machine learning framework to deploy and scale substitution models

Take Control of Complexity With Seldon

Stay Ahead in MLOps with our Monthly Newsletter!

Solutions

Company

Resources

Privacy

Stay Ahead in MLOps with our
Monthly Newsletter!