What is Reinforcement Learning?

Reinforcement learning is a method of training machine learning models through trial and error and feedback. The model will be given a goal and list of known actions. The model’s input is the measurement of its environment and current state, and output is the model’s action to move between states. Positive or negative reward signals are released after every action is taken. The aim is for the model to establish the optimum sequence of actions to achieve its given goal. 

Reinforcement learning is one of three main types of machine learning approach alongside supervised and unsupervised machine learning. It’s used to train models to perform specific tasks or achieve defined goals in a given environment. The model will interact with its surroundings or environment, performing actions to move between different states. Actions are then either positively or negatively reinforced through reward or penalty signals. Successful actions are reinforced, and unsuccessful actions are penalised. A model will go through many different iterations to find the best possible sequence of actions to achieve a given goal.  

Reinforcement learning is already utilised in many different sectors and scenarios.  Its strength is in dynamically training a system to perform specific actions in an environment. It’s already the learning method for self-driving car technology, and will also power gaming AI such as chessbots. The technique can also be used to optimise process and strategic decisions, for example if a model has the goal to make efficiency savings in a system.  

This guide explores the topic of reinforcement learning in machine learning, including how it works, examples of how it’s used, and its unique strengths as an area of machine learning. 

How does reinforcement learning work?

Reinforcement learning in machine learning works though a feedback loop, in which a model will perform an action and receive a reward signal. The system will choose the path to achieving a goal with the most rewards and least penalties. It is a trial and error approach to machine learning, as the model cycles through different possible iterations until the ideal outcome is achieved. Successful actions are then reinforced through different iterations of the model. The feedback loop allows the model to adjust its action next time, reinforcing any positive success or penalising any negative outcome.  

There are two types of feedback signals used in reinforcement learning. Positive reinforcement is the rewarding of certain actions or behaviour from the model. An event or action is performed which has a positive outcome. This action is reinforced so it is repeated or learned from in future actions. On the other hand, the model may be penalised if it makes the wrong action or decision, again reinforcing the right course of action. This is through a process of negative reinforcement. 

In reinforcement learning the model will have an overall goal it needs to achieve,  which could be to maximise efficiency in a system. An important consideration is in selecting the domain of the reinforcement learning model. This is the type of input the model will  be using to understand its environment and current state. For example, when training the software behind a self-driving vehicle, the input could be live video, GPS data, and  information from motion sensors.  

The model’s environment is the constraints faced by the model in a given area, such as the rules of a game or spatial data. The model moves from state to state within this environment, a measurement of where the model is at a given time or moment. This could be the current positioning of pieces laid out on a chess board for example. The model will perform an action, achieve a new state and receive feedback or reward, an evaluation of the model’s last action. Decisions are generally based on achieving the highest possible reward. 

The model is therefore incentivised to complete a given goal in the best possible way.  Other types of machine learning approaches will use labelled training data to understand patterns and predict trends. Reinforced machine learning on the other hand is used for more strategic tasks, such as choosing the best next move in chess or optimising a supply chain. 

Examples of reinforcement learning

Reinforcement learning is already used across a range of real-world settings. Its strengths lie in learning and improving strategic processes and series of actions. This approach to machine learning is used to achieve specific objectives or goals, or to optimise a series of actions. Reinforcement learning already has many real-world examples, such as playing games like chess or training systems to play as characters in video games. It’s also used in robotics to train systems to move from A to B effectively.  

Reinforcement learning currently has the most success in closed systems or environments. Within the constraints of a game or simulation, the defined parameters and constraints of the environment make trial and error learning more straightforward. Models driven by reinforcement learning are already meeting or surpassing human ability in games like chess or pong. As the approach improves it will be better with more complex actions and goals.  

Reinforcement learning is also being leveraged to improve and optimise strategic tasks. For example, a model could be created to optimise the logistics of a supply chain. Given a clear goal to improve efficiency of logistics, the model can test a huge array of known actions to optimise the process. Another example may be in buying and selling stocks. A model can be trained with the goal of maximising profit, automating the buying and selling process. 

Common examples of reinforcement learning in machine learning includes: 

  • Developing a system to play games such as chess or pong. 
  • Training of driverless car systems. 
  • Optimising chatbots and virtual assistants. 
  • Optimising strategic systems in business and organisations  through a reward system. 
  • As a technique to train deep neural networks.  
  • Making efficiency savings within systems and improving processes. 

The reinforcement learning process

A model built with reinforcement learning techniques can be categorised as a black box machine learning model. This means a human observer can see the input and output of the model, but the functions within the model may not be understood. However, the steps in the process of setting up a reinforcement learning model are clearcut. The first step is considering the type of domain the reinforcement learning  model will be in. This is important in understanding and defining the type of input the model will have, whether that’s visual data or GPS for example. The input data will allow the model to understand its environment and current state.  

All possible actions that model can make need to be defined and set. The model will choose from a list of possible actions in a given state. Examples could be accelerating, breaking, and steering with driverless car technology, or mapping a controller’s input in a video game. These are the actions at the model’s disposal to move between states and complete its goal. Actions performed by the model are measured or judged by the movement towards a specific goal. The model aims to learn the series of actions to complete a task or objective in the optimum way. 

The process of reinforcement learning includes: 

  • Choosing an appropriate domain of input. This includes the type of input data the model will process. 
  • Setting the type of feedback the model will receive and its weighting.  
  • Setting the overall goal of the model, whether that’s to optimise a process or perform a task. 
  • Performing the feedback loop in which the model performs an action in a given state to achieve a goal, before receiving positive or negative feedback on the action.  
  • The model will move from state to state until it reaches its end goal. 

The process will be repeated many times to optimise the actions or steps. This will often be done in a simulated environment to increase the efficiency of tests. 

The strengths of reinforcement learning

The strength of reinforcement learning is in its ability to optimise a sequence of actions to complete a task in the best possible way. Models are self-teaching through the process of trial and error, resulting in algorithms that are more complex and flexible than a human programmer could achieve. There are many real-world applications for reinforcement learning models, focusing on automating complex tasks or optimising workflows. 

Models can be used to experiment with the best course of action to achieve a goal. Models are often initially trained in simulated environments, which greatly improves the speed and scale of training. Models can be trained in parallel in simulated environments, rapidly cycling through new iterations which improve with each evolution. The potential for trial and error evolution in a simulated environment is only capped by computing power, which is improving all the time. 

Reinforcement learning models get better and better at performing their tasks through these different cycles. As machine learning is scalable with the right resources, this means a model can cycle through a huge range of iterations, experimenting with the best possible actions for different states. Models therefore have the potential to perform actions much better than human counterparts.  

Other types of machine learning approach such as supervised or unsupervised machine learning have different strengths in application. This could be in recognising and categorising objects and images, or clustering data  around similar features.  The strength of reinforcement learning on the other hand is in problem solving, finding the best sequence of known actions to achieve a goal. This makes it a powerful tool for automating complex tasks such as driving a car, but also for improving strategic problems like optimising supply chains.  

The main strengths of reinforcement learning include: 

  • Used to select workflows and actions to achieve a specific goal. 
  • Models can be trained in parallel, with each iteration improving and optimising the process. The result is a volume and scale of training scenarios that surpass any that an individual human could experience.  
  • Very powerful in domains with clear rules and boundaries like a game or simulations. 
  • Models don’t need to understand what an environment is. It will learn by applying a range of actions within a given state. 
  • Models learn and improve in simulated environments, increasing the speed and scope of training. 

Reinforcement learning vs supervised learning

Supervised learning relies on a sample of training data which has clearly labelled input and output data. The trends and patterns will be learned from the training data itself  to be applied to new and unseen data. Supervised machine learning models are usually deployed to perform either classification or regression tasks. Classification is the categorisation of objects or data against learned features, for example in facial recognition software. Models are trained on labelled input and output data to recognise and classify similar objects or data.   

Regression is the prediction of continuous outcomes. This means forecasting trends in data such as predicting stock changes as with machine learning in finance. Having learned the relationship between input and output data, the model can predict the outcome of new data.  This can be leveraged to forecast future trends, or fill in gaps in historic data. 

Reinforcement learning on the other hand doesn’t use labelled input and output data. Instead,  the model learns from performing an action and gaining feedback. So the approaches differ in training technique as well as final application. 

Machine learning deployment for every organisation

Seldon moves machine learning from POC to production to scale, reducing time-to-value so models can get to work up to 85% quicker. In this rapidly changing environment, Seldon can give you the edge you need to supercharge your performance.

With Seldon Deploy, your business can efficiently manage and monitor machine learning, minimise risk, and understand how machine learning models impact decisions and business processes. Meaning you know your team has done its due diligence in creating a more equitable system while boosting performance.

Deploy machine learning in your organisations effectively and efficiently. Talk to our team about machine learning solutions today.

Contents