The ever-growing presence of technology in our daily lives has sparked discussions among many about how it affects the environment. Being able to measure, control, and ultimately reduce the carbon footprint of your product or service is not only a responsible stance to take, it can actually bring tangible benefits to the business.
Quantifying ML model emissions
Unsurprisingly, carbon emissions are also a hot topic in the machine learning community. Machine learning models require lots of memory and compute resources, all of which consume power that contributes to a global digital carbon footprint.
The ML community realized the challenge and started looking into quantifying emissions associated with ML models. A good example is the 2019 paper by a group from the University of Massachusetts Amherst that looked at deep learning NLP models and compared emissions of their training pipelines with everyday examples, such as air travel and human life.
Ever since, more and more efforts are being made by researchers worldwide to better understand machine learning’s contribution to global climate change. This includes thorough blog posts, talks, papers and even dedicated tools to track the CO2 emissions of research.
Large Language Models
The concerns in this area are only intensifying now with Large Language Models (LLMs), such as Open AI’s ChatGPT, rising to prominence. LLMs are built on neural networks consisting of millions, billions or even trillions of parameters and take considerable time and energy to train before they are able to demonstrate amazing results.
For instance, emissions involved in training a popular GPT-3 LLM are comparable to those of three roundtrips between San Francisco and New York on a passenger plane.
However, there is a growing discrepancy in this research effort. The absolute majority of the research done when it comes to ML carbon emissions is focused on the model training. At the same time, multiple sources estimate that when the ML deployment pipeline is considered as a whole, inference consumes the majority of compute resources, accounting for anything from 70 to 90% [1, 2, 3, 4].
And yet when it comes to the research efforts of quantifying ecological implications of ML inference, it is surprisingly scarce. Perhaps because it is much harder to do…
In 2021, Facebook (now Meta) reported the breakdown of data center capacity for ML-related tasks to be approximately 10:20:70 for Experimentation, Training and Inference phases respectively [source].
We believe that this discrepancy represents an exciting opportunity to make a real difference. It is a research opportunity to expand our understanding of the impact an ML model may have on the environment.
An opportunity to reduce environmental impact
More importantly, it is also an opportunity to decrease said impact by focusing our efforts on the right things. If inference accounts for 90% of a model’s energy consumption and the remaining 10% goes to training, it means that reducing inference energy consumption by a quarter can offset the negative impact of doubled training time.
This is why with the new release of Seldon Core we focused so much on making sure our inference pipeline is as efficient as possible. With such features as multi-model serving, overcommit and auto scaling, users can utilize resources in the most efficient way possible, with positive impact both on their infrastructure costs as well as on the environment.
There is a lot more to be done in this area. Technical reports from businesses about their energy consumption for training and inference would be extremely valuable. Should such data become available, it can help facilitate focused research that can lead to development of tools and techniques to measure and control the carbon footprint of inference.
We are excited to see further progress in reducing the energy consumption of ML inference, as the greenest energy is the energy not used.
Andrei is currently pursuing PhD at the University of Cambridge. His research interests are somewhere between machine learning and software systems, leaning towards the latter. He also has keen interest in Bayesian optimization and is actively participating in several open source projects. Before jumping into the world of academia he has spent more than a decade as a software engineer, developing everything from small webapps to data center network software. In collaboration with Seldon, Andrei explores ways to turn latest research ideas into practical production-ready MLOps infrastructure.