We are delighted to announce new research that will give users of Alibi Detect control over which changes to their data distribution should – and should not – be identified as drift. The research introduces a new class of ‘context-aware’ drift detectors and has been integrated into Alibi Detect as part of this week’s 0.9.0 release.
A problem that prevents the application of drift detectors to many important use cases is that the data sets on which models are trained often contain a much richer variety of contexts than any given window of recent deployment data. Drift detectors notice this discrepancy between the training and deployment data and raise an alarm, even when the narrowing of context is expected and known to be well handled by the model. Often practitioners lower the sensitivity of their detectors to prevent so many unwanted alarms being raised. However this then hampers the detectors responsiveness to the potentially costly or dangerous drifts of interest.
Context-aware drift detectors allow practitioners to augment their data samples with associated context variables. The detector then only detects differences between the training and deployment data that can not be attributed to differences between the two sets of associated contexts.
As a concrete example consider a computer vision model operating on images of animals. The distribution underlying the images changes depending on the time of day, throughout which lighting conditions change. At night a window of recent deployment data would contain only dark images. The underlying distribution is therefore different to that of the training data, which also contains daytime images. A conventional drift detector would pick up on this and raise alarms all through the night.
In this situation the time of day forms important context which the practitioner is happy to let vary between windows of data. By passing time-stamps as context variables to a context-aware drift detector the practitioner can instead focus exclusively on detecting distribution changes that can not be attributed to a narrowing of the time-of-day context. For example, suppose only wolves and owls were observed at nighttime during training. Nighttime deployment windows containing only images of owls and wolves would be deemed perfectly acceptable by the detector whereas windows also containing images of dogs would cause drift to be detected.
In this example the context variable was informed by domain specific knowledge and was separate from the data itself. Other examples might include weather conditions or user demographic factors such as age. However we can also specify the context to be a transformation of the data, such as the model’s predictions on the data. This would then allow the model’s predictions on the data to change as long as the change in predictions is accompanied by the expected change in the data. This, for example, would allow a change to model predictions caused by deployment data focusing on a subset of the training data distribution. It would not, however, allow a change to model predictions caused by the emergence of a new subpopulation not observed at training time.
There exist many other useful ways in which context variables might be defined, such as those demonstrated in our examples on text (news) and time-series (ECGs). We are excited to see how users of Alibi Detect capitalise on the flexibility offered by this new detector. For documentation see our documentation and for full technical details refer to Cobb and Van Looveren (2022).
As an applied machine learning researcher at Seldon, Oliver’s research focuses on addressing various technical challenges that arise from users’ desire to deploy their models in a robust and responsible manner. Stemming from a broader interest in maths and statistics, Oliver has five years of experience in machine learning research during which he has published at top conferences including ICML, ICLR and AISTATS.