Exploring The Power of Causal Inference in Machine Learning

Samuel Kehinde Ayo
6 min readFeb 13, 2023

--

Sam-Ayo giving a talk on Artificial Intelligence at Ehingbeti summit, Lagos, Nigeria

If you’re a data die-hard like me, who gets tired of the norm so easily, you must have started to ask yourself “what happens beyond relationship modeling. Machine Learning is good at predicting future outcome based on correlations in past data, but how do we go one step further?”

What am I even saying? You landed here because you’re more diehard than I thought.

To begin this exploration, let’s begin with some definition of terms.

What is machine learning?

Machine learning is a subfield of artificial intelligence, which is broadly defined as the capability of a machine to imitate intelligent human behavior.

Machine Learning is a subset of Artificial Intelligence(AI), which enables a program(software) to learn and uncover patterns that exist in historical data, so they can make predictions without being explicitly programmed to do so.

In a dataset for machine learning experimentation, we have the X or feature set and the y or target set.

The machine learning algorithm trains on the the relationship that exist between X and y, to make future predictions of unseen data of X set.

Machine learning is good at predicting future outcomes based on correlations in historical data patterns but it sucks because it predicts outcomes only based on correlation.

What this means is, it projects the correlations of variables in the past into the future and in the future those variables may not be (strongly) correlated or they may be confounded by something else.

Basically, machine learning learns from correlations in the data.

It projects the correlation, biases, the patterns of the past and tries to give you a picture of what the future will look like but that future has been subjected by information coming from the past.

This has a downside because it does not robustly take into account that uncertainties that may appear outside the training dataset that may confound the impact of the various input variables on the output to be predicted.

So, it is limited.

It is restrictive in how it can serve decision makers, and of course it does not let decision maker know the quality of their decisions. From machine learning predictions, the decision maker only has an idea of what the future looks but the burden of the exact decision to make remains unsolved.

Machine learning does not help the data scientist communicate to the stakeholders, a clear analysis of the clear pathways

What really is causal inference?

Causal inference or Causality is the process of determining the causal relationships between variables in a system. In other words, it is the study of how changes in one variable (the cause) affect another variable (the effect or treatment). This is an important aspect of understanding the underlying mechanisms of a system and making predictions based on this understanding.

Causal inference in machine learning takes a different approach from traditional machine learning. It involves using experimental and observational data to determine the causal relationships between variables. This information is then used to build models that not only make predictions but also provide insights into the causal relationships between variables.

Causal queries can be categorized into: hypothetical causation(predictive) and counterfactual causation(explanatory)

The gold standard for inferring causal effects is randomized controlled experiments (RCEs) aka A/B test.

In RCEs, a population of individuals is split into two groups:

  • treatment and
  • control,

by administering treatment to one group and nothing to the other, we can measure the outcome of both groups. Assuming that the treatment and control groups aren’t too dissimilar, we can infer whether the treatment was effective based on the difference in outcome between the two groups.

But hold on, let’s pose a question my mentor asked me while i was studying causality.

We can’t always run such experiments because you can’t AB test your life

Not even with applications especially if they’re mission-critical.

In this case, we rely on observational data. One of the most popular approaches is Judea Pearl’s technique (assuming the Causal Markov Condition) used in statistics to make causal inferences but that’s beyond the scope for this article..

The Exposition begins

In a retail company, sales dropped for a product across stores in 3 cities. Stakeholders are curious and the marketing team are on edge. The Marketing manager picks the log reports and realized that recently an ad was run for this product.

Did running the ads cause the sales to drop, or was it merely a coincidence?

Most machine learning-based solution focuses on predicting outcomes, not understanding causality. ML can use causal inference to measure the effects of multiple variables. Interestingly, causal inference shifts the data narrative “From how to why”, it allows the data scientist to query beyond correlational studies to causalities in data patterns.

“It’s a big thing to integrate [causality] into AI. Current approaches to machine learning assume that the trained AI system will be applied on the same kind of data as the training data. In real life it is often not the case.” …Yoshua Bengio

Machine learning has been making rapid progress in recent years and has become an integral part of many industries. The applications of machine learning are numerous and diverse, ranging from computer vision and natural language processing to autonomous systems and recommender systems. Despite its wide range of applications, machine learning has limitations, especially when it comes to causal inference.

“Lots of people in ML/DL [deep learning] know that causal inference is an important way to improve generalization.” …Yann LeCun

In traditional machine learning, the focus is on predictive modeling. This involves using data to train a model to make predictions about future outcomes based on past data patterns. While predictive modeling is useful for making predictive analysis, it is limited in capacity. This is because traditional machine learning models do not consider the causal relationships between variables.

Further on, this means that machine learning models often aren’t robust enough to handle changes in the input data type, and can’t always generalize well.

The Super Powers AKA Powers that Be

  1. Generality

Causal inference explicitly overcomes this problem through causal relationship/graphs, which is considering what might have happened(historical patterns) when faced with a lack of information.

2. Clear Understanding of correlational relationships

Another benefit of causal inference in machine learning is that it helps to avoid the problem of spurious correlations. In traditional machine learning, correlations between variables are often used to make predictions. However, these correlations may not be indicative of a causal relationship. By using causal inference, practitioners can determine the actual causal relationships between variables and avoid making predictions based on spurious correlations. Rather than relying on fixed correlations between data sets, causal models allow the ML systems to understand both the causal variables and their effects on the environment. This in turn allows the system to identify objects regardless of subtle changes.

Y=a+bx, is a simple linear regression equation. From modeling this relationship with ML, we can deduce that a unit change in x affects Y by a certain number.

This clearly shows association, but when stakeholders begin to ask questions from how to why. causality becomes the solution to run to.

3. Attention is much more than we need ;)

Another super power of causal inference in machine learning is that it provides a better understanding of the underlying mechanisms of a system. This enables data practitioners to make more accurate predictions and to intervene in the system to achieve a desired outcome.

In healthcare, causal inference can be used to determine the causal relationship between various factors such as lifestyle, genetics, and environment, and the risk of certain diseases. This information can then be used to develop interventions that reduce the risk of these diseases.

According to researcher in a study titled “Towards Causal Representation Learning,” Machine learning often disregards information that humans use heavily.

In conclusion, causal inference in machine learning is a powerful tool that enables data practitioners to make more accurate predictions and to understand the underlying mechanisms of a system. As machine learning continues to advance, causal inference will play an increasingly important role in a wide range of applications.

Further reading

Causal inference- Carnegie melon University

Overview of causal inference in machine learning — Ericson

--

--

Samuel Kehinde Ayo
Samuel Kehinde Ayo

Written by Samuel Kehinde Ayo

Data Scientist | AI engineer | Senior Software engineer

No responses yet