Correlation and Causal Relation

Our universe is filled with "cause-and-effect" relationships. If we do not study, we will not pass our next math test. If we do not pass our math tests, our grades will suffer. If our grades drop low enough, we might not pass math this year. But why didn''t we study in the first place? Was it because we were preoccupied with our sick dog? What causes the dog to get sick? As we can see, cause-and-effect relationships can become extremely complex. The good news is that we can make these relationships easier to understand by using our math skills. In order to do this, we will need to understand the concept of "correlation and causal relation." Let''s get started:

The definition of correlation and causation

The first thing we need to understand is that correlation and causation are actually two very different concepts. Although they are somewhat related, confusing the two can lead to very serious errors. Let''s take a closer look at the definitions of these two terms:

Correlation: When two things are "correlated" it means that there is a connection -- but nothing more. For example, we might say that the price of car tires goes up at the same rate as car engine oil. But that''s as much as we can say without further information. We clearly can''t say that the price of tires is causing the price of engine oil to increase. The most likely conclusion is the existence of a third, unknown variable (such as inflation).
Causation: When two things have a "causal relationship," it means that one directly affects the other. For example, we might see that when we kick a ball, it flies a certain distance. One variable (kicking power) directly affects the other (the ball''s travel distance). This is something that we can clearly see and prove with math and science. In most cases, we must run an experiment to determine whether this causal relationship is valid.

Understanding the difference between correlation and causation can help us make more logical deductions about the world around us. It can also lead to the rejection of flawed theories. For example, a common theory dating back to the ancient Greeks was "spontaneous generation." This theory posited that living animals could "spontaneously" arise from non-living matter, such as sand or water. Many early scientists tried to prove this with a number of "experiments." One such thinker was Jan Baptist van Helmont, a Belgian chemist who created a "recipe" for creating mice in the 1600s. This recipe contained instructions to throw a dirty cloth and wheat in a corner. Mice may have "appeared" as a result of this recipe, but the recipe did not cause the mice to spontaneously generate. This may seem obvious to us, but van Helmont did not understand the difference between correlation and causation. In other words, correlation does not equal causation. Things may be related, but this doesn''t necessarily mean that one thing causes another. This is one of the most useful concepts not only in mathematics but also in science and logic. In terms of logic, concepts like dependent events, the complement of an event, and independent events are all highly related to the concepts of causation and correlation.

While causation is a relatively simple concept, correlation can be multifaceted and complex. There are many types of correlation, including:

Positive correlation: We say that two variables have a positive correlation if y tends to increase as a function of x. In other words, we see an upward trend when we graph our data. This graph shows an example of positive correlation:

Negative correlation: We say that we have a negative correlation if y tends to increase as a function of x. In other words, we see a downward trend when we graph our data points.

No correlation: There is no correlation when there is no clear pattern or relationship between x and y. We also say that there''s no correlation when the values stay the same. In other words, we see a "flat" pattern on our graph (or no discernable pattern whatsoever). We can give this a correlation value of zero.

We can also break these into even more detailed categories:

Perfect positive correlation: If we see a "perfect positive correlation," it means that our data points form a perfect line with an upward trend. We can give this correlation a value of 1.
High positive correlation: High positive correlation means that we see a clear upward trend, but there is not a straight line.
Low positive correlation: Low positive correlation has a vague upward trend, but there is the data is scattered.
Low negative correlation: Low negative also has a vague downward trend with scattered data.
High negative correlation: A high negative correlation has a clear downward trend -- but not a straight line.
Perfect negative correlation: Perfect negative correlation has a value of -1. Our data points form a perfect line in a downward trend.

But how do we assign correlation values to data sets? Why does a perfect positive correlation have a value of 1? We use the Pearson Product Moment Correlation, which is also known simply as "Pearson''s coefficient." It looks like this:

r_{x y} = \frac{n \sum_{i = 1}^{n} x_{i} y_{i} - \sum_{i = 1}^{n} x_{i} \sum_{i = 1}^{n} y_{i}}{\sqrt{n \sum_{i = 1}^{n} {x_{i}}^{2} - {(\sum_{i = 1}^{n} x_{i})}^{2}} \sqrt{n \sum_{i = 1}^{n} {y_{i}}^{2} - {(\sum_{i = 1}^{n} y_{i})}^{2}}}

$n$ is the number of data points.
$x_{i}$ and $y_{i}$ are individual data points in datasets $x$ and $y$ , respectively.
$\sum_{i = 1}^{n} x_{i} y_{i}$ is the sum of the products of corresponding $x$ and $y$ values.
$\sum_{i = 1}^{n} x_{i}$ and $\sum_{i = 1}^{n} y_{i}$ are the sums of all $x$ values and $y$ values, respectively.
$\sqrt{n \sum_{i = 1}^{n} {x_{i}}^{2} - {(\sum_{i = 1}^{n} x_{i})}^{2}}$ is the square root of the sum of the squares of $x$ values minus the square of the sum of $x$ values, multiplied by $n$ .
$\sqrt{n \sum_{i = 1}^{n} {y_{i}}^{2} - {(\sum_{i = 1}^{n} y_{i})}^{2}}$ is the square root of the sum of the squares of $y$ values minus the square of the sum of $y$ values, multiplied by $n$ .

We should also note that correlation calculations only work for data points that seem to form straight lines on a scatter plot. For example, we might see a set of data that increases and then declines over time -- forming a curve with an apex. In this case, normal correlation calculations are not helpful to us.

Examples of correlation and causation

Here''s an example of correlation:

Students who report anxiety before a test tend to record achieve lower test scores. In other words, there is a negative correlation between student anxiety and test scores. But we cannot say for sure whether this is a causal relationship because the low test scores could be caused by another factor -- such as lack of preparedness.

Here''s an example of causation:

There is a positive correlation between the number of hours a student spends studying and their final test score. We can say with relative certainty that this is a causal relationship since it makes sense for a student to achieve a higher score if they spend more time studying.

Topics related to the Correlation and Causal Relation

Dependent Events

Complement of an Event

Independent Events

Flashcards covering the Correlation and Causal Relation

Statistics Flashcards

Common Core: High School - Statistics and Probability Flashcards

Practice tests covering the Correlation and Causal Relation

Probability Theory Practice Tests

Common Core: High School - Statistics and Probability Diagnostic Tests

Pair your student with a tutor who understands causation and correlation

If your student needs help covering correlation, causation, or any other concept, tutoring is a solid choice. During these 1-on-1 learning sessions, students can ask their tutors numerous questions and clear up confusions that may have arisen during class time. Tutors can tailor their teaching methods to your student''s unique characteristics, including their learning style, their ability level, and even their hobbies. Reach out to Varsity Tutors today, and we''ll pair your student with a suitable tutor.