Correlation Coefficient Calculator
Welcome to Omni's correlation coefficient calculator! Here you can learn all there is about this important statistical concept. Apart from discussing the general definition of correlation and the intuition behind it, we will also cover in detail the formulas for the four most popular correlation coefficients:
- Pearson correlation;
- Spearman correlation;
- Kendall tau correlation (including the variants); and
- Matthews correlation (MCC, a.k.a. Pearson phi).
As a bonus, we will also explain how Pearson correlation is linked to simple linear regression. We will start, however, by explaining what the correlation coefficient is all about. Let's go!
What is the correlation coefficient?
Correlation coefficients are measures of the strength and direction of relation between two random variables. The type of relationship that is being measured varies depending on the coefficient. In general, however, they all describe the co-changeability between the variables in question – how increasing (or decreasing) the value of one variable affects the value of the other variable – does it tend to increase or decrease?
Importantly, correlation coefficients are all normalized, i.e., they assume values between -1 and +1. Values of ±1 indicate the strongest possible relationship between variables, and a value of 0 means there's no relationship at all.
And that's it when it comes to the general definition of correlation! If you wonder how to calculate correlation, the best answer is to... use Omni's correlation coefficient calculator 😊! It allows you to easily compute all of the different coefficients in no time. In the next section, we explain how to use this tool in the most effective way.
If you wonder how to calculate correlation by hand, you will find all the necessary formulas and definitions for several correlation coefficients in the following sections.
How to use this correlation calculator with steps?
To use our correlation coefficient calculator:
- Choose which of four correlation coefficients you want to compute:
- Pearson correlation;
- Spearman correlation;
- Kendall rank correlation; or
- Matthews correlation.
- Input your data into the rows. When at least three points (both an x and y coordinate) are in place, it will give you your result.
- Be aware that this is a correlation calculator with steps! If you turn on the option
Show calculation details?
, our tool will show you the intermediate stages of calculations. This is very useful when you need to verify the correctness of your calculations. - Our correlation coefficient calculator will also, whenever possible, display the interpretation of the result. It uses
- 0.8 ≤ |corr| ≤ 1.0 very strong;
- 0.6 ≤ |corr| < 0.8 strong;
- 0.4 ≤ |corr| < 0.6 moderate;
- 0.2 ≤ |corr| < 0.4 weak; and
- 0.0 ≤ |corr| < 0.2 very weak.
to describe the strength of correlation. This scale is based on the absolute value of correlation and the thresholds are the following:
Pearson correlation coefficient formula
The Pearson correlation between two variables X
and Y
is defined as the covariance between these variables divided by the product of their respective standard deviations:
This translates into the following explicit formula:
where and stand for the average of the sample, and , respectively.
Remember that the Pearson correlation detects only a linear relationship – a low value of Pearson correlation doesn't mean that there is no relationship at all! The two variables may be strongly related, yet their relationship may not be linear but of some other type.
In least squares regression , the square of the Pearson correlation between and is equal to the coefficient of determination, R², which expresses the fraction of the variance in that is explained by :
If you want to discover more about the Pearson correlation, visit our dedicated Pearson correlation calculator website.
Spearman correlation coefficient
The Spearman coefficient is closely related to the Pearson coefficient. Namely, the Spearman rank correlation between and is defined as the Pearson correlation between the rank variables and . That is, the formula for Spearman's rank correlation reads:
To obtain the rank variables, you just need to order the observations (in each sample separately) from lowest to highest. The smallest observation then gets rank 1
, the second-smallest rank 2
, and so on – the highest observation will have rank n
. You only need to be careful when the same value appears in the data set more than once (we say there are ties). If this happens, assign to all these identical observations the rank equal to the arithmetic mean of the ranks you would assign to these observations where they all had different values.
The Spearman correlation is sensitive to the monotonic relationship between the variables, so it is more general than the Pearson correlation – it can capture, e.g., quadratic or exponential relationships.
There is also a simpler and more explicit formula for Spearman correlation, but it holds only if there are no ties in either of our samples. More details await you in the Spearman's rank correlation calculator.
Kendall rank correlation (tau)
We most often denote Kendall's rank correlation by the Greek letter τ (tau), and that's why it's often referred to as Kendall tau.
Consider two samples, x
and y
, each of size n
: x1, ..., xn and y1, ..., yn. Clearly, there are n(n+1)/2
possible pairs of x
and y
.
We have to go through all these pairs one by one and count the number of concordant and discordant pairs. Namely, for two pairs (xi, yi) and (xj, yj) we have the following rules:
- If xi < xj and yi < yj then this pair is concordant.
- If xi > xj and yi > yj then this pair is concordant.
- If xi < xj and yi > yj then this pair is discordant.
- If xi > xj and yi < yj then this pair is discordant.
The Kendall rank correlation coefficient formula reads:
where:
- – Number of concordant pairs; and
- – Number of discordant pairs.
That is, is the difference between the number of concordant and discordant pairs divided by the total number of all pairs.
Easy, don't you think? However, it is so only if there are no ties. That is, there are no repeating values in both sample x
and sample y
. If there are ties, there are two additional variants of Kendall tau. (Fortunately, our correlation coefficient calculator can calculate them all!) To define them, we need to distinguish different kinds of ties:
- If xi = xj and yi ≠ yj then we have a tie in
x
. - If xi ≠ xj and yi = yj then we have a tie in
y
. - If xi = xj and yi = yj then we have a double tie.
The Kendall rank tau-b correlation coefficient formula reads:
where:
- – Number of ties in ; and
- – Number of ties in .
Use tau-b if the two variables have the same number of possible values (before ranking). In other words, if you can summarize the data in a square contingency table. An example of such a situation is when both variables use a 5-point Likert scale: strongly disagree, disagree, neither agree nor disagree, agree, or strongly agree.
If your data is assembled in a rectangular non-square contingency table, or, in other words, if the two variables have a different number of possible values, then use tau-c (sometimes called Stuart-Kendall tau-c):
where:
- – ;
- – The number of rows in the contingency table; and
- – The number of columns in the contingency table.
But where is tau-a, you may think? Fortunately, tau-a is defined in the same simple way as before (when we had no ties):
Kendall tau correlation coefficient is sensitive monotonic relationship between the variables.
Matthews correlation (Pearson phi)
The Matthews correlation (abbreviated as MCC, also known as Pearson phi) measures the quality of binary classifications. Most often, we can encounter it in machine learning and biology/medicine-related data.
To write down the formula for the Matthews correlation coefficient we need to assemble our data in a 2x2 contingency table, which in this context is also called the confusion matrix:
Predictions | |||
---|---|---|---|
Positive | Negative | ||
Observations | Positive | TP | FN |
Negative | FP | TN |
where we use the following quite standard abbreviations::
TP
– True positive;FP
– False positive;TN
– True negative; andFN
– False negative.
Matthews correlation is given by the following formula:
The interpretation of this coefficient is a bit different now:
- +1 means we have a perfect prediction;
- 0 means we don't have any valid information; and
- -1 means we have a complete inconsistency between prediction and the actual outcome.
If you're interested, don't hesitate to visit our Matthews correlation coefficient calculator.
FAQ
What does a positive correlation mean?
If the value of correlation is positive, then the two variables under consideration tend to change in the same direction: when the first one increases, the other tends to increase, and when the first one decreases, then the other one tends to decrease as well.
What does a negative correlation mean?
If the value of correlation is positive, then the two variables under consideration tend to change in the opposite directions: when the first one increases, the other tends to decrease, and when the first one decreases, then the other one tends to increase.
How to read a correlation matrix?
A correlation matrix is a table that shows the values of a correlation coefficient between all possible pairs of several variables. It always has ones at the main diagonal (this is the correlation of a variable with itself) and is symmetric (because the correlation between X and Y is the same as between Y and X). For these reasons, the redundant cells sometimes get trimmed. If there is some color-coding, make sure to check what it means: it may either illustrate the strength and direction of correlation or its statistical significance.