Cosine Similarity Calculator
The cosine similarity calculator will teach you all there is to know about the cosine similarity measure, which is widely used in machine learning and other fields of data science.
Read on to discover:
- What the cosine similarity is;
- What the formula for the cosine similarity is;
- Whether the cosine similarity can be negative; and
- How to calculate the cosine similarity in Python.
How to use the cosine similarity calculator
Here's how to use this cosine similarity calculator:
-
Enter your vectors and into the calculator, one element at a time.
-
More fields will appear as you need them.
-
Empty fields are treated as zeroes.
-
The vectors will automatically be extended to matching lengths.
-
-
The cosine similarity (and derivative values, like the angle between the vectors, , and the cosine distance, ) are displayed below the vector inputs.
-
The calculations for finding the cosine similarity are shown below the results so that you may understand your specific result.
What is the cosine similarity?
The cosine similarity measure indicates how similar two vectors are using the cosine of the angle between them. It gives no information on the comparative magnitudes of the vectors.
Cosine similarity is widely used in data analysis and data science, particularly in the field of natural language processing.
π Remember what the cosine is? No? Then head on over to our cosine calculator.
The cosine similarity formula
It helps to know what the cosine similarity is conceptually, but how do we calculate it? Let's explore the formula.
The cosine similarity between two -dimensional vectors and , which is denoted as , is defined as the cosine of the angle between the two vectors, :
However, we don't always know the angle β then what? Well, a more complex yet more helpful formula can be derived from the dot product. Let's investigate!
The dot product of the two vectors is denoted as , and is defined as:
where is the magnitude of the vector (and similar for ). We can rearrange this equation to become:
And so we have a handy formula for the cosine similarity that doesn't rely on the angle directly:
And what's more, we can rewrite the formula with sums:
Lovely! A formula for the cosine distance that relies only on the known elements of the vectors.
π Want a refresher on the dot product and vector magnitudes? Check out our dot product calculator and the vector magnitude calculator.
The cosine similarity, , falls within the range , which of course, are the limits of the cosine function.
-
When the two vectors are in the same direction, and so .
-
When the two vectors are orthogonal, and .
-
When the two vectors are in opposite directions, and so the cosine similarity is -1.
π Don't care about the cosine similarity and only want the angle between two vectors? Perhaps you'd like to visit our angle between two vectors calculator.
Note that we say "similar" and not "identical" β only measures the angle and is not influenced by the comparative magnitudes. only means the two vectors' angles are the same, not that the two vectors are equal. See if you can prove this mathematically!
The cosine similarity is not defined when either vector is a zero-vector β a vector with all elements as zeroes and thus zero magnitude.
How do I calculate the cosine similarity?
To calculate the cosine similarity between two vectors, follow these steps:
-
If you know the angle between the vectors, the cosine similarity is the cosine of that angle.
-
If you don't know the angle, calculate the dot product of the two vectors.
-
Calculate both vectors' magnitudes.
-
Divide the dot product by the product of the magnitudes.
-
The result is the cosine similarity.
An example of the cosine similarity
Let's look at an example of two 2D vectors and their cosine similarity. Let's use:
- and
- .
Already, we can visualize that the two vectors point in the same general direction, i.e., up. We can guess that and therefore that , but let's calculate it properly using the formula we learned above.
-
The dot product is:
-
The vectors' magnitudes are:
and
-
The cosine similarity is, therefore:
Our guesses were right!
How to calculate the cosine similarity with Python
As it's arguably the best language for data science, you might need to calculate the cosine similarity in Python. If you're implementing it yourself, you can use NumPy's dot
function for the dot product and the norm
function from the numpy.linalg
submodule for the vector magnitude. Here's how it might be done:
from numpy import dot
from numpy.linalg import norm
def calc_cosine_similarity(a, b):
return dot(a,b)/(norm(a)*norm(b))
Then you can call the function as:
a = [1, 1, 1]
b = [3, 4, 5]
calc_cosine_similarity(a, b)
# delivers 0.9797958971132713
What is the cosine distance?
The cosine distance is used to measure the dissimilarity between two vectors. It's simply the complement of the cosine similarity, i.e.,
However, the cosine distance is not a true distance metric, because it does not have the triangle inequality property, i.e., the inequality:
does not hold for all possible values of , , and .
π Visit our triangle inequality theorem calculator for more information on this theorem.
FAQ
Can cosine similarity be negative?
Yes, cosine similarity can be negative because the cosine of some angles can be negative. A negative cosine similarity means that the two vectors are more dissimilar than similar and that the angle between them is greater than 90Β°.
What does a cosine similarity of -1 mean?
A cosine similarity of -1 means that the two vectors point in opposite directions. This does not mean that their magnitudes are equal, but simply that their angle is 180Β°.
SC(a,b) = (aΒ·b) / (βaβ Γ βbβ)
a
and b
below. More fields will appear as you need them. The vectors will always have the same length β empty fields are treated as zeros.a = [0]
b = [0]