Omni Calculator logo

The cosine similarity calculator will teach you all there is to know about the cosine similarity measure, which is widely used in machine learning and other fields of data science.

Read on to discover:

  • What the cosine similarity is;
  • What the formula for the cosine similarity is;
  • Whether the cosine similarity can be negative; and
  • How to calculate the cosine similarity in Python.

How to use the cosine similarity calculator

Here's how to use this cosine similarity calculator:

  1. Enter your vectors a⃗\vec{a} and b⃗\vec{b} into the calculator, one element at a time.

    • More fields will appear as you need them.

    • Empty fields are treated as zeroes.

    • The vectors will automatically be extended to matching lengths.

  2. The cosine similarity SC\rm S_C (and derivative values, like the angle between the vectors, ΞΈ\theta, and the cosine distance, DC\rm D_C) are displayed below the vector inputs.

  3. The calculations for finding the cosine similarity are shown below the results so that you may understand your specific result.

What is the cosine similarity?

The cosine similarity measure indicates how similar two vectors are using the cosine of the angle between them. It gives no information on the comparative magnitudes of the vectors.

Cosine similarity is widely used in data analysis and data science, particularly in the field of natural language processing.

πŸ”Ž Remember what the cosine is? No? Then head on over to our cosine calculator.

The cosine similarity formula

It helps to know what the cosine similarity is conceptually, but how do we calculate it? Let's explore the formula.

The cosine similarity between two NN-dimensional vectors a⃗\vec{a} and b⃗\vec{b}, which is denoted as SC(a⃗,b⃗){\rm S_C}(\vec{a}, \vec{b}), is defined as the cosine of the angle between the two vectors, θ\theta:

SC(aβƒ—,bβƒ—)=cos⁑θ\small {\rm S_C}(\vec{a}, \vec{b}) = \cos \theta

However, we don't always know the angle ΞΈ\theta β€” then what? Well, a more complex yet more helpful formula can be derived from the dot product. Let's investigate!

The dot product of the two vectors is denoted as a⃗⋅b⃗\vec{a}\cdot\vec{b}, and is defined as:

aβƒ—β‹…bβƒ—=βˆ₯aβƒ—βˆ₯ βˆ₯bβƒ—βˆ₯ cos⁑θ,\small \vec{a} \cdot \vec{b} = \Vert\vec{a}\Vert\ \Vert\vec{b}\Vert\ \cos\theta,

where βˆ₯aβƒ—βˆ₯\Vert\vec{a}\Vert is the magnitude of the vector aβƒ—\vec{a} (and similar for βˆ₯bβƒ—βˆ₯\Vert\vec{b}\Vert). We can rearrange this equation to become:

cos⁑θ=aβƒ—β‹…bβƒ—βˆ₯aβƒ—βˆ₯ βˆ₯bβƒ—βˆ₯\small \cos\theta = \frac{ \vec{a}\cdot\vec{b} }{ \Vert\vec{a}\Vert\ \Vert\vec{b}\Vert }

And so we have a handy formula for the cosine similarity that doesn't rely on the angle directly:

SC=aβƒ—β‹…bβƒ—βˆ₯aβƒ—βˆ₯ βˆ₯bβƒ—βˆ₯\small {\rm S_C} = \frac{ \vec{a}\cdot\vec{b} }{ \Vert\vec{a}\Vert\ \Vert\vec{b}\Vert }

And what's more, we can rewrite the formula with sums:

SC=βˆ‘i=1Naibiβˆ‘i=1Nai2βˆ‘i=1Nbi2\small {\rm S_C} = \frac{ \sum_{i=1}^N a_i b_i }{ \sqrt{\sum_{i=1}^N a_i^2} \sqrt{\sum_{i=1}^N b_i^2} }

Lovely! A formula for the cosine distance that relies only on the known elements of the vectors.

πŸ”Ž Want a refresher on the dot product and vector magnitudes? Check out our dot product calculator and the vector magnitude calculator.

The cosine similarity, SC\rm S_C, falls within the range [βˆ’1,1][-1, 1], which of course, are the limits of the cosine function.

  • When the two vectors are in the same direction, ΞΈ=0∘\theta = 0^\circ and so SC=1\rm S_C = 1.

  • When the two vectors are orthogonal, ΞΈ=90∘\theta = 90^\circ and SC=0\rm S_C = 0.

  • When the two vectors are in opposite directions, ΞΈ=180∘\theta = 180^\circ and so the cosine similarity is -1.

πŸ”Ž Don't care about the cosine similarity and only want the angle between two vectors? Perhaps you'd like to visit our angle between two vectors calculator.

Note that we say "similar" and not "identical" β€” SC\rm S_C only measures the angle and is not influenced by the comparative magnitudes. SC=1\rm S_C = 1 only means the two vectors' angles are the same, not that the two vectors are equal. See if you can prove this mathematically!

The cosine similarity is not defined when either vector is a zero-vector β€” a vector with all elements as zeroes and thus zero magnitude.

How do I calculate the cosine similarity?

To calculate the cosine similarity between two vectors, follow these steps:

  1. If you know the angle between the vectors, the cosine similarity is the cosine of that angle.

  2. If you don't know the angle, calculate the dot product of the two vectors.

  3. Calculate both vectors' magnitudes.

  4. Divide the dot product by the product of the magnitudes.

  5. The result is the cosine similarity.

An example of the cosine similarity

Let's look at an example of two 2D vectors and their cosine similarity. Let's use:

  • aβƒ—=[1,5]\vec{a} = [1, 5] and
  • bβƒ—=[βˆ’1,3]\vec{b} = [-1, 3].

Already, we can visualize that the two vectors point in the same general direction, i.e., up. We can guess that θ<90∘\theta < 90^\circ and therefore that SC>0{\rm S_C} > 0, but let's calculate it properly using the formula we learned above.

  1. The dot product is:

    aβƒ—β‹…bβƒ—=1β‹…(βˆ’1)+5β‹…3=14\vec{a}\cdot\vec{b} = 1\cdot (-1) + 5 \cdot 3 = 14

  2. The vectors' magnitudes are:

    βˆ₯aβƒ—βˆ₯=12+52=5.099\Vert\vec{a}\Vert = \sqrt{1^2+5^2} = 5.099

    and

    βˆ₯bβƒ—βˆ₯=(βˆ’1)2+32=3.162\Vert\vec{b}\Vert = \sqrt{(-1)^2+3^2} = 3.162

  3. The cosine similarity is, therefore:

    SC=(aβƒ—β‹…bβƒ—)/(βˆ₯aβƒ—βˆ₯ βˆ₯bβƒ—βˆ₯){\rm S_C} = (\vec{a}\cdot\vec{b}) / (\Vert\vec{a}\Vert\ \Vert\vec{b}\Vert)
    SC=14/(5.099β‹…3.162)\textcolor{transparent}{\rm S_C} = 14 / (5.099 \cdot 3.162)
    SC=0.868\textcolor{transparent}{\rm S_C} = 0.868

Our guesses were right!

How to calculate the cosine similarity with Python

As it's arguably the best language for data science, you might need to calculate the cosine similarity in Python. If you're implementing it yourself, you can use NumPy's dot function for the dot product and the norm function from the numpy.linalg submodule for the vector magnitude. Here's how it might be done:

from numpy import dot
from numpy.linalg import norm

def calc_cosine_similarity(a, b):
    return dot(a,b)/(norm(a)*norm(b))

Then you can call the function as:

a = [1, 1, 1]
b = [3, 4, 5]
calc_cosine_similarity(a, b)
# delivers 0.9797958971132713

What is the cosine distance?

The cosine distance is used to measure the dissimilarity between two vectors. It's simply the complement of the cosine similarity, i.e.,

DC(aβƒ—,bβƒ—)=1βˆ’SC(aβƒ—,bβƒ—)\small {\rm D_C}(\vec{a},\vec{b}) = 1 - {\rm S_C}(\vec{a},\vec{b})

However, the cosine distance is not a true distance metric, because it does not have the triangle inequality property, i.e., the inequality:

DC(aβƒ—,cβƒ—)≀DC(aβƒ—,bβƒ—)+DC(bβƒ—,cβƒ—),\small {\rm D_C}(\vec{a},\vec{c}) \le {\rm D_C}(\vec{a},\vec{b}) + {\rm D_C}(\vec{b},\vec{c}),

does not hold for all possible values of a⃗\vec{a}, b⃗\vec{b}, and c⃗\vec{c}.

πŸ”Ž Visit our triangle inequality theorem calculator for more information on this theorem.

FAQ

Can cosine similarity be negative?

Yes, cosine similarity can be negative because the cosine of some angles can be negative. A negative cosine similarity means that the two vectors are more dissimilar than similar and that the angle between them is greater than 90Β°.

What does a cosine similarity of -1 mean?

A cosine similarity of -1 means that the two vectors point in opposite directions. This does not mean that their magnitudes are equal, but simply that their angle is 180Β°.

Rijk de Wet
SC(a,b) = (aΒ·b) / (β€–aβ€– Γ— β€–bβ€–)
Input the vectors a and b below. More fields will appear as you need them. The vectors will always have the same length β€” empty fields are treated as zeros.
Vector a = [a₁, ..., aβ‚™]
a₁
aβ‚‚
a = [0]
Vector b = [b₁, ..., bβ‚™]
b₁
bβ‚‚
b = [0]
Check out 46 similar coordinate geometry calculators πŸ“ˆ
Average rate of changeBilinear interpolationCatenary curve… 43 more
People also viewed…

Circle skirt

Circle skirt calculator makes sewing circle skirts a breeze.

Condense logarithms

The condense logarithms calculator is here to take a sum or difference of different log expressions (possibly with multiples) and change it into a single one.

Pseudoinverse

This pseudoinverse calculator can determine the Moore-Penrose pseudoinverse for any small matrix.

Snowman

The perfect snowman calculator uses math & science rules to help you design the snowman of your dreams!
Copyright by Omni Calculator sp. z o.o.
Privacy, Cookies & Terms of Service