Understand Cosine Similarity | 2 Minute Tutorial

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
cosine similarity is a similarity measure that is widely used in the field of machine learning from natural language processing where it's used to compare text all the way to image processing when comparing images or human faces cosine similarities defined as the cosine of the angle between the vectors that means this measure is scaling variant or in other words unaffected by the length of vectors being compared now to compute it we have to calculate the dot product of two vectors and divided by the product of vectors magnitudes let's take two vectors A and B the dot product is simply a summed element-wise multiplication of vectors coordinates next we can use the Pythagorean theorem to calculate the magnitudes of those vectors let's plug our DOT product and magnitude Center formula and what we get is a cosine similarity this similarity measure is in a range from negative 1 to positive 1 meaning that objects We compare are identical if the values one perfectly dissimilar if the value is negative one and if the value is 0 the vectors are orthogonal or independent let's take a look at a very simple example we have three text strings a white dog a blue cat and a white cat and we want to calculate the cosine similarity between the first text string and the other two we need a way to vectorize these strings and one of the simplest ways is to use the back of words method at first we clean our data by removing any punctuation and stopwards next we create a vocabulary of all unique words in our data set and every word will represent a dimension each Dimension will correspond to the frequency of that word an attack string now it's time to calculate the cosine similarities so we simply plug these vectors into a formula our similarity score show that the third text string is the most similar to the first one of course this is a very primitive example and there are many ways to vectorize text or images one way would be to use a neural network but this is a topic for the next time if you find this video useful hit a like And subscribe buttons and I'll see you in the next one
Info
Channel: Daniel Krei
Views: 6,683
Rating: undefined out of 5
Keywords:
Id: zcUGLp5vwaQ
Channel Id: undefined
Length: 2min 8sec (128 seconds)
Published: Sun Jul 16 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.