notes

Personal notes
git clone git://git.laack.co/notes.git
Log | Files | Refs

CosineSimilarity.md (1385B)


      1 # Cosine Similarity
      2 
      3 **Source:** [Wikipedia](https://en.wikipedia.org/wiki/Cosine_similarity.md)
      4 
      5 ## Definition
      6 
      7 Cosine similarity is the cosine of the angle created by two vectors.
      8 
      9 The cosine similarity can also be thought of as the dot product of the vectors divided by the product of their lengths. This idea gives way to the closed form solution shown below.
     10 
     11 cosine similarity $= \frac{A \cdot B}{||A|| ||B||}$
     12 
     13 The more commonly stated formula uses summations, but it conveys the same information.
     14 
     15 ## Meaning + Example Usage
     16 
     17 The cosine similarity describes how similar vectors are without consideration for their magnitude. 
     18 
     19 This can be applied in an information retrieval context to compare two documents. Each unique term in the documents is an axis of the vector space where the number of usages of the term within a given document is a component of the document's vector.
     20 
     21 ## Implementation
     22 
     23 ```python3
     24 import math
     25 
     26 def magnitude(v):
     27     sq = 0
     28     for i in range(len(v)):
     29         sq += v[i] ** 2
     30     return math.sqrt(sq)
     31 
     32 def dp(A,B):
     33     result = 0
     34     for i in range(len(A)):
     35         result += A[i] * B[i]
     36     return result
     37 
     38 def cosine_similarity(A,B):
     39     
     40     dp_AB = dp(A,B)
     41     a_l = magnitude(A)
     42     b_l = magnitude(B)
     43     return dp_AB / (a_l * b_l)
     44 
     45 if __name__ == "__main__":
     46     A = [0, 4873, 823]
     47     B = [0, 487, 48988]
     48     print(cosine_similarity(A,B))
     49 ```