notes

Personal notes
git clone git://git.laack.co/notes.git
Log | Files | Refs

NaiveBayes.md (745B)


      1 # Naive Bayes
      2 
      3 ML SS
      4 
      5 **Definition:** Naive Bayes is an algorithm used to find the probabilities of text being part of a given class. 
      6 
      7 This is often used for spam classification. Here are the steps:
      8 
      9 1. Find percent of classification messages that contain each token (word/phrase)
     10 2. Using this percent, multiply all percents together for each token in a given message.
     11 3. Multiply this final percent with a known probability of any given item being part of the current class.
     12 4. Find the class with the highest percent and assume it is of that class.
     13 
     14 Often for this we want to add a pseudo-count to each token count for the class. This ensures that if a class has none of a given token the output is not 0% instead it would simply be lower.