notes

Personal notes
git clone git://git.laack.co/notes.git
Log | Files | Refs

Stemming.md (1177B)


      1 # Stemming
      2 
      3 **Source:** https://www.nltk.org/howto/stem.html
      4 
      5 **Definition:** Stemming is an NLP approach to simplify terms that convey the same meaning into the same form by stripping affixes.
      6 
      7 ## Examples With Porter Stemmer
      8 
      9 caresses -> caress
     10 caressing -> caress
     11 stemming -> stem
     12 stem -> stem
     13 flies -> fli
     14 flying -> fli
     15 
     16 NOTE: Porter stemmer is the most popular stemmer for English
     17 
     18 ## Usage
     19 
     20 In practice, stemming is very useful in English for keyword matching because it is closer to capturing the meaning of words, making keyword searches more representative of semantics.
     21 
     22 Additionally, in practice, it can be more performant than lemmatization.
     23 
     24 ## Limitations
     25 
     26 Different languages employ different approaches to word tense and combination such that the generalization of stemming is generally limited to a specific language, and there may not be a simple approach that can be applied to all languages (even individually).
     27 
     28 Additionally, terms in English like 'news' are changed to 'new' (using Porter stemming) which can result in lost meaning. Such limitations don't necessarily apply to lemmatization which uses a dictionary based approach, but tends to be slower.