notes

Personal notes
git clone git://git.laack.co/notes.git
Log | Files | Refs

GeneralizationError.md (1203B)


      1 # Generalization Error
      2 
      3 ML CH1
      4 
      5 **Definition:** Generalization error or out-of-sample error, is the error rate of a model on data that is not in the training set. 
      6 
      7 When testing a model it is important to have a training set and a test set which is a certain amount of the total number of samples. You then train the model and check to see its accuracy on the test set. This accuracy is the generalization error rate.
      8 
      9 It is common practice to use 80% of the data for training and 20% for testing. There is also sometimes another set of data called the holdout set which is compared against to give another layer of verification. This is important because sometimes models will be tuned using different hyperparameters (learning rates) and then they may be better for the 20% of testing data, but by doing this you basically tuned the model to be the best for both the training and testing set so it is useful to have one more set in these cases. This is also sometimes referred to as the validation set, dev set, or development set. In this case you would first train on training data, test them all against the dev set, select the best one, and then evaluate on the test set for generalization error.