SVM.md (2173B)
1 # Support Vector Machines (SVMs) 2 3 ML D3 4 5 **Definition:** Support vector machines are models that create lines to separate different outputs by drawing lines between them leaving as much space possible between the different classes. They also have edges to the "street" where there is a line up the middle and these edges are only affected by instances located on the edge of the street and not by instances far off. These are the support vectors. 6 7 ### Classification 8 9 Think of trying to make a street as wide as possible where there are buildings on the side that can't be moved. If the buildings move in the edges of the street need to as well. We would also see that the center line for the street moves accordingly as there is width lost on one side. Regardless of how many buildings are made far away, they do not affect the optimal width of the road. This describes how hard margin classification works, and the issue that arises with it is that if two samples are of different classes but in any way intermingle, the algorithm won't work. 10 11 As such, there is also soft margin classification for svms which tries to limit margin violations while also balancing this with making the street as large as possible. With scikit learn, if you reduce the C value (hyperparameter) then it will have more margin violations. This decreases the likelihood of overfitting but reducing it too much will cause underfitting. 12 13 Support Vector Machines are good for small datasets, but they do not scale well. They are also subject to feature scaling. 14 15 When dealing with non-linearly classifiable datasets we can use the same polynomial strategy used with linear regression to plot based on any degree polynomial. 16 17 A trick related to SVMs is called the polynomial kernel (kernel trick). This allows for polynomial mapping without the need for a combinatorial explosion of features by doing higher dimensional mapping without having to compute everything (unclear about this). 18 19 ### Regression 20 21 When trying to use SVMs for regression we try to fit as many samples on the street while still limiting margin violations. The width of the street is controlled by the hyperparameter epsilon. 22