SVMs Don't Learn Patterns: They Find the Best Boundary Between Them
Support Vector Machines approach classification differently from everything I've studied so far. Decision Trees ask questions recursively. Logistic Regression estimates probabilities. SVMs do something geometrically clean: find the line (or plane, or hyperplane) that separates classes with the maximum possible margin.
The hyperplane is a decision boundary described by:
In 2D this is a line. In 3D it's a plane. In higher dimensions it's a hyperplane. Everything on one side gets one class label, everything on the other side gets the other.
There are infinitely many hyperplanes that could separate two linearly separable classes. SVM picks the one that maximizes the margin, the gap between the hyperplane and the nearest data points from each class.
The data points that sit exactly on the margin boundary are the support vectors. These are the only points that determine where the hyperplane ends up: remove any other point and the hyperplane doesn't move.
The objective balances margin maximization against penalizing misclassifications. Hinge Loss handles the penalty: if a point is correctly classified outside the margin, loss = 0. If it's inside the margin (or misclassified), loss .
What clicked
This is counterintuitive: the model's decision boundary is defined by a handful of edge cases (the support vectors), not the bulk of the data. The regularization term controls the tradeoff: high = less tolerance for misclassification; low = wider margin allowed but more violations permitted.
Still shaky on
The kernel trick: SVMs can handle non-linear boundaries by mapping data into higher-dimensional spaces where a hyperplane becomes a non-linear boundary in the original space. I understand the concept but haven't worked through the math yet.
What's next
KNN takes the laziest possible approach to classification, it doesn't learn anything during training at all.