Linear Regression Is Just Finding the Best Straight Line: Here's What That Actually Means

linear-regression regression supervised-learning least-squares math

Linear regression is the first algorithm most people learn and often the most underestimated. At its core, it does one thing: fits a straight line through data points to model the relationship between an input and a continuous output. But what "fits a straight line" actually means mathematically is where it gets interesting.

The equation is $y = mx + b$ : slope times input plus intercept. In ML notation you'll see $y = \theta_1 x + \theta_0$ , same thing. The model learns the values of $m$ (slope) and $b$ (intercept) from data. Once you have them, prediction is just plugging in $x$ .

The real question is: which line? There are infinite lines you could draw through a scatter plot. Linear regression picks the one that minimizes the error between predicted and actual values. Each data point has a residual: the gap between what the model predicted ( $\hat{y}$ ) and what actually happened ( $y$ ).

$\text{Residual} = y - \hat{y}$

To find the best line, you minimize the Sum of Squared Residuals:

$SSR = \sum_i (y_i - \hat{y}_i)^2$

Squaring serves two purposes: negatives and positives don't cancel each other out, and large errors get penalized harder than small ones (a residual of 10 becomes 100; a residual of 1 stays 1). This is called the Least Squares Method.

What clicked

The key assumption baked into linear regression is that there's actually a linear relationship between input and output. If the real relationship is curved, no straight line will fit well: you'd need polynomial features or a different model entirely.

Still shaky on

Why not just minimize the absolute values of residuals instead of squaring? Squared errors have a smooth derivative everywhere, which makes optimization with calculus (or gradient descent later) much cleaner. Absolute value has a kink at zero and is harder to differentiate through.

What's next

What happens when the output isn't a continuous number but a category: like "will this customer churn: yes or no?" That's Logistic Regression, and despite the name, it's actually a classification model.