Entry 15 of 24
ML Fundamentals Series
·2 min read

The Perceptron Isn't a Neuron: It's a Weighted Vote With a Threshold

Before there were multilayer networks, there was a single unit trying to prove a point: that a machine could make a decision from weighted evidence. The McCulloch-Pitts neuron is the ancestor of every neuron in every network since. It does exactly two things: a function gg takes the inputs and performs an aggregation, and a function ff makes a decision based on that aggregated value. Inputs come in two flavors: excitatory (positive values that push toward firing) and inhibitory (negative values that push against it). The aggregation itself is nothing exotic:

ysum=i=1nwixiy_{sum} = \sum_{i=1}^{n} w_i x_i

The Perceptron builds directly on this: the simplest form of neural network that makes a decision by combining weighted inputs and running the result through an activation function. The one addition that matters is the bias term:

z=i=1nwixi+bz = \sum_{i=1}^{n} w_i x_i + b

Weight and bias do different jobs and it's worth being precise about the difference. The weight controls how much each input influences the output: bigger weight, more influence. The bias controls when the perceptron activates at all: it shifts the decision boundary up, down, left, or right, independent of the inputs. A perceptron with zero bias is forced through the origin no matter how good its weights are. Bias is what lets the boundary sit wherever the data actually needs it.

Training a perceptron is the Perceptron Rule, and it's a clean three-step loop. First, initialize weights, bias, and a learning rate η\eta. Second, run the training process: compute z=wx+bz = w \cdot x + b, threshold it into a prediction (ypred=1y_{pred} = 1 if z0z \geq 0, else 00), compute the error (Error=yypred\text{Error} = y - y_{pred}), then nudge the weights and bias in the direction that would have reduced that error:

w=w+ηErrorxb=b+ηErrorw' = w + \eta \cdot \text{Error} \cdot x \qquad b' = b + \eta \cdot \text{Error}

Third, repeat this for multiple epochs until the errors stop moving the weights. There's no calculus here, no gradients, just: were you wrong, and if so, which direction should the weight have leaned. That simplicity is also the perceptron's ceiling: it can only draw a straight line through the data. Everything after this in the neural network story is about breaking past that limit.