The perceptron algorithm
Introduction
The perceptron algorithm is a simple algorithm for supervised learning of binary classifiers. A binary classifier is a function that can decide whether an input, represented by a vector of numbers, belongs to some specific class or not. The perceptron algorithm is an algorithm for learning a binary classifier from a set of training data. The perceptron algorithm is a special case of the more general linear classifier algorithm.
This classifier is a threshold function, a function that maps its input \(\textbf{x}\) (the vector) to an output \(f(\textbf{x})\) a single binary value:
\[f(\textbf{x}) = \begin{cases} 1 & \text{if } \textbf{w} \cdot \textbf{x} + b > 0, \\ 0 & \text{otherwise} \\ \end{cases}\]In practice, what the perceptron does is to compute the dot product between the input vector and a vector of weights, and then add a bias term. If the result is positive, the perceptron outputs 1, otherwise it outputs 0. The weights and the bias are the parameters of the perceptron, and they are learned by the perceptron algorithm. This is equivalent to computing a linear combination of the input vector and the weights, and then adding the bias term. The core of the algorithm is to find the weights and the bias that make the perceptron output the correct value for each training example. These weights are found by starting with a random (or zero) vector of weights, and then adjusting the weights according to the following rule: if the perceptron outputs the wrong value, adjust the weights by moving them in the direction of the correct output. This is the perceptron learning rule, which can be formalized as follows:
\[\textbf{w} \leftarrow \textbf{w} + \eta \left( y - \hat{y} \right) \textbf{x}\]where \(\eta\) is the learning rate, \(y\) is the correct output, \(\hat{y}\) is the perceptron’s output, and \(\textbf{x}\) is the input vector.
The activation function is a function that maps the output of the perceptron to a binary value. In the perceptron algorithm, the activation function is the Heaviside step function, which is defined as follows:
\[\sigma(a) = \begin{cases} 1 & \text{if } a > 0, \\ -1 & \text{otherwise} \\ \end{cases}\]In practice, the implementation of this function for a binary classifier boils down to subtracting the output of the perceptron from the correct output, and then returning the result. This is equivalent to the following:
\[\sigma(y, \hat{y}) = y - \hat{y}\]Python implementation
Note that this implementation explicitly computes the dot product between the input vector and the weights, and then adds the bias term. This is done for educational purposes, in practice the dot product is computed using the np.dot
function, and the bias term is added using the np.add
function.
This plot is generated by the fit_plot
function, a modified version of the fit
function that plots the decision boundary as the training occurs. This function only handles 2-dimensional data, and it is not meant to be used in practice.
To generate some artificial data and train the perceptron, we can use the following code:
def make_data(n, dims=1):
a = np.random.normal(size=(n, dims))
b = np.random.normal(size=(n, dims))
b += 3.5
X = np.vstack((a, b))
y = np.empty((n*2,))
y[:n] = 0
y[n:] = 1
return X, y
# Plot the data
X, y = make_data(100, 2)
plt.scatter(X[:100, 0], X[:100, 1])
plt.scatter(X[100:, 0], X[100:, 1])
Here’s the full code for the perceptron algorithm:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from functools import partial
from matplotlib import rc
rc('animation', html='html5')
class Perceptron:
def __init__(self, learning_rate=0.1, max_iterations=1000):
self.learning_rate = learning_rate
self.max_iterations = max_iterations
def dot(self, a, b):
"""
Dot product
"""
y = 0
for i,j in zip(a,b):
y += i*j
return y
def sigma(self, y, y_hat):
"""
Activation function, in practice the function has this output:
if y == 1 and y_hat == 0:
return 1
if y == 0 and y_hat == 1:
return -1
return 0
"""
return y - y_hat
def fit(self, X, y):
# Initialize weights and bias to zero
self.weights = np.random.uniform(size=X.shape[1])
self.bias = 0
# Train the perceptron for a maximum number of iterations
for _ in range(self.max_iterations):
# Loop through each training example
for i in range(X.shape[0]):
# Compute the predicted class for the current example
predicted = self.predict(X[i])
# Update the weights and bias
self.weights += self.learning_rate * self.sigma(y[i], predicted) * X[i]
self.bias += self.learning_rate * self.sigma(y[i], predicted)
def fit_plot(self, X, y):
""" Plots the decision boundary as the training occurs
Only works with 2-dimensional data
"""
if X.shape[1] > 2:
raise Exception("Only words with 2-D data")
# plot data
fig, ax = plt.subplots()
ax.scatter(X[:, 0], X[:, 1], c=y)
# Initialize weights and bias
self.weights = np.random.uniform(size=X.shape[1])
self.bias = 0
# collect data for animation
y_vals = []
x_vals = np.array([np.min(X[:, 0]), np.max(X[:, 0])])
# Training loop
for _ in range(self.max_iterations):
# Loop through each training example
for i in range(X.shape[0]):
# Compute the predicted class for the current example
predicted = self.predict(X[i])
# Update the weights and bias
self.weights += self.learning_rate * self.sigma(y[i], predicted) * X[i]
self.bias += self.learning_rate * self.sigma(y[i], predicted)
# Define the decision boundary as a line in the form y = mx + b
m = -perceptron.weights[0] / perceptron.weights[1]
b = -perceptron.bias / perceptron.weights[1]
# data to animate decision boundary plot
y_vals.append(m * x_vals + b)
# plot decision boundary
self.line, = ax.plot(x_vals, y_vals[0], '--', alpha=0.8)
ani = animation.FuncAnimation(fig, self.animate, fargs=[y_vals], frames=len(y_vals), interval=20, blit=True)
return ani
def animate(self, i, y):
self.line.set_ydata(y[i]) # update the data.
return line,
def predict(self, X):
# Compute the dot product between the weights and input features
dot_product = self.dot(self.weights, X) + self.bias
# Threshold function: return 1 if the dot product is positive, otherwise return 0
return 1 if dot_product > 0 else 0
Learning logical functions
The perceptron algorithm can be used to learn logical functions, such as the AND, OR and XOR functions. The XOR function is not linearly separable and cannot be learned by a simple perceptron algorithm. The following code trains a perceptron to learn the AND function:
X = np.array([
[0, 0],
[0, 1],
[1, 0],
[1, 1]
])
y = np.array([0, 0, 0, 1])
pcp = Perceptron(learning_rate=0.01, max_iterations=100)
pcp.fit(X, y)
print("Weights: ", pcp.weights)
print("Bias: ", pcp.bias)
for i in range(X.shape[0]):
print("Predicted: ", pcp.predict(X[i]), "Actual: ", y[i])
Gives the following output:
Predicted: 0 Actual: 0
Predicted: 0 Actual: 0
Predicted: 0 Actual: 0
Predicted: 1 Actual: 1
The perceptron correctly learns the AND function.
To learn the OR function, we can use the same code, but with a different set of training data:
X = np.array([
[0, 0],
[0, 1],
[1, 0],
[1, 1]
])
y = np.array([0, 1, 1, 1])
pcp = Perceptron(learning_rate=0.01, max_iterations=100)
pcp.fit(X, y)
for i in range(X.shape[0]):
print("Predicted: ", pcp.predict(X[i]), "Actual: ", y[i])
Trying to learn the XOR function
The XOR function is not linearly separable and cannot be learned by a simple perceptron algorithm. The following code trains a perceptron to learn the XOR function:
X = np.array([
[0, 0],
[0, 1],
[1, 0],
[1, 1]
])
y = np.array([0, 1, 1, 0])
pcp = Perceptron(learning_rate=0.01, max_iterations=1000)
pcp.fit(X, y)
for i in range(X.shape[0]):
print("Predicted: ", pcp.predict(X[i]), "Actual: ", y[i])
Gives the following output:
Predicted: 1 Actual: 0
Predicted: 1 Actual: 1
Predicted: 0 Actual: 1
Predicted: 1 Actual: 0
The perceptron does not learn the XOR function, even after 1000 iterations.
Enjoy Reading This Article?
Here are some more articles you might like to read next: