Introduction

The perceptron algorithm is a simple algorithm for supervised learning of binary classifiers. A binary classifier is a function that can decide whether an input, represented by a vector of numbers, belongs to some specific class or not. The perceptron algorithm is an algorithm for learning a binary classifier from a set of training data. The perceptron algorithm is a special case of the more general linear classifier algorithm.

This classifier is a threshold function, a function that maps its input \(\textbf{x}\) (the vector) to an output \(f(\textbf{x})\) a single binary value:

\[f(\textbf{x}) = \begin{cases} 1 & \text{if } \textbf{w} \cdot \textbf{x} + b > 0, \\ 0 & \text{otherwise} \\ \end{cases}\]

In practice, what the perceptron does is to compute the dot product between the input vector and a vector of weights, and then add a bias term. If the result is positive, the perceptron outputs 1, otherwise it outputs 0. The weights and the bias are the parameters of the perceptron, and they are learned by the perceptron algorithm. This is equivalent to computing a linear combination of the input vector and the weights, and then adding the bias term. The core of the algorithm is to find the weights and the bias that make the perceptron output the correct value for each training example. These weights are found by starting with a random (or zero) vector of weights, and then adjusting the weights according to the following rule: if the perceptron outputs the wrong value, adjust the weights by moving them in the direction of the correct output. This is the perceptron learning rule, which can be formalized as follows:

\[\textbf{w} \leftarrow \textbf{w} + \eta \left( y - \hat{y} \right) \textbf{x}\]

where \(\eta\) is the learning rate, \(y\) is the correct output, \(\hat{y}\) is the perceptron’s output, and \(\textbf{x}\) is the input vector.

The activation function is a function that maps the output of the perceptron to a binary value. In the perceptron algorithm, the activation function is the Heaviside step function, which is defined as follows:

\[\sigma(a) = \begin{cases} 1 & \text{if } a > 0, \\ -1 & \text{otherwise} \\ \end{cases}\]

In practice, the implementation of this function for a binary classifier boils down to subtracting the output of the perceptron from the correct output, and then returning the result. This is equivalent to the following:

\[\sigma(y, \hat{y}) = y - \hat{y}\]

Python implementation

Note that this implementation explicitly computes the dot product between the input vector and the weights, and then adds the bias term. This is done for educational purposes, in practice the dot product is computed using the np.dot function, and the bias term is added using the np.add function.

Animation of the decision boundary learned by the perceptron algorithm

This plot is generated by the fit_plot function, a modified version of the fit function that plots the decision boundary as the training occurs. This function only handles 2-dimensional data, and it is not meant to be used in practice.

To generate some artificial data and train the perceptron, we can use the following code:

def make_data(n, dims=1):
    a = np.random.normal(size=(n, dims))
    b = np.random.normal(size=(n, dims))
    b += 3.5

    X = np.vstack((a, b))
    y = np.empty((n*2,))
    y[:n] = 0
    y[n:] = 1
    
    return X, y

# Plot the data
X, y = make_data(100, 2)
plt.scatter(X[:100, 0], X[:100, 1])
plt.scatter(X[100:, 0], X[100:, 1])

Here’s the full code for the perceptron algorithm:


import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from functools import partial
from matplotlib import rc

rc('animation', html='html5')

class Perceptron:
    def __init__(self, learning_rate=0.1, max_iterations=1000):
        self.learning_rate = learning_rate
        self.max_iterations = max_iterations

    def dot(self, a, b):
        """
        Dot product
        """
        y = 0
        for i,j in zip(a,b):
            y += i*j
        return y

    def sigma(self, y, y_hat):
        """
        Activation function, in practice the function has this output:
        if y == 1 and y_hat == 0:
            return 1
        if y == 0 and y_hat == 1:
            return -1
        return 0        
        """
        return y - y_hat

    
    def fit(self, X, y):
        # Initialize weights and bias to zero
        self.weights = np.random.uniform(size=X.shape[1])
        self.bias = 0
        
        # Train the perceptron for a maximum number of iterations
        for _ in range(self.max_iterations):
            # Loop through each training example
            for i in range(X.shape[0]):
                # Compute the predicted class for the current example
                predicted = self.predict(X[i])
                
                # Update the weights and bias
                self.weights += self.learning_rate * self.sigma(y[i], predicted) * X[i]
                self.bias += self.learning_rate * self.sigma(y[i], predicted)
    
    def fit_plot(self, X, y):
        """ Plots the decision boundary as the training occurs 
        Only works with 2-dimensional data        
        """
        if X.shape[1] > 2:
            raise Exception("Only words with 2-D data")
        
        # plot data
        fig, ax = plt.subplots()
        ax.scatter(X[:, 0], X[:, 1], c=y)
        
        # Initialize weights and bias
        self.weights = np.random.uniform(size=X.shape[1])
        self.bias = 0
        
        # collect data for animation
        y_vals = []
        x_vals = np.array([np.min(X[:, 0]), np.max(X[:, 0])])
        
        # Training loop
        for _ in range(self.max_iterations):
            
            # Loop through each training example
            for i in range(X.shape[0]):
                
                # Compute the predicted class for the current example
                predicted = self.predict(X[i])
                
                # Update the weights and bias
                self.weights += self.learning_rate * self.sigma(y[i], predicted) * X[i]
                self.bias += self.learning_rate * self.sigma(y[i], predicted)   
                
                # Define the decision boundary as a line in the form y = mx + b
                m = -perceptron.weights[0] / perceptron.weights[1]
                b = -perceptron.bias / perceptron.weights[1]

                # data to animate decision boundary plot
                y_vals.append(m * x_vals + b)

                
        # plot decision boundary
        self.line, = ax.plot(x_vals, y_vals[0], '--', alpha=0.8)

        ani = animation.FuncAnimation(fig, self.animate, fargs=[y_vals], frames=len(y_vals), interval=20, blit=True)
        return ani

    def animate(self, i, y):
        self.line.set_ydata(y[i])  # update the data.
        return line,

    def predict(self, X):
        # Compute the dot product between the weights and input features
        dot_product = self.dot(self.weights, X) + self.bias
        
        # Threshold function: return 1 if the dot product is positive, otherwise return 0
        return 1 if dot_product > 0 else 0

Learning logical functions

The perceptron algorithm can be used to learn logical functions, such as the AND, OR and XOR functions. The XOR function is not linearly separable and cannot be learned by a simple perceptron algorithm. The following code trains a perceptron to learn the AND function:

X = np.array([
    [0, 0],
    [0, 1],
    [1, 0],
    [1, 1]
])
y = np.array([0, 0, 0, 1])

pcp = Perceptron(learning_rate=0.01, max_iterations=100)
pcp.fit(X, y)

print("Weights: ", pcp.weights)
print("Bias: ", pcp.bias)

for i in range(X.shape[0]):
    print("Predicted: ", pcp.predict(X[i]), "Actual: ", y[i])

Gives the following output:

Predicted:  0 Actual:  0
Predicted:  0 Actual:  0
Predicted:  0 Actual:  0
Predicted:  1 Actual:  1

The perceptron correctly learns the AND function.

To learn the OR function, we can use the same code, but with a different set of training data:

X = np.array([
    [0, 0],
    [0, 1],
    [1, 0],
    [1, 1]
])
y = np.array([0, 1, 1, 1])

pcp = Perceptron(learning_rate=0.01, max_iterations=100)
pcp.fit(X, y)

for i in range(X.shape[0]):
    print("Predicted: ", pcp.predict(X[i]), "Actual: ", y[i])

Trying to learn the XOR function

The XOR function is not linearly separable and cannot be learned by a simple perceptron algorithm. The following code trains a perceptron to learn the XOR function:

X = np.array([
    [0, 0],
    [0, 1],
    [1, 0],
    [1, 1]
])
y = np.array([0, 1, 1, 0])

pcp = Perceptron(learning_rate=0.01, max_iterations=1000)
pcp.fit(X, y)

for i in range(X.shape[0]):
    print("Predicted: ", pcp.predict(X[i]), "Actual: ", y[i])

Gives the following output:

Predicted:  1 Actual:  0
Predicted:  1 Actual:  1
Predicted:  0 Actual:  1
Predicted:  1 Actual:  0

The perceptron does not learn the XOR function, even after 1000 iterations.