Training in Essential AI (Part 3): Delving into Artificial Neural Systems
Artificial Neural Networks (ANNs) have a rich history that dates back to the early decades of the 20th century. The foundation for ANNs was laid in 1943 when Warren McCulloch and Walter Pitts introduced the concept of an artificial neuron.
Fast-forward to 1957, and Frank Rosenblatt presented the Perceptron, one of the earliest types of ANNs designed for learning from data. However, the initial enthusiasm for the Perceptron waned in the 1960s due to its inability to solve non-linearly separable problems, such as the XOR problem. This limitation led to a decline in research and funding. The XOR problem was eventually addressed mathematically in 1969, paving the way for further developments in neural network research.
The modern era of deep learning began in the mid-2000s, with key contributions from Geoffrey Hinton, Yann LeCun, and Yoshua Bengio. Their work revitalized neural networks, enabling them to learn complex patterns through multiple layers, leading to breakthroughs in image recognition, natural language processing, and other domains.
Logistic regression can be considered a simple type of neural network, consisting of a single neuron with a sigmoid (logistic) activation function. In essence, the simplest ANN, the Perceptron, performs a linear combination of inputs followed by a non-linear activation; if the activation function is chosen to be the logistic sigmoid function, the model essentially implements logistic regression. Thus, logistic regression serves as a building block or a foundational model for understanding and extending to more complex multilayer ANNs.
An ANN consists of a collection of neurons connected in layers, which produces an output when the neurons in the input layer are excited. Each artificial neuron has a number of inputs Xj, weights Wj, a bias term b, and produces a single value Z as output. The network is trained using Gradient Descent, an optimization algorithm, to minimize the error between the predicted class probability and the true class label.
The output of a neuron is mapped onto a range by applying an activation function, such as the Sigmoid function. The optimization problem in ANNs seeks to minimize an objective function over a set of parameters for obtaining the desired output. Learning takes place between the layers, with the weights lW at each layer learned such that the output resembles the true class label when these weights are plugged into the ANN.
For instance, a simple ANN, as constructed for a lesson using Python and numpy, comprises three layers: an input layer, a hidden layer with three Neurons, and an output layer with one Neuron. This network is applied to a non-linearly separable dataset and optimized weights are obtained. The number of neurons in the output layer depends on the number of classes in the dataset.
A larger network, with multiple hidden layers (two hidden layers with 5 Neurons each), is created for non-linear classification. This constructed network can classify a set of data points belonging to two classes (0/1). The code for the Neural Network construction can be found at this GitHub repository.
In conclusion, the history of ANNs demonstrates the evolution from simple logistic regression models to complex deep learning networks. The lineage establishes logistic regression as both a predecessor and a simple building block within the broader framework of artificial neural networks.
Data-and-cloud-computing technologies have been instrumental in the advancement of Artificial Neural Networks (ANNs) by providing the necessary computational resources for training and deploying complex models. The cloud's scalable infrastructure enables researchers to train large-scale deep learning networks, which would be computationally prohibitive on local machines.
Artificial Intelligence (AI) algorithms, including ANNs, are continually evolving with the help of technology advancements and larger datasets. The collaboration between AI, technology, and data-and-cloud-computing is driving significant improvements in solutions for image recognition, natural language processing, and other domains, revolutionizing various industries.