AI Institute

Criteria for Learning Without Forgetting in Artificial Neural Networks

October 6, 2019

Winning the Best Paper Award at the IEEE International Conference on Cognitive Computing, Dr. Ibrahim Elfadel, Professor of Electrical and Computer Engineering, and his group use novel algorithms to better predict information saturation in artificial neural networks.

Artificial intelligence (AI) systems have achieved state-of-the-art performance in many machine learning tasks, but are yet to outperform the human brain, not least because they keep forgetting previously learned information. In a paper which won the Best Paper Award at the IEEE International Conference on Cognitive Computing, Milan, Italy, in July 2019, Dr. Ibrahim Elfadel, Professor of Electrical Engineering and Computer Science and Principal Investigator with the KU Center for Cyber Physical Systems, Dr. Rupesh Raj Karn, Postdoctoral Fellow, and Dr. Prabhakar Kudva, IBM Research Staff Member address this important problem of forgetful systems.

Artificial neural networks (ANN) are computing systems inspired by biological neural networks found in animal brains. Such systems learn to perform tasks by considering examples, generally without being programmed with task-specific rules. In image recognition, for example, an ANN may learn to identify images that contain cats by being trained on example images that have been manually labelled as “cat” or “not cat” and then using these the trained network to identify cats in other images. They do this without any prior knowledge of what constitutes a cat—i.e. that they have tails, whiskers and fur—by generating identifying characteristics from the examples they process.

However, when building a system or adding new capabilities, increasing the number of tasks required means that the system must preserve the inference accuracy based on the original data while incrementally training on additional data. For example, a robot may be delivered to a customer’s house with a set of default object recognition capabilities, but new site-specific object models may needed for the robot to navigate in the presence of objects not included in the original training set.

“Traditional machine learning models have typically assumed that all the training data is available prior to the model building phase,” explained Dr. Elfadel. “Very often, this is not the case.”

ANNs, especially those with multiple interconnected layers of neurons, also known as Deep Neural Networks (DNNs), have a higher capacity for progressive learning than other traditional machine learning models, mainly due to the potentially large number of parameters which can be tuned to incrementally build more accurate models. “However, even progressive learning on such DNNs cannot go on forever,” said Dr. Elfadel.

Programming AI to Not Forget with Task Progressive Learning

Catastrophic forgetting is the tendency of an ANN to completely and abruptly forget previously learned information upon learning new information. This makes continual learning difficult. While an ANN is based on the design of the human brain, there’s a fundamental difference between the two: humans leverage prior experiences to acquire new knowledge, but an AI system almost always needs to start from scratch. Granting ANNs with this ability is a cognitive computing conundrum.

“You can think of it this way: in an ANN (and possibly also in the human brain) memory data is encoded diffusively in the weights of the connections between the neurons,” explained Dr. Elfadel. “It is not easy to pin down the memory cells undergoing catastrophic forgetting. Nor is it easy to pin down what is being forgotten among already stored data.”

One possible solution is task progressive learning. This should effectively transfer knowledge across a series of tasks, incorporating prior knowledge at each layer by reusing old computations in learning new ones. Progressive networks are designed to retain a pool of pre-trained models throughout training and learn lateral connections to extract useful features for new tasks—much like the human brain does.

“Task progressive learning without catastrophic forgetting using artificial neural networks has demonstrated viability and promise,” said Dr. Elfadel. “Due to the large number of ANN hyper-parameters, a model already trained over a group of tasks can further learn a new task without forgetting the previous ones.”

Several algorithms have been proposed for progressive learning, including synaptic weight consolidation, ensemble, rehearsal, and sparse coding.

Detecting Information Saturation in ANNs

“One major problem with such methods, however, is that they fail to detect the congestion in the ANN shared parameter space to indicate the saturation of the existing network and its inability to add new tasks using progressive learning,” explained Dr. Elfadel. “The detection of such saturation is especially needed to avoid the catastrophic forgetting of old trained tasks and the concurrent loss in their generalization quality.

An ANN is based on a collection of connected units called artificial neurons, with each connection able to transmit a signal to other neurons. An artificial neuron that receives a signal then processes it and signals other neurons connected to it. These connections have a “weight” that adjusts as learning proceeds. The weight increases or decreases the strength of the signal at a connection.

“In all these methods, the set of ANN weights needed to accommodate the new tasks grow with their number. Intuitively, the ANN should become congested as soon as the set size is too high relative to the total number of weights. This intuitive congestion measure is clearly correlated with the onset of catastrophic forgetting.”

“In progressive learning, a tuning criterion is typically applied over the trajectory of ‘important’ ANN parameters in weight space so that their contours do not deviate much with respect to those defined by older tasks,” explained Dr. Elfadel. “The basic idea is that in an ANN, congestion should be detected when all of the parameters become ‘important’ for the training of older tasks. Once this is achieved, a barrier to progressive learning is created, and any further learning of new tasks would result in the catastrophic forgetting of older tasks.”

The paper proposes a methodology for ANN congestion detection based on computing the Hessian of the ANN loss function at the optimal weights for a group of previously learned tasks. In mathematics, the Hessian describes the local curvature of a function of many variables.

Predicting Saturation with Heuristic Algorithms

“Since the Hessian calculation is compute-intensive, we provide Hessian approximation heuristics that are computationally efficient,” said Dr. Elfadel. “The algorithms are implemented and analyzed in the context of two cloud network security datasets with results showing that the proposed metrics give an accurate assessment of the ANN progressive learning capacity for these datasets. Furthermore, the results show that progressive learning capacity is very much data-dependent with the network security data sets exhibiting higher congestion thresholds for progressive learning than the more traditional image data sets used in DNNs.”

The paper describes how a snapshot of the parameters is taken after each progressive training phase, with the rank of the loss function Hessian measured at these parameters. Closeness to full rank is indicator of a congestion risk in the ANN and of possible catastrophic forgetting.

“Our work provides a way to measure congestion and can be applied to several incremental learning paradigms such as multi-task continual learning, transfer learning, and progressive learning, to measure the risk of catastrophic forgetting with the learning of newer tasks,” said Dr. Elfadel. “We are very pleased with this award and grateful to the KU Institute for Artificial Intelligence and Intelligent Systems and its Center of Cyber Physical Systems for facilitating our participation in this IEEE conference. We hope that our techniques will become part of the automated machine learning design toolbox of data scientists and cognitive system designers.”

Jade Sterling
News and Features Writer
6 October 2019