On Intelligence: Computational and Otherwise

I am crazy about brains. I want to understand how the brain works, not just from a philosophical perspective, not just in a general way, but in a detailed nuts and bolts engineering way. My desire is not only to understand what intelligence is and how the brain works, but how to build machines that work the same way. I want to build truly intelligent machines. --Jeff Hawkins in On Intelligence.

The Beginning

On these pages I hope to share some of my investigations, thoughts, and experiments into the nature of intelligence. I am not a neuroscientist by training, thus many of my comments need to be taken with a grain of salt. I have a background in computer engineering, and most of my comments will strictly deal with the problem from a computational prespective.

Ever since my initial foray into the world of computers, I have been fascinated by artificial intelligence. Apparently Alan Turing, the father of modern computer science, spent some time looking into the same problem. The so called "Turing test" attempts to define a set of rules for determining if a machine is truly intelligent.

My earliest exposure to something that I thought was truly intelligent was a chess playing computer. A game of chess, played by two strong players, often exhibits qualities that we associate with intelligence: creativity, long range planning, problem solving, and memory. It was quite a surprise to me that a computer could be programmed to play a good game of chess. This initial experience with a chess playing computer eventually led me to create my own chess playing program. Named Rajah, the program participated in the Dutch Computer Chess Championships and in the AEGON man-machine tournament. Eventually the program was renamed RajahX and become the subject of my undergraduate thesis. I managed to do some further work while I was a master's student, but as I completed my master's degree and entered the doctoral program, I gave up on computer chess completely.

Although chess programs often exhibit behaviour that appears intelligent, they have no understanding of the game. They are merely calculators that evaluate a vast number of current and future positions, and select the most promising move to play on the board. This process is quite unlike the thought process that goes through a chess master's mind as he/she searches for the best move. He/she only evaluates a handful of moves, often guided by intuition and past experience. Just as planes don't fly according to the same principles as birds, computers play chess using a process that is quite different from humans. Computer chess was supposed to be the poster child of aritificial intelligence, but the solution that was developed was a simpler computational approach that taught us nothing about the nature of intelligence.

In recent times, especially after reading Jeff Hawkin's On Intelligence, I have found myself revisiting the question of intelligence and whether it is possible to replicate it in a machine. It should be possible unless we believe that there is some magic in there that cannot be explained by science. The more important question is whether we know enough about the brain to actually start building intelligent machines.

Entered July 25, 2006

The Two Types of Geniuses

I have held the belief that the geniuses amongst us perform their best work before they reach the age of 30. Some of the most influential work in engineering and the sciences were performed as part of an individual's docotoral thesis or soon thereafter, both of which usually take place before they are 30.

However, according to the current issue of Wired (July 2006), a new theory of genius is being put forward by David Galenson (University of Chicago Economist). The new theory divides geniuses into two categories: conceptual innovators and experimental innovators. Conceptual innovators are closer to the traditional view of geniuses. These are individuals who produce their most influential work when they are young, and their work usually makes bold, dramatic leaps foward in their respective disciplines. In contrast, experimental innovators often spend their lifetime tinkering and experimenting within their disciplines, and produce their most influential work much later in their careers.

The article provides some concrete examples of individuals in both categories. Conceptual innovators include people such as

And experimental innovators include such luminaries as

So, what kind of genius are you?

Entered July 26, 2006

The Brain as a Memory System

It is generally known that the brain is a sophisticated memory system. But what is less obvious are the features offered by this memory system, and the way in which these features arise out of a vast number of interconnected neurons (approximately 30 billion of them). Once again Jeff Hawkin's On Intelligence provides some answers. He suggests that there are four main features offered by the brain memory system:

  1. Stores a sequence of patterns.
  2. Recalls patterns auto-associatively.
  3. Stores patterns in an invariant form.
  4. Stores patterns in a hierarchy.
Without repeating Jeff Hawkins wonderful exposition of these features in On Intelligence, I will try to provide my own rationale for these and other features that are likely to be by-products of the four main ones.

Storage and Retrieval of Pattern Sequences. The brain appears to be built to store and retrieve sequences. Often, a memory of a particular passage of time cannot be recalled without "playing through" the key events before the passage in your mind. Usually, even after the passage is recalled, the brain continues onwards and recalls the sequence of events that happened after the passage. One of the best examples of sequential storage and retrieval is in the memory of music. To remember a particular section of music, it is almost impossible to simply jump into the middle without having played through the song from the beginning. An even more startling observation that suggests that the brain is built to process sequences is the presence of saccades or rapid eye movements in human vision. Even though we view the world in front of our eyes as being stable, the image that is received by the eye and projected into the brain is constantly changing. The eye makes rapid movements to gather differing views of the world in front of you, and what you end up seeing as the stable image in your mind's eye is a processed version of these views. In fact human vision is dependent on these sequences of differing images in order to function properly.

Auto-Associative Recall. This refers to the brain's ability to recall a pattern given a fragment of the pattern or an erroneous version of the pattern. The process is automatic, and we often forget that the brain is actually completing a fragment of the pattern we just experienced. Here are a few examples. Given a fragment of a song, we can often remember the remaining parts of the song. We can recognize familiar individuals even if they are partially obstructed or hidden. When a person familiar to us crosses our field of view, we might associatively recall fond and not-so-fond memories of that person. When listening to someone talk, the sounds that arrive at the ear are often garbled. It is through auto-associative recall that these sounds are corrected and you can fully comprehend the person.

Storage in Invariant Form. Patterns are stored in a way that allows them to be recalled even if there has been a transformation applied to the pattern. For example, larger, smaller, faster, or slower versions of the pattern should all map to the memory of a single representative (invariant) pattern. Many of our memories are stored in invariant form. Our recognition of familiar individuals is often irrespective of whether they are at a distance or up close, whether they are in shadow or bright sunlight, and whether they are facing us or away from us. Our recognition of familiar tunes is similar. We can recognize a tune even when it is played in a different key, at a different tempo, or with different instruments.

Hierarchical Storage. The brain processes patterns presented to it in a hierarchical fashion. Neurons much lower in the hierarchy process simple patterns. As you go further up the hierarchy more complex patterns are built out of the simpler patterns that exist lower in the hierarchy. This hierarchical processing of patterns has been demonstrated in the human visual cortex (the portion of the neocortex responsible for vision). Neurons at the bottom of the hierarchy are tuned to operate on a very small portion of the image received from the eye. These neurons often detect simple features such as lines oriented in a particular direction. In the next step up the hierarchy, neurons detect slightly more complex imagery such as contours. As you move further up the hierarchy, neurons detect shapes and objects. At the highest levels of the visual hierarchy, neurons process motion and are capable of recognizing complex objects.

Common Substructure Extraction. This particular property is likely a by-product of features 3 and 4, or features 3 and 4 are by-products of this property. I am not sure which of these two views is correct. It is highly inefficient (in both storage space and training time) to store a unique invariant form for every pattern that we wish to remember. What is more likely to happen is that the brain automatically finds common substructures in the patterns we wish to store and creates a memory of this substructure. Particular versions of the memory are then stored as differences in the common substructure. For example, rather than have a unique mechanism for recognizing every individual we know, what is more likely to happen is that the brain creates a mechanism for recognizing the ideal (or average) human face (a pair of eyes, nose, mouth, and ears), and then the recognition of a particular individual could be formulated as differences a particular individual's face has from the ideal face. Hierarchical processing of patterns also implies that substructures common to the higher layers has to exist at the lower layers. This notion of finding common substructure is similar to principles found in image recognition (eigenface) and image compression (fractal compression).

Compression. It is remarkable that human beings can store every life experience they have had as firing patterns on approximately 30 billion neurons. There have been a number of studies that try to quantify the memory capacity of the human brain. Some estimates place it at around 10^9 bits (probably correct to within an order of magnitutde). If we are to store every life experience in this limitted space there has to be some form of compresion employed by the brain. Notions such as invariant forms, hierarchical storage, and common substructure extraction are similar to a lot of concepts developed for data compression. It is highly likely that the compressed storage of patterns is one of the features offered by the brain memory system.

Entered August 19, 2006

Hawkins' Hypothesis: Prediction = Intelligence

Jeff Hawkins' hypothesis in On Intelligence is that what we observe as human intelligence is nothing more than an elaborate form of prediction. He argues that the neocortex builds a sophisticated model of the world and uses this model to perform predictions. The more I think about this hypothesis, the more certain I am of its validity.

The first clue that the hypothesis is likely correct is due to the physical limitations of neurons. A single neuron can send out a signal every 5 ms. If the brain were purely reactive and not predictive, every task that we are capable of would have to be computed by a group of neurons. However, the chain of computations cannot be deep because each additional element on the chain adds 5 ms to the overall computation time. A single neuron does not have a lot of computational power and it is hard to envision anything useful being performed without having to traverse a long chain of neurons. For example, consider a reaction test such as this one. This test requires the user to press the mouse button when a particular stimulus is visible. The time between the stimulus and button press is the user's reaction time. These tests force the brain to operate in a purely reactive mode as the stimulus appears after a random amount of time has passed. On the test my average reaction time was 0.25 seconds. Discounting the time taken by my eye to transfer the image of the stimulus to the brain and the time taken to transfer motor commands from the brain to my hand, a chain of 50 neurons (an upper bound) might have been traversed in order to complete this trivial task. However, we routinely perform complex tasks that require the brain to operate in the millisecond regime. For example, the simple act of running requires muscle control in the millisecond regime. The solution to this paradox is that most tasks are handled by the brain through prediction rather than reaction. In the case of muscle control during running, there are several sets of predicted muscle commands on the way down while the current set is being executed.

Prediction is involved in almost every task that isn't completely novel to us. For example, consider the task of catching a baseball. If you have had enough practice catching one, you instinctually move to the expected landing area of the ball that is in flight. Your brain has a stored model of how balls behave in flight, and makes a prediction of where it is expected to land. In order to appreciate the complexity of the model required to make this prediction, remember that the flight of the ball is governed by Newton's laws of motion and you are essentially solving something equivalent in order to compute the landing area. Even something quite trivial such as climbing a set of stairs familiar to us involves a lot of prediction. The entire climbing process is predictive: the number of steps, the expected height difference between steps, and the width of each step is all being computed ahead of time. If any of these parameters was changed you would likely fall or stumble when the prediction fails to match reality.

Here are some additional clues that suggest a strong relationship between prediction and intelligence.

Entered August 20, 2006

Prediction and the Features of the Brain Memory System

In the preceding entry, I suggested that prediction is the mechanism through which the brain carries out a lot of its activities. In an earlier entry, I also suggested that the brain is a memory system which offers four main features:

  1. Storage and retrieval of pattern sequences.
  2. Auto-associative recall.
  3. Storage in an invariant form.
  4. Hierarchical storage.
One can view prediction as being a direct result of the first three features. That is, prediction is the auto-associate recall of pattern sequences stored in an invariant form. For example, given some initial information about an event or activity (partial pattern), we retrieve (auto-associative recall) the remaining pieces (sequence memory) from a memory of a similar event or activity in the past (invariant memory).

Entered September 3, 2006

Prediction and Compression

Earlier, I established a link between the features of the brain memory system and prediction. In this entry, I describe the well known link between prediction and compression thereby adding more support to my earlier claim that compression might be another feature offered by the brain.

The problem of data compression can be divided into two subproblems: the problem of prediction and the problem of encoding. Given a stream of symbols to be compressed, the prediction subproblem determines the probability with which a particular symbol appears next in the input stream and the encoding subproblem is given the task of creating a code to represent the next symbol. Symbols that appear with high probability are given smaller codes and symbols that appear with low probability are given larger codes. The encoding subproblem is considered to be largely solved by methods such as Huffman coding and arithmetic coding. Both of these methods have been shown to generate codes that are close to optimal. The prediction subproblem, on the other hand, is far from being solved. In general, prediction seeks to build a model of the process that was used to generate the data being compressed, and then uses this model to determine the probability with which certain symbols follow other symbols. For example, prediction by partial matching is a compression algorithm that uses the context defined by the preceding N symbols to predict the probability of the next symbol. However, this is only an approximation as it sets a limit on the size of the context used in the prediction of the next symbol. Prediction is still an open problem in the research community and yet the brain does this very thing on a regular basis.

Entered September 4, 2006


Updated September 4, 2006