Monday, May 10, 2010

The Art of Solving Sudokus - Part II

ANN - Artifical Neural Network
An artificial neural network (ANN) is the analog of the biological neural networks as we encounter in the “real world”. Such a biological neural network consists of neurons and synapses which connect neurons to each other. Each neuron can fire signals to the other neurons to which it is connected. Depending on the width of the synapse between the firing neuron and the neuron which is fired at, the fired signal can be weakened or strengthened. A neuron will only fire a signal when it is triggered to do so by his connecting neurons, and when the accumulated signal of those connecting neurons is strong enough.

An artificial neural network is defined in a similar way: it consists of nodes, as the analogue of neurons, and edges between these nodes, as the analogue of synapses. To each edge a weight is attached, corresponding to the width of the synapse. Starting from an initial state of the ANN we can iteratively determine values for each node by taking the weighed average of the values of all nodes connected to this node. This weighed average for such a node corresponds to the accumulated signal of all connecting nodes. The average is weighed using the values of the connecting nodes and the weights of the corresponding edges. The weighed average determines whether the node is about to fire or not. If this weighed average is more than a certain threshold value the value of the node is set to 1, that is the firing mode, if the weighed average is less than the threshold value it is set to 0, indicating the non-firing mode. The corresponding function to determine the firing mode, is called the discrete transfer function. The iteration step is called the local updating rule.

Example: Assuming a network of $N$ neurons $s_i$, having values in $\{0,1\}$, where the interaction between the neurons is represented by the weights $w_{ij}$, then the network can be updated by applying the rules:\[v_i=\sum_{j=1,\neq i}^{N} w_{ij}\cdot s_j \mbox{, and } s_i=sgn(v_i)\]
In real life biological neural networks behave “rational”, at least that is how we would like to consider ourselves, suggesting some stable behavior of such networks. This stable behavior can be expressed by an equilibrium state of the network. Common approach to determine such stable states is to consider them as states of the network where some so-called energy function is minimal. For each ANN we can also define an energy function which gives a value for each possible state of the ANN. This energy function is based on the weights of the synapses and the values of the neurons which are connected by these synapses.

Example: For the network of the previous example one could define the energy function by:\[E=\frac{1}{2}\sum_{i=1}^{N} \sum_{j=1,\neq i}^{N} w_{ij}\cdot s_i \cdot s_j\]
If certain conditions are satisfied, when applying the local update rule iteratively, the ANN will converge to an equilibrium state, in which the energy function will obtain a global minimum.

Since mankind can solve any of its problems (at least that is how we see ourselves), and the underlying scheme for it is provided by its biological neural networks, it makes sense to try to see whether ANN’s can also be used to solve certain types of optimization problems. Since solutions for optimization problems are the minimal values of some target function, say energy function, the underlying idea is to "program" the ANN in such a way that its energy function corresponds to the optimization problem we want to solve.

Using the local updating rule we can try to determine iteratively the optimal solution. Important restrictions are that not all optimization problems can be programmed in such a way, and that (as with all heuristics) the local updating rule will not always reach the global optimum, but can end up in a local optimum.

There are different types of ANN’s, depending on the choices we make in our modeling. These types are:
  1. Discrete ANN’s
  2. Fuzzy ANN’s
  3. Quantum ANN’s
The way that ANN's were defined changed during time, based on the experiences with them and due to the quality of the proposed method. The first ANN's were based on binary values for the nodes, using a discete step function as threshold function, and evolved to quantum like ANN's where discrete distributions are assigned as values to the nodes, and a temperature based Boltzmann function is used as threshold function.
These different types will be described in more details below.

Discrete ANN's
In a discrete ANN the values of the neurons are restricted to 0 and 1 (sometimes the values -1 and 1 are used). In this discrete case the local updating rule will be based on the discrete transfer function, and will only "jump" from one discrete state to another discrete state. Sometimes those jumps from one discrete state to another state will be "too big", implying that the local update rule will not be able to leave this state anymore. This results in a state that corresponds to only a local minimum instead of the global minimum we were looking for.

Example: In the discrete case the value 0 corresponds to the distribution (1, 0). The first entry in this vector is the probability that the value 0 occurs, the second entry the probability that the value 1 occurs. The value 1 corresponds to the distribution (0, 1).

Fuzzy ANN's
A first adjustment is to choose a different form for the transfer function. We can choose a "smooth" function instead of the discrete transfer function. This smooth function still depends on the input value, the weighed average of all connected neurons, but can now have all possible values between 0 and 1, instead of only 0 or 1. This type of ANN is called a fuzzy (or continuous) ANN.
It makes sense to consider such fuzzy ANN’s: since the values of the neurons will have values between 0 and 1, each value can be interpreted as the probability that the neuron will fire a signal. The parameter that is used for "smoothing" the transfer function is called the temperature.

Example: In the fuzzy case each neuron has a distribution (p, 1-p), where p is a value between 0 and 1.

A disadvantage of assigning values between 0 and 1 to the neurons, is that the local updating rule will not force the final state to be a discrete state, leaving the ANN in a fuzzy state. We can take care of this by adding an extra term to the energy function, which prefers discrete states, but in doing so we have to carefully choose the appropriate weight factor for this extra term, which is sometime a harder job than the original optimization problem itself.

Quantum ANN's
A way to overcome this problem and which results automatically in a feasible optimum state, consists of a "quantum" approach. Instead of assigning just a single value we assign a discrete distribution to a neuron. That is we assign to a neuron a number of possible values and for each value a probability that this values occurs.

Example: More generally we could assign K possible values to a neuron, with appropriate probabilities. This is called a K-state Potts spin. The natural representation for such a K-state Potts spin is a vector of K entries, where each entry has the value 0 or 1 (discrete variant) or between 0 and 1 (fuzzy variant). The vector should be a distribution, that is the sum of all entries should be 1.

The local updating rule can be reformulated in such a way that the resulting value is again a K-state Potts spin: when we apply the rule to a certain neuron we take the sum of all K-state Potts spins of all neurons which are connected to the specified neuron. We normalize the resulting K-vector in such a way that the sum of all entries becomes 1. There is an 'obvious' choice for this normalization.