The task in the planning of goal-directed movements is to find a series of motor commands such that the final sensory information matches the desired value. Here, this problem is treated as an optimization task. The function to be optimized is the square error between anticipated and desired goal.
In our experiments, the goal state was not the complete sensory information, as in (7.1), but only the value gi in a predefined sector i. Thus, the cost function is E2 = (oi - gi)2, with oi equal to the predicted output in sector i. First, we assume that we know the appropriate number of chain links. The free parameters are the velocities vL and vR for each time step (link in the chain).
Two different optimization methods were applied, simulated annealing and Powell's method from the `Numerical Recipes' (Press et al., 1993). The first is more suited to find a global minimum, whereas the second might be caught in a local minimum.
Simulated annealing is a stochastic method for minimum search, occasionally allowing jumps to higher values of the cost function. The probability of these jumps is given by the Boltzmann distribution. The temperature parameter of this distribution is slowly reduced during the simulation, according to an annealing scheme. In the present study, a variant (Carter Jr., 1994) of the `Fast Simulated Annealing' (Szu and Hartley, 1987) was used. This variant starts with increasing the temperature up to a point at which a large jump to a higher value in the cost function occurs, and then decreases the temperature. The default parameters from Carter Jr. (1994) were used, except for the learning rate, which was set to 0.1, and the number of random steps at each temperature value, which was set to 20 times the number of free parameters. Random numbers were generated using the algorithm `ran1' from the Numerical Recipes (Press et al., 1993).
Powell's method is based on conjugate directions, but does not need the evaluation of a gradient. Here, the parameters were taken from the Numerical Recipes (Press et al., 1993). The fractional tolerance of the cost function was set to 10-4. Both optimization methods were initialized by setting all velocities to zero.
The treatment of the goal-directed movement as an optimization problem allows us to add penalty terms to the square error to restrict the possible range of solutions. The choice of velocities beyond the range ±60 mm/sec, used for training, was prohibited by punishing velocities outside this range with an additional term in the cost function (+10 000 pixels squared). This term was necessary because otherwise, for goals out of the reach of one interval, the optimization could result in large velocities for which no examples were available in the training set (for these velocities, the extrapolation of sensory predictions found by the network may be incorrect). To avoid collisions, a penalty term (+100 pixels squared) was added to velocity series that result in robot positions too close to an obstacle.
So far, we have assumed that the number of chain links is given; however, the number of time steps required to achieve a goal is not known beforehand. Therefore, we start with one link and increase the number of links in the optimization process. For each number of links, we solve the optimization and test if the resulting state matches the desired state (within 0.5 pixels--the resolution limit). If this criterion is not yet met, the number of links is increased by one and the optimization restarts from zero velocities. This is repeated until the criterion is met.
To test the goal-directed movements quantitatively, a random series of goals was chosen. A trial consisted of choosing a goal and executing the resulting movement. The goal sector was chosen among the ten sectors, and its value was chosen from the interval [50, 65]. With the given shape of the robot and the arrangement of the obstacles, it was physically possible for each sector to attain these values. At the beginning of each trial, an image was taken, which was used as the starting point of the anticipation. At the end of a trial, another image was taken for comparison with the desired goal. In the next trial, the robot started from where it ended in the previous movement sequence. The robot did two blocks of 50 trials. At the beginning of each block the robot was placed in the middle of the circle. This was done to increase the variety of movements, because at the end of a block, the robot happened to spend most of its time near the obstacles. The results are in section 7.3.4.