Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Скачиваний:
3
Добавлен:
05.03.2016
Размер:
287.74 Кб
Скачать

8 Exploration, Generalization, Models and Animats

Teletransportation is the heuristic we used to validate our hypothesis concerning generalization in XCS. From this perspective teletransportation should be considered a theoretical tool used in our experiments to support our hypothesis. Unfortunately, teletransportation cannot be applied to general problems, such as physical autonomous agents, because it would require the presence of a trainer that, every M[sub es] steps, picks up the agent and takes it to another area of the environment. We can, however, develop a technique from the teletransportation idea, feasible for general problems, through which a wider exploration of the environment can be guaranteed.

8.1 Related Work

As we pointed out previously, XCS usually learns a global policy, but it may tend to evolve local policies in those environments where the agent is not able to visit all the areas with the same frequency. This problem is not novel in the area of reinforcement learning. Many reinforcement learning algorithms, in order to converge to the optimum, require that the environment is visited uniformly. For example, when neural networks are employed, all the areas of the environment have to be explored with the same frequency, otherwise the neural network may overfit locally.

Solutions to this kind of problem for reinforcement algorithms have already been proposed. Sutton (1990) introduced the Dyna architecture which integrates the learning algorithm with a model of the environment that is built up by experience. The model is then employed to simulate exploration in other areas of the environment or for planning. Another solution is the one proposed by Lin (1993) where the idea of experience replay is introduced: past experienced trajectories to goal states are memorized and subsequently used to recall past experiences in order to avoid local overfitting.

8.2 Dyna Architecture for xcs

Teletransportation may be implemented for real problems by integrating XCS with a model of the environment built during exploration. The model can be subsequently employed as in Sutton (1990) to simulate exploration in other areas of the environment, while the agent explores one specific environmental niche. The model may also be used for planning. The simplest way to develop a model of the environment in a discrete state/action space, like grid-worlds, is to memorize the past experience as quadruples of the form (s, a, s', r), where: s is current sensory input; a is the action the agent selected when it perceived s; s' is the sensory input returned when the agent perceiving s has performed a; finally, r is the immediate reward the agent received for performing a when in s. This type of model, similar to Riolo's (1991) work on latent learning, is easily integrated into XCS. The overall system, which we call Dyna-XCS, works as follows.

When in exploration, the animat is placed randomly in a blank cell of the environment and then it moves under the control of XCS using one of the usual exploration strategies, i.e., random or biased. If the animat reaches a food cell by M[sub es] steps, the exploration ends. Otherwise, if the animat does not find food "in time" (in M[sub es] steps) the system stops exploring the environment and starts using the model of the environment in order to simulate an exploration experiment on the model. Accordingly, the current sensor configuration is memorized, and a new exploration starts in the model. Exploration within the model is very similar to the exploration the agent performs in the environment. First, the initial position is determined randomly among the states which appear in the first position of the quadruples that have been experienced. Then exploration continues on the model until S[sub es] steps have been performed, or the animat has reached a food cell in the model. At this point, the animat ends the simulated exploration in the model and restarts the exploration in the environment at the same position in which the exploration in the environment stopped.

Соседние файлы в папке 3