Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Скачиваний:
3
Добавлен:
05.03.2016
Размер:
287.74 Кб
Скачать

6.3 Discussion

We proposed a hypothesis in order to characterize the situations in which XCS may not converge to an optimal policy. The hypothesis we formulated concerns the concept of environmental niche and suggests that XCS can fail to converge to a global optimum if the environmental niches are not explored frequently. We thus observe that the system should not explore one area of the environment for a long time; instead, it should frequently change environmental niche. Otherwise, XCS may start to learn locally, evolving classifiers which are correct with respect to a specific area but are inaccurate in some other area.

Notice that our hypothesis is not a matter of the environment or of XCS alone but depends upon the interaction between them. An environment in which the animat is likely to visit all the possible areas will be easily solved by XCS with the usual random exploration strategy.

We want to point out that, although the approach we followed to study the behavior of XCS regards a specific kind of environments, i.e., grid-worlds, the conclusions we draw appear to be general and therefore can be extended to other environments.

7 Verification of the Hypothesis

According to the hypothesis presented in the previous section, XCS can fail to converge to the optimum in those environments where the system is not likely to explore all the environmental niches frequently. If our hypothesis is correct, the phenomena we have discussed should not appear when XCS employs an exploration strategy guaranteeing frequent exploration of all the environmental niches.

In this section we validate our hypothesis empirically. We introduce a meta-exploration strategy, teletransportation, that we use as a theoretical tool to verify our argument. The strategy can be applied to any exploration strategy previously employed with XCS. Accordingly, we refer to it as a meta-exploration strategy rather than an exploration strategy.

Teletransportation works as follows: when in exploration, the animat is placed randomly in a blank cell of the environment; then it moves following one of the possible exploration strategies proposed in the literature, random or biased. If the animat reaches a food cell within a maximum number M[sub es] of steps, the exploration ends; otherwise, if the animat does not find food by M[sub es] steps, it is moved, i.e., teletransported, to another blank cell and the exploration phase is restarted. Teletransportation guarantees for small M[sub es] values that the animat visits all the possible niches with the same frequency; while for large M[sub es] this strategy becomes equivalent to the exploration strategy employed without teletransportation, e.g., random or biased.

We apply XCS with teletransportation (XCST) to the environments previously discussed (Maze5, Maze6 and Woods14) using the same parameters settings employed in the original experiments. Figure 10 compares the performance of XCST and XCS with biased exploration in Maze5, when a population of 1600 classifiers is employed and the M[sub es] parameter is set to 20 steps. Results show that, in Maze5 XCST converges to the optimum. As Figure 11 shows, XCST's performance is stable near the optimum even when only 800 classifiers are employed in the population. We have similar results when XCST is applied to Maze6 (see Figure 12). The comparison of the performance for XCST and XCS shows that XCST converges to an optimal solution while XCS with biased exploration, for the same parameter settings, cannot reach the optimum.

Figure 13 compares a typical performance of XCS with biased exploration with a typical performance of XCST when both systems are applied to Woods14. The immediate impression is that XCST's performance is not very stable and is only near optimal. However, to fully understand Figure 13, we have to analyze how XCST learns. When in exploration, XCST continuously moves in the environment in order to visit all the niches frequently. Accordingly, the animat does not learn the optimal policy in the usual way, by "trajectories", i.e., starting in a position and exploring until a goal state is reached.

XCST's policy instead emerges from a set of experiences of a limited number of steps the animat has collected while it was learning in the environment. The system immediately learns an optimal policy for the the positions near the food cells, then it extends this policy during subsequent explorations in the other areas of the environment. We can think of the artificial animal, the animat, as a natural animal that first secures a good path to food and then extends its knowledge to other areas of the environment. In Maze6, the policy is extended very rapidly because the positions of the environment are near to the food position. In Woods14, the analysis of single runs shows that XCST almost immediately learns an optimal policy for the first eight positions; then the policy also converges for the subsequent eight positions. At the end, the performance is near optimal because for the last two positions of Woods14, the most difficult ones, the optimal policy is not completely determined.

The experiments with XCST in Woods14 highlight a limitation of teletransportation as an exploration strategy: since the environment is explored uniformly, the positions for which it is difficult to evolve an optimal solution requiring more experience converge slowly toward an optimal performance.

Соседние файлы в папке 3