Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Тексты / 3 / Math.doc
Скачиваний:
171
Добавлен:
02.05.2014
Размер:
287.74 Кб
Скачать

6.2 Are Overgeneral Classifiers Inaccurate?

The generalization mechanism of XCS is sound so it is not clear why it may fail in certain environments. Lanzi (1997) observes that generalization in XCS is achieved through evolution; therefore, there may be cases in which the generalization mechanism can be too slow to delete overly general classifiers, and these have enough time to proliferate in the population.

We believe that Wilson's generalization hypothesis is correct; accordingly, we argue that XCS fails in learning a certain task when some terms of the hypothesis do not hold. First, we observe that:

For overly general classifiers to be "deleted", i.e., reproduce less and then be deleted, they must be observed by the system to be inaccurate. However, this happens only if overly general classifiers are applied in distinct environmental niches.

We argue that in XCS it is not always true that an overly general classifier will become inaccurate; in fact, due to the parameter update, a classifier becomes inaccurate only when it is applied to situations which have different payoff levels. However, this only happens when the classifier is applied in different situations, i.e., environmental niches. There are applications in which, due to the structure of the environment and to the exploration policy, the animat does not visit all the niches with the same frequency, but rather it stays in a certain area of the environment for a while and then moves to another one. In such situations, Wilson's generalization hypothesis may fail because overly general classifiers which should be inaccurate may be evaluated as accurate.

Consider for example an overly general classifier that matches two niches belonging to two different areas of the environment. As long as the system stays in the area belonging to the first niche, its parameters will be updated accordingly to the payoff level of the first niche. As long as the animat does not visit the second niche, the classifier appears accurate even if it is globally overly general.(n4) The overly general classifier is thus selected for reproduction and the system allocates resources, i.e., copies, to it. When the animat moves to the other area of the environment belonging to the second niche, the classifier starts becoming inaccurate because the payoff level that it predicts is no longer correct. At this point, two things may happen. First, perhaps the classifier did not reproduce sufficiently in the first niche; therefore, the (macro) classifier is deleted because it has become inaccurate: the animat thus "forgets" what it learned in the previous area. Second, if the overly general classifier reproduced sufficiently when in the initial niche, the (macro) classifier survives enough to adjust its parameters in order to become accurate with respect to the current niche. Therefore, the overly general classifier continues to reproduce and mutate in the new niche, and can produce even more overly general offspring. This behavior can be summarized as follows:

XCS usually learns a global policy. However, if the environment is not or cannot be visited frequently, it tends to learn a local policy that can produce overly general classifiers, which by definition cause performance errors.

Note that the phenomenon we discuss does not concern the general problem of having incomplete information about the environment caused by a partial exploration. The environments we use are small enough that, after the first two hundred problems, the system has tried almost all the possible environmental niches. Instead, our statement deals with the capability of XCS in evolving a stable solution. Thus our hypothesis states that:

XCS fails to learn an optimal policy in environments where the system is not very likely to explore all the environmental niches frequently.

This hypothesis concerns the capability of the agent to explore all of the environment in a uniform way; therefore it is related to the environment structure and to the exploration strategy employed. Since the exploration strategies previously employed within XCS in animat problems select actions randomly, our hypothesis is directly related to the average random walk to food. The smaller it is, the more likely the animat will be able to visit all positions in the environment frequently. The larger the average random walk, the more likely the animat is to visit certain areas of the environment more frequently. Our hypothesis, therefore, can explain why in certain environments XCS with biased exploration performs better than XCS with random exploration. When using biased exploration, the animat performs a random action only with a certain probability, otherwise it employs the best action. Accordingly, the animat is not likely to spend much time in a certain area of the environment but, following the best policy it learned, it moves to another area. When the environmental niches are more separated, such as in Maze6 and Woods14, the animat is unable to visit all the niches as frequently as would be necessary in order to evolve an optimal policy.

Соседние файлы в папке 3