Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
287.74 Кб

5 Xcs in Woods14

Cliff and Ross (1994) presented experimental results for ZCS (Wilson, 1994), the system from which XCS was derived. They show that the failure in learning an optimal policy depends on the length of the sequence of actions required to reach food: the longer the sequence is, the more difficult the environment.

Our experiments in Maze5 and Maze6 might seem to confirm the results presented for ZCS. XCS in fact performs better in Maze5, which requires an average of 4.6 steps to reach food, than in Maze6, where the animat takes an average of 5.05 steps to reach food. However, the minor difference between the average number of steps in the two environments seems too small to justify the significant difference in system performance.

We now extend the results presented in the previous section by analyzing the performance of XCS in an environment requiring a long sequence of actions to reach the goal state. For this purpose, we apply three different versions of XCS in the Woods 14 environment. Woods 14 (Figure 6) is a simple environment, which consists of a linear path of 18 blank cells to a food cell, and has an expected optimal path to food of nine steps.

Initially, we applied XCS with biased exploration and XCS without generalization to Woods14 with a population of 2000 classifiers. General parameters are set as in the previous experiment except for the discount factor ? which is set to 0.9. The performance of XCS with biased exploration in Woods14 is shown in Figure 7. The performance of XCS when the generalization mechanism does not act, is shown in Figure 8. Curves are averaged over ten runs.(n2)

These results show that, even if biased exploration is introduced, XCS does not converge to an optimum in the Woods14 environment. However, when # symbols are not used, XCS easily reaches the optimum. The former result may indicate that the problems encountered with XCS depend on the length of the expected optimal path to food. The latter results shown in Figure 8 also suggest that XCS can solve problems which involve long sequences of actions. This result is extremely important; it shows that XCS is a better model of a classifier system than ZCS, because it is able to build long chains of actions, a task in which ZCS fails (Cliff and Ross, 1994).

In the second experiment, we apply XCSS to Woods14 with 2000 classifiers.(n3) Figure 9 reports the performance of XCSS in Woods 14; the curve, averaged over ten runs, shows that XCSS can evolve an optimal solution for Woods 14.

Although these results are interesting, they do not explain the causes which underlie the observed behavior. We need to study the generalization mechanism of XCS and Wilson's generalization hypothesis in order to understand XCS's behavior. This is the subject of the next section where we discuss the generalization capabilities of XCS and formulate a hypothesis to explain our results.

6 Generalization with xcs in Animat Problems

6.1 The Generalization Mechanism of xcs

The experimental results discussed in the previous two sections demonstrate that some grid worlds are more difficult for XCS to navigate than others. For example, in the Woods2 environment (see Wilson (1997a)) XCS easily produces optimal solutions; in others, such as Maze5, Maze6 and Woods 14, XCS may require special exploration policies and/or special operators.

Here we analyze the generalization mechanism of XCS in order to understand which factors may influence the performance of the system. We start by reconsidering Wilson's generalization hypothesis, which explains the fundamental principles of generalization in XCS as follows:

"Consider two classifiers C1 and C2 having the same action, where C2's condition is a generalization of C1's. That is, C2's condition can be generated by C1's by changing one or more of C1's specified (1 or 0) alleles to don't cares (#). Suppose C1 and C2 have the same epsilon, and are thus equally accurate.

Every time C1 and C2 occur in the same action set, their fitness values will be updated by the same amount. However, because C2 is a generalization of C1 it will tend to occur in more match sets than C1, and thus probably (depending on the action-selection regime) in more action sets. Because the GA occurs in action sets, C2 will have more reproductive opportunities and thus its number of exemplars will tend to grow with respect to C1's [...]. Consequently, when C1 and C2 next meet in the same action set, a larger fraction of the constant fitness update would be "steered" toward exemplars of C2, resulting via the GA in yet more exemplars of C2 relative to C 1. Eventually, it was hypothesized, C2 would displace C1 from the population." (Wilson, 1995)

Wilson's hypothesis explains how XCS develops a tendency to evolve maximally general classifiers. But what happens when an overly general classifier appears in the population?

Overgeneral classifiers are such that, due to the presence of some don't care symbols, they match different niches with different rewards and thus will become inaccurate. Since the GA in XCS bases fitness upon classifier accuracy, overly general classifiers tend to reproduce less and will eventually be deleted.

In Section 6.2, we will analyze the generalization mechanism in detail, to show why it may sometimes work incorrectly.

Соседние файлы в папке 3