Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Тексты / 3 / Math.doc
Скачиваний:
171
Добавлен:
02.05.2014
Размер:
287.74 Кб
Скачать

2 Description of xcs

We now overview XCS according to its most recent version (Wilson, 1997a). We refer the interested reader to Wilson (1995) for the original XCS description or to Kovacs's report (Kovacs, 1996) for a more detailed discussion for implementors.

Classifiers in XCS have three main parameters: (1) the prediction p, which estimates the payoff that the system expects if the classifier is used; (2) the prediction error c, which estimates the error of the prediction p; and (3) the fitness F, which evaluates the accuracy of the payoff prediction given by p and thus is a function of the prediction error Epsilon.

At each time step, the system input is used to build the match set [M] containing the classifiers in the population whose condition part matches the sensory configuration. If the match set is empty a new classifier which matches the input is created through covering. For each possible action a[sub i] in the match set, the system prediction P(a[sub i]) is computed as the fitness weighted average of the classifier predictions that advocate the action a[sub i] in [M]. P(a[sub i]) gives an evaluation of the expected payoff if action a[sub i] is performed. Action selection can be deterministic, the action with the highest system prediction is chosen, or probabilistic, the action is chosen with a certain probability among the actions with a non-null prediction.

The classifiers in [M], which propose the selected action, form the current action set [A]. The selected action is then performed in the environment and a scalar reward r is returned to the system together with a new input configuration.

The reward r is used to update the parameters of the classifiers in the action set corresponding to the previous time step [A][sub -1]. Classifier parameters are updated as follows. First, the Q-learning-like payoff P is computed as the sum of the reward received at the previous time step and the maximum system prediction, discounted by a factor Gamma (0 </= Gamma < 1). P is used to update the prediction p by the Widrow-Hoff delta rule (Widrow and Hoff, 1960) with learning rate Beta (0 </= Beta </= 1): p[sub j] arrow left p[sub j] + Beta (P - p[sub j]). Likewise, the prediction error Epsilon is adjusted with the formula: Epsilon arrow left Epsilon[sub j] + Beta(|P - p| Epsilon). The fitness update is slightly more complex. Initially, the prediction error is used to evaluate the classification accuracy Kappa of each classifier as Kappa = exp(ln Alpha(Epsilon - Epsilon[sub 0])/Epsilon[sub 0]) if Epsilon > Epsilon[sub 0] or Kappa = 1 otherwise. Subsequently the relative accuracy Kappa' of the classifier is computed from Kappa as Kappa' = Kappa/Sigma[sub [A][sub -1]] Kappa. Finally, the fitness parameter is adjusted by the rule F arrow left F + Beta(Kappa' - F).

The genetic algorithm in XCS is applied to the action set. It selects two classifiers with probability proportional to their fitnesses and copies them. It performs crossover on the copies using probability Chi while using probability Mu to mutate each allele.

Macroclassifiers. Introduced by Wilson (1995), an important innovation with XCS is the definition of macroclassifiers. These are classifiers that represent a set of classifiers with the same condition and the same action by means of a new parameter called numerosity. Whenever a new classifier has to be inserted in the population, it is compared to existing ones to check whether there already exists a classifier with the same condition-action pair. If such a classifier exists then the new classifier is not inserted in the population. Instead, the numerosity parameter of the existing (macro) classifier is incremented. If there is no classifier in the population with the same condition-action pair then the new classifier is inserted in the population.

Macroclassifiers are, essentially, a programming technique that speeds up learning by reducing the number of classifiers XCS has to process. Wilson shows that use of macroclassifiers substantially reduces the population for normal mutation rates, especially if the environment offers significant generalizations. In addition, he shows that the number of macroclassifiers is a useful statistic for measuring the level of generalization of the solution by the system.

Subsumption Deletion and Specify. Since XCS was introduced, two genetic operators have been proposed as extensions to the original system: subsumption deletion (Wilson, 1997a) and specify (Lanzi, 1997b).

Subsumption deletion was introduced to improve the generalization capability of XCS. Subsumption deletion acts when classifiers created by the genetic algorithm are inserted in the population. Offspring classifiers created by the GA are replaced with clones of their parents if: (1) they are specializations of the two parents, i.e., they are subsumed by their parents, (2) their parents are accurate, and (3) the parameters of their parents have been updated sufficiently. If all these conditions are satisfied the offspring classifiers are discarded and copies of their parents are inserted in the population; otherwise, the offspring are inserted in the population.

The idea of subsumption deletion is that, since the goal of XCS is to evolve an accurate, maximally general representation, it is useless to specialize classifiers that are already accurate. Accordingly, with subsumption deletion, accurate classifiers can produce only more general offspring.

Specify was introduced to assist the generalization mechanism of XCS in eliminating overly general classifiers. Specify acts when a significant number of overly general classifiers are in the action set. This condition is detected by comparing the average prediction error of classifiers in the action set Epsilon [A] with the average prediction error of classifiers in the population Epsilon [P]. If Epsilon [A] is twice Epsilon[P] and the classifiers in [A] have been updated, on average at least N[sub sp] times, then a classifier is randomly selected from [A] with probability proportional to its prediction error. The selected classifier is used to generate one offspring classifier in which each # symbol is replaced, with a probability of P[sub Sp], with the corresponding digit in the system input. The resulting classifier is then inserted in the population and another is deleted if necessary.

Соседние файлы в папке 3