Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Стратегии детекции в рампознавании лиц / Detection Strategies For Face Recognition Using Learning And Evolution Phd Dissertation 1998

.pdf
Скачиваний:
17
Добавлен:
01.05.2014
Размер:
1.05 Mб
Скачать

èêìëiíé îïZðïeñiò ø

The goal of this thesis is to advance a novel learning and evolutionary methodology for developing detection strategies for face recognition tasks and to assess its feasibility for forensic analysis using the FERET face database. The novel strategies are characteristic of the emerging fields of Behavior-Based AI and Active and Selective Vision (ASV) which are discussed in Chap.2. Both behavior-based AI and ASV require adaptation and towards that end learning and evolution are discussed in Chap.3. As both subject detection in video sequences and then face detection are prerequisites for eye detection, we concern ourselves with those topics in Chap.4. Eye detection process can be addressed using (i) optimal selection of features and classification schemes (DT - decision trees) evolved based on Genetic Algorithms (GA) and applied exhaustively across the whole face for eye detection, or (ii) optimal behaviors encoded as Finite State Automata (FSA), also evolved using GAs, for navigation and detection across facial landscapes, where classification is selectively applied now only to salient areas. The corresponding methodologies and relevant experiments for those two approaches to eye detection are described in Chap. 5 and 6, respectively. The final chapter, Ch. 7, discusses the merits of this thesis and points out to future and promising research directions.

11

CHAPTER

2

щыъэьяю

Machine learning is the subfield of AI concerned with intelligent systems that learn. It is the computational study of algorithms that improve performance based on experience. Recent years have seen an explosion of work on this topic, which has produced a wide variety of automated learning algorithms. The fundamental characteristics of intelligent behavior are the abilities to pursue goals and to plan for future actions. To exhibit these characteristics, an intelligent system - human or machine - must be able to classify objects, to display appropriate behaviors for achieving given goals, such as those related to perceiving and acting in some landscaped environment. The intelligent system needs to also generalize, handle noisy inputs, use prior knowledge, deal with complex environment, and explore (navigate) an unknown space.

Research in AI considers that the intelligent tasks can be implemented by a reasoning process operating on symbolic and explicit internal representations. This approach has proven successful for knowledge-based tasks such as expert level reasoning. As this approach can not be extended easily to develop autonomous agents, a novel methodology, has been suggested, that of behavior-based AI. In order for an agent ('robot') to act autonomously over a wide range of tasks and environments, it must be capable of exhibiting a wide range of different behaviors (Manning, 1979; McFarland, 1987). The behavior-based AI approach is based on the conception or construction of simulated animats capable to survive in more or less unpredictable and threatening environments. The animats (Roitblat, 1987) prove themselves capable of actively searching for useful information and choosing behaviors that permit them to benefit from the interactions with the environment. They are able to improve their behaviors through adaptation using learning or evolutionary processes. From this perspective, the animat approach heavily relies upon recent work on cognitive behavior of animals (Roitblat and Meyer, 1995) and computational models inspired by nature, such as genetic algorithms (GAs).

$&%('*),+""!"# -/. 02143"5ACB"DE>F 6*728:9

Maes (1992) has proposed autonomous agents (animats) as sets of modules, each of them having its own specific but limited competence. A competence module consists of a node (state) and the links (transition) to attach to another module. The competence models are linked in a network through three types of links: successor link, predecessor link and conflictor link. Modules then use these links to activate and inhibit each other, and as the activation (support) energy accumulates, the best actions to be taken given the current situation (state and input) and goals emerge. Once the activation level corresponding to a given activity exceeds some threshold, it becomes active and it is executed. Specifically, behavior-based AI has advanced the idea that for successful operation (and survival) an

12

intelligent and autonomous processor should (i) consist of multiple competencies ('routines'), (ii) be "open" or "situated" in its environment, and (iii) monitor the domain of application and figure out, in a competitive fashion, what to do next while dealing with many conflicting goals simultaneously. The need for being open or "situated" and to explore and monitor it requires from an animat to continously sense and perceive its environment. As such processe are resource intensive the need for selectivity becomes crucial. In analogy to biological systems one then considers the possibility of active and selective vision as it would be discussed in Sect. 2. 2.

An agent is thought of as sensing its environment and acting upon it through effectors. Marvin Minsky (1985) describes the brain’s operation in terms of an ensemble of agencies, ‘the organization of agents’, each responsible for a simple functionality. The agencies communicate among themselves to reach a ‘ decision’. This result of the emerging effect is essentially the operational mind. An early example of such an ensemble of agencies is the subsumption architecture or reactive behavior (Brooks, 1985). Mobile robots, like human being, need to build and maintain models of the environment. The models, known as visual maps, enable the animat to perform its main tasks which are navigation and manipulation. As Rodney Brooks argues, a useful visual map would in fact be a collection of local maps and their relationships. When the observer moves, new information becomes available, and the view of the world can be appropriately updated. Brooks has then advocated an approach for designing autonomous robots that displays the characteristics of both reactive and behavior-based AI (Brooks, 1986). The idea is that the overall agent design should be decomposed, not into general functional components such as perception, learning, and planning, but rather into specific behaviors such as obstacle avoidance, wall-following, foraging for food (©information©), and exploration. Each behavioral module accesses the sensor inputs independently to extract just the information it needs, and sends its own signal to the effectors. Behaviors are arranged into a prioritized hierarchy in which higher level behaviors can access the internal state of lower level behaviors and can modify their outputs. The main aim of behavior-based AI is to eliminate the reliance on a centralized, complete representation of the world state. The internal state is needed only to keep track of those aspects of the world state that are inaccessible to the sensors and are required for action selection in each behavior.

Fukuda (1994) discusses the concept of a ‘society of robotic’ system. The robotic system that constructs a society is not a system in which individual robots carry out separate tasks, but a robotic system in which many robots carry out tasks in coordination. A robotic system configured in this way is a distributed autonomous robotic system which must have cooperation and competition among robots. He has also proposed a dynamically reconfigurable robotic system, called the Cellular Robotic System (CEBOT). The CEBOT system is highly distributed and it is composed of fundamental elements (cells) characterized by different functionalities and able to operate as autonomous units. Collective behavior, characteristic of such an architecture isd similar to the ©Society of Minds© discussed above and it can be traced back to the blackboards systems employed by early AI. Blackboard systems are domain-specific problem solving systems that employ an incremental and opportunistic problem solving style. The blackboard architecture (Erman et. al., 1980) has three different components: a global database called the blackboard, independent knowledge sources that generate solution elements on the blackboard, and a scheduler to control knowledge source activity. As autonomy considerations become important one expands on the blackboard architecture and moves then to an architecture where the data base is distributed amongst the agents and across the connections linking the agents.

Artificial Life (A-Life) - as defined by Chris Langton (1989) one of its founders - is a field of study devoted to understanding life by attempting to abstract the fundamental dynamic principles underlying biological phenomena,

13

and recreating these dynamics in other physical environment and making them accessible to new kinds of experimental manipulation and testing. Artificial life is based on a synthetic approach operating in two phases: (i) abstracting the logical principles of living organisms, and (ii) implementing these logical principles through synthesis on another media like computers. A-life systems also consist of a large collection of simple, basic units whose interesting properties emerge as a result of both competition and cooperation. One example is Von Neumann’s model (Von Neumann, 1966), where the basic units are grid cells and the observed phenomena involve composite objects consisting of several cells. Another example is Craig Reynolds’ work (Reynolds, 1978) on flocking behavior in which he investigated how flocks of birds, called ‘boids’, fly without central direction. The computerized world he simulated was populated with a collection of boids, flying in accordance with three rules: collision avoidance, velocity matching and flocking centering. Each boid consists of a basic unit that senses only its nearby flock-mates and acts according to the above rules leading to the emergence of flocking behaviors. Reynolds’ flocks-of-birds model demonstrates the basic architecture of A-life systems: a large number of elemental units interacting with a small number of nearby neighbors without a central controller.

Another major area of A-life research is to model ecosystem behavior and the evolutionary dynamics of populations. Holland’s Echo system (Holland, 1993) models ecologies in the same sense that GA models population genetics. It simulates a small set of primitive agent-agent and agent-environment interactions. The goal of Echo is to study how simple interactions among simple agents lead to high-level emergent phenomena such as cooperation and competition in a society of agents. Echo consists of a population of agents distributed on a set of sites on a lattice, and each lattice is also distributed different types of renewable resources for different types of agents’ intake. Agents can interact by mating which results in an offspring from a combination of parents’ genome, and by trading or fighting which results in the exchange of internal resources between agents. Each agent has a particular set of rules encoded by its genome that determines its interactions with other agents and determines the types of resources it requires. After evolution using GAs, the Echo system has demonstrated complex behavior, ecological dependencies among different species, and sensitivity to differing levels of renewable resources. Recently, Demetri Terzopoulos et al (1995) have used the ‘animat’ approach to develop artificial life involving highly evolved and complex behaviors. This work demonstrates a virtual marine world inhabited by realistic artificial fish. It models the autonomous agent, artificial fish, situated in a simulated environment and interacting with each other. The agent has (i) deformable body actuated by internal muscles and locomoted according to biomechanic and hydrodynamic principles; (ii) visual sensors that can image the environment; and (iii) a controller models motor, perception, behavior, and learning system. His algorithms not only emulate the appearance, movement, and behavior of individual agents but also illustrates complex group behaviors. The individual and emergent collective behaviors include caudal and pectoral locomotion, collision avoidance, foraging, preying, schooling, and mating.

G HGJI4KN"OQPL"MR*SUT"V["VC\YZa,bc]^(d_*] `

The flow of visual input consists of huge amounts of time-varying information. It is crucial for both biological vision and automated systems to perceive and comprehend such a constantly changing environment within a relatively short processing time. To cope with such a computational challenge one should locate and analyze only the information relevant to the current task by quickly focusing on selected areas of the scene as per need. One would think that the

14

vast amount of input data reaching the sensors must be processed in parallel by the human visual system (HVS) in order to obtain reasonable performance, but due to architectural constraints this is hardly feasible. Attention mechanisms thus balance between computationally expensive parallel techniques and time intensive serial techniques to simplify computation and reduce the amount of processing. Besides complexity reasons (Culhane et al., 1992) efficient attention schemes are needed also to form the basis for behavioral coordination (Allport, 1989).

Active and selective vision leads directly to issues of attention. Sensory, perceptual, and cognitive systems are space-time limited, while the potential information available to each of them is potentially infinite. Much of attentional selectivity, explained as filter theory in terms of system limitation with respect to both storage and processing capabilities, is mostly concerned with early selection of spotlights and late but selective processing of control and recognition. Selective processing of regions with restricted location (or motion) is thus necessary for achieving almost real-time and enhanced performance with limited resources. As an example, restricted but enhanced processing becomes possible and it implements the equivalent of foveal perception. Furthermore, spatial attention appears to cause suppression of responses to unattended stimuli in V4, whereas increased but still spatially localized effort (when the task is made more difficult) would cause enhancement of responses and sharpened selectivity for attended stimuli (Spitzer, 1988).

Various computational models of visual attention have been proposed to filter out some of the input and thus not only reduce the computational complexity of the underlying processes but possibly provide a basis to form invariant and canonical object representations as well. Several biologically motivated models of attentional mechanisms and visual integration have appeared in the literature. For most of these models every point in the visual field competes for control over attention based on its local conspicuity and the history of the system. A high level feature integration mechanism is then implemented to select the next center of gaze or fixation point. Koch and Ullman [13] have proposed using a number of elementary maps encoding conspicuous orientation, color, and direction of movements, which would then be merged into a single representation, called the saliency map. The most active location of this map is computed by a winner take-all (WTA) mechanism and selected as the next focus of interest. The computational mechanisms needed to implement the perceptual behaviors underlying behaviorbased AI are collectively known as a Visual Routine Processor (VRP).

e fgJh4ij/kml nQo p&qmr"st*uCv*w/x{ x w

As part of behavior-based AI, Maes (1992) has proposed autonomous agents (animats) as sets of reactive modules. A similar behavior-based like approach for pattern classification and navigation tasks is suggested by the concept of visual routines (Ullman, 1984), recently referred to as a visual routine processor (VRP) by Horswill (1995). The VRP assumes the existence of a set of visual routines that can be applied to base image representations (maps), subject to specific functionalities, and driven by the task at hand. Moghaddam and Pentland (1995) have suggested that reactive behavior be implemented in terms of “ perceptual intelligence” so the sensory input is directly coupled to a (probabilistic) decision-making unit for the purpose of control and action. An autonomous agent is then essentially a Finite-State Automata (FSA) or Hidden-Markov Model (HMM) whose feature inputs are chosen by sensors connected to the environment and/or are derived from it, and whose actions operate on the same environment. The automata decides its actions based on inputs and/or features it ©forages©, while the behavior of the controller is learned. It is up to evolution and learning to collectively define and hardwire such purposeful automata.

15

A difficult and still open question regarding the concept of visual routines introduced earlier is the extent to which their design could be automated. Brooks (1985), for example, has shown that much of his initial success has been due to carefully but manually chosen behaviors and cleverly designed interactions among the modules. In order to scale up to more complex behavior and greater robustness, Brooks (1985) and others are looking to machine learning techniques and evolutionary algorithms to tune and properly adapt such behavioral routines. The important question for the VRP mentioned earlier is how to automatically craft such visual routines and how to integrate their outputs. Early attempts, developed using manual design, involved simulations lacking low-level (base) representations and operating on bitmaps only. Ramachandran (1985), an advocate of the utilitarian theory of perception, has suggested as an alternative that one could craft such visual routines by evolving a “ bag of perceptual tricks” whose survival is dependent on functionality and fitness. This approach, which can be directly traced to the earlier “ Neural Darwinism” theory of neuronal group selection as a basis for higher brain function (Edelman, 1987), suggests natural selection as the major force behind the automatic design of visual routines and their integration. Another possibility for evolving visual routines would employ Evolutionary Computation (EC).

16

CHAPTER

3

|~} 4

Animals’ behavior is adaptive as long as it allows the animal to survive in a changing, unpredictable, and more or less threatening environment. Similarly, the behavior of an agent (robot) is considered to be adaptive as long as it can continue to perform the functions for which it was built. Both learning and evolution are the main sources for enabling animats (©animals©) to survive in complex and dynamical environments. An animat thus improves the adaptive character of its behavior while it experiences new situations in its environment. Several learning strategies based on human learning models have been proposed, including supervised, unsupervised, and reinforcement learning, etc. using different techniques such as symbolic AI, connectionist neural networks, or statistical pattern recognition. With the help of evolution, the behavior of individuals in a population from generation to generation can improve as well. The animat uses its sensors to capture information about its environment, so it can act on and explore its surrounding in order to adapt to the outside environment. The performance of animat’s adaptation is improved based on its experience.

Vision systems that have successfully supported nontrivial tasks have usually taken advantage of constraints derived from the task to be performed and the environment where it is expected to react in a speedy fashion for increased reliability and lower complexity of perceptual processes. As it has been mentioned earlier, visual processing is encapsulated as visual routines (Ullman, 1984) consisting of different base representations, perceptual operations and control. Firby et al. (1996) propose an animat architecture based on such a reactive behavior to build general purpose vision systems. The reactive skills are invoked during the visual and action processes and they can be terminated on demand by control units.

" * ,

Learning, a fundamental aspect of intelligence, enables the animat to improve its performance based on experience. Learning results from the interaction between the agent and the world and from the agent’s observation of its own decision-making processes. It evolves making changes on the agent’s internal structures to improve its performance in future situations. A learning agent has several conceptual components (Fig. 3.1) (Russell, 1996).

17

LEARNING AGENT

 

 

 

 

 

 

feedback

 

Performance

learning goals

Learning element

Critic

standard

 

 

 

 

changes

knowledge

 

 

 

percepts

 

 

 

 

program generator

Performance element

 

WORLD

 

 

 

 

 

 

actions

Figure 3.1 A general model of learning agents

The learning element, which is responsible for making improvements, and the performance element, which is responsible for selecting external actions. The design of the learning element of an agent depends very much on the design of the performance element. The critic encapsulates a fixed standard of performance, which it uses to generate feedback for the learning element regarding the success or failure of its modifications to the performance element. The performance standard is necessary because the percepts themselves cannot suggest the desired direction of improvement. The problem generator is the component responsible for deliberately generating new experiences. New rules and procedures can be added to the performance element (change). The knowledge accumulated in the performance element can also be used by the learning element to make better sense of the observations (knowledge). The learning elements are also responsible for improving the efficiency of the performance element. The design of the learning elements is also affected by the learning setups of (i) which components of the performance element are to be improved, (ii) how those components are represented in the agent program, and (iii) what prior information is available to interpret the agent’s experience. The animat is thus responsible for improving both its learning and the performance elements. The problem generator becomes available as a result of explorations.

The basic problem studied in machine learning has been that of inducing a representation of a function, a systematic mapping between inputs and outputs (©learning from examples©), usually referred to as generalization. Such mappings can be learned using methods such as attribute-based representations, or connectionist neural networks, Decision trees are the most commonly used attribute-based learning methods and are introduced next.

" / ¡£ ¢ ¢

The basic aim of any concept-learning symbolic system is to construct rules for classifying objects given a training set of objects whose class labels are known. The objects are described by a fixed collection of attributes, each with its own set of discrete values and each object belongs to one of two classes. The rules derived in our case will form a decision tree (DT). The decision trees are derived using C4.5, the most commonly used algorithm for the induction of decision trees (Quinlan, 1986). The C4.5 algorithm uses an information-theoretical approach, the entropy, for building the decision tree. The entropy is a measure of uncertainty (©ambiguity©) and characterizes the intrinsic ability of a set of features to discriminate between classes of different objects. The entropy E for a feature set {f} is given by:

n

mj

é

æ

 

x

+

ö

 

æ

 

x

öù

 

E( f ) = åå êx i+,k

log2 ç

 

 

i ,k

 

÷

xi,k

log 2 ç

 

 

i, k

 

÷ú

(3.1)

+

 

 

+

 

 

k =1 i =1

ë

è x i, k

+ x i, k ø

 

è

xi ,k

+ xi ,k øû

 

18

where n is the number of classes and mj is the number of distinct values that feature f can take on. xi,k+ is the number of positive examples in class k for which feature f takes on its ith value. Similarly xi,k- is the number of negative examples in class k for which feature f takes on its j th value.

In an iterative fashion C4.5 determines the feature which is most discriminatory and then it dichotomizes (splits) the data into classes categorized by this feature. The next significant feature of each of the subsets is then used to further partition them and the process is repeated recursively until each of the subsets contain only one kind of labeled data. The resulting structure is called a decision tree, where nodes stand for feature discrimination tests while their exit branches stand for those subclasses of labeled examples satisfying the test. An unknown example is classified by starting at the root of the tree, performing the sequential tests and following the corresponding branches until a leaf (terminal node) is reached indicating that some class has been decided on. Decision trees are disjunctive, since each branch leaving a decision node corresponds to a separate disjunctive case. After decision trees are constructed a tree pruning mechanism is invoked. Pruning is used to reduce the effect of noise in the learning data. It discards some of the unimportant sub-trees and retains those covering the largest number of examples. The tree obtained thus provides a more general description of the learned concept.

¤¤¥ ¦ §©¨&ª>«m¬"¨® ¯ ° ±³²Q´&µ·¶­´&¼½º¼*¾¡¿½À¸m¹ º,¹ÉÌ*Í» Î

The process of natural selection leads to evolution as a result of adaptive strategies being continuously tested for their fitness as it is the case for closed-loop control. Reasoning by analogy, one attempts then to emulate computationally the 'survival of the fittest' concept for complex and difficult problems as those encountered in detection, navigation and homing. Evolutionary Computation (EC) in general, and Genetic Algorithms (GAs) in particular, mimic what nature has done all along and it does that using similar principles. GAs are further defined when one provides a specific strategy for choosing the offsprings and/or the next generation. Simulated breeding is one of the possible strategies where offsprings are selected according to their fitness. Note also that simulated breeding is conceptually similar to stochastic search in general, and to simulated annealing in particular, for the case when the size of the offspring population is limited to one individual only.

Genetic algorithms (Goldberg, 1989), as examples of evolutionary computation, are stochastic search techniques using non-deterministic method that sift through a population of solutions using the principles of evolution and natural genetics. In recent years, genetic algorithms have become a popular optimization tool for many areas of research, including the field of system control (optimization), search, and machine learning. Genetic algorithms (GAs) are adaptive search techniques initially introduced by Holland (1975). Genetic algorithms typically maintain a constant-sized population of individuals ('chromosomes') which represent samples from the space to be searched. Each individual is evaluated on the basis of its overall fitness with respect to some prespecified functionality and across some application domain. New individuals (samples from the search space) are produced by selecting high performing individuals to produce "offspring" which retain many of the features of their 'parents'. The result is an evolving population that has improved fitness with respect to the given functionality ('goal').

New individuals (offsprings) for the next generation are formed by using two main genetic operators, crossover and mutation. Crossover operates by randomly selecting a point in the two selected parents gene structures and exchanging the remaining segments of the parents to create new offspring. Mutation operates by randomly changing

19

one or more components of a selected individual and it acts as a perturbation operator to allow for inserting new information ('genetic material') into the population. Mutation prevents any stagnation ('premature convergence') that might occur during the search process. The main issues in applying GAs consist of choosing an appropriate base representation and an adequate evaluation function.

Further advances in pattern analysis and classification require the integration of various learning processes in a modular fashion. Adaptive systems that employ several strategies can potentially offer significant advantages over single-strategy systems. Since the type of input and acquired knowledge are more flexible, such hybrid systems can be applied to a wider range of problems. The integration of genetic algorithms and decision tree learning advocated in this chapter is also part of a broader issue being actively explored, namely, that evolution and learning can work synergistically (Hinton and Nowlan, 1987). The ability to learn can be shown to ease the burden on evolution. Evolution (genotype learning) only has to get close to the goal; (phenotype) learning can then fine tune the behavior (Mü hlenbein and Kinderman, 1989). Although Darwinian theory does not allow for the inheritance of acquired characteristics, as Lamarck hinted to, learning (as acquired behaviors) can still influence the course of evolution. The Baldwin effect indeed suggests that local search when employed changes the fitness of strings and thus the course of evolution.

Ï ÐÑ ÒCÓ,Ô/Õ×2Ø:Ù©ÚÖÚ©Ü"ÝÞ&ß,à á â ã â ä/å*æð ç è é ê,ë ì í î2ï

Recently, researchers have applied evolutionary computation techniques to the study of the interactions between learning and evolution. Lamarck believed in direct inheritance of characteristics acquired by individuals during their lifetime. Darwin proposed instead that natural selection coupled with diversity could largely explain evolution. This debate had been continued until, in 1896, J. M. Baldwin and C. Lloyd Morgan independently put forward a theory of “ new factor in evolution” that has subsequently become know as the Baldwin effect (Baldwin 1890; Lloyd Morgan, 1896). Individuals that are able to acquire an evolutionary beneficial trait during their lifetime, through adaptive process such as learning, are better fitted and thus more likely to survive and pass more of their offsprings into the next generation. As a consequence their traits (©genes©) are more likely to preserve and become dominant in future generations.

Baldwin effect works in two steps. First, phenotypic plasticity allows an individual to adapt to a partially successful mutation, which might otherwise be useless to the individual. If the mutation increase inclusive fitness, it will tend to proliferate in the population. However, phenotypic plasticity is typical costly for an individual. Therefore the second mechanism indicates that a behavior once learned may eventually become instinctive. The second step looks the same as Lamarkian evolution, but there is no direct alteration of the genotype, based on the experience of the phenotype. The Baldwin effect came to the attention of computer scientists with the work of Hinton and Nowlan (1987) It also arises in evolutionary computation when a genetic algorithm is used to evolve a population of individuals that also employ a local search algorithm at the same time. Local search is the computational analog of phenotypic plasity in biological evolution. Computationally, this issue arises in the design of systems in which the genome serves as the starting point and context for a local search algorithm (typically hill climbing) designed to find points nearby with even better fitness. When such individuals are found, one can pass back both the new fitness and the new genome in a Lamarckian manner or just the new fitness. In the latter case the effect of discovering and returning a higher fitness results in more offspring potential for the starting point genome.

20