12-10-2013_09-24-50 / [Simon_Kendal,_Malcolm_Creen]_Introduction_to_Know(BookFi.org)
.pdf
Uncertain Reasoning |
253 |
Activity 4
Determine why do these two numbers (70% and 50%) do not add up to 100%.
Feedback 4
You should have been able to recognise that the two numbers do not add up to 100% because we are sampling two different populations, i.e., 100 who have the flu and 100 who do not have the flu.
Similarly, we look at the population of those people with the flu, and 60% of those have a runny nose. In a sample of 100 people who do not have the flu and we find that only 20% of those have a runny nose. We do the same to determine the probability of the symptom ‘headache’ being true, given that the hypothesis flu is true, and the probability of symptom ‘headache’ being true given that the hypothesis, flu, is not true. Notice that this data is not the same as shown in the previous table and that numbers do not necessarily add up to 100%.
Collecting this data is fairly simple. We merely sample 100 patients who have been diagnosed with flu, and we take 100 patients who have been diagnosed as not having the flu, and determine the probabilities of the symptoms for each of these populations of patients. We can now repeatedly use Bayes equations to calculate the probability of flu, given a range of symptoms.
Another way of writing the Bayes theorem is using the two equations below.
P(E:H) P(H)
P(H:E) =
P(E:H) P(H) + P(E:not H) P(not H)
(1 − P(E:H)) P(H)
P(H:not E) =
(1 − P(E:H)) P(H) + (1 − P(E:not H)) P(not H)
The first of these equations is used to calculate the probability of a hypothesis given an event or symptom is true, the second is used if an event or symptom is not true.
Let’s imagine that a patient has a high temperature, and the prior probability of the flu is 0.3. We can use the equations to calculate the probability that this patient has the flu, now that we have discovered that they have a high temperature. As the symptom is true, i.e., the event is true, we use the first equation, and calculate the probability of the hypothesis. The first equation says we calculate the probability of the event given the hypothesis, and multiply that by the prior probability of the hypothesis.
We divide all of this by the part of the equation beneath the line, which is the prior probability of the event, given the hypothesis multiplied by the prior probability of
254 |
An Introduction to Knowledge Engineering |
the hypothesis, plus the probability of the event given not the hypothesis multiplied by the prior probability of not the hypothesis.
The prior probability of not the hypothesis, in other words P(not H) is clearly just one minus P(H), the prior probability of the hypothesis.
Using the equations and the data taken from the table above we can now calculate the probability of flu for a range of symptoms.
Assuming ‘High Temperature’ is true and ‘Runny nose’ is false we firstly calculate the probability of flu given that high temperature is true using the first equation. Initially, we ignore the fact that runny nose is false—we take account of this later by using the second equation.
Thus we use the first equation
P(E:H) P(H)
P(H:E) =
P(E:H) P(H) + P(E:not H) P(not H)
to calculate P(flu:high temperature).
Given that the prior probability of having flu is 0.3, then the equation can be completed as follows.
0.7 × 0.3
P(H:E) =
0.7 × 0.3 + 0.5 × 0.7
Therefore,
P(flu:high temperature) = 0.375
Now that the probability of having flu with a high temperature is set at 0.375, then the probability of having flu with two symptoms can be derived. As the second symptom is false we use the second equation
(1 − P(E:H)) P(H)
P(H:not E) =
(1 − P(E:H)) P(H) + (1 − P(E:not H)) P(not H)
to calculate P(flu:high temperature, not runny nose).
In this case P(H) is not 0.3, it is the calculated value of 0.375 as we are taking the fact that the patient has a high temperature into consideration.
Applying the probability factors from the different events produces the following.
(1 − 0.6) × 0.375
P(H:not E) =
(1 − 0.6) × 0.375 + (1 − 0.2) × (1 − 0.375)
Uncertain Reasoning |
255 |
This means that the probability of the hypothesis being correct with the patient having a temperature but not a runny nose can be stated as:
P(flu:high temperature, not runny nose) = 0.23
We can repeat the process above many times in order to calculate the probability of flu given any specific combination of symptoms a patient may have.
Clearly, at the same time as calculating the probability of flu we would, in parallel, also calculate the probability that the patient has a common cold and all other potential hypotheses. By determining which symptom has most effect on all of these calculations we will know which symptom is most important and thus which question the doctor should ask next. Furthermore, before asking any more questions we can determine if the probabilities could change significantly. If so then we have not yet reached a firm conclusion. However, if the probabilities will not change significantly irrespective of the answers a patient may give then we can be satisfied that we have reached a definitive diagnosis and no longer need to ask anymore questions.
Bayesian Networks
A Bayesian network (also known as Bayes net, causal probabilistic network, Bayesian belief network, or simply belief network) is a compact model representation for reasoning under uncertainty.
A problem domain—diagnosis of mechanical failures, for instance—consists of a number of entities or events. These entities or events are, in a Bayesian network, represented as random variables. One random variable can, for instance, represent the event that a piece of mechanical hardware in a production facility has failed. The random variables representing different events are connected by directed edges to describe relations between events. An edge between two random variables X and Y represents a possible dependence relation between the events or entities represented by X and Y. An edge could, for instance, describe a dependence relation between disease and a symptom—diseases cause symptoms. Thus, edges can be used to represent cause–effect relations. The dependence relations between entities of the problem domain are organised as a graphical structure. This graphical structure describes the possible dependence relations between the entities of the problem domain, e.g. a Bayesian network model for diagnosing lung cancer, tuberculosis, and bronchitis would describe the cause–effect relations between the possible causes of these diseases.
The uncertainty of the problem domain is represented through conditional probabilities.
Conditional probability distributions specify our belief about the strengths of the cause–effect relations, e.g. lung cancer does not always produce a positive (bad)
256 |
An Introduction to Knowledge Engineering |
chest X-ray, or a mechanical failure does not always cause an alarm to sound. Thus, a Bayesian network consists of a qualitative part, which describes the dependence relations of the problem domain, and a quantitative part, which describes our belief about the strengths of the relations.
Bayesian networks have been applied for reasoning and decision making under uncertainty in a large number of different settings. The next activity requires you to identify some applications.
Activity 5
This activity will help you appreciate the usefulness of the Bayesian approach to dealing with uncertainty in KBSs.
Search the Internet for examples of the use of Bayesian networks.
Feedback 5
You might have discovered examples of the application of Bayesian Networks to any of the following:
Medicine—diagnosis of muscle and nerve diseases, antibiotic treatment, diabetes advisory system, triage (AskRed.com).
Software—software debugging, printer troubleshooting, safety and risk evaluation of complex systems, help facilities in Microsoft Office products.
Information processing—information filtering, display of information for time-critical decisions, fault analysis in aircraft control.
Industry—diagnosis and repair of on-board unmanned underwater vehicles, control of centrifugal pumps, process control in wastewater purification.
Economy—credit application evaluation, portfolio risk and return analysis.
Military—NATO Airborne Early Warning & Control Program, situation assessment.
Agriculture—blood typing and parentage verification of cattle, replacement of milk cattle, mildew management in winter wheat.
The Strengths of Probabilistic Reasoning
Bayes theorem is mathematically sound so it provides a good basis for the investigation of uncertainty.
The results of using this method have strong justification, adding value and credibility to the output from expert systems.
Probabilistic reasoning, when compared with confidence factors (only expressions of opinion), has higher validity because the results are based on mathematically proven reasoning and statistical data.
Uncertain Reasoning |
257 |
The Limitations of Probabilistic Reasoning
Needs statistical data to be collected from previous results, and will only work where this data is available. Furthermore, this data may not be accurate invalidating the results of the hypothesis being tested.
Often one might have to rely on human estimates of one or more of these probability factors. But if you have to do that, it might be better to let experienced experts estimate the relevant probabilities from the start. This is the point of view of the advocates of confidence factors.
Summary
This section introduced the principle of uncertainty, and showed how Bayes theorem could be used to determine the extent of uncertainty, firstly in a written example, and then using formulae.
Self-Assessment Question
The following information is available concerning why a motor vehicle will not start. The hypothesis is that the battery is flat (i.e., not working) and so the engine will not start.
|
Probability of |
Probability of |
Probability of |
|
engine turning |
noisy alternator |
lights working |
When battery flat is true |
0.1 |
0.5 |
0.3 |
When battery flat is not true |
0.7 |
0.4 |
0.6 |
|
|
|
|
Assume that the prior probability of a flat battery is 0.7. Complete the Bayesian equation assuming that car lights are working but that the engine is not turning.
Answer to Self-Assessment Question
Calculate probability of having a flat battery given that the car lights are working.
P(E:H) P(H)
P(H:E) =
P(E:H) P(H) + P(E:not H) P(not H)
(0.3 × 0.7)
P(H:E) =
(0.3 × 0.7) + (0.6 × (1 − 0.7))
0.21
P(H:E) =
0.21 + 0.18
P(flat battery:lights working) = 0.54
258 |
An Introduction to Knowledge Engineering |
Thus given the fact that the car lights are working the probability of the problem being a flat battery has fallen from 0.7 to 0.54. The next step is to calculate the probability of the battery being flat given that the car lights work but the engine does not turn.
(1 − P(E:H)) P(H)
P(H:not E) =
(1 − P(E:H)) P(H) + (1 − P(E:not H)) P(not H)
(1 − 0.1) × 0.54
P(H:not E) =
(1 − 0.1) × 0.54 + (1 − 0.7) × (1 − 0.54)
0.486
P(H:not E) =
0.486 + 0.138
P(flat battery:lights working and engine not turning) = 0.78
So the probability of having a flat battery when engine isn’t turning but when the lights are working is 0.78.
Uncertain Reasoning |
259 |
SECTION 4: FUZZY LOGIC
Introduction
This section provides an introduction to fuzzy logic and its use within KBSs as an approach to storing knowledge where uncertainty is a factor.
Objectives
By the end of the section you will be able to:
evaluate the usefulness of fuzzy logic in dealing with uncertainty.
Fuzzy Logic
Fuzzy logic is a method of dealing with uncertainty in expert systems. The technique uses the same principles as the mathematical theory of fuzzy sets (Jamshidi, 1997). It attempts to simulate the process of human reasoning by allowing the computer to behave in a method that appears to be less precise or logical than activities normally ascribed to a computer.
The reasoning behind fuzzy logic is that many decisions are not true or false, black or white, etc. Decisions actually involve uncertainty and terms such as ‘maybe’, indicating that actions may or may not occur. The decision-making process may not, therefore be particularly structured, but involve many partial decisions taken without complete information.
Many people confuse uncertain reasoning with fuzzy reasoning. Probabilistic reasoning as in Bayes theorem is concerned with the uncertain reasoning about well-defined events such as symptoms or illnesses. On the other hand, fuzzy logic is concerned with the reasoning about ‘fuzzy’ events or concepts.
Fuzzy Logic Statements
Fuzzy logic allows a degree of impreciseness to be used for both inputs to, and outputs from a KBS. For example, the following statements are valid in fuzzy logic terms, but not in probability theory.
Input terms allowed:
The temperature is ‘high’
The vibration is ‘low’
The load is ‘medium’.
260 |
An Introduction to Knowledge Engineering |
Outputs can be in terms of:
The bearing damage is ‘moderate’
The unbalance is ‘very high’.
Activity 6
This activity draws on examples of your own thinking patterns to help you understand fuzzy reasoning.
Consider the temperature of the room you are in at the moment, without looking at a thermometer, how would you characterise the temperature?
Consider someone you know quite well, how would you characterise their height, given that you have never measured it?
Feedback 6
The chances are you might use terms such as cool, warm or freezing to describe the room, or short or tall to describe your friend’s height.
When is a person tall, at 170 cm, 180 cm or 190 cm? If we define the threshold of tallness at 180 cm, then the implication is that a person of 179.9 cm is not tall. When humans reason with terms such as ‘tall’ they do not normally have a fixed threshold in mind, but a smooth fuzzy definition. Humans can reason very effectively with such fuzzy definitions and in order to capture human fuzzy reasoning we need fuzzy logic.
An example of a fuzzy rule that involves a fuzzy condition and a fuzzy conclusion is:
IF holiday is long THEN spending money is high
Fuzzy reasoning involves three steps:
1.Fuzzification of the terms in the conditions of rules (i.e., inputs).
2.Inference from fuzzy rules.
3.Defuzzification of the fuzzy terms in the conclusions of rules (i.e., outputs).
Fuzzification
Using the technique of fuzzification, the concept ‘long’ is related to the underlying objective term that it is attempting to describe; i.e., the actual time in weeks. As an example, the term ‘long’ can be represented in this graph (see Figure 7.1).
The graph shows the degree of membership with which a holiday belongs to the category (set) ‘long’. Full membership of the class ‘long’ is represented by a value of 1, while no membership is represented by a value of 0. At 2 weeks and below
Uncertain Reasoning |
261 |
Long
1
Membership value
0.5
2 |
3 |
4 |
Length of holiday (in weeks)
FIGURE 7.1. Fuzzy concept ‘long’ related to length of holiday in weeks.
a holiday does not belong to the class ‘long’. At 4 weeks and above a holiday fully belongs to the class ‘long’. Between 2 weeks and 4 weeks the membership increases linearly between 0 and 1. The degree of belonging to the set ‘long’ is called the confidence factor or the membership value. The shape of the membership function curve can be non-linear.
The purpose of the fuzzification process is to allow a fuzzy condition in a rule to be interpreted. For example, the condition ‘holiday = long’ in a rule can be true for all values of ‘length of holiday’, however, the confidence factor or membership value (MV) of this condition can be derived from the above graph. A 3-week-long holiday is ‘long’ with a confidence factor of 0.5. It is the gradual change of the MV of the condition ‘long’ with the length of holiday that gives fuzzy logic its strength.
Normally, fuzzy concepts have a number of values to describe the various ranges of values of the objective term that they describe. For example, the fuzzy concept ‘hotness’ may have the values ‘very hot’, ‘hot’ and ‘warm’. Membership functions of these values can be shown in Figure 7.2.
Warm |
|
Hot |
Very hot |
1 |
|
|
|
Membership |
|
|
|
value |
|
|
|
0.5 |
|
|
|
20° |
30° |
40° |
50° |
|
Temperature |
|
|
FIGURE 7.2. Membership functions for ‘warm’, ‘hot’ and ‘very hot’. |
|||
262 |
An Introduction to Knowledge Engineering |
Fuzzy Inference
Inference from a set of fuzzy rules involves fuzzification of the conditions of the rules, then propagating the confidence factors (membership values) of the conditions to the conclusions (outcomes) of the rules.
Consider the following rule:
IF (location is expensive) AND (holiday is long) THEN spending money is high
Inference from this rule involves (using fuzzification) looking up the MV of the condition ‘location is expensive’ given the price of food, etc. and the MV of ‘holiday is long’ given the length of the holiday. If we, as suggested by Zadeh, take the minimum MV of all the conditions and assign it to the outcome ‘spending money is high’ then from our example, if ‘location is expensive’ had a MV of 0.9 and ‘holiday is long’ had a MV of 0.7 we would conclude that the ‘spending money is high’ with a MV of 0.7.
An enhancement of this method involves having a weight for each rule between 0 and 1 that multiplies the MV assigned to the outcome of the rule. By default each rule weight is set to 1.0.
In a fuzzy rule base a number of rules with the outcome ‘spending money is high’ will be fired. The inference engine will assign the outcome ‘spending money is high’, the maximum MV from all the fired rules. Thus from the rule above we deduced that the spending money is high with a MV of 0.7. However, given the rule below we may deduce that the spending money is high with a MV of 0.9.
IF (holiday is exotic) THEN the spending money is high
Thus, taking the conclusions of these two rules together we would deduce that the spending money is high with an MV of 0.9.
In summary, fuzzy inference involves:
Fuzzification of the conditions of each rule and assigning the outcome of each rule the minimum MV of its conditions multiplied by the rule weight.
Assigning each outcome the maximum MV from its fired rules.
Fuzzy inference will result in confidence factors (MVs) assigned to each outcome in the rule base.
Defuzzification
If the conclusion of the fuzzy rule set involves fuzzy concepts, then these concepts must be translated back into objective terms before they can be used in practice.
