Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Burgess M.Principles of network and system administration.2004.pdf
Скачиваний:
181
Добавлен:
23.08.2013
Размер:
5.65 Mб
Скачать

518

CHAPTER 13. ANALYTICAL SYSTEM ADMINISTRATION

28

21

14

7

 

 

 

 

0

6

12

18

24

0

Figure 13.7: The daily average of maximal CPU percentage shows no visible rhythm, if we remove the initial anomalous point then there is no variation, either in the average or its standard deviation (error bars) which justifies the claim of a periodicity.

13.6 Deterministic and stochastic behavior

In this section we turn to a more abstract view of a computer system: we think of it as a generalized dynamical system, i.e. a mathematical model which develops in time, according to certain rules.

Abstraction is one of the most valuable assets of the human mind: it enables us to build simple models of complex phenomena, eliminating details which are only of peripheral or dubious importance. But abstraction is a double-edged sword: on the one hand, abstracting a problem can show us how that problem is really the same as a lot of other problems which we know more about; conversely, unless done with a certain clarity, it can merely plant a veil over our senses, obscuring rather than assisting the truth. Our aim in this section is to think of computers as abstract dynamical systems, such as those which are routinely analyzed in physics and statistical analysis. Although this will not be to every working system administrator’s taste, it is an important viewpoint in the pursuit of system administration as a scientific discipline.

13.6.1Scales and fluctuations

Complex systems are characterized by behavior at many levels or scales. In order to extract information from a complex system it is necessary to focus on the appropriate scale for that information. In physics, three scales are usually distinguished

13.6. DETERMINISTIC AND STOCHASTIC BEHAVIOR

519

in many-component systems: the microscopic, mesoscopic and macroscopic scales. We can borrow this terminology for convenience.

Microscopic behavior details exact mechanisms at the level of atomic operations.

Mesoscopic behavior looks at small clusters of microscopic processes and examines them in isolation.

Macroscopic processes concern the long-term average behavior of the whole system.

These three scales can also be discerned in operating systems and they must usually be considered separately. At the microscopic level we have individual system calls and other atomic transactions (on the order of microseconds to milliseconds). At the mesoscopic level we have clusters and patterns of system calls and other process behavior, including algorithms and procedures, possibly arising from single processes or groups of processes. Finally, there is the macroscopic level at which one views all the activities of all the users over scales at which they typically work and consume resources (minutes, hours, days, weeks). There is clearly a measure of arbitrariness in drawing these distinctions. The point is that there are typically three scales which can usefully be distinguished in a relatively stable dynamical system.

13.6.2Principle of superposition

In any dynamical system where several microscopic processes can coexist, there are two possible scenarios:

Every process is completely independent of every other. System resources change linearly (additively) in response to new processes.

The addition of each new process affects the behavior of the others in a non-additive (non-linear) fashion.

The first of these is called the principle of superposition. It is a generic property of linear systems (actually this is a defining tautology). In the second case, the system is said to be non-linear because the result of adding lots of processes is not merely the sum of those processes: the processes interact and complicate matters. Owing to the complexity of interactions between subsystems in a network, it is likely that there is at least some degree of non-linearity in the measurements we are looking for. That means that a change in one part of the system will have communicable, knock-on effects on another part of the system, with possible feedback, and so on.

This is one of the things which needs to be examined, since it has a bearing on the shape of the distribution one can expect to find. Empirically one often finds that the probability of a deviation x from the expected behavior is [130]

P ( x) = 2σ

exp

| σ |

1

 

 

x

520

CHAPTER 13. ANALYTICAL SYSTEM ADMINISTRATION

for large jumps. This is much broader than a Gaussian measure for a random sample

P ( x) = (2π )1/2

σ

exp

2σ 2

 

1

 

 

 

x2

 

which one might normally expect of random behavior [34].

13.6.3The idea of convergence

In order to converge to a stable equilibrium one needs to provide counter-measures to change that are switched off when the system has reached its desired state. In order for this to happen, a policy of checking-before-doing is required. This is actually a difficult issue which becomes increasingly difficult with the complexity of the task involved. Fortunately most system configuration issues are solved by simple means (file permissions, missing files etc.) and thus, in practice, it can be a simple matter to test whether the system is in its desired state before modifying it.

In mathematics a random perturbation in time is represented by Gaussian noise, or a function whose expectation value, averaged over a representative time interval, is zero

 

1

T

 

f =

0

dt f (t) = 0.

T

The simplest model of random change is the driven harmonic oscillator.

d2s + γ ds + ω2 = f (t), dt2 dt 0

where s is the state of the system and γ is the rate at which it converges to a steady state. In order to make oscillations converge, they are damped by a frictional or counter force γ (in the present case the immune system is the frictional force which will damp down unwanted changes). In order to have any chance of stopping the oscillations the counter force must be able to change direction in time with the oscillations so that it is always opposing the changes at the same rate as the changes themselves. Formally this is ensured by having the frictional force proportional to the rate of change of the system as in the differential representation above. The solutions to this kind of motion are damped oscillations of the form

s(t) eγ t sin(ωt + φ),

for some frequency ω and damping rate γ . In the theory of harmonic motion, three cases are distinguished: under-damped motion, damped and over-damped motion. In under-damped motion γ ω, there is never sufficient counter force to make the oscillations converge to any degree. In damped motion the oscillations do converge quite quickly γ ω. Finally with over-damped motion γ ω the counter force is so strong as to never allow any change at all.

13.6. DETERMINISTIC AND STOCHASTIC BEHAVIOR

521

 

 

 

 

 

Under-damped

Inefficient: the system can never

 

 

 

quite keep errors in check.

 

 

Damped

System converges in a time scale of

 

 

 

the order of the rate of fluctuation.

 

 

Over-damped

Too draconian: processes killed

 

 

 

frequently while still in use.

 

 

 

 

 

Clearly an over-damped solution to system management is unacceptable. This would mean that the system could not change at all. If one does not want any changes then it is easy to place the machine in a museum and switch it off. Also an under-damped solution will not be able to keep up with the changes to the system made by users or attackers.

The slew rate is the rate at which a device can dissipate changes in order to keep them in check. If immune response ran continuously then the rate at which it completed its tasks would be the approximate slew rate. In the body it takes two or three days to develop an immune response, approximately the length of time it takes to become infected, so that minor episodes last about a week. In a computer system there are many mechanisms which work at different time scales and need to be treated with greater or lesser haste. What is of central importance here is the underlying assumption that an immune response will be timely. The time scales for perturbation and response must match. Convergence is not a useful concept in itself, unless it is a dynamical one. Systems must be allowed to change, but they must not be allowed to become damaged. Presently there are few objective criteria for making this judgment so it falls to humans to define such criteria, often arbitrarily.

In addition to random changes, there is also the possibility of systematic error. Systematic change would lead to a constant unidirectional drift (clock drift, disk space usage etc). These changes must be cropped sufficiently frequently (producing a sawtooth pattern) to prevent serious problems from occurring. A serious problem would be defined as a problem which prevented the system from functioning effectively. In the case of disk usage, there is a clear limit beyond which the system cannot add more files, thus corrective systems need to be invoked more frequently when this limit is approached, but also in advance of this limit with less frequency to slow the drift to a minimum. In the case of clock drift, the effects are more subtle.

13.6.4Parameterizing a dynamical system

If we wish to describe the behavior of a computer system from an analytical viewpoint, we need to be able to write down a number of variables which capture its behavior. Ideally, this characterization would be numerical since quantitative descriptions are more reliable than qualitative ones, though this might not always be feasible. In order to properly characterize a system, we need a theoretical understanding of the system or subsystem which we intend to describe. Dynamical

522

CHAPTER 13. ANALYTICAL SYSTEM ADMINISTRATION

systems fall into two categories, depending on how we choose our problem to analyze. These are called open systems and closed systems.

Open system: This is a subsystem of some greater whole. An open system can be thought of as a black box which takes in input and generates output, i.e. it communicates with its environment. The names source and sink are traditionally used for the input and output routes. What happens in the black box depends on the state of the environment around it. The system is open because input changes the state of the system’s internal variables and output changes the state of the environment. Every piece of computer software is an open system. Even an isolated total computer system is an open system as long as any user is using it. If we wish to describe what happens inside the black box, then the source and the sink must be modeled by two variables which represent the essential behavior of the environment. Since one cannot normally predict the exact behavior of what goes on outside of a black box (it might itself depend on many complicated variables), any study of an open system tends to be incomplete. The source and sink are essentially unknown quantities. Normally one would choose to analyze such a system by choosing some special input and consider a number of special cases. An open system is internally deterministic, meaning that it follows strict rules and algorithms, but its behavior is not necessarily determined, since the environment is an unknown.

Closed system: This is a system which is complete, in the sense of being isolated from its environment. A closed system receives no input and normally produces no output. Computer systems can only be approximately closed for short periods of time. The essential point is that a closed system is neither affected by, nor affects its environment. In thermodynamics, a closed system always tends to a steady state. Over short periods, under controlled conditions, this might be a useful concept in analyzing computer subsystems, but only as an idealization. In order to speak of a closed system, we have

to know the behavior of all the variables which characterize the system. A closed system is said to be completely determined.1

An important difference between an open system and a closed system is that an open system is not always in a steady state. New input changes the system. The internal variables in the open system are altered by external perturbations from the source, and the sum state of all the internal variables (which can be called the system’s macrostate) reflect the history of changes which have occurred from outside. For example, suppose we are analyzing a word processor. This is clearly an open system: it receives input and its output is simply a window on its data to the user. The buffer containing the text reflects the history of all that was inputted by the user and the output causes the user to think and change the input again. If we were to characterize the behavior of a word processor, we would describe it by its internal variables: the text buffer, any special control modes or switches etc.

1This does not mean that it is exactly calculable. Non-linear, chaotic systems are deterministic but inevitably inexact over any length of time.

13.6. DETERMINISTIC AND STOCHASTIC BEHAVIOR

523

Normally we are interested in components of the operating system which have more to do with the overall functioning of the machine, but the principle is the same. The difficulty with such a characterization is that there is no unique way of keeping track of a system’s history over time, quantitatively. That is not to say that no such measures exist. Let us consider one simple cumulative quantifier of the system’s history, which was introduced by Burgess in ref. [42], namely its entropy or disorder. Entropy has certain qualitative, intuitive features which are easily understood. Disorder in a system measures the extent to which it is occupied by files and processes which prevent useful work. If there is a high level of disorder, then – depending on the context – one might either feel satisfied that the system is being used to the full, or one might be worried that its capacity is nearing saturation.

There are many definitions of entropy in statistical studies. Let us choose Shannon’s traditional informational entropy as an example [277]. In order for the informational entropy to work usefully as a measure, we need to be selective in the type of data which are collected.

In ref. [42], the concept of an informational entropy was used to gauge the stability of a system over time. In any feedback system there is the possibility of instability: either wild oscillation or exponential growth. Stability can only be achieved if the state of the system is checked often enough to adequately detect the resolution of the changes taking place. If the checking rate is too slow, or the response to a given problem is not strong enough to contain it, then control is lost.

In order to define an entropy we must change from dealing with a continuous measurement, to a classification of ranges. Instead of measuring a value exactly, we count the amount of time a value lies within a certain range and say that all of those values represent a single state. Entropy is closely associated with the amount of granularity or roughness in our perception of information, since it depends on how we group the values into classes or states. Indeed all statistical quantifiers are related to some procedure for coarse-graining information, or eliminating detail. In order to define an entropy one needs, essentially, to distinguish between signal and noise. This is done by blurring the criteria for the system to be in a certain state. As Shannon put it, we introduce redundancy into the states so that a range of input values (rather than a unique value) triggers a particular state. If we consider every single jitter of the system to be an important quantity, to be distinguished by a separate state, then nothing is defined as noise and chaos must be embraced as the natural law. However, if one decides that certain changes in the system are too insignificant to distinguish between, such that they can be lumped together and categorized as a single state, then one immediately has a distinction between useful signal and error margins for useless noise. In physics this distinction is thought of in terms of order and disorder.

Let us represent a single quantifier of system resources as a function of time f (t). This function could be the amount of CPU usage, or the changing capacity of system disks, or some other variable. We wish to analyze the behavior of system resources by computing the amount of entropy in the signal f (t). This can be done by coarse-graining the range of f (t) into N cells:

Fi < f (t) < F+i ,

524

CHAPTER 13. ANALYTICAL SYSTEM ADMINISTRATION

where i = 1, . . . , N ,

F+i = Fi+1

and the constants F±i are the boundaries of the ranges. The probability that the signal lies in cell i, during the time interval from zero to T is the fraction of time the function spends in each cell i:

 

1

T

 

 

 

 

 

 

 

 

 

 

 

pi (T ) =

0

dt θ f (t)

Fi θ f (t) F+i

 

T

where θ (t) is the step function, defined by

 

 

 

 

 

 

 

 

 

 

 

 

1

 

 

 

 

0

 

 

 

 

t )

 

 

1

t

 

t

 

>

 

 

 

θ (t

=

2

t

=

t

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

0

t t

< 0.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Now, let the statistical degradation of the system be given entropy [277]

,

by the Shannon

N

E(T ) = − pi (T ) log pi (T ),

i=1

where pi is the probability of seeing event i on average. i runs over an alphabet of all possible events from 1 to N , which is the number of independent cells in which we have chosen to coarse-grain the range of the function f (t). The entropy, as defined, is always a positive quantity, since pi is a number between 0 and 1.

Entropy is lowest if the signal spends most of its time in the same cell F±i . This means that the system is in a relatively quiescent state and it is therefore easy to predict the probability that it will remain in that state, based on past behavior. Other conclusions can be drawn from the entropy of a given quantifier. For example, if the quantifier is disk usage, then a state of low entropy or stable disk usage implies little usage which in turn implies low power consumption. This might also be useful knowledge for a network; it is easy to forget that computer systems are reliant on physical constraints. If entropy is high it means that the system is being used very fully: files are appearing and disappearing rapidly: this makes it difficult to predict what will happen in the future and the high activity means that the system is consuming a lot of power. The entropy and entropy gradient of sample disk behavior is plotted in figure 13.8.

Another way of thinking about the entropy is that it measures the amount of noise or random activity on the system. If all possibilities occur equally on average, then the entropy is maximal, i.e. there is no pattern to the data. In that case all of the pi are equal to 1/N and the maximum entropy is (log N ). If every message is of the same type then the entropy is minimal. Then all the pi are zero except for one, where px = 1. Then the entropy is zero. This tells us that, if f (t) lies predominantly in one cell, then the entropy will lie in the lower end of the range 0 < E < log N . When the distribution of messages is random, it will be in the higher part of the range.

Entropy can be a useful quantity to plot, in order to gauge the cumulative behavior of a system, within a fixed number of states. It is one of many possibilities

13.6. DETERMINISTIC AND STOCHASTIC BEHAVIOR

525

0

1000

2000

3000

4000

5000

Figure 13.8: Disk usage as a function of time over the course of a week, beginning with Saturday. The lower solid line shows actual disk usage. The middle line shows the calculated entropy of the activity and the top line shows the entropy gradient. Since only relative magnitudes are of interest, the vertical scale has been suppressed. The relatively large spike at the start of the upper line is due mainly to initial transient effects. These even out as the number of measurements increases. From ref. [42].

for explaining the behavior of an open system over time, experimentally. Like all cumulative, approximate quantifiers it has a limited value however, so it needs to be backed up by a description of system behavior.

13.6.5Stochastic (random) variables

A stochastic or random variable is a variable whose value depends on the outcome of some underlying random process. The range of values of the variable is not at issue, but which particular value the variable has at a given moment is random. We say that a stochastic variable X will have a certain value x with a probability P (x). Examples are:

Choices made by large numbers of users.

Measurements collected over long periods of time.

Cause and effect are not clearly related.

Certain measurements can often appear random, because we do not know all of the underlying mechanisms. We say that there are hidden variables. If we sample data from independent sources for long enough, they will fall into a stable type of distribution, by virtue of the central limit theorem (see for instance ref. [136]).

526

CHAPTER 13. ANALYTICAL SYSTEM ADMINISTRATION

13.6.6Probability distributions and measurement

Whenever we repeat a measurement and obtain different results, a distribution of different answers is formed. The spread of results needs to be interpreted. There are two possible explanations for a range of values:

The quantity being measured does not have a fixed value.

The measurement procedure is imperfect and a incurs a range of values due to error or uncertainty.

Often both of these are the case. In order to give any meaning to a measurement, we have to repeat the measurement a number of times and show that we obtain approximately the same answer each time. In any complex system, in which there are many things going on which are beyond our control (read: just about anywhere in the real world), we will never obtain exactly the same answer twice. Instead we will get a variety of different answers which we can plot as a graph: on the x-axis, we plot the actual measured value and on the y-axis we plot the number of times we obtained that measurement divided by a normalizing factor, such as the total number of measurements. By drawing a curve through the points, we obtain an idealized picture which shows the probability of measuring the different values. The normalization factor is usually chosen so that the area under the curve is unity.

There are two extremes of distribution: complete certainty (figure 13.9) and complete uncertainty (figure 13.10). If a measurement always gives precisely the

1

Probability of measurement

0

Measured value

Figure 13.9: The delta distribution represents complete certainty. The distribution has a value of 1 at the measured value.

same answer, then we say that there is no error. This is never the case with real measurements. Then the curve is just a sharp spike at the particular measured value. If we obtain a different answer each time we measure a quantity, then there is a spread of results. Normally that spread of results will be concentrated around some more or less stable value (figure 13.11). This indicates that the probability of measuring that value is biased, or tends to lead to a particular range of values. The smaller the range of values, the closer we approach figure 13.9. But the converse might also happen: in a completely random system, there might be no fixed value

13.6. DETERMINISTIC AND STOCHASTIC BEHAVIOR

527

1

 

 

 

 

Probabilityofmeasurement

 

 

 

 

 

 

 

0

Measured value

Figure 13.10: The flat distribution is a horizontal line indicating that all measured values, within the shown interval, occur with equal probability.

1

Probabilityofmeasurement

0

Measured value

Figure 13.11: Most distributions peak at some value, indicating that there is an expected value (expectation value) which is more probable than all the others.

of the quantity we are measuring. In that case, the measured value is completely uncertain, as in figure 13.10. To summarize, a flat distribution is unbiased, or completely random. A non-flat distribution is biased, or has an expectation value, or probable outcome. In the limit of complete certainty, the distribution becomes a spike, called the delta distribution.

We are interested in determining the shape of the distribution of values on repeated measurement for the following reason. If the variation of the values is symmetrical about some preferred value, i.e. if the distribution peaks close to its mean value, then we can likely infer that the value of the peak or of the mean is the true value of the measurement and that the variation we measured was due to random external influences. If, on the other hand, we find that the distribution is very asymmetrical, some other explanation is required and we are most likely observing some actual physical phenomenon which requires explanation.