Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Burgess M.Principles of network and system administration.2004.pdf
Скачиваний:
181
Добавлен:
23.08.2013
Размер:
5.65 Mб
Скачать

Chapter 13

Analytical system administration

System administration has always involved a high degree of experimentation. Inadequate documentation, combined with a steep learning curve, had made that a necessity. As the curve continues to steepen and the scope of the problem only increases, the belief has gradually deepened that system administration is not merely a mechanic’s job, but a scientific discipline.

A research community has grown up, led by a mixture of academics and working administrators, encouraged by organizations such as USENIX and SAGE, mainly in the US though increasingly in Europe and Australasia. The work has often been dominated by the development of software tools, since tools for the trade have been most desperately required. Now that many good tools exist, at least for Unix-based networks, the focus is changing towards more careful analyses of system administration [41, 42, 108, 44], with case studies and simple experiments.

This chapter provides a brief introduction to a larger field of theoretical system administration [52].

13.1 Science vs technology

Most of the research which is presently undertaken in system administration is of an applied nature. In most cases, it involves the construction of a tool which solves a specific local problem, a one-off solution to a general problem, i.e. a demonstration of possibility. A minority of authors has attempted to collate the lessons learned from these pursuits and distill their essence into a general technology of more permanent value. This is partly the nature of technological research. Science, on the other hand, deals in abstraction. The aim of science is to regard the full horror of reality and condense it into a few themes which capture its essence, without undue complication. We say that scientific knowledge has increased if we are able to perform this extraction of the foundations in some study and if that knowledge empowers with some increased understanding of the problem.

500

CHAPTER 13. ANALYTICAL SYSTEM ADMINISTRATION

In science, knowledge advances by undertaking a series of studies, in order to either verify or falsify a hypothesis. Sometimes these studies are theoretical, sometimes they are empirical and frequently they are a mixture of the two. The aim of a study is to contribute to a larger discussion, which will eventually lead to progress in the field. A single piece of work is rarely, if ever, an end in itself. Once a piece of work is published, it needs to be verified or shown to be false by others also. Reproducibility is an important criterion for any result, otherwise it is worthless.

How we measure progress in a field is often a contentious issue, but it can involve several themes. In order to test an idea it is often necessary to develop a suitable ‘technology’ for the investigation. That technology might be mathematical, computational or mechanical. It does not relate directly to the study itself, but it makes it possible for the study to take place. In system administration, software tools form this technology. For example, the author’s management system cfengine [41] is a tool which was created in order to implement and refine a conceptual scheme, namely the immunity model of system maintenance [44]. There is a distinction between the tool which makes the idea possible, and the idea itself.

Having produced the tool, it is still necessary to test whether or not the original idea was a good one, better or worse than other ideas, or simply unworkable in practice. Scientific progress is made, with the assistance of the tool only if the results of previous work can be improved upon, or if an increased understanding of the problem can be achieved, perhaps leading to greater predictive power or a more efficient solution to the original problem.

All problems are pieces of a larger puzzle. A complete scientific study begins with a motivation, followed by an appraisal of the problems, the construction of a theoretical model for understanding or solving the problems, and finally an evaluation or verification of the approach used and the results obtained. Recently much discussion has been directed towards finding suitable methods for evaluating technological innovations in computer science as well as encouraging researchers to use them. Nowadays many computing systems are of comparable complexity to phenomena found in the natural world and our understanding of them is not always complete, in spite of the fact that they were designed to fulfill a specific task. In short, technology might not be completely predictable, hence there is a need for experimental verification.

13.2 Studying complex systems

There are many issues to be studied in system administration. Some issues are of a technical nature, while others are of a human nature. System administration confronts the human–machine interaction as few other branches of computer science do. Here are some examples:

Reliability studies (e.g. failure rate of hardware/software, evaluation of policies and strategies)

Determining and evaluating methods for ensuring system integrity (e.g. automation, cooperation between humans, formalization of policy etc.)

13.2. STUDYING COMPLEX SYSTEMS

501

Observations which reveal aspects of system behavior that are difficult to predict (e.g. strange phenomena, periodic cycles)

Issues of strategy and planning.

Science proceeds as a dialogue between theory and experiment. We need theory to interpret results of observations and we need observations to back up theory. Any conclusions must be a consistent mixture of the two.

To date, very little theory has been applied to the problems of system administration. Most studies have been empirical, or anecdotal. Very few of the studies made, in the references of this book, attempt to quantify their findings. In a subject which is complex, like system administration, it is easy to fall back on qualitative claims. This is dangerous, however, since one is more easily fooled by qualitative descriptions than by hard numbers. At the same time, one must not believe that it is sensible to demand hard-nosed falsification of claims (a` la Karl Popper) in such a complex environment. Any numbers which we can measure must be considered valuable, provided they actually have a sensible interpretation.

Computers are complex systems. Complexity in a system means that there is a large number of variables to be considered, probably too many to deal with in detail. Many issues are hidden directly from view and have to be discovered with some ingenuity.

A liberal attitude is usually the most constructive in making the best of a difficult lot. Any study will be worthwhile if it has something to tell us, however little. However, it is preferable if studies are authoritative, i.e. if they are able to tell us something of deeper value than mere here-say. Still, we have to judge studies for what they are worth, and no more. Authors should try to avoid marketing language which is prevalent in the commercial world, and also pointless tool-building without regard for any well thought-out model. The following questions are useful:

What am I trying to study?

Has it been done before? Can it be improved?

What are the criteria for improvement?

Can I formulate my study as a hypothesis which can be verified or falsified to some degree?

If not, how can I clearly state the aims of my work? What are the available methods for gauging success/failure?

How general is my study? What is the scope of its validity?

How can my study be generalized?

How can I ensure objectivity?

Then afterwards check:

• Is my result unambiguously true or merely a matter of interpretation?

502

CHAPTER 13. ANALYTICAL SYSTEM ADMINISTRATION

Are there alternative viewpoints which lead to the same conclusion?

Is the result worth reporting to others?

Case studies are often used in fields of research where metrics are few and far between. Case studies, or anecdotal evidence, are a poor-man’s approach to the truth, but in system administration we suffer from a general poverty of available avenues for investigation. Case studies, made as objectively as possible, are often the best one can do.

13.3 The purpose of observation

In technology the act of observation has two objective goals: i) to gather information about a problem in order to motivate the design and construction of a technology which solves it, and ii) to determine whether or not the resulting technology fulfills its design goals. If the latter is not fulfilled in a technological context, the system may be described as faulty, whereas in natural science there is no right or wrong. In between these two empirical book marks lies a theoretical model which hopefully connects the two.

The problem with technological disciplines is that what constitutes an evaluation of success or failure is often far from clear. This is because both goals and assisting technologies can be dominated by invested interests and dogged by the difficulty of constructing objective experiments with clear metrics. System administration is an example where these problems are particularly acute.

System administration is a mixture of technology and sociology. The users of computer systems are constantly changing the conditions for observations. If the conditions under which observations are made are not constant, then the data lose their meaning: the message we are trying to extract from the data is supplemented by several other messages which are difficult to separate from one another. Let us call the message we are trying to extract signal and the other messages which we are not interested in noise. Complex systems are often characterized by very noisy environments.

In most disciplines one would attempt to reduce or eliminate the noise in order to isolate the signal. However, in system administration, it would be no good to eliminate the users from an experiment, since it is they who cause most of the problems that one is trying to solve. In principle this kind of noise in data could be eliminated by statistical sampling over very long periods of time, but in the case of real computer systems this might not be possible since seasonal variations in patterns of use often lead to several qualitatively different types of behavior which should not be mixed. The collection of reliable data might therefore take many years, even if one can agree on what constitutes a reasonable experiment. This is often impractical, given the pace of technological change in the field.

13.4 Evaluation methods and problems

The simplest and potentially most objective way to test a model of system administration is to combine heuristic experience with repeatable simulations.

13.4. EVALUATION METHODS AND PROBLEMS

503

Experienced system administrators have the pulse of their system and can evaluate their performance in a way that only humans can. Their knowledge can be used to define repeatable benchmarks or criteria for different aspects of the problem. But even this approach is not without its difficulties. Many of the administrators’ impressions would be very difficult to gauge numerically. For example: a common theme is research which is designed to relieve administrators of tedious work, leaving them to work on more important tasks. Can such a claim be verified? Here are some of the difficulties.

Measure the time spent working on the system

Record the actions taken by the automatic system, which a human administrator would have been required to do by hand and compare.

The administrator has so much to do that he/she can work full

time no matter how much one automates ‘tedious tasks’.

There is no unique way to solve a problem. Some administrators fix problems by hand, while others will write a script for each new problem. The time/approach taken depends on the person.

In this case the issue was too broad to be able to quantify. Choosing the appropriate question to ask is often the most difficult aspect of an experimental study. If we restrict the scope of the question to a very specific point, we can end up with an artificial study; if the question is too broad in its scope, we risk not being able to test it convincingly.

To further clarify this point it is useful to refer to an analogy. Imagine two researchers who create vehicles for the future, one based on renewable solar power and another based on coal. The two vehicles have identical functionality; the solar powered vehicle seems cleaner than the coal powered one, but in fact the level of pollution required to make the solar cells equals the harmful output of the coal vehicle throughout its lifetime. The laws of thermodynamics tell us that there is potential for improving this situation for the electric car but probably not for the coal powered one. The solar vehicle is lighter and more efficient, but it cannot do anything that the coal powered car cannot. All in all, one suspects that the solar powered system is a better solution, since one does not have to refuel it frequently and it is based on a technology which is universally useful, whereas the coal system is quite restricted. So what are the numbers which we should measure to distinguish one from the other, to verify the hypothesis that the solar powered vehicle is better? Is one solution really better than the other? Regardless of whether either solution is optimal, is one of them going in a sustainable direction for future development? It might seem clear that the electric vehicle is a sounder technology since it is both sustainable in its power source and in its potential for future development, whereas the coal vehicle is something of a dead end. The solution can be ideologically correct, but this is a matter of opinion. Anyone can claim to prefer the coal powered vehicle, whether others would deem that belief to be rational or not. One can attempt to evaluate their basic principles on the basis of anecdotal evidence. One can produce numbers for many small contributing factors (such as the weight of the