
- •Contents
- •Preface to second edition
- •1 Introduction
- •1.2 Applying technology in an environment
- •1.3 The human role in systems
- •1.4 Ethical issues
- •1.7 Common practice and good practice
- •1.8 Bugs and emergent phenomena
- •1.10 Knowledge is a jigsaw puzzle
- •1.11 To the student
- •1.12 Some road-maps
- •2 System components
- •2.2 Handling hardware
- •2.3 Operating systems
- •2.4 Filesystems
- •2.5 Processes and job control
- •2.6 Networks
- •2.7 IPv4 networks
- •2.8 Address space in IPv4
- •2.9 IPv6 networks
- •3 Networked communities
- •3.1 Communities and enterprises
- •3.2 Policy blueprints
- •3.4 User behavior: socio-anthropology
- •3.5 Clients, servers and delegation
- •3.6 Host identities and name services
- •3.8 Local network orientation and analysis
- •4 Host management
- •4.1 Global view, local action
- •4.2 Physical considerations of server room
- •4.3 Computer startup and shutdown
- •4.5 Installing a Unix disk
- •4.6 Installation of the operating system
- •4.7 Software installation
- •4.8 Kernel customization
- •5 User management
- •5.1 Issues
- •5.2 User registration
- •5.3 Account policy
- •5.4 Login environment
- •5.5 User support services
- •5.6 Controlling user resources
- •5.7 Online user services
- •5.9 Ethical conduct of administrators and users
- •5.10 Computer usage policy
- •6 Models of network and system administration
- •6.5 Creating infrastructure
- •6.7 Competition, immunity and convergence
- •6.8 Policy and configuration automation
- •7.2 Methods: controlling causes and symptoms
- •7.4 Declarative languages
- •7.6 Common assumptions: clock synchronization
- •7.7 Human–computer job scheduling
- •7.9 Preventative host maintenance
- •7.10 SNMP tools
- •7.11 Cfengine
- •8 Diagnostics, fault and change management
- •8.1 Fault tolerance and propagation
- •8.2 Networks and small worlds
- •8.3 Causality and dependency
- •8.4 Defining the system
- •8.5 Faults
- •8.6 Cause trees
- •8.7 Probabilistic fault trees
- •8.9 Game-theoretical strategy selection
- •8.10 Monitoring
- •8.12 Principles of quality assurance
- •9 Application-level services
- •9.1 Application-level services
- •9.2 Proxies and agents
- •9.3 Installing a new service
- •9.4 Summoning daemons
- •9.5 Setting up the DNS nameservice
- •9.7 E-mail configuration
- •9.8 OpenLDAP directory service
- •9.10 Samba
- •9.11 The printer service
- •9.12 Java web and enterprise services
- •10 Network-level services
- •10.1 The Internet
- •10.2 A recap of networking concepts
- •10.3 Getting traffic to its destination
- •10.4 Alternative network transport technologies
- •10.5 Alternative network connection technologies
- •10.6 IP routing and forwarding
- •10.7 Multi-Protocol Label Switching (MPLS)
- •10.8 Quality of Service
- •10.9 Competition or cooperation for service?
- •10.10 Service Level Agreements
- •11 Principles of security
- •11.1 Four independent issues
- •11.2 Physical security
- •11.3 Trust relationships
- •11.7 Preventing and minimizing failure modes
- •12 Security implementation
- •12.2 The recovery plan
- •12.3 Data integrity and protection
- •12.5 Analyzing network security
- •12.6 VPNs: secure shell and FreeS/WAN
- •12.7 Role-based security and capabilities
- •12.8 WWW security
- •12.9 IPSec – secure IP
- •12.10 Ordered access control and policy conflicts
- •12.11 IP filtering for firewalls
- •12.12 Firewalls
- •12.13 Intrusion detection and forensics
- •13 Analytical system administration
- •13.1 Science vs technology
- •13.2 Studying complex systems
- •13.3 The purpose of observation
- •13.5 Evaluating a hierarchical system
- •13.6 Deterministic and stochastic behavior
- •13.7 Observational errors
- •13.8 Strategic analyses
- •13.9 Summary
- •14 Summary and outlook
- •14.3 Pervasive computing
- •B.1 Make
- •B.2 Perl
- •Bibliography
- •Index

536 |
CHAPTER 13. ANALYTICAL SYSTEM ADMINISTRATION |
f(t ) - signal |
Time |
Fourier transform |
Frequency |
Figure 13.16: Fourier analysis is like a prism, showing us the separate frequencies of which a signal is composed. The sharp peaks in this figure illustrate how we can identify periodic behavior which might otherwise be difficult to identify. The two peaks show that the input source conceals two periodic signals.
13.8 Strategic analyses
The use of formal mathematics to analyze system administration has so far been absent from the discussion. There are two reasons why such analyses are of interest: i) a formal description of a subject often reveals expectations and limitations which were invisible prior to the systematic model, and ii) optimal solutions to problems can be explored, avoiding unnecessary prejudice.
The languages of Game Theory [47] and Dynamical Systems [46] will enable us to formulate and model assertions about the behavior of systems under certain administrative strategies. At some level, the development of a computer system is a problem in economics: it is a mixed game of opposition and cooperation between users and the system. The aims of the game are several: to win resources, to produce work, to gain control of the system, and so on. A proper understanding of the issues should lead to better software and better strategies from human administrators. For instance, is greed a good strategy for a user? How could one optimally counter such a strategy? In some cases it might even be possible to solve system administration games, determining the maximum possible ‘win’ available in the conflict between users and administrators. These topics are somewhat beyond the scope of this book.
13.9 Summary
Finding a rigorous experimental and theoretical basis for system administration is not an easy task. It involves many entwined issues, both technological and sociological. A systematic discussion of theoretical ideas may be found in ref. [52]. The sociological factors in system administration cannot be ignored, since the goal of system administration is, amongst other things, user satisfaction. In this respect one is forced to pay attention to heuristic evidence, as rigorous statistical analysis of a specific effect is not always practical or adequately separable from whatever else is going on in the system. The study of computers is a study of complexity.
EXERCISES |
537 |
Exercises
Self-test objectives
1.What is meant by a scientific approach to system administration?
2.What does complexity really mean?
3.Explain the role of observation in making judgments about systems.
4.How can one formulate criteria for the evaluation of system policies?
5.How is reliability defined?
6.What principles contribute to increased reliability?
7.Describe heuristically how you would expect key variables, such as numbers of processes and network transactions, to vary over time. Comment on what this means for the detection of anomalies in these variables.
8.What is a stochastic system? Explain why human–computer systems are stochastic.
9.What is meant by convergence in the context of system administration?
10.What is meant by regulation?
11.Explain how errors of measurement can occur in a computer.
12.Explain how errors of measurement should be dealt with.
Problems
1.Consider the following data which represent a measurement of CPU usage for a process over time:
2.1
2.0
2.1
2.2
2.2
1.9
2.2
2.2
2.1
2.2
2.2
Now answer the following:
(a)To the eye, what appears to be the correct value for the measurement?
(b)Is there a correct value for the measurement?
538 |
CHAPTER 13. ANALYTICAL SYSTEM ADMINISTRATION |
(c)What is the mean value?
(d)What is the standard deviation?
(e)If you were to quote these data as one value, how would you quote the result of the measurement?
2.What is meant by random errors? Explain why computers are not immune to random errors.
3.Explain what is meant by Mean Time Before Failure. How is this quantity measured? Can sufficient measurements be made to make its value credible?
4.If a piece of software has a MTBF of two hours and an average downtime of 15 seconds, does it matter that it is unstable?
5.Explain why one would expect measurements of local SMTP traffic to show a strong daily rhythm, while measurements of incoming traffic would not necessarily have such a pronounced daily rhythm.
6.Discuss whether one would expect to see a daily rhythm in WWW traffic. If such a rhythm were found, what would it tell us about the source of the traffic?
7.Describe a procedure for determining causality in a computer network. Explain any assumptions and limitations which are relevant to this.
8.Explain why problems with quite different causes often lead to the same symptoms.