- •Contents
- •Preface to second edition
- •1 Introduction
- •1.2 Applying technology in an environment
- •1.3 The human role in systems
- •1.4 Ethical issues
- •1.7 Common practice and good practice
- •1.8 Bugs and emergent phenomena
- •1.10 Knowledge is a jigsaw puzzle
- •1.11 To the student
- •1.12 Some road-maps
- •2 System components
- •2.2 Handling hardware
- •2.3 Operating systems
- •2.4 Filesystems
- •2.5 Processes and job control
- •2.6 Networks
- •2.7 IPv4 networks
- •2.8 Address space in IPv4
- •2.9 IPv6 networks
- •3 Networked communities
- •3.1 Communities and enterprises
- •3.2 Policy blueprints
- •3.4 User behavior: socio-anthropology
- •3.5 Clients, servers and delegation
- •3.6 Host identities and name services
- •3.8 Local network orientation and analysis
- •4 Host management
- •4.1 Global view, local action
- •4.2 Physical considerations of server room
- •4.3 Computer startup and shutdown
- •4.5 Installing a Unix disk
- •4.6 Installation of the operating system
- •4.7 Software installation
- •4.8 Kernel customization
- •5 User management
- •5.1 Issues
- •5.2 User registration
- •5.3 Account policy
- •5.4 Login environment
- •5.5 User support services
- •5.6 Controlling user resources
- •5.7 Online user services
- •5.9 Ethical conduct of administrators and users
- •5.10 Computer usage policy
- •6 Models of network and system administration
- •6.5 Creating infrastructure
- •6.7 Competition, immunity and convergence
- •6.8 Policy and configuration automation
- •7.2 Methods: controlling causes and symptoms
- •7.4 Declarative languages
- •7.6 Common assumptions: clock synchronization
- •7.7 Human–computer job scheduling
- •7.9 Preventative host maintenance
- •7.10 SNMP tools
- •7.11 Cfengine
- •8 Diagnostics, fault and change management
- •8.1 Fault tolerance and propagation
- •8.2 Networks and small worlds
- •8.3 Causality and dependency
- •8.4 Defining the system
- •8.5 Faults
- •8.6 Cause trees
- •8.7 Probabilistic fault trees
- •8.9 Game-theoretical strategy selection
- •8.10 Monitoring
- •8.12 Principles of quality assurance
- •9 Application-level services
- •9.1 Application-level services
- •9.2 Proxies and agents
- •9.3 Installing a new service
- •9.4 Summoning daemons
- •9.5 Setting up the DNS nameservice
- •9.7 E-mail configuration
- •9.8 OpenLDAP directory service
- •9.10 Samba
- •9.11 The printer service
- •9.12 Java web and enterprise services
- •10 Network-level services
- •10.1 The Internet
- •10.2 A recap of networking concepts
- •10.3 Getting traffic to its destination
- •10.4 Alternative network transport technologies
- •10.5 Alternative network connection technologies
- •10.6 IP routing and forwarding
- •10.7 Multi-Protocol Label Switching (MPLS)
- •10.8 Quality of Service
- •10.9 Competition or cooperation for service?
- •10.10 Service Level Agreements
- •11 Principles of security
- •11.1 Four independent issues
- •11.2 Physical security
- •11.3 Trust relationships
- •11.7 Preventing and minimizing failure modes
- •12 Security implementation
- •12.2 The recovery plan
- •12.3 Data integrity and protection
- •12.5 Analyzing network security
- •12.6 VPNs: secure shell and FreeS/WAN
- •12.7 Role-based security and capabilities
- •12.8 WWW security
- •12.9 IPSec – secure IP
- •12.10 Ordered access control and policy conflicts
- •12.11 IP filtering for firewalls
- •12.12 Firewalls
- •12.13 Intrusion detection and forensics
- •13 Analytical system administration
- •13.1 Science vs technology
- •13.2 Studying complex systems
- •13.3 The purpose of observation
- •13.5 Evaluating a hierarchical system
- •13.6 Deterministic and stochastic behavior
- •13.7 Observational errors
- •13.8 Strategic analyses
- •13.9 Summary
- •14 Summary and outlook
- •14.3 Pervasive computing
- •B.1 Make
- •B.2 Perl
- •Bibliography
- •Index
10.7. MULTI-PROTOCOL LABEL SWITCHING (MPLS) |
407 |
of IS-IS is that it was developed to politicize adherence to the OSI routing model, while being somewhat removed from the real needs and wishes of users. IS-IS was early in having support for IPv6.
BGP
The Border Gateway Protocol (BGP) is an Exterior Routing Protocol, designed to route top-level traffic between Autonomous Systems (sometimes called Routing Domains). BGP is neither a Distance Vector nor a Link State protocol in the normal sense. Instead it may be called a Path Vector Protocol, since it stores not only hop metrics but entire pathways through Autonomous System maps. In a sense, it automatically performs source routing. This is to account for policy decisions: who says that just anyone should be able to send traffic over just any Autonomous System? BGP tries to find the best route, only after finding an ‘authorized route’.
BGP’s support for Classless InterDomain Routing (CIDR) has made it possible to rescue IPv4 from an early demise during the 1990s. Top-level routers need to know paths to all networks, the table of network numbers must be stored on each inter-domain router. Storing and parsing this table places great demands on these backbone routers.
BGP works over TCP, which makes it predictable, but this has also led to routing problems associated with traffic congestion.
Principle 53 (Routing policy). At the level of Autonomous Systems, policy (access controls) plays the major role in determining routes; efficiency is of secondary importance. Lower down within each AS, routes are calculated based on availability and distance metrics.
In a real sense, BGP is not a routing protocol at all, but a directory service, telling top-level routers in which general direction they must send packets in order to get closer to their final destination; i.e. it is a database of hints. A BGP route cannot be guaranteed to be true. The assumptions on which it is built are that the underlying transport routing will be correctly performed by something like OSPF or IS-IS, and that no policies will change as packets are following their suggested routes. BGP tells a packet: I cannot send you to your destination, but if you go to this Autonomous System, they should be able to help you.
Note, however, that Autonomous Systems are literally autonomous: they can decide not to cooperate with their neighbors, at their own option. BGP is literally peer-to-peer cooperation. The consistency of global routing mechanisms depends entirely on trusting neighbors to play their part and keep responsible policy practices. A simple misconfiguration of BGP could lead to widespread routing confusion.
10.7 Multi-Protocol Label Switching (MPLS)
The argument over IP or ATM has condensed down to an effort to combine the best of both worlds. Multi-Protocol Label Switching (MPLS) is a hybrid layer 2–3
408 |
CHAPTER 10. NETWORK-LEVEL SERVICES |
technology that uses IP to guide switched technologies of all kinds. It is about the separation of control and forwarding mechanisms.
Layer 2 switches provide high-speed connectivity, while the IP routers at the edge – interconnected by a mesh of layer 2 virtual circuits – provide the intelligence to forward IP datagrams. The difficulty with this approach lies in the complexity of mapping between two distinct architectures that require the definition and maintenance of separate topologies, address spaces, routing protocols, signaling protocols and resource allocation schemes. The emergence of the multilayer switching solutions and MPLS is part of the evolution of the Internet to decrease complexity by combining layer 2 switching and layer 3 routing into a fully integrated solution.
Another goal of MPLS is to integrate Quality of Service (QoS) functionality into IP. ATM has QoS functionality, but IP has no true support for this. Today the best one can do is to simulate long-term average Quality of Service (see section 10.8).
The forwarding component of virtually all multilayer switching solutions and MPLS is based on a label-swapping forwarding algorithm. This is the same algorithm used to forward data in ATM and Frame Relay switches; it is based on signaling and label distribution. A label is a short, fixed-length value carried in the packet’s header to identify a Forwarding Equivalence Class (FEC). A label is analogous to a Virtual Circuit Identifier, but an FEC also distinguishes between differentiated services, analogous to IP service port numbers.
An FEC is a set of packets that is forwarded over the same path through a network even if their ultimate destinations are different. For example, in conventional longest-match IP routing, the set of unicast packets whose destination addresses map to a given IP address prefix is an example of an FEC.
10.8 Quality of Service
The commercialization of the network has seen the arrival of Internet Service Providers (ISP), Application Service Providers (ASP), Web Hotels, outsourcing and hosting companies. The desire to sell these services to other organizations and companies has placed deliverability at center stage. Customers will not pay for a service that they cannot be certain will be delivered; there are many ways in which one might choose to deal with this challenge:
•A cheaper price for living with uncertainty – but this might not be acceptable.
•A planned over-capacity to guarantee a level of service – but this might be considered wasteful.
• Precision technology that can deliver and regulate exactly what is needed – but this requires investment in infrastructure.
Sorting out the details of a solution to these issues is the job of a Service Level Agreement (SLA) between the service provider and the service client (see section 10.10). Clearly, technology is at the heart of this; one cannot promise what cannot be delivered.
10.8. QUALITY OF SERVICE |
409 |
There are many levels at which one can discuss ‘service’. In the most general case, one can simply talk about quality assurance of an entire business process, but the expression Quality of Service in networking terms generally refers to delivery rate assurance and value for money. Traditionally one has referred to networked services as the application-level services of the previous chapter, because the network was not for sale – it simply existed as a non-commercial service to academia and the military. Today, the connectivity itself is being sold and it must be included in the list of services that companies want to buy or charge for. Service providers are thus interested in being able to sell connectivity with bandwidth guarantees. Different kinds of applications require different levels of lower level service guarantees. For instance, voice and video traffic are time critical and data intensive, whereas Web traffic and E-mail are not. All quality of service guarantees rely on the basic transport guarantee; thus Quality of Service must be defined bottom up in terms of the OSI-layers.
Today, some are discussing QoS (Quality of Service), QoD (Quality of Devices), QoE (Quality of Experience), QoB (Quality of Business), and any number of variations of the issue of service provision. Each of these is trying to capture the essence of a usable measure that can be sold like ‘kilos of sugar’ to customers, and be used to gauge a provider’s own performance. So how does one define Quality of Service more rigorously? It has been suggested that it must be a function of ‘Quality of Devices’.
QoS = f (QoD) |
(10.1) |
This is clearly sensible, from the laws of causality; the higher OSI levels can make demands for service but if the lower levels are not willing, they will not be successful. What kind of function should this be? Is it linear, non-linear, stochastic, chaotic? Some principles for Quality of Service for management are discussed in ref. [255]. This discussion is based on ref. [51].
The basic Internet protocol family (TCP/IP) has no provision for securing quality of service guarantees at the protocol level; a limited form of quality of service can be simulated at the router level by prioritizing packet delivery. To do this, a router must open up packets and look at the higher OSI layers 4–7. This incurs an additional cost that adds to the delivery time:
Total = Latency + Transport Round-Time + Query-Response Processing
Other transport agents, like Frame Relay, ATM and MPLS, on the other hand, are designed with Quality of Service in mind. These, however, are rarely deployed end to end for the normal user, so some compromise is required.
When one speaks of Quality of Service in networking, one really means Quality of Service Rate, i.e. timing guarantees, but this poses an important question: over what time scale can quality assurance be provided? Data rates are never constant; at some level, one will begin to see the graininess of traffic and see variations in the amount of data delivered per second. It is only the average rate, over several seconds, minutes or hours that can be regulated and thus be predicted. Thus the issue of quality of service is one of what the time scale is over which guarantees are required.
410 |
CHAPTER 10. NETWORK-LEVEL SERVICES |
Principle 54 (Rate guarantees). The maximum rate at which any service can be provided is limited by the weakest link in the chain of communication (see principle 49). The variability of the rate is limited by random errors at the shortest time scale t of any component in the chain. In a steady state service, the average rate, over a time T t is:
|
1 |
T |
|
|
|
R = |
0 |
R(t)dt, |
(10.2) |
||
T |
where dt = t in practice. If one assumes that the average transmission rate is constant, up to a variable amount of random noise, one may write
R(t) = R(t) + δR(t), |
(10.3) |
where δR(t) is a random source with time granularity t, then the variability over time T is
1 |
T |
|
|
0 |
|
|
|
R(T ) = T |
(R(t) − R )2dt. |
(10.4) |
If the system is in a steady state (not a state of growth or decay), then this tends towards a small constant value (approximately zero) as T t.
Thus, the longer the time scale over which a rate guarantee is defined, the less control is needed through technology, and the smaller the overhead to manage it; i.e. t/T → 0 is the goal of devices (small packet size or granularity).
Example 9. A service guarantee of 10MB per day is entirely believable for UUCP (a modem, dial-up protocol), but a rate of 10MB per minute is not. A rate of 350 kilobits per second is reasonable for a Digital Subscriber Line (DSL) service provider, but a rate of 3 kilobits per hundredth of a second is not.
This agrees with intuition. Indeed, ATM uses small fixed cell sizes; MPLS allows variable packet sizes, with a fine granularity.
It is important to note the difference between determinism (i.e. explicit control over information) and predictability. Quality of Service demands predictability over a given time scale, but this does not imply deterministic behavior along the way.
Principle 55 (Predictability vs determinism). Predictability always has limits or tolerances. Predictability does not imply precise determinism, but determinism provides predictability. Predictability requires only a deterministic average behavior.
Example 10. The Ethernet is a non-deterministic protocol, but within a reasonable margin for error, it is possible to calculate the average data rate, on a time scale of minutes, that is fairly constant. Thus the Ethernet can be predictable, without requiring deterministic control.
Diffserv is a way of defining routing policy for differentiated services (see RFC 2475), i.e. of setting priorities on routed packets. Since the router is performing
10.8. QUALITY OF SERVICE |
411 |
packet forwarding regulation, the average packet size along the journey is the limit of granularity for Quality Assurance. RSVP is a QoS signaling service, proposed by the Internet Engineering Task Force (IETF), that is used by hosts to signal resource requests. These must be negotiated between hosts and routers between end points.
10.8.1Uncertainty
The term service guarantee seems to imply determinism of service mechanisms, but this need not be the case. All we require is predictability over an appropriate time scale.
It is important to understand that service is about changes occurring in time, and thus time is an essential element of any service-level agreement. If we focus on shorter and shorter intervals of time, it becomes impossible to guarantee what will happen. It is only over longer intervals that we can say, on average, what has been the level of service and what is likely to be the level of service in the future. We must therefore specify the time scale on which we shall measure service levels.
Example 11. A Service Level Agreement for UUCP network connectivity could agree to transfer up to 10 MB of data per day. This is an easy goal, by modern standards, and it hardly seems worth including any margin for error in this. On the other hand, a Digital Subscription Line (DSL) network provider might offer a guaranteed rate of 350 kbps (kilobits per second). This is a common level of service at the time of writing. But what are the margins for error now? If each customer has a private network telephone line, we might think that there is no uncertainty here, but this would be wrong. There might be noise on the line, causing a reduction in error-free transmission rate. When the signal reaches the Service Provider’s switching center, customers are suddenly expected to share common resources, and this sharing must maintain the guarantees. It now becomes realistic to assess the margin for error in the figure 350 kbps, since resources are tighter.
Example 12. A University Professor can agree to grade 10 examination papers per day. It is not clear that the level of interruptions and other duties will not make this goal unreasonable. The level of uncertainty is much higher than in a mechanistic network switch. We might estimate it to be 10 ± 3 exam papers per day. In this case, the Professor should include this margin for error in the contract of service.
Uncertainty is an important concern in discussing Quality of Service. Uncertainty is calculated using the ‘theory of errors’. This assumes that errors
or uncertainties occur at random, according to a Gaussian profile, about some true value. The Gaussian assumption basically ensures that errors are small or do not grow to an arbitrarily large size, compared with the rate of change of the average. Whether or not a phenomenon really has a Gaussian profile or not, error handling techniques can be used to estimate uncertainties provided there is a suitable separation of time scales.
Example 13. Consider the rate of arrival of data R, in bytes, from the viewpoint of a network switch or router. The measurables are typically the packet size P and the number of packets per second r. These are independent quantities, with independent
412 |
CHAPTER 10. NETWORK-LEVEL SERVICES |
uncertainties: packet sizes are distributed according to network protocol and traffic types, whereas packet rates are dictated by router/switch performance and queue lengths. The total rate is expressed as:
|
|
|
λ = rP . |
|
|
|
|
|
(10.5) |
||
Using the method of combining independent uncertainties, we write: |
|
||||||||||
|
|
λ = λ + λ |
|
|
|
|
|
|
|||
|
|
r = r + r |
|
|
|
|
|
|
|||
and |
P = P + P , |
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
∂λ |
2 |
|
∂λ |
|
2 |
|
|
||
λ = |
|
|
P 2 + |
|
|
|
r2. |
(10.6) |
|||
∂P |
∂r |
||||||||||
Now, ATM packets have a fixed size of 53 bytes, thus PATM = 0, but Ethernet or Frame Relay packets have varying sizes. An average uncertainty needs to be measured over time. Let us suppose that it might be 1kB, or something of that order of magnitude.
For a Service Provider, the uncertainty in r also requires measurement; r represents the aggregated traffic from multiple customers. A Service Provider could hope that the aggregation of traffic load from several customers would even out, allowing the capacity of a channel to be used evenly at all times. Alas, traffic in the same geographical region tends to peak at the same times, not different times, so channels must be idle most of the time and inundated for brief periods. To find r and r, we
aggregate the separate sources into the total packet-rate: |
|
|
r(t) = i |
ri (t). |
(10.7) |
The aggregated uncertainty in r is the Pythagoran sum:
r = |
|
|
|
|
(10.8) |
i |
ri2. |
||||
|
|
|
|
|
|
The estimated uncertainty is
λ = r2( P )2 + P 2( r)2. (10.9)
Since r and r are likely to be of similar orders of magnitude for many customers, whereas P < P , this indicates that uncertainty is dominated by demand uncertainty, i.e.
λ P r. |
(10.10) |
This uncertainty can now be used in queuing estimates.
Now that we are able to quantify uncertainty, we can create a sensible servicelevel agreement on the following kind of assertion:
‘We, the provider, promise to provide a service of S ± S, measured over time intervals T , at the price . . .’
