Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
database.docx
Скачиваний:
13
Добавлен:
10.02.2016
Размер:
314.09 Кб
Скачать

Virtual private network

A virtual private network (VPN) is a computer network in which some of the links between nodes are carried by open connections or virtual circuits in some larger network (e.g., the Internet) instead of by physical wires. The data link layer protocols of the virtual network are said to be tunneled through the larger network when this is the case. One common application is secure communications through the public Internet, but a VPN need not have explicit security features, such as authentication or content encryption. VPNs, for example, can be used to separate the traffic of different user communities over an underlying network with strong security features.

VPN may have best-effort performance, or may have a defined service level agreement (SLA) between the VPN customer and the VPN service provider. Generally, a VPN has a topology more complex than point-to-point.

Internetwork

An internetwork is the connection of multiple computer networks via a common routing technology using routers. The Internet is an aggregation of many connected internetworks spanning the Earth.

Organizational scope

Networks are typically managed by organizations which own them. According to the owner's point of view, networks are seen as intranets or extranets. A special case of network is the Internet, which has no single owner but a distinct status when seen by an organizational entity – that of permitting virtually unlimited global connectivity for a great multitude of purposes.

Intranets and extranets

Intranets and extranets are parts or extensions of a computer network, usually a LAN.

An intranet is a set of networks, using the Internet Protocol and IP-based tools such as web browsers and file transfer applications, that is under the control of a single administrative entity. That administrative entity closes the intranet to all but specific, authorized users. Most commonly, an intranet is the internal network of an organization. A large intranet will typically have at least one web server to provide users with organizational information.

An extranet is a network that is limited in scope to a single organization or entity and also has limited connections to the networks of one or more other usually, but not necessarily, trusted organizations or entities—a company's customers may be given access to some part of its intranet—while at the same time the customers may not be considered trusted from a security standpoint. Technically, an extranet may also be categorized as a CAN, MAN, WAN, or other type of network, although an extranet cannot consist of a single LAN; it must have at least one connection with an external network.

Internet

The Internet is a global system of interconnected governmental, academic, corporate, public, and private computer networks. It is based on the networking technologies of the Internet Protocol Suite. It is the successor of the Advanced Research Projects Agency Network (ARPANET) developed by DARPA of the United States Department of Defense. The Internet is also the communications backbone underlying the World Wide Web (WWW).

Participants in the Internet use a diverse array of methods of several hundred documented, and often standardized, protocols compatible with the Internet Protocol Suite and an addressing system (IP addresses) administered by the Internet Assigned Numbers Authority and address registries. Service providers and large enterprises exchange information about the reachability of their address spaces through the Border Gateway Protocol (BGP), forming a redundant worldwide mesh of transmission paths.

Network topology

Common layouts

A network topology is the layout of the interconnections of the nodes of a computer network. Common layouts are:

A bus network: all nodes are connected to a common medium along this medium. This was the layout used in the original Ethernet, called 10BASE5 and 10BASE2.

A star network: all nodes are connected to a special central node. This is the typical layout found in in a Wireless LAN, where each wireless client connects to the central Wireless access point.

A ring network: each node is connected to its left and right neighbor node, such that all nodes are connected and that each node can reach each other node by traversing nodes left- or rightwards. The Fiber Distributed Data Interface (FDDI) made use of such a topology.

A mesh network: each node is connected to an arbitrary number of neighbors in such a way that there is at least one traversal from any node to any other.

A fully connected network: each node is connected to every other node in the network.

Note that the physical layout of the nodes in a network may not necessarily reflect the network topology. As an example, with FDDI, the network topology is a ring (actually two counter-rotating rings), but the physical topology is a star, because all neighboring connections are routed via a central physical location.

Overlay network

An overlay network is a virtual computer network that is built on top of another network. Nodes in the overlay are connected by virtual or logical links, each of which corresponds to a path, perhaps through many physical links, in the underlying network. The topology of the overlay network may (and often does) differ from that of the underlying one.

For example, many peer-to-peer networks are overlay networks because they are organized as nodes of a virtual system of links run on top of the Internet. The Internet was initially built as an overlay on the telephone network.

The most striking example of an overlay network, however, is the Internet itself: At the IP layer, each node can reach any other by a direct connection to the desired IP address, thereby creating a fully connected network; the underlying network, however, is composed of a mesh-like interconnect of subnetworks of varying topologies (and, in fact, technologies). Address resolution and routing are the means which allows the mapping of the fully connected IP overlay network to the underlying ones.

Overlay networks have been around since the invention of networking when computer systems were connected over telephone lines using modems, before any data network existed.

Another example of an overlay network is a distributed hash table, which maps keys to nodes in the network. In this case, the underlying network is an IP network, and the overlay network is a table (actually a map) indexed by keys.

Overlay networks have also been proposed as a way to improve Internet routing, such as through quality of service guarantees to achieve higher-quality streaming media. Previous proposals such as IntServ, DiffServ, and IP Multicast have not seen wide acceptance largely because they require modification of all routers in the network.[citation needed] On the other hand, an overlay network can be incrementally deployed on end-hosts running the overlay protocol software, without cooperation from Internet service providers. The overlay has no control over how packets are routed in the underlying network between two overlay nodes, but it can control, for example, the sequence of overlay nodes a message traverses before reaching its destination.

For example, Akamai Technologies manages an overlay network that provides reliable, efficient content delivery (a kind of multicast). Academic research includes end system multicast and overcast for multicast; RON (resilient overlay network) for resilient routing; and OverQoS for quality of service guarantees, among others.

Basic hardware components

Apart from the physical communications media themselves as described above, networks comprise additional basic hardware building blocks interconnecting their terminals, such as network interface cards (NICs), hubs, bridges, switches, and routers.

Network interface cards

A network card, network adapter, or NIC (network interface card) is a piece of computer hardware designed to allow computers to physically access a networking medium. It provides a low-level addressing system through the use of MAC addresses.

Each Ethernet network interface has a unique MAC address which is usually stored in a small memory device on the card, allowing any device to connect to the network without creating an address conflict. Ethernet MAC addresses are composed of six octets. Uniqueness is maintained by the IEEE, which manages the Ethernet address space by assigning 3-octet prefixes to equipment manufacturers. The list of prefixes is publicly available. Each manufacturer is then obliged to both use only their assigned prefix(es) and to uniquely set the 3-octet suffix of every Ethernet interface they produce.

Repeaters and hubs

A repeater is an electronic device that receives a signal, cleans it of unnecessary noise, regenerates it, and retransmits it at a higher power level, or to the other side of an obstruction, so that the signal can cover longer distances without degradation. In most twisted pair Ethernet configurations, repeaters are required for cable that runs longer than 100 meters. A repeater with multiple ports is known as a hub. Repeaters work on the Physical Layer of the OSI model. Repeaters require a small amount of time to regenerate the signal. This can cause a propagation delay which can affect network communication when there are several repeaters in a row. Many network architectures limit the number of repeaters that can be used in a row (e.g. Ethernet's 5-4-3 rule).

Today, repeaters and hubs have been made mostly obsolete by switches (see below).

Bridges

A network bridge connects multiple network segments at the data link layer (layer 2) of the OSI model. Bridges broadcast to all ports except the port on which the broadcast was received. However, bridges do not promiscuously copy traffic to all ports, as hubs do, but learn which MAC addresses are reachable through specific ports. Once the bridge associates a port and an address, it will send traffic for that address to that port only.

Bridges learn the association of ports and addresses by examining the source address of frames that it sees on various ports. Once a frame arrives through a port, its source address is stored and the bridge assumes that MAC address is associated with that port. The first time that a previously unknown destination address is seen, the bridge will forward the frame to all ports other than the one on which the frame arrived.

Bridges come in three basic types:

1. Local bridges: Directly connect LANs

2. Remote bridges: Can be used to create a wide area network (WAN) link between LANs. Remote bridges, where the connecting link is slower than the end networks, largely have been replaced with routers.

3. Wireless bridges: Can be used to join LANs or connect remote stations to LANs.

Switches

A network switch is a device that forwards and filters OSI layer 2 datagrams (chunks of data communication) between ports (connected cables) based on the MAC addresses in the packets.[15] A switch is distinct from a hub in that it only forwards the frames to the ports involved in the communication rather than all ports connected. A switch breaks the collision domain but represents itself as a broadcast domain. Switches make forwarding decisions of frames on the basis of MAC addresses. A switch normally has numerous ports, facilitating a star topology for devices, and cascading additional switches.[16] Some switches are capable of routing based on Layer 3 addressing or additional logical levels; these are called multi-layer switches. The term switch is used loosely in marketing to encompass devices including routers and bridges, as well as devices that may distribute traffic on load or by application content (e.g., a Web URL identifier).

Routers

A router is an internetworking device that forwards packets between networks by processing information found in the datagram or packet (Internet protocol information from Layer 3 of the OSI Model). In many situations, this information is processed in conjunction with the routing table (also known as forwarding table). Routers use routing tables to determine what interface to forward packets (this can include the "null" also known as the "black hole" interface because data can go into it, however, no further processing is done for said data).

Firewalls

A firewall is an important aspect of a network with respect to security. It typically rejects access requests from unsafe sources while allowing actions from recognized ones. The vital role firewalls play in network security grows in parallel with the constant increase in 'cyber' attacks for the purpose of stealing/corrupting data, planting viruses, etc.

Network performance

Network performance refers to the service quality of a telecommunications product as seen by the customer. It should not be seen merely as an attempt to get "more through" the network.

The following list gives examples of Network Performance measures for a circuit-switched network and one type of packet-switched network, viz. ATM:

1. Circuit-switched networks: In circuit switched networks, network performance is synonymous with the grade of service. The number of rejected calls is a measure of how well the network is performing under heavy traffic loads. Other types of performance measures can include noise, echo and so on.

2. ATM: In an Asynchronous Transfer Mode (ATM) network, performance can be measured by line rate, quality of service (QoS), data throughput, connect time, stability, technology, modulation technique and modem enhancements.

There are many different ways to measure the performance of a network, as each network is different in nature and design. Performance can also be modelled instead of measured; one example of this is using state transition diagrams to model queuing performance in a circuit-switched network. These diagrams allow the network planner to analyze how the network will perform in each state, ensuring that the network will be optimally designed.

8-second rule

A June 2001 Zona Research report entitled "The Need for Speed II" found that the average web user will wait about eight seconds for a page to download, but that current average download time across backbone connection on most web sites is almost ten seconds.

The 8-second rule is an old (by Internet standards) way of measuring the adequate response time of a webserver through different bandwidth connections. It specified that if the load-time of a web page exceeds eight seconds, users are unlikely to wait, or "stick around", for its completion. In order to increase the "stickiness" of a website, faster ways to deliver the content to the user needed to be devised. These included stripping away unnecessary HTML code and using fewer images.

It is generally believed that this rule no longer applies, since a much higher percentage of Internet users now have broadband available, making almost every website load up much faster, in some cases in less than a second. However, the rule has remained as a rough unit to measure the performance of a webserver.

Network security

In the field of networking, the area of network security[20] consists of the provisions and policies adopted by the network administrator to prevent and monitor unauthorized access, misuse, modification, or denial of the computer network and network-accessible resources. Network security is the authorization of access to data in a network, which is controlled by the network administrator. Users are assigned an ID and password that allows them access to information and programs within their authority. Network Security covers a variety of computer networks, both public and private that are used in everyday jobs conducting transactions and communications among businesses, government agencies and individuals. Networks can be private, such as within a company, and others which might be open to public access. Network security is involved in organizations, enterprises, and other types of institutions. It does as its title explains: It secures the network, as well as protecting and overseeing operations being done. The most common and simple way of protecting a network resource is by assigning it a unique name and a corresponding password.

Network security concepts

Network security starts with authenticating the user, commonly with a username and a password. Since this requires just one detail authenticating the user name —i.e. the password, which is something the user 'knows'— this is sometimes termed one-factor authentication. With two-factor authentication, something the user 'has' is also used (e.g. a security token or 'dongle', an ATM card, or a mobile phone); and with three-factor authentication, something the user 'is' is also used (e.g. a fingerprint or retinal scan).

Once authenticated, a firewall enforces access policies such as what services are allowed to be accessed by the network users. Though effective to prevent unauthorized access, this component may fail to check potentially harmful content such as computer worms or Trojans being transmitted over the network. Anti-virus software or an intrusion prevention system (IPS) help detect and inhibit the action of such malware. An anomaly-based intrusion detection system may also monitor the network and traffic for unexpected (i.e. suspicious) content or behavior and other anomalies to protect resources, e.g. from denial of service attacks or an employee accessing files at strange times. Individual events occurring on the network may be logged for audit purposes and for later high-level analysis.

Communication between two hosts using a network may be encrypted to maintain privacy.

Honeypots, essentially decoy network-accessible resources, may be deployed in a network as surveillance and early-warning tools, as the honeypots are not normally accessed for legitimate purposes. Techniques used by the attackers that attempt to compromise these decoy resources are studied during and after an attack to keep an eye on new exploitation techniques. Such analysis may be used to further tighten security of the actual network being protected by the honeypot.

Network resilience

In computer networking: “Resilience is the ability to provide and maintain an acceptable level of service in the face of faults and challenges to normal operation.”

Views of networks

Users and network administrators typically have different views of their networks. Users can share printers and some servers from a workgroup, which usually means they are in the same geographic location and are on the same LAN, whereas a Network Administrator is responsible to keep that network up and running. A community of interest has less of a connection of being in a local area, and should be thought of as a set of arbitrarily located users who share a set of servers, and possibly also communicate via peer-to-peer technologies.

Network administrators can see networks from both physical and logical perspectives. The physical perspective involves geographic locations, physical cabling, and the network elements (e.g., routers, bridges and application layer gateways) that interconnect the physical media. Logical networks, called, in the TCP/IP architecture, subnets, map onto one or more physical media. For example, a common practice in a campus of buildings is to make a set of LAN cables in each building appear to be a common subnet, using virtual LAN (VLAN) technology.

Both users and administrators will be aware, to varying extents, of the trust and scope characteristics of a network. Again using TCP/IP architectural terminology, an intranet is a community of interest under private administration usually by an enterprise, and is only accessible by authorized users (e.g. employees).[22] Intranets do not have to be connected to the Internet, but generally have a limited connection. An extranet is an extension of an intranet that allows secure communications to users outside of the intranet (e.g. business partners, customers).

Unofficially, the Internet is the set of users, enterprises, and content providers that are interconnected by Internet Service Providers (ISP). From an engineering viewpoint, the Internet is the set of subnets, and aggregates of subnets, which share the registered IP address space and exchange information about the reachability of those IP addresses using the Border Gateway Protocol. Typically, the human-readable names of servers are translated to IP addresses, transparently to users, via the directory function of the Domain Name System (DNS).

Over the Internet, there can be business-to-business (B2B), business-to-consumer (B2C) and consumer-to-consumer (C2C) communications. Especially when money or sensitive information is exchanged, the communications are apt to be secured by some form of communications security mechanism. Intranets and extranets can be securely superimposed onto the Internet, without any access by general Internet users and administrators, using secure Virtual Private Network (VPN) technology.

Network topology

Network topology is the layout pattern of interconnections of the various elements (links, nodes, etc.) of a computer or biological network. Network topologies may be physical or logical. Physical topology refers to the physical design of a network including the devices, location and cable installation. Logical topology refers to how data is actually transferred in a network as opposed to its physical design. In general physical topology relates to a core network whereas logical topology relates to basic network.

Topology can be understood as the shape or structure of a network. This shape does not necessarily correspond to the actual physical design of the devices on the computer network. The computers on a home network can be arranged in a circle but it does not necessarily mean that it represents a ring topology.

Any particular network topology is determined only by the graphical mapping of the configuration of physical and/or logical connections between nodes. The study of network topology uses graph theory. Distances between nodes, physical interconnections, transmission rates, and/or signal types may differ in two networks and yet their topologies may be identical.

A local area network (LAN) is one example of a network that exhibits both a physical topology and a logical topology. Any given node in the LAN has one or more links to one or more nodes in the network and the mapping of these links and nodes in a graph results in a geometric shape that may be used to describe the physical topology of the network. Likewise, the mapping of the data flow between the nodes in the network determines the logical topology of the network. The physical and logical topologies may or may not be identical in any particular network.

There are two basic categories of network topologies:

1. Physical topologies

2. Logical topologies

The shape of the cabling layout used to link devices is called the physical topology of the network. This refers to the layout of cabling, the locations of nodes, and the interconnections between the nodes and the cabling. The physical topology of a network is determined by the capabilities of the network access devices and media, the level of control or fault tolerance desired, and the cost associated with cabling or telecommunications circuits.

The logical topology, in contrast, is the way that the signals act on the network media, or the way that the data passes through the network from one device to the next without regard to the physical interconnection of the devices. A network's logical topology is not necessarily the same as its physical topology. For example, the original twisted pair Ethernet using repeater hubs was a logical bus topology with a physical star topology layout. Token Ring is a logical ring topology, but is wired a physical star from the Media Access Unit.

The logical classification of network topologies generally follows the same classifications as those in the physical classifications of network topologies but describes the path that the data takes between nodes being used as opposed to the actual physical connections between nodes. The logical topologies are generally determined by network protocols as opposed to being determined by the physical layout of cables, wires, and network devices or by the flow of the electrical signals, although in many cases the paths that the electrical signals take between nodes may closely match the logical flow of data, hence the convention of using the terms logical topology and signal topology interchangeably.

Logical topologies are often closely associated with Media Access Control methods and protocols. Logical topologies are able to be dynamically reconfigured by special types of equipment such as routers and switches.

The study of network topology recognizes seven basic topologies:

1. Ring

2. Mesh

3. Star

4. Fully Connected

5. Line

6. Tree

7. Bus

Reliability engineering is an engineering field, that deals with the study, evaluation, and life-cycle management of reliability: the ability of a system or component to perform its required functions under stated conditions for a specified period of time. It is often measured as a probability of failure or a measure of availability. However, maintainability is also an important part of reliability engineering.

Reliability engineering for complex systems requires a different, more elaborated systems approach than reliability for non-complex systems / items. Reliability engineering is closely related to system safety engineering in the sense that they both use common sorts of methods for their analysis and might require input from each other. Reliability analysis have important links with function analysis, requirements specification, systems design, hardware design, software design, manufacturing, testing, maintenance, transport, storage, spare parts, operations research, human factors, technical documentation, training and more.

Most industries do not have specialized reliability engineers and the engineering task often becomes part of the tasks of a design engineer, logistics engineer, systems engineer or quality engineer. Reliability engineers should have broad skills and knowledge.

Contents

1 Overview

2 Reliability theory

3 Reliability engineering vs safety engineering

4 Reliability program plan

4.1 Reliability requirements

4.2 Reliability prediction

4.3 System reliability parameters

4.4 Reliability modelling

4.5 Reliability test requirements

4.6 Requirements for reliability tasks

5 Design for reliability

6 Reliability testing

7 Accelerated testing

8 Software reliability

9 Reliability operational assessment

10 Reliability organizations

11 Certification

12 Reliability engineering education

13 See also

14 References

15 Further reading

15.1 US standards, specifications, and handbooks

15.2 UK standards

15.3 French standards

15.4 International standards

16 External links

Overview

A reliability block diagram

Reliability may be defined in several ways:

The idea that something is fit for a purpose with respect to time;

The capacity of a device or system to perform as designed;

The resistance to failure of a device or system;

The ability of a device or system to perform a required function under stated conditions for a specified period of time;

The probability that a functional unit will perform its required function for a specified interval under stated conditions.

The ability of something to "fail well" (fail without catastrophic consequences)

Reliability engineering is a special discipline within Systems engineering. Reliability engineers rely heavily on statistics, probability theory, and reliability theory to set requirements, measure or predict reliability and advice on improvements for reliability performance. Many engineering techniques are used in reliability engineering, such as Reliability Hazard analysis, Failure mode and effects analysis (FMEA), Fault tree analysis, Reliability Prediction, Weibull analysis, thermal management, reliability testing and accelerated life testing. Because of the large number of reliability techniques, their expense, and the varying degrees of reliability required for different situations, most projects develop a reliability program plan to specify the reliability tasks that will be performed for that specific system.

The function of reliability engineering is to develop the reliability requirements for the product, establish an adequate life-cycle reliability program, show that corrective measures (risk mitigations) produce reliability improvements, and perform appropriate analyses and tasks to ensure the product will meet its requirements and the unreliability risk is controlled to an acceptable level. It needs to provide a robust set of (statistical) evidence and justification material to verify if the requirements have been met and to check preliminary reliability risk assessments. The goal is to first identify the reliability hazards, assess the risk associated with them and to control the risk to an acceptable level. What is acceptable is determined by the managing authority / customers. These tasks are normally managed by a reliability engineer or manager, who may hold an accredited engineering degree and has additional reliability-specific education and training.

Reliability engineering is closely associated with maintainability engineering and logistics engineering, e.g. Integrated Logistics Support (ILS). Many problems from other fields, such as security engineering and safety engineering can also be approached using common reliability engineering techniques. This article provides an overview of some of the most common reliability engineering tasks. Please see the references for a more comprehensive treatment.

Many types of engineering employ reliability engineers and use the tools and methodology of reliability engineering. For example:

System engineers design and analyze complex systems having a specified reliability

Mechanical engineers may have to design a machine or system with a specified reliability

Automotive engineers have reliability requirements for the automobiles (and components) which they design

Electronics engineers must design and test their products for reliability requirements.

In software engineering and systems engineering the reliability engineering is the sub-discipline of ensuring that a system (or a device in general) will perform its intended function(s) when operated in a specified manner for a specified length of time. Reliability engineering is performed throughout the entire life cycle of a system, including development, test, production and operation.

Reliability theory

Main articles: reliability theory, failure rate.

Reliability theory is the foundation of reliability engineering. For engineering purposes, reliability is defined as:

the probability that a device will perform its intended function during a specified period of time under stated conditions.

Mathematically, this may be expressed as,

,

where is the failure probability density function and t is the length of the period of time (which is assumed to start from time zero).

Reliability engineering is concerned with four key elements of this definition:

First, reliability is a probability. This means that failure is regarded as a random phenomenon: it is a recurring event, and we do not express any information on individual failures, the causes of failures, or relationships between failures, except that the likelihood for failures to occur varies over time according to the given probability function. Reliability engineering is concerned with meeting the specified probability of success, at a specified statistical confidence level.

Second, reliability is predicated on "intended function:" Generally, this is taken to mean operation without failure. However, even if no individual part of the system fails, but the system as a whole does not do what was intended, then it is still charged against the system reliability. The system requirements specification is the criterion against which reliability is measured.

Third, reliability applies to a specified period of time. In practical terms, this means that a system has a specified chance that it will operate without failure before time . Reliability engineering ensures that components and materials will meet the requirements during the specified time. Units other than time may sometimes be used. The automotive industry might specify reliability in terms of miles, the military might specify reliability of a gun for a certain number of rounds fired. A piece of mechanical equipment may have a reliability rating value in terms of cycles of use.

Fourth, reliability is restricted to operation under stated (or explicitly defined) conditions. This constraint is necessary because it is impossible to design a system for unlimited conditions. A Mars Rover will have different specified conditions than the family car. The operating environment must be addressed during design and testing. Also, that same rover, may be required to operate in varying conditions requiring additional scrutiny.

Reliability engineering vs safety engineering

Reliability engineering differs from safety engineering with respect to the kind of hazards that are considered. Reliability engineering is in the end only concerned with cost. It relates to hazards that could transform into a particular level of loss of revenue for the company or the customer. These can be cost due to loss of production due to system unavailability, unexpected high or low demands for spares, repair costs, man hours, (multiple) re-designs, interruptions on normal production (e.g. due to high repair times or due to unexpected demands for non-stocked spares) and many other indirect costs. Safety engineering, on the other hand, is more specific and regulated. The related reliability Requirements are sometimes extremely high. It deals with unwanted dangerous events (for life and environment) in the same sense as reliability engineering, but does normally not directly look at cost and is not concerned with repair actions after failure. Another difference is the level of impact of failures on society and the control of governments. Safety engineering is often strictly controlled by governments (e.g. Nuclear, Aerospace, Defense, Rail and Oil industries). Furthermore, safety engineering and reliability engineering often have contradicting requirements.For example, in train control systems it is common practice to use many fail-safe devices and to lower trip settings as needed. This will unfortunately lower the reliability. Reliability can be increased here by using redundant systems, this does however lower the safety levels. The only way to increase both reliability and safety on a systems level is by using fault tolerant systems. In this case the "operational" / "mission" reliability as well as the safety of a system can be increased. This is common practice in aerospace systems that need continues availability and do not have a fail safe mode (e.g. flight computers and related steering systems). However, the "basic" reliability of the system will in this case still be lower. Basic reliability refers to failures that might not result in system failure, but do result in maintenance actions, logistic cost, use of spares, etc.

Reliability program plan

Many tasks, methods, and tools can be used to achieve reliability. Every system requires a different level of reliability. A commercial airliner must operate under a wide range of conditions. The consequences of failure are grave, but there is a correspondingly higher budget. A pencil sharpener may be more reliable than an airliner, but has a much different set of operational conditions, insignificant consequences of failure, and a much lower budget.

A reliability program plan (RPP) is used to document exactly what "best practices" (tasks, methods, tools, analyses, and tests) are required for a particular (sub)system, as well as clarify customer requirements for reliability assessment. For large scale, complex systems, the Reliability Program Plan is a distinctive document. For simple systems, it may be combined with the systems engineering management plan or an integrated logistics support management plan. A reliability program plan is essential for a successful reliability, availability, and maintainability (RAM) program and is developed early during system development, and refined over the systems life-cycle. It specifies not only what the reliability engineer does, but also the tasks performed by other stakeholders. A reliability program plan is approved by top program management, who is responsible for identifying resources for its implementation.

Technically, often, the main objective of a Reliability Program Plan is to evaluate and improve availability of a system and not reliability. Reliability needs to be evaluated and improved related to both availability and the cost of ownership (due to spares costing, maintenance man-hours, transport etc. costs). Often a trade-off is needed between the two. There might be a maximum ratio between availability and cost of ownership. If availability or Cost of Ownership is more important depends on the use of the system (e.g. a system that is a critical link in a production system - for example a big oil platform - is normally allowed to have a very high cost of ownership if this translates to even a minor higher availability as the unavailability of the platform directly results in a massive loss of revenue). Testability of a system should also be addressed in the plan as this is the link between reliability and maintainability. The maintenance (the maintenance concept / strategy) can influence the reliability of a system (e.g. by preventive maintenance) - although it can never bring it above the inherent reliability. Maintainability influences the availability of a system - in theory this can be almost unlimited if one would be able to repair a failure in a very short time.

A proper reliability plan should normally always address RAMT analysis in its total context. RAMT stands in this case for Reliability, Availability, Maintainability (and maintenance) and Testability in context to users needs to the technical requirements (as translated from the needs).

Reliability requirements

For any system, one of the first tasks of reliability engineering is to adequately specify the reliability and maintainability requirements, as defined by the stakeholders in terms of their overall availability needs. Reliability requirements address the system itself, test and assessment requirements, and associated tasks and documentation. Reliability requirements are included in the appropriate system/subsystem requirements specifications, test plans, and contract statements. Maintainability requirements address system issue of costs as well as time to repair. Evaluation of the effectiveness of corrective measures is part of a FRACAS process that is usually part of a good RPP.

Reliability prediction

Reliability prediction is the combination of the creation of a proper reliability model together with estimating (and justifying) the input parameters for this model (like failure rates for a particular failure mode or event and the mean time to repair the system for a particular failure) and finally to provide a system (or part) level estimate for the output reliability parameters (system availability or a particular functional failure frequency).

Some recognized authors on reliability - e.g. Patrick O'Conner, R. Barnard and others - have argued that too many emphasis is often given to the prediction of reliability parameters and more effort should be devoted to prevention of failure. The reason for this is that prediction of reliability based on historic data can be very misleading, because a comparison is only valid for exactly the same designs, products under exactly the same loads / context. Even a minor change in detail in any of these could have major effects on reliability. Furthermore, normally the most unreliable and important items (most interesting candidates for a reliability investigation) are most often subjected to many modifications and changes. Also, to perform a proper quantitative reliability prediction for systems is extremely difficult and expensive if done by testing. On part level results can be obtained often with higher confidence as many samples might be used for the available testing financial budget, however unfortunately these tests might lack validity on system level due to the assumptions that had to be made for part level testing. Testing for reliability should be done to create failures, learn from them and to improve the system / part. The general conclusion is drawn that an accurate and an absolute prediction - by field data comparison or testing - of reliability is in most cases not possible. A exception might be failures due to wear-out problems like fatigue failures. Mil. Std. 785 writes in its introduction that reliability prediction should be used with great caution if not only used for comparison in trade-off studies.

Furthermore, based on latest insights in Reliability centered maintenance (RCM), most (complex) system failures do no occur due to wear-out issues (e.g. a number of 4% has been provided, refer to RCM page). The failures are often a result of combinations of more and multi-type events or failures. The results of these studies have shown that the majority of failures follow a constant failure rate model, for which prediction of the value of the parameters is often problematic and very time consuming (for a high level reliability - part level). Testing these constant failure rates at system level, by for example mil. handbook 781 type of testing, is not practical and can be extremely misleading.

Despite all the concerns, there will always be a need for the prediction of reliability.These numbers can be used as a Key performance indicator (KPI) or to estimate the need for spares, man-power, availability of systems, etc.

Reliability predictions:

Help assess the effect of product reliability on the maintenance activity and on the quantity of spare units required for acceptable field performance of any particular system. For example, predictions of the frequency of unit level maintenance actions can be obtained. Reliability prediction can be used to size spare populations.

Provide necessary input to system-level reliability models. System-level reliability models can subsequently be used to predict, for example, frequency of system outages in steady-state, frequency of system outages during early life, expected downtime per year, and system availability.

Provide necessary input to unit and system-level Life Cycle Cost Analyses. Life cycle cost studies determine the cost of a product over its entire life. Therefore, how often a unit will have to be replaced needs to be known. Inputs to this process include unit and system failure rates. This includes how often units and systems fail during the first year of operation as well as in later years.

Assist in deciding which product to purchase from a list of competing products. As a result, it is essential that reliability predictions be based on a common procedure.

Can be used to set factory test standards for products requiring a reliability test. Reliability predictions help determine how often the system should fail.

Are needed as input to the analysis of complex systems such as switching systems and digital cross-connect systems. It is necessary to know how often different parts of the system are going to fail even for redundant components.

Can be used in design trade-off studies. For example, a supplier could look at a design with many simple devices and compare it to a design with fewer devices that are newer but more complex. The unit with fewer devices is usually more reliable.

Can be used to set achievable in-service performance standards against which to judge actual performance and stimulate action.

The telecommunications industry has devoted much time over the years to concentrate on developing reliability models for electronic equipment. One such tool is the Automated Reliability Prediction Procedure (ARPP), which is an Excel-spreadsheet software tool that automates the reliability prediction procedures in SR-332, Reliability Prediction Procedure for Electronic Equipment. FD-ARPP-01 provides suppliers and manufacturers with a tool for making Reliability Prediction Procedure (RPP) calculations. It also provides a means for understanding RPP calculations through the capability of interactive examples provided by the user.

The RPP views electronic systems as hierarchical assemblies. Systems are constructed from units that, in turn, are constructed from devices. The methods presented predict reliability at these three hierarchical levels:

Device: A basic component (or part)

Unit: Any assembly of devices. This may include, but is not limited to, circuit packs, modules, plug-in units, racks, power supplies, and ancillary equipment. Unless otherwise dictated by maintenance considerations, a unit will usually be the lowest level of replaceable assemblies/devices. The RPP is aimed primarily at reliability prediction of units.

Serial System: Any assembly of units for which the failure of any single unit will cause a failure of the system or overall mission.

System reliability parameters

Requirements are specified using reliability parameters. The most common reliability parameter is the mean time to failure (MTTF), which can also be specified as the failure rate (this is expressed as a frequency or Conditional Probability Density Function (PDF)) or the number of failures during a given period. These parameters are very useful for systems that are operated frequently, such as most vehicles, machinery, and electronic equipment. Reliability increases as the MTTF increases. The MTTF is usually specified in hours, but can also be used with other units of measurement such as miles or cycles.

In other cases, reliability is specified as the probability of mission success. For example, reliability of a scheduled aircraft flight can be specified as a dimensionless probability or a percentage. as in system safety engineering.

A special case of mission success is the single-shot device or system. These are devices or systems that remain relatively dormant and only operate once. Examples include automobile airbags, thermal batteries and missiles. Single-shot reliability is specified as a probability of one-time success, or is subsumed into a related parameter. Single-shot missile reliability may be specified as a requirement for the probability of hit.

For such systems, the probability of failure on demand (PFD) is the reliability measure - which actually is a unavailability number. This PFD is derived from failure rate (a frequency of occurrence) and mission time for non-repairable systems. For repairable systems, it is obtained from failure rate and mean-time-to-repair (MTTR) and test interval. This measure may not be unique for a given system as this measure depends on the kind of demand. In addition to system level requirements, reliability requirements may be specified for critical subsystems. In most cases, reliability parameters are specified with appropriate statistical confidence intervals.

Reliability modelling

Reliability modelling is the process of predicting or understanding the reliability of a component or system prior to its implementation. Two types of analysis that are often used to model a system reliability behavior are Fault Tree Analysis and Reliability Block diagrams. On component level the same analysis can be used together with others. The input for the models can come from many sources, e.g.: Testing, Earlier operational experience field data or Data Handbooks from the same or mixed industries can be used. In all cases, the data must be used with great caution as predictions are only valid in case the same product in the same context is used. Often predictions are only made to compare alternatives.

For part level predictions, two separate fields of investigation are common:

The physics of failure approach uses an understanding of physical failure mechanisms involved, such as mechanical crack propagation or chemical corrosion degradation or failure;

The parts stress modelling approach is an empirical method for prediction based on counting the number and type of components of the system, and the stress they undergo during operation.

Software reliability is a more challenging area that must be considered when it is a considerable component to system functionality.

For systems with a clearly defined failure time (which is sometimes not given for systems with a drifting parameter), the empirical distribution function of these failure times can be determined. This is done in general in an experiment with increased (or accelerated) stress. These experiments can be divided into two main categories:

Early failure rate studies determine the distribution with a decreasing failure rate over the first part of the bathtub curve. (The bathtub curve only holds for hardware failures, not software.) Here in general only moderate stress is necessary. The stress is applied for a limited period of time in what is called a censored test. Therefore, only the part of the distribution with early failures can be determined.

In so-called zero defect experiments, only limited information about the failure distribution is acquired. Here the stress, stress time, or the sample size is so low that not a single failure occurs. Due to the insufficient sample size, only an upper limit of the early failure rate can be determined. At any rate, it looks good for the customer if there are no failures.

In a study of the intrinsic failure distribution, which is often a material property, higher (material) stresses are necessary to get failure in a reasonable period of time. Several degrees of stress have to be applied to determine an acceleration model. The empirical failure distribution is often parametrized with a Weibull or a log-normal model.

It is a general praxis to model the early (hardware) failure rate with an exponential distribution. This less complex model for the failure distribution has only one parameter: the constant failure rate. In such cases, the Chi-squared distribution can be used to find the goodness of fit for the estimated failure rate. Compared to a model with a decreasing failure rate, this is quite pessimistic (important remark: this is not the case if less hours / load cycles are tested than service life in a wear-out type of test, in this case the opposite is true and assuming a more constant failure rate than it is in reality can be dangerous). Sensitivity analysis should be conducted in this case.

Reliability test requirements

Reliability test requirements can follow from any analysis for which the first estimate of failure probability, failure mode or effect needs to be justified. Evidence can be generated with some level of confidence by testing. With software-based systems, the probability is a mix of software and hardware-based failures. Testing reliability requirements is problematic for several reasons. A single test is in most cases insufficient to generate enough statistical data. Multiple tests or long-duration tests are usually very expensive. Some tests are simply impractical, and environmental conditions can be hard to predict over a systems life-cycle. Reliability engineering is used to design a realistic and affordable test program that provides enough evidence that the system meets its reliability requirements. Statistical confidence levels are used to address some of these concerns. A certain parameter is expressed along with a corresponding confidence level: for example, an MTBF of 1000 hours at 90% confidence level. From this specification, the reliability engineer can for example design a test with explicit criteria for the number of hours and number of failures until the requirement is met or failed. Other type tests are also possible.

The combination of reliability parameter value and confidence level greatly affects the development cost and the risk to both the customer and producer. Care is needed to select the best combination of requirements - e.g. cost-effectiveness. Reliability testing may be performed at various levels, such as component, subsystem, and system. Also, many factors must be addressed during testing and operation, such as extreme temperature and humidity, shock, vibration, or other environmental factors (like loss of signal, cooling or power; or other catastrophes such as fire, floods, excessive heat, physical or security violations or other myriad forms of damage or degradation). Reliability engineering must assess the root cause of failures and devise corrective actions. Reliability engineering determines an effective test strategy so that all parts are exercised in relevant environments in order to assure the best possible reliability under understood conditions. For systems that must last many years, reliability engineering may be used to design accelerated life tests.

Requirements for reliability tasks

Reliability engineering must also address requirements for various reliability tasks and documentation during system development, test, production, and operation. These requirements are generally specified in the contract statement of work and depend on how much leeway the customer wishes to provide to the contractor. Reliability tasks include various analyses, planning, and failure reporting. Task selection depends on the criticality of the system as well as cost. A critical system may require a formal failure reporting and review process throughout development, whereas a non-critical system may rely on final test reports. The most common reliability program tasks are documented in reliability program standards, such as MIL-STD-785 and IEEE 1332. Failure reporting analysis and corrective action systems are a common approach for product/process reliability monitoring.

Design for reliability

Design For Reliability (DFR), is an emerging discipline that refers to the process of designing reliability into designs. This process encompasses several tools and practices and describes the order of their deployment that an organization needs to have in place to drive reliability and improve maintainability in products, towards a objective of improved availability, lower sustainment costs, and maximum product utilization or lifetime. Typically, the first step in the DFR process is to establish the system’s availability requirements. Reliability must be "designed in" to the system. During system design, the top-level reliability requirements are then allocated to subsystems by design engineers, maintainers, and reliability engineers working together.

Reliability design begins with the development of a (system) model. Reliability models use block diagrams and fault trees to provide a graphical means of evaluating the relationships between different parts of the system. These models incorporate predictions based on parts-count failure rates taken from historical data. While the (input data) predictions are often not accurate in an absolute sense, they are valuable to assess relative differences in design alternatives.

A Fault Tree Diagram

One of the most important design techniques is redundancy. This means that if one part of the system fails, there is an alternate success path, such as a backup system. The reason why this is the ultimate design choice is related to the fact that to provide absolute high confidence reliability evidence for new parts / items is often not possible or extremely expensive. By creating redundancy, together with a high level of failure monitoring and the avoidance of common cause failures, even a system with relative bad single channel (part) reliability, can be made highly reliable (mission reliability)on system level. No testing of reliability has to be required for this.

An automobile brake light might use two light bulbs. If one bulb fails, the brake light still operates using the other bulb. Redundancy significantly increases system reliability, and is often the only viable means of doing so. However, redundancy is difficult and expensive, and is therefore limited to critical parts of the system. Another design technique, physics of failure, relies on understanding the physical processes of stress, strength and failure at a very detailed level. Then the material or component can be re-designed to reduce the probability of failure. Another common design technique is component derating: Selecting components whose tolerance significantly exceeds the expected stress, as using a heavier gauge wire that exceeds the normal specification for the expected electrical current. Another effective way to deal with unreliability issues is to perform analysis to be able to predict degradation and being able to prevent unscheduled down events / failures from occurring. RCM (Reliability Centered Maintenance) programs can be used for this.

Many tasks, techniques and analyses are specific to particular industries and applications. Commonly these include:

Built-in test (BIT) (Testability analysis)

Failure mode and effects analysis (FMEA)

Reliability simulation modeling

Reliability Hazard analysis

Thermal analysis

Reliability Block Diagram analysis

Fault tree analysis

Root cause analysis

Sneak circuit analysis

Accelerated Testing

Reliability Growth analysis

Weibull analysis

Electromagnetic analysis

Statistical interference

Avoid Single Point of Failure

Functional Analysis (Functional FMEA)

Predictive and Preventive maintenance: Reliability Centered Maintenance (RCM) analysis

Testability analysis

Failure diagnostics analysis

Human error analysis

Operational Hazard analysis /

Manual screening

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]