Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Магистры Практикум.doc
Скачиваний:
51
Добавлен:
01.06.2015
Размер:
781.82 Кб
Скачать

Message-Passing Platforms

The logical machine view of a message-passing platform consists of p processing nodes, each with its own exclusive address space. Each of these processing nodes can either be single processors or a shared-address-space multiprocessor. Instances of such a view come naturally from clustered workstations and non-shared-address-space multicomputers. On such platforms, interactions between processes running on different nodes must be accomplished using messages, hence the name message passing. This exchange of messages is used to transfer data, work, and to synchronize actions among the processes. In its most general form, message-passing paradigms support execution of a different program on each of the p nodes.

Since interactions are accomplished by sending and receiving messages, the basic operations in this programming paradigm are send and receive. In addition, since the send and receive operations must specify target addresses, there must be a mechanism to assign a unique identification or ID to each of the multiple processes executing a parallel program. This ID is typically made available to the program using a function such as who_am_i, which returns to a calling process its ID. There is one other function that is typically needed to complete the basic set of message-passing operations – num_procs, which specifies the number of processes participating in the ensemble. With these four basic operations, it is possible to write any message-passing program.

1.4. Answer the questions to the text:

1. Name the two components of parallel computing and describe their essence.

2. What are the differences between SIMD and MIMD architectures?

3. Give the classification of shared-address-space platforms. What is the basis of this classification?

4. How do a message-passing platforms function?

1.5. Using information from the text, mark the following as True or False:

1. In SIMD architecture every processing element has its own control unit.

2. MIMD computers require more memory because they have one global control unit.

3. NUMA and UMA architectures are defined not in terms cache access times but memory access times.

4. The name message passing comes from the fact that interactions between processes running on different nodes are carried out using messages.

5. There is no difference if you emulate a message-passing architecture on a shared-address-space computer or vice versa.

1.6.Get ready to speak about the types of hardware platforms.

Lesson 2

2.1 What do you know about a distributed system?

2.2 Read the text, paying attention to all the details.

Introduction to Distributed System Design

A distributed system is an application that executes a collection of protocols to coordinate the actions of multiple processes on a network, such that all components cooperate together to perform a single or small set of related tasks.

Why build a distributed system? There are lots of advantages including the ability to connect remote users with remote resources in an open and scalable way. When we say open, we mean each component is continually open to interaction with other components. When we say scalable, we mean the system can easily be altered to accommodate changes in the number of users, resources and computing entities.

Thus, a distributed system can be much larger and more powerful given the combined capabilities of the distributed components, than combinations of stand-alone systems. But it's not easy for a distributed system to be useful, it must be reliable. To be truly reliable, a distributed system must have the following characteristics:

a) Fault-Tolerant: It can recover from component failures without performing incorrect actions.

b) Highly Available: It can restore operations, permitting it to resume providing services even when some components have failed.

c) Recoverable: Failed components can restart themselves and rejoin the system, after the cause of failure has been repaired.

d) Consistent: The system can coordinate actions by multiple components often in the presence of concurrency and failure. This underlies the ability of a distributed system to act like a non-distributed system.

e) Scalable: It can operate correctly even as some aspect of the system is scaled to a larger size. For example, we might increase the size of the network on which the system is running. In a scalable system, this should not have a significant effect.

f) Predictable Performance: The ability to provide desired responsiveness in a timely manner.

g) Secure: The system authenticates access to data and services

These are high standards, which are challenging to achieve. Probably the most difficult challenge is a distributed system must be able to continue operating correctly even when components fail.

Failure is the defining difference between distributed and local programming, so you have to design distributed systems with the expectation of failure. When you design distributed systems, you have to say, "Failure happens all the time." So when you design, you design for failure. It is your number one concern.

Distributed systems design is obviously a challenging endeavor. How do we do it when we are not allowed to assume anything, and there are so many complexities? We can define some fundamental design principles which every distributed system designer and software engineer should know. Some of these may seem obvious, but it will be helpful as we proceed to have a good starting list.

a) You have to design distributed systems with the expectation of failure. Avoid making assumptions that any component in the system is in a particular state

b) Explicitly define failure scenarios and identify how likely each one might occur. Make sure your code is thoroughly covered for the most likely ones.

c) Both clients and servers must be able to deal with unresponsive senders/receivers.

d) Think carefully about how much data you send over the network. Minimize traffic as much as possible.

e) Latency is the time between initiating a request for data and the beginning of the actual data transfer. Minimizing latency sometimes comes down to a question of whether you should make many little calls/data transfers or one big call/data transfer.

f) Don't assume that data sent across a network (or even sent from disk to disk in a rack) is the same data when it arrives. If you must be sure, do checksums or validity checks on data to verify that the data has not changed.

g) Caches and replication strategies are methods for dealing with state across components. We try to minimize stateful components in distributed systems, but it's challenging. State is something held in one place on behalf of a process that is in another place, something that cannot be reconstructed by any other component. If it can be reconstructed it's a cache. Caches can be helpful in mitigating the risks of maintaining state across components. But cached data can become stale, so there may need to be a policy for validating a cached data item before using it.

2.3 Make up ten questions to clear out all doubtful moments of the text.

2.4 Give the oral annotation to the text, then give its summary.