modern-multithreading-c-java
.pdf
JAVA TCP CHANNEL CLASSES |
325 |
while (true) { // read objects until get EOF |
|
messageParts msg = null; |
|
try { |
|
msg = (messageParts) from.readObject(); |
// receive messageParts object |
buffer.deposit(msg); |
// deposit messageParts object |
|
// into Buffer |
} |
|
catch (EOFException e) { break;} |
|
} |
|
If buffer becomes full, method deposit() will block. This will prevent any more messagePart objects from being received until a messagePart object is withdrawn using method receive().
There are two basic ways to use TCPSender and TCPMailbox objects. A single TCPMailbox object R can be used to receive messages from multiple senders. Each sender constructs a TCPSender object S with R’s host and port address, and executes the following operations each time it sends a message to R:
S.connect(); S.send(message); S.close();
TCPMailbox object R will receive all the messages that the sender sends to it. This connect–send–close scheme can cause a problem if many messages are sent to the same TCPMailbox object. Each execution of connect–send–close requires a new connection to be opened and closed. Operation connect() relies on the operating system to choose a local port for each new connection. After the message is sent and the connection is closed, the socket enters a TIME WAIT state on the sender’s machine. This means that a new socket connection with the same port numbers (local and remote) and the same host addresses (local and remote) will be unavailable on the sender’s machine for a specified period of
time, which is usually 4 minutes.
Without this 4-minute wait, an “old” message that was sent over socket A but that failed to arrive before socket A was closed, could eventually arrive and be mistaken as a “new” message sent on a different socket B, where B was constructed after A with the same port number and host address as A. The TIME WAIT state gives time for old messages to wash out of the network.
Of course, the operating system can choose a different local port number for each connect() operation, but the number of these “ephemeral” ports is limited and is different for different operating systems. (Each system has its own default number of ephemeral ports, which can usually be increased [Gleason 2001].) Thus, it is possible to exhaust the supply of ephemeral ports if thousands of messages are sent within 4 minutes.
The preferred way to use a TCPSender object S is to issue an S.close() operation only after all the messages have been sent. Using this scheme, an S.connect() operation appears at the beginning of the program and an S.close() operation appears at the end (refer back to Listing 6.3). All the messages sent over S will
326 |
MESSAGE PASSING IN DISTRIBUTED PROGRAMS |
use a single connection, so the number of ephemeral ports is less of an issue. If multiple TCPSender objects connect to the same TCPMailbox, the TCPMailbox will handle the connections concurrently.
6.2.2 Classes TCPSynchronousSender and TCPSynchronousMailbox
As we mentioned, TCPSender and TCPMailbox implement asynchronous channels. Synchronous channels can be created by forcing method send() to wait for an acknowledgment that the sent message has been received by the destination thread. The receive() method sends an acknowledgment when the message is withdrawn from the messageBuffer, indicating that the destination thread has received the message.
Classes TCPSynchronousSender and TCPSynchronousMailbox incorporate these changes. Methods connect(), send(), and receive() are shown in Listing 6.5, along with a portion of the connectionHandler thread. Method connect() in TCPSynchronousSender connects to the associated TCPMailbox and obtains an output stream for sending a message and an input stream for receiving an acknowledgment. Method send() sends the message and then waits for an acknowledgment. The message is received by a connectionHandler thread in the TCPMailbox. The connectionHandler deposits the message, along with the ObjectOutputStream to be used for sending the acknowledgment, into the message buffer. Method receive() in TCPSynchronousMailbox withdraws a message and its associated ObjectOutputStream from buffer and uses the
ObjectOutputStream to send a nullObject as an acknowledgment.
public void connect() throws TCPChannelException { // from class TCPSynchronousSender
try {
socket = new Socket( destinationHostname, destinationPort ); to = new ObjectOutputStream(socket.getOutputStream()); from = new ObjectInputStream(socket.getInputStream());
} catch (Exception e) { e.printStackTrace(); throw new TCPChannelException(e.getMessage());}
}
public void send( messageParts message) throws TCPChannelException { // from class TCPSynchronousSender
try {
synchronized(lock) { to.writeObject(message); to.flush();
nullObject msg = (nullObject) from.readObject();
}
}
Listing 6.5 Methods connect(), send(), and receive() in classes TCPSynchronousSender and TCPSynchronousMailbox.
JAVA TCP CHANNEL CLASSES |
327 |
catch (NullPointerException e) { e.printStackTrace();
throw new TCPChannelException
("null stream - call connect() before sending messages");
}
catch (Exception e) { e.printStackTrace(); throw new TCPChannelException(e.getMessage());}
}
class connectionHandler extends Thread { // from class TCPSynchronousMailbox
...
// for receiving message objects from the sender
from = new ObjectInputStream(socket.getInputStream()); // for sending acknowledgements to the sender
to = new ObjectOutputStream(socket.getOutputStream());
...
public void run() {
...
msg = (messageParts) from.readObject(); // read message from sender // stream to is used to send acknowledgment back to sender
msg.to = to;
buffer.deposit(msg); // msg will be withdrawn in receive() by receiving
... // thread and msg.to will be used to send an acknowledgment
}
}
public messageParts receive( ) throws TCPChannelException { // from class TCPSynchronousMailbox
try { synchronized(lock) {
messageParts msg = buffer.withdraw();
//stream to is still connected to the sending thread ObjectOutputStream to = msg.to;
//receiver is not given access to the acknowledgment stream msg.to = null;
to.writeObject(new nullObject()); // send null object as ack.
}// end synchronized lock
}catch (Exception e) {
e.printStackTrace();
throw new TCPChannelException(e.getMessage());
}
}
Listing 6.5 (continued )
328 |
MESSAGE PASSING IN DISTRIBUTED PROGRAMS |
6.2.3 |
Class TCPSelectableSynchronousMailbox |
Our final TCP-based class is TCPSelectableSynchronousMailbox. As its name implies, this class enables synchronous TCP mailbox objects to be used in selective wait statements. A TCPSelectableSynchronousMailbox object is used just like a selectableEntry or selectablePort object in Chapter 5. Listing 6.6 shows bounded buffer program Buffer, which uses a selectiveWait object and TCPSelectableSynchronousMailbox objects deposit and withdraw. Distributed Producer and Consumer processes can use TCPSynchronousSender objects in the usual way to send messages to Buffer. Recall that a messageParts object contains an optional return address, which the Buffer uses to send a withdrawn item back to the Consumer.
Notice that Buffer selects deposit and withdraw alternatives in an infinite loop. One way to terminate this loop is to add a delay alternative to the selective wait, which would give Buffer a chance to timeout and terminate after a period of inactivity. In general, detecting the point at which a distributed computation has terminated is not trivial since no process has complete knowledge of the global state of the computation, and neither global time nor common memory exists in a distributed system. Alternatives to global time are described in the next section. Distributed algorithms for termination detection are presented in [Brzezinski et al.
public final class Buffer {
public static void main (String args[]) { final int depositPort = 2020;
final int withdrawPort = 2021;
final int withdrawReplyPort = 2022; int fullSlots=0;
int capacity = 2;
Object[] buffer = new Object[capacity]; int in = 0, out = 0;
try {
TCPSelectableSynchronousMailbox deposit = new TCPSelectableSynchronousMailbox(depositPort);
TCPSelectableSynchronousMailbox withdraw = new TCPSelectableSynchronousMailbox (withdrawPort);
String consumerHost = InetAddress.getLocalHost().getHostName(); TCPSender withdrawReply = new TCPSender(consumerHost,withdrawReplyPort);
selectiveWait select = new selectiveWait();
select.add(deposit); |
// alternative 1 |
select.add(withdraw); |
// alternative 2 |
while(true) { |
|
withdraw.guard(fullSlots>0); |
|
TIMESTAMPS AND EVENT ORDERING |
329 |
deposit.guard (fullSlots<capacity); switch (select.choose()) {
case 1: Object o = deposit.receive(); // item from Producer buffer[in] = o;
in = (in + 1) %capacity; ++fullSlots; break;
case 2: messageParts withdrawRequest = withdraw.receive(); messageParts m = (messageParts) buffer[out];
try {// send an item back to the Consumer withdrawReply.send(m);
} catch (TCPChannelException e) {e.printStackTrace();}
out = (out + 1) %capacity; --fullSlots; break;
}
}
}
catch (InterruptedException e) {e.printStackTrace();System.exit(1);} catch (TCPChannelException e) {e.printStackTrace();System.exit(1);} catch (UnknownHostException e) {e.printStackTrace();}
}
}
Listing 6.6 (continued )
1993]. These algorithms could be incorporated into the channel classes presented in this chapter and in Chapter 5, along with a “terminate” alternative for selective wits that would be chosen when termination is detected.
6.3 TIMESTAMPS AND EVENT ORDERING
In a distributed environment, it is difficult to determine the execution order of events. This problem occurs in many contexts. For example, distributed processes that need access to a shared resource must send each other requests to obtain exclusive access to the resource. Processes can access the shared resource in the order of their requests, but the request order is not easy to determine. This is the distributed mutual exclusion problem.
Event ordering is also a critical problem during testing and debugging. When synchronization events occur during an execution, trace messages can be sent to a special controller process so that the events can be recorded for replay. But care must be taken to ensure that the event order observed by the controller process is consistent with the event order that actually occurred. Similarly, reachability testing depends on accurate event ordering to identify concurrent events and generate race variants. Since event ordering is a prerequisite for solving many
TIMESTAMPS AND EVENT ORDERING |
|
|
|
331 |
|||||
|
|
Thread1 Thread2 Thread3 |
Controller |
||||||
|
|
a |
|
|
|
d |
|
|
d |
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
time |
|
|
b |
|
|
|
|
b |
|
|
|
|
|
|
|
a |
|||
|
|
|
|
|
|
|
|||
|
|
|
|
c |
|
|
|
|
|
|
|
|
|
|
|
|
|
c |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Figure 6.8 Observability Problems for Diagram D1.
TABLE 6.1 Effectiveness of Timestamping Mechanisms
|
|
|
Timestamp Mechanisms |
|
|
|
|
Local |
Global |
|
|
Observability |
Arrival |
Real-Time |
Real-Time |
Totally Ordered |
Partially Ordered |
Problems |
Order |
Clocks |
Clock |
Logical Clocks |
Logical Clocks |
|
|
|
|
|
|
Incorrect orderings |
× |
× |
× |
× |
|
Arbitrary orderings |
× |
× |
|
||
Source: Fidge [1996].
aAn × indicates that a problem exists.
žArbitrary orderings. The controller observes event d occur before event b. Since d and b are concurrent events they can occur in either order, and the order the controller will observe is nondeterministic. However, the controller cannot distinguish nondeterministic orderings from orderings that are enforced by the program. During debugging, a programmer may see that d precedes b and mistakenly conclude that d must precede b. Similarly, the programmer may feel the need to create a test case where the order of events d and b is reversed, even though this change in ordering is not significant.
To overcome these observability problems, extra information, in the form of a timestamp, must be attached to the trace messages. The controller can use the timestamps to order events accurately. Table 6.1 shows five timestamp mechanisms (including arrival order) and their ability to overcome observability problems. An × indicates that a timestamp mechanism has a particular observability problem.
Four of these timestamping mechanisms involve the use of clock values. The first two clock schemes use real-time clocks.
6.3.2 Local Real-Time Clocks
This simple scheme uses the real-time clock available on each processor as the source of the timestamp. Since the real-time clocks on different processors are
332 |
|
|
|
|
|
|
MESSAGE PASSING IN DISTRIBUTED PROGRAMS |
||||||
Thread1 Thread2 |
Thread3 |
Thread1 Thread2 Thread3 |
|||||||||||
0:02 a |
|
|
|
|
d |
|
0:02 |
0:01 a |
|
|
d |
|
0:01 |
|
|
|
|
|
|
|
|||||||
|
|
|
|
|
|
|
|
|
|
||||
|
|
0:01 |
b |
|
|
|
|
|
0:02 |
b |
|
|
|
|
|
0:03 |
c |
|
|
|
|
|
0:03 |
c |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Figure 6.9 Timestamps using unsynchronized local real-time clocks.
Thread1 Thread2 Thread3 |
Thread1 Thread2 Thread3 |
||||||||||||
0:02 a |
|
|
|
d |
|
0:03 |
0:02 a |
|
|
|
d |
|
0:03 |
|
|
|
|
|
|
||||||||
|
|
|
|
|
|
|
|
|
|
||||
|
|
0:04 b |
|
|
|
|
|
|
0:04 b |
|
|
|
|
|
|
0:06 c |
|
|
|
|
|
|
0:06 c |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Figure 6.10 Timestamps using a real-time global clock.
not synchronized, incorrect orderings may be seen, and concurrent events are arbitrarily ordered. Figure 6.9 shows two ways in which the events in diagram D1 of Fig. 6.7 can be timestamped. On the left, the clock of Thread1’s processor is ahead of the clock of Thread2’s processor, so event b appears erroneously to occur before event a. The ordering of events d and b is arbitrary and depends on the relative speeds of the threads and the amount by which the processor’s real-time clocks differ.
6.3.3 Global Real-Time Clocks
If the local real-time clocks are synchronized, there is a global reference for real time. This avoids incorrect orderings, but as shown in Fig. 6.10, arbitrary orderings are still imposed on concurrent events d and b. Also, sufficiently accurate clock synchronization is difficult and sometimes impossible to achieve.
6.3.4 Causality
The remaining two schemes use logical clocks instead of real-time clocks. Logical clock schemes rely on the semantics of program operations to determine whether one event occurred before another event. For example, if events A and B are local events in the same thread and A is executed before B, then A’s logical timestamp will indicate that A happened before B. Similarly, if S is an asynchronous send event in one thread and R is the corresponding receive event in another thread,
TIMESTAMPS AND EVENT ORDERING |
333 |
S’s logical timestamp will indicate that S occurred before R. (More accurately, S occurred before the completion of R, since the send operation S might have occurred long after the receive operation R started waiting for a message to arrive.) It is the execution of program operations that governs the passage of logical time, not the ticking of a real-time clock.
If we consider a send operation and its corresponding receive operation to have a cause-and-effect relationship, ordering the send before the receive places the cause before the effect. It is important that any event ordering is consistent with the cause-and-effect relationships between the events. Thus, if event C can potentially cause or influence another event E, we will say that C occurs before E in causal order.
The causality or happened-before relation “ ” for an execution of a messagepassing program is defined as follows [Lamport 1978]:
(C1) If events e and f are events in the same thread and e occurs before f , then e f .
(C2) If there is a message e → f (i.e., e is a nonblocking send and f is the corresponding receive), then e f .
(C3) If there is a message e ↔ f or f ↔ e (i.e., one of e or f is a blocking send and the other is the corresponding blocking receive), for event g such that e g, we have f g, and for event h such that h f , we have h e.
(C4) If e f and f g, then e g. (Thus, “ ” is transitive.)
It is easy to examine a space-time diagram visually and determine the causal relations. For two events e and f in a space-time diagram, e f if and only if there is no message e ↔ f or f ↔ e and there exists a path from e to f that follows the vertical lines and arrows in the diagram. (A double-headed arrow allows a path to cross in either direction.)
For events e and f of an execution, if neither e f nor f e, then e and f are said to be concurrent, denoted as e||f . This also means that if there is a message e ↔ f or f ↔ e, then e and f are concurrent events. Since e||f and f ||e are equivalent, the “||” relation is symmetric. However, the “||” relation is not transitive. In diagram D1, a → b, b → c, d → c, a → c, a||d, and d||b, but a and b are not concurrent events. In diagram D2, a → c, b → c, d → c, d → c, b||a, and a||d, but b and d are not concurrent events.
Since two concurrent events are not ordered (i.e., they can happen in either order), the causality relation only partially orders the events in an execution trace, but a partial order is still useful: Given a program, an input, and a partially ordered execution trace of synchronization events that is based on the causality relationship, there is only one possible result. Thus, a partially ordered trace of synchronization events that is based on the causality relation is sufficient for tracing and replaying an execution. SYN-sequence definitions for distributed programs are given in Section 6.5.
334 |
MESSAGE PASSING IN DISTRIBUTED PROGRAMS |
We will consider two or more executions that have the same input and the same partially ordered synchronization sequence to be equivalent executions. Furthermore, the relative order of events in an execution will be determined using a partial order, not a total order, of the execution’s events. By doing so, if we say that event a happens before event b during an execution E with a given input, a happens before b in all other executions that have the same input and the same partially ordered synchronization sequence as E.
6.3.5 Integer Timestamps
Our objective is to determine the causality relationships of the events in an execution. We will do this by using logical clocks. Logical clocks serve as a substitute for real-time clocks with respect to the causality relation. Each event receives a logical timestamp, and these timestamps are used to order the events. Consider a message-passing program that uses asynchronous communication and contains threads Thread1, Thread2, . . ., and Threadn. Threadi, 1 ≤ i ≤ n, contains a logical clock Ci, which is simply an integer variable initialized to 0. During execution, logical time flows as follows:
(IT1) Threadi increments Ci by one immediately before each event it executes. (IT2) When Threadi sends a message, it also sends the value of Ci as the
timestamp for the send event.
(IT3) When Threadi receives a message with ts as its timestamp, if ts ≥ Ci, then Threadi sets Ci to ts + 1 and assigns ts + 1 as the timestamp for the receive event. Hence, Ci = max(Ci, ts + 1).
Denote the integer timestamp recorded for event e as IT(e), and let s and t be two events of an execution. If s t, then IT(s) will definitely be less than IT(t). That is, the integer timestamps will never indicate that an event occurred before any other events that might have caused it. However, the converse is not true. The fact that IT(s) is less than IT(t) does not imply that s t. If s and t are concurrent, their timestamps will be consistent with one of their two possible causal orderings. Thus, we cannot determine whether or not s t by using the integer timestamps recorded for s and t.
Diagram D3 in Fig. 6.11 represents an execution of three threads that use asynchronous communication. The messages in this diagram are a → o, c → r, q → x, w → p, and z → d. Diagram D3 also shows the integer timestamp for each event. Notice that the integer timestamp for event v is less than the integer timestamp for event b, but v b does not hold. (There is no path from v to b in diagram D3.)
Although integer timestamps cannot tell us the causality relationships that hold between the events, we can use integer timestamps to produce one or more total orders that preserve the causal order. For example, in Section 6.4.1 we show how to use integer timestamps to order the requests made when distributed processes want to enter their critical sections. The strategy for producing a total
