modern-multithreading-c-java
.pdfMESSAGE-BASED SOLUTIONS TO DISTRIBUTED PROGRAMMING PROBLEMS |
345 |
}
}
public void waitForReplies() { // wait for all the other processes to reply while (true) {
messageParts m = (messageParts) receiveReplies.receive(); // ID of replying thread is available but not needed
int receivedID = ((Integer)m.obj).intValue(); replyCount++;
if (replyCount == numberOfProcesses-1) break; // all replies have been received
}
}
}
class requestMessage implements Serializable {
public int ID; |
// process ID |
public int number; |
// sequence number |
public requestMessage(int number, int ID) { this.ID = ID; this.number = number;}
}
Listing 6.16 (continued )
object owned by one of the distributed processes. All messages sent through the TCPSender object are addressed to the associated TCPMailbox. This allows us to address messages once, when we construct the TCPSender objects, instead of specifying an address each time we send a message.
Each distributedProcess uses TCPMailbox objects named receiveRequests and receiveReplies to receive request and reply messages from the other processes. Connections between all the TCPSender and TCPMailbox objects are made by calling the connect() method of each TCPSender object at the start of the run() method for each distributedProcess.
We assume that all three distributedMutualExclusion programs run on the same computer. The port numbers used for TCPMailbox objects receiveRequests and receiveReplies are as follows: distributedProcess0 uses 2020 and 2021 for its two TCPMailbox objects, distributedProcess1 uses 2022 and 2023 for its two TCPMailbox objects, and distributedProcess2 uses 2024 and 2025 for its two TCPMailbox objects. So, for example, when distributedProcess0 sends requests to the other two processes, it sends them to ports 2022 and 2024. Requests from the other two processes to distributedProcess0 are addressed to port 2020, while replies from the other two processes to distributedProcess0 are addressed to 2021. Replies from distributedProcess0 to distributedProcess1 and distributedProcess2 are addressed to ports 2023 and 2025, respectively. Thus, each port number P is associated with one TCPMailbox object, which is used by one of the distributedProcesses to receive messages. The other two distributedProcesses
346 |
MESSAGE PASSING IN DISTRIBUTED PROGRAMS |
use TCPSender objects associated with P to send messages to the TCPMailboxes associated with P.
When a distributedProcess wants to enter its critical section in method run(), it computes a sequence number and sets flag requestingOrExecuting to true. It then sends a request to each of the other processes and waits for each of the processes to reply.
Each distributedProcess has a Helper thread that handles requests received from the other processes [Hartley 1998]. If the Helper for distributedProcess i receives a requestMessage from distributedProcess j , the Helper replies immediately if the sequence number in j ’s request is less than the sequence number stored at i or if distributedProcess i is not trying to enter its critical section. The Helper defers the reply if distributedProcess i is in its critical section, or if distributedProcess i wants to enter its critical section and the requestMessage from distributedProcess j has a higher sequence number. If the sequence numbers are the same, the tie is broken by comparing process identifiers. (Each request message contains a sequence number and the identifier of the sending process.)
When a distributedProcess sends its request, it computes a sequence number by adding one to the highest sequence number it has received in requests from other processes. Class Coordinator is a monitor that synchronizes a distributedProcess thread and its Helper thread. A sample execution for this program is given in Section 6.5.4, where we show how to trace and replay its executions.
6.4.2 Distributed Readers and Writers
Here we show how to solve the distributed readers and writers problem by using message passing. The strategy we implement is R = W.2, which allows concurrent reading and gives readers and writers equal priority. Mutual exclusion is provided using the permission-based distributed mutual exclusion algorithm described earlier. When a process wants to perform its read or write operation, it sends a request to each of the other processes and waits for replies. A request consists of the same pair of values (sequence number and ID) used in the mutual exclusion algorithm, along with a flag that indicates the type of operation (read or write) being requested. When process i receives a request from process j , process i sends j an immediate reply if:
žProcess i is not executing or requesting to execute its read or write operation.
žProcess i is executing or requesting to execute a “compatible” operation. Two read operations are compatible, but two write operations, or a read and a write operation, are not compatible.
žProcess i is also requesting to execute a noncompatible operation, but process j ’s request has priority over process i’s request.
Program distributedReadersAndWriters is almost identical to program distributedMutualExclusion in Listing 6.16, so here we will only show the differences. First, each distributedProcess is either a reader or a writer. The user indicates the type of process and the process ID when the program is started:
348 |
MESSAGE PASSING IN DISTRIBUTED PROGRAMS |
6.4.3 Alternating Bit Protocol
The Alternating Bit Protocol (ABP) is designed to ensure the reliable transfer of data over an unreliable communication medium [Bartlet et al. 1969]. The name of the protocol refers to the method used—messages are sent tagged with the bits 1 and 0 alternately, and these bits are also sent as acknowledgments.
Listing 6.18 shows classes ABPSender and ABPReceiver and two client threads. Thread client1 is a source of messages for an ABPSender thread called sender. The sender receives messages from client1 and sends the messages to an ABPReceiver thread called receiver. The receiver thread passes each message it receives to thread client2, which displays the message. We assume that messages sent between an ABPSender and an ABPReceiver will not be corrupted, duplicated, or reordered, but they may be lost. The ABP will handle the detection and retransmission of lost messages.
An ABPSender S works as follows. After accepting a message from its client, S sends the message and sets a timer. To detect when the medium has lost a message, S also appends a 1-bit sequence number (initially 1) to each message it sends out. There are then three possibilities:
žS receives an acknowledgment from ABPReceiver R with the same sequence number. If this happens, the sequence number is incremented (modulo 2), and S is ready to accept the next message from its client.
žS receives an acknowledgment with the wrong sequence number. In this case S resends the message (with the original sequence number), sets a timer, and waits for another acknowledgment from R.
žS gets a timeout from the timer while waiting for an acknowledgment. In this case, S resends the message (with the original sequence number), sets a timer, and waits for an acknowledgment from R.
An ABPReceiver R receives a message and checks that the message has the expected sequence number (initially, 1). There are two possibilities:
žR receives a message with a sequence number that matches the sequence number that R expects. If this happens, R delivers the message to its client and sends an acknowledgment to S. The acknowledgment contains the same sequence number that R received. R then increments the expected sequence number (modulo 2) and waits for the next message.
žR receives a message but the sequence number does not match the sequence number that R expects. In this case, R sends S an acknowledgment that contains the sequence number that R received (i.e., the unexpected number) and then waits for S to resend the message.
Note that in both cases, the acknowledgment sent by R contains the sequence number that R received.
Communication between the sender and its client, and the receiver and its client, is through shared link channels. Class link was presented in Chapter 5.
MESSAGE-BASED SOLUTIONS TO DISTRIBUTED PROGRAMMING PROBLEMS |
351 |
messageMailbox = new TCPUnreliableMailbox (receiverPort); ackSender = new TCPSender(senderHost,senderPort);
while (true) {
// receive message from sender
messageParts msg = (messageParts) messageMailbox.receive(); abpPacket packet = (abpPacket) msg.obj;
if (packet.sequenceNumber == expectedSequenceNumber) { deliver.send(packet.obj); // deliver message to client 2 msg = new messageParts(new
Integer(packet.sequenceNumber)); ackSender.send(msg); // send ack to sender
expectedSequenceNumber = (expectedSequenceNumber+1)%2;
}
else {
msg = new messageParts(new Integer(packet.sequenceNumber));
ackSender.send(msg); // resend ack to sender
}
}
}
catch (UnknownHostException e) {e.printStackTrace();} catch (TCPChannelException e) {e.printStackTrace();}
}
}
class abpPacket implements Serializable{
public abpPacket(Object obj,int sequenceNumber) { this.obj = obj;this.sequenceNumber = sequenceNumber;
} |
|
public Object obj; |
// message |
public int sequenceNumber; |
// sequence number: 0 or 1 |
} |
|
Listing 6.18 (continued )
The sender and receiver threads communicate using TCPSender and TCPMailbox objects, and a selectable asynchronous mailbox called TCPSelectableMailbox. The delayAlternative in the sender’s selective wait statement allows the sender to timeout while it is waiting to receive an acknowledgment from the receiver.
To simulate the operation of ABP over an unreliable medium, we have modified classes TCPMailbox and TCPSelectableMailbox so that they randomly discard messages and acknowledgments sent between the sender and receiver. These unreliable mailbox classes are called TCPUnreliableMailbox and TCPUnreliableSelectableMailbox. Figure 6.19 shows one possible flow of messages through the ABP program. In this scenario, no message or acknowledgment is lost. In Figure 6.20, the ABPSender’s message is lost. In this case, the ABPSender
352 |
MESSAGE PASSING IN DISTRIBUTED PROGRAMS |
Client 1 |
ABPSender ABPReceiver Client 2 |
|
accept msg |
send msg
receive msg
deliver
receive ack
send ack
Figure 6.19 ABP scenario in which no messages or acknowledgements are lost.
Client 1 ABPSender |
ABPReceiver Client 2 |
|||||||||
|
|
|
accept msg |
|
|
|
||||
|
|
|
|
|
|
|||||
|
|
|
|
|
|
|||||
|
send msg |
|
|
|
lost |
|
|
|
||
|
timeout |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
resend msg |
|
|
|
|
|
receive msg |
|
||
|
|
|
|
deliver |
|
|||||
|
|
|
|
|
|
|
|
|
||
|
receive ack |
|
|
|
send ack |
|
||||
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
Figure 6.20 ABP scenario with no lost message or acknowledgement.
Client 1 ABPSender |
ABPReceiver Client 2 |
||||||||
|
|
|
|
accept msg |
receive msg |
||||
|
send msg |
||||||||
|
|
|
|
|
|||||
|
|
|
|
|
|||||
|
timeout |
|
|
deliver |
|
|
|||
|
|
send ack |
|||||||
|
resend msg |
|
|
||||||
|
receive ack |
|
|
receive msg |
|||||
|
|
|
|
accept msg |
(no delivery) |
||||
|
|
|
|
||||||
|
send msg |
|
|
send ack |
|||||
|
receive ack |
|
|
receive msg |
|||||
|
receive ack |
deliver |
|
|
|||||
|
send ack |
||||||||
|
|
|
|||||||
accept msg
Figure 6.21 ABP scenario with a spurious timeout.
receives a timeout from its timer and resends the message. In Figure 6.21, the ABPSender’s message is not lost, but the ABPSender still receives a timeout before it receives an acknowledgment. In this case, the ABPReceiver delivers and acknowledges the first message but only acknowledges the second. The ABPSender receives two acknowledgments, but it ignores the second one.
TESTING AND DEBUGGING DISTRIBUTED PROGRAMS |
353 |
6.5 TESTING AND DEBUGGING DISTRIBUTED PROGRAMS
A tracing and replay technique for shared memory channels was presented in Chapter 5. That same technique can be applied here with certain modifications that are necessary in a distributed environment. We begin by defining SYN-sequences for distributed Java programs that use classes TCPSender and TCPMailbox for message passing. Afterward, we present solutions to the tracing, replay, and feasibility problems. We did not implement reachability testing for the TCPSender and TCPMailbox classes. The overhead of running in a distributed environment, including the cost of open and closing connections and detecting termination, would slow down reachability testing significantly. Still, it could be done since tracing, replay, and timestamping have already been implemented. On the other hand, it is easy to transform a program that uses the TCP classes into one that uses the channel classes in Chapter 5. This allows distributed algorithms to be tested in a friendlier environment.
Let DP be a distributed program. We assume that DP consists of multiple Java programs and that there are one or more Java programs running on each node in the system. Each Java program contains one or more threads. These threads use TCPSender and TCPMailbox objects to communicate with threads in other programs and possibly on other nodes. Threads in the same Java program can communicate and synchronize using shared variables and the channel, semaphore, or monitor objects presented in previous chapters. Shared variable communication and synchronization can be traced, tested, and replayed using the techniques described previously. Here, we focus on message passing between programs on different nodes.
6.5.1 Object-Based Sequences
The SYN-sequence definitions in this chapter are similar to the definitions presented in Chapter 5 for shared channel objects. We begin by defining an objectbased SYN-sequence, which means that there is one SYN-sequence for each synchronization object in the program. The synchronization objects in a distributed Java program DP are its TCPMailbox objects. The threads in DP execute synchronization events of the following four types:
žConnection. A connection is created between a TCPSender object and its associated TCPMailbox object by calling connect() on the TCPSender. (The host address and port number of a TCPMailbox is associated with a TCPSender object when the TCPSender object is constructed.)
žArrival. A message M is sent by a thread that calls operation send(M) on a TCPSender object. The arrival of message M at the corresponding TCPMailbox occurs some time after M it is sent. When message M arrives, it is queued in the message buffer of the TCPMailbox object.
žReceive. A message is received by a thread that calls operation receive() on a TCPMailbox object. A receive() operation withdraws and returns a message from the message buffer of the TCPMailbox object.
