
modern-multithreading-c-java
.pdfTICKET-BASED SOLUTIONS TO THE n-THREAD CRITICAL SECTION PROBLEM |
55 |
Many machines provide special hardware instructions that allow us to test and modify a value, or add two values, atomically. These instructions can be used directly or they can be used to implement atomic functions. The Win32 API provides a family of atomic functions called the Interlocked functions. Function InterlockedExchangeAdd can be used to implement the ticket algorithm. The prototype for this function is
long InterlockedExchangeAdd(long* target, long increment);
A call such as
oldValueOfX = InterlockedExchangeAdd(&x, increment);
atomically adds the value of increment to x and returns the old value (i.e., before adding increment ) of x. InterlockedExchangeAdd is used in the ticket algorithm as follows:
number[i] = InterlockedExchangeAdd(&next, 1).
This is equivalent to the following critical section:
number[i] = next; |
// these statements are executed |
next = next + 1; |
// as a critical section |
On Intel processors (starting with the Intel 486 processor), function InterlockedExchangeAdd can be implemented using the atomic XADD (Exchange and Add) instruction. The XADD instruction also serves as a memory barrier (see Section 2.5.6) to ensure that threads see consistent values for the shared variables. J2SE 5.0 provides package java.util.concurrent.atomic. This package supports operations such as getAndIncrement() and getAndSet(), which are implemented using the machine-level atomic instructions (such as XADD) that are available on a given processor (see Exercise 2.13).
In the ticket algorithm, each thread Ti, 0 ≤ i ≤ n − 1, executes the following code:
volatile long next = 1; |
// next ticket number to be issued to a thread |
|
volatile long permit =1; |
// ticket number permitted to enter critical section |
|
while (true) { |
|
|
number[i] = InterlockedExchangeAdd(&next,1); |
(1) |
|
while (number[i] != permit) {;} |
(2) |
|
critical section |
|
(3) |
++permit; |
|
(4) |
noncritical section |
|
(5) |
} |
|
|
56 |
THE CRITICAL SECTION PROBLEM |
This algorithm is very simple, but it requires special machine instructions to implement InterlockedExchangeAdd or a similar atomic function. Another shortcoming of this algorithm is that the values of permit and next grow without bounds.
2.2.2 Bakery Algorithm
As in the ticket algorithm, the bakery algorithm [Lamport 1974] allows threads to enter their critical sections in ascending order of their ticket numbers. But unlike the ticket algorithm, it does not require special hardware instructions. The tickets in the bakery algorithm are a bit more complicated. Each thread Ti, 0 ≤ i ≤ n − 1, gets a ticket with a pair of values (number [i ],i ) on it. The value number [i] is the ticket number, and i is the ID of the thread. Since each ticket contains a pair of values, a special comparison is used to order the tickets. If two threads have the same ticket numbers, the IDs are used to break the tie. That is, for two tickets (a,b) and (c,d), define
Ticket (a,b) < Ticket (c,d) if a < c or (a == c and b < d).
First, we show a simpler, but incorrect version of the Bakery algorithm to explain the basic ideas. Each thread Ti, 0 ≤ i ≤ n − 1, executes the code below. Initially, all elements of array number have the value 0.
while (true) { |
|
number[i] = max(number) + 1; |
(1) |
for(int j=0; j<n; j++ ) |
(2) |
while (j != i && number[j] != 0 && |
(3) |
(number[j],j) < (number[i],i) ) {;} |
(4) |
critical section |
(5) |
number[i] = 0; |
(6) |
noncritical section |
(7) |
} |
|
In statement (1), the call to max(number ) returns the maximum value in array number. This maximum value is incremented by 1, and the result is used as the ticket number. Since different threads may execute (1) at the same time, thread Ti may obtain the same ticket number as another thread. As we mentioned above, thread IDs are used to break ties when the ticket numbers are the same.
In the for-loop, thread Ti compares its ticket with the ticket of each of the other threads. If some other thread Tj intends to enter its critical section and it has a ticket with a value that is less than Ti’s ticket, Ti waits until Tj exits its critical section. (If Tj tries to enter its critical section again, its new ticket will have a value higher than Ti’s ticket and thus Tj will have to wait.) When Ti completes the loops in (2) and (3), no other thread is in its critical section. Also, thread Ti
TICKET-BASED SOLUTIONS TO THE n-THREAD CRITICAL SECTION PROBLEM |
57 |
is the only thread that can enter its critical section since any other threads that are in their entry-sections will have higher ticket numbers.
This algorithm does not satisfy the mutual exclusion requirement, as illustrated by the following sequence:
T0 |
T1 |
|
Comments |
|
|
|
|
(1) |
|
T0 evaluates max(number ) + 1, which is 1, |
|
|
|
|
but a context switch occurs before |
|
context switch → |
|
assigning 1 to number[0] |
|
T1 sets number [1] to max(number ) + 1, |
||
|
(1) |
||
|
|
|
which is 1 |
|
(2) |
T1 starts its for-loop |
|
|
(3) |
T1 exits its while and for-loops |
|
|
(4) |
T1 |
enters its critical section |
(1) |
← context switch |
T0 assigns 1 (not 2 ) to number [0]. |
|
|
|||
(2) |
|
T0 |
starts its for-loop |
(3) |
|
T0 |
exits its while- and for-loops since |
|
|
|
number[0] == number[1] == 1, and |
|
|
|
(number [0],1) < (number [1],2) |
(4) |
|
T0 |
enters its critical section while T1 is in its |
|
|
|
critical section |
To fix this problem, once thread Ti, i > 0, starts executing statement (1), number [i] should not be accessed by other threads during their execution of statement
(3) until Ti finishes executing statement (1).
The complete bakery algorithm is given below. It uses the following global array:
volatile bool choosing[n];
Initially, all elements of choosing have the value false. If choosing[i] is true, thread Ti is in the process of choosing its ticket number at statement (2):
while (true) { |
|
choosing[i] = true; |
(1) |
number[i] = max(number)+1; |
(2) |
choosing[i] = false; |
(3) |
for (int j=0; j<n; j++) { |
(4) |
while (choosing[j]) {;} |
(5) |
while ( j != i && number[j] !=0 && |
(6) |
(number[j],j) < (number[i],i) ) {;} |
|
} |
|
58 |
THE CRITICAL SECTION PROBLEM |
critical section |
(7) |
number[i] = 0; |
(8) |
noncritical section |
(9) |
} |
|
A formal proof of the bakery algorithm is given in [Lamport 1974]. Here we consider three important cases:
1.Assume that one thread, say Ti, intends to enter its critical section and no other thread is in its critical section or entry-section. Then number [i] is 1 and number [j], where j = i, is 0. Thus, Ti enters its critical section immediately.
2.Assume that one thread, say Ti, intends to enter its critical section and Tk, k = i, is in its critical section. Then at statement (6), number[k] !=0 and number [k] < number [i]. Thus, Ti is delayed at (6) until Tk executes statement (8).
3.Two or more threads intend to enter their critical sections and no other thread is in its critical section. Assume that Tk and Tm, where k < m, intend to enter. Consider the possible relationships between number [k] and number [m]:
žnumber [k] < number [m]. Tk enters its critical section since (number [k],k) < (number [m],m).
žnumber [k] == number [m]. Tk enters its critical section since (number [k],k) < (number [m],m).
žnumber [k] > number [m]. Tm enters its critical section since (number [m],m) < (number [k],k).
Thus, array choosing solves the problem that stems from the nonatomic arithmetic expression in statement (2).
The bakery algorithm satisfies the mutual exclusion, progress, and bounded waiting requirements. However, the values in number grow without bound. Lamport [1974] showed how a practical upper bound can be placed on these values. He also showed that the bakery algorithm can be made to work even when read and write operations are not atomic (i.e., when the read and write operations on a variable may overlap).
2.3 HARDWARE SOLUTIONS TO THE n-THREAD CRITICAL SECTION PROBLEM
In this section we show how to use Win32 function InterlockedExchange to solve the n-process critical section problem. Here is the function prototype for the Win32 function InterlockedExchange:
long InterlockedExchange(long* target, long newValue);
HARDWARE SOLUTIONS TO THE n-THREAD CRITICAL SECTION PROBLEM |
59 |
The InterlockedExchange function atomically exchanges a pair of 32-bit values and behaves like the following atomic function:
long InterLockedExchange(long* target, long newValue) { // executed atomically long temp = *target; *target = newValue; return temp;
}
Like InterlockedExchangeAdd(), this function also generates a memory barrier instruction.
2.3.1 Partial Solution
This solution uses InterlockedExchange to guarantee mutual exclusion and progress but not bounded waiting. Shared variable lock is initialized to 0:
volatile long lock = 0;
Each thread executes
while (true) { |
|
while (InterlockedExchange(const_cast<long*>(&lock), 1) == 1) {;} |
(1) |
critical section |
(2) |
lock = 0; |
(3) |
noncritical section |
(4) |
} |
|
If the value of lock is 0 when InterlockedExchange is called, lock is set to 1 and InterlockedExchange returns 0. This allows the calling thread to drop out of its while-loop and enter its critical section. While this thread is in its critical section, calls to InterlockedExchange will return the value 1, keeping the calling threads delayed in their while-loops.
When a thread exits its critical section, it sets lock to 0. This allows one of the delayed threads to get a 0 back in exchange for a 1. This lucky thread drops out of its while-loop and enter its critical section. Theoretically, an unlucky thread could be delayed forever from entering its critical section. However, if critical sections are small and contention among threads for the critical sections is low, unbounded waiting is not a real problem.
2.3.2 Complete Solution
This solution to the n-process critical section satisfies all three correctness requirements. The global array waiting is used to indicate that a thread is waiting to enter its critical section.
volatile bool waiting[n];
60 |
|
THE CRITICAL SECTION PROBLEM |
|
Initially, all elements of |
waiting |
are false. If waiting [i] is true, thread Ti, 0 ≤ |
|
i ≤ n − 1, is waiting to |
enter its |
critical section. Each thread Ti |
executes the |
following code: |
|
|
|
volatile bool waiting[n]; |
// initialized to false |
|
|
volatile long lock, key; |
// initialized to 0 |
|
|
while (true) { |
|
|
|
waiting[i] = true; |
|
|
(1) |
key = 1; |
|
|
(2) |
while (waiting[i] && key) { |
|
(3) |
|
key = InterlockedExchange(const_cast<long*>(&lock), 1); |
(4) |
||
} |
|
|
(5) |
waiting[i] = false; |
|
|
(6) |
critical section |
|
|
(7) |
j = (i+1) % n; |
|
|
(8) |
while ((j != i) && !waiting[j]) { j = (j+1) % n; } |
(9) |
||
if (j == i) |
|
|
(10) |
lock = 0; |
|
|
(11) |
else |
|
|
(12) |
waiting[j] = false; |
|
|
(13) |
noncritical section |
|
|
(14) |
} |
|
|
|
In statement (1), thread Ti sets waiting [i] to true to indicate that it is waiting to enter its critical section. Thread Ti then stays in the while-loop until either InterlockedExchange returns 0 or waiting [i] is set to 0 by another thread when that thread exits its critical section.
When thread Ti exits its critical section, it uses the while-loop in statement (9) to search for a waiting thread. Thread Ti starts its search by examining waiting[i+1 ] . If the while-loop terminates with j == i, no waiting threads exist; otherwise, thread Tj is a waiting thread and waiting[j] is set to 0 to let Tj exit the while-loop at statement (3) and enter its critical section.
2.3.3 Note on Busy-Waiting
All of the solutions to the critical section problem that we have seen so far use busy-waiting —a waiting thread executes a loop that maintains its hold on the CPU. Busy-waiting wastes CPU cycles. To reduce the amount of busy-waiting, some type of sleep instruction can be used. In Win32, execution of
Sleep(time);
releases the CPU and blocks the executing thread for time in milliseconds. In Java, a thread sleeps by executing Thread.sleep(time) for time in milliseconds, while Pthreads uses sleep(time) for time in seconds. Executing a sleep statement
HARDWARE SOLUTIONS TO THE n-THREAD CRITICAL SECTION PROBLEM |
61 |
results in a context switch that allows another thread to execute. (We use the term blocked to describe a thread that is waiting for some event to occur, such as the expiration of a timer or the completion of an I/O statement. Such a thread is not running and will not run until the target event occurs and the operating system schedules the thread to run.) The amount of busy waiting in the while-loop in statement (3) of the complete solution can be reduced as follows:
while (waiting[i] && key) { Sleep(100); // release the CPU
key = InterlockedExchange(const_cast<long*>(&lock), 1);
}
If contention among threads for the critical section is low and critical sections are small, there may be little chance that a thread will execute the while-loop for more than a few iterations. In such cases it may be more efficient to use busy-waiting and avoid the time-consuming context switch caused by executing a sleep statement.
A slightly different version of this code can be used to solve a potential performance problem on multiprocessor systems with private caches and cachecoherence protocols that allow shared writable data to exist in multiple caches. For example, consider the case where two processors are busy-waiting on the value of lock in the complete solution above. When a waiting processor modifies lock in statement (4), it causes the modified lock to be invalidated in the other processor’s cache. As a result, the value of lock can bounce repeatedly from one cache to the other [Dubois et al. 1988].
Instead of looping on the call to InterlockedExchange in (3), which causes the ping-pong effect, we can use a simpler loop that waits for lock to become 0:
while (waiting[i] && key) { |
(3) |
while (lock) {;} // wait for lock to be released |
(4) |
// try to grab lock |
(5) |
key = InterlockedExchange(const_cast<long*>(&lock), 1); |
|
} |
|
When a processor wants to acquire the lock, it spins locally in its cache without modifying and invalidating the lock variable. When the lock is eventually released, the function InterlockedExchange is still used, but only to attempt to change the value of lock from 0 to 1. If another thread happens to “steal” the lock between statements (4) and (5), looping will continue at statement (3).
In general, the performance of busy-waiting algorithms depends on the number of processors, the size of the critical section, and the architecture of the system. Mellor-Crummey and Scott [1991] described many ways to fine-tune the performance of busy-waiting algorithms (see Exercise 2.13).
62 |
THE CRITICAL SECTION PROBLEM |
2.4 DEADLOCK, LIVELOCK, AND STARVATION
We have identified three correctness requirements for solutions to the critical section problem: mutual exclusion, progress, and bounded waiting. Different concurrent programming problems have different correctness requirements. However, one general requirement that is often not explicitly stated is the absence of deadlock, livelock, and starvation. In this section we explain this requirement and give examples of programs that violate it. A more formal definition of this requirement is presented in Section 7.3.3.
2.4.1 Deadlock
A deadlock requires one or more threads to be blocked forever. As we mentioned above, a thread is blocked if it is not running and it is waiting for some event to occur. Sleep statements block a thread temporarily, but eventually the thread is allowed to run again. In later chapters we will see other types of statements that can permanently block the threads that execute them. For example, a thread that executes a receive statement to receive a message from another thread will block until the message arrives, but it is possible that a message will never arrive.
Let CP be a concurrent program containing two or more threads. Assume that there is an execution of CP that exercises an execution sequence S, and at the end of S, there exists a thread T that satisfies these conditions:
žT is blocked due to the execution of a synchronization statement (e.g., waiting to receive a message).
žT will remain blocked forever, regardless of what the other threads will do.
Thread T is said to be deadlocked at the end of S, and CP is said to have a deadlock. A global deadlock refers to a deadlock in which all nonterminated threads are deadlocked.
As an example, assume that CP contains threads T1 and T2 and the following execution sequence is possible:
žT1 blocks waiting to receive a message from T2.
žT2 blocks waiting to receive a message from T1.
Both T1 and T2 will remain blocked forever since neither thread is able to send the message for which the other thread is waiting.
2.4.2 Livelock
We assume that some statements in CP are labeled as progress statements, indicating that threads are expected eventually to execute these statements. Statements that are likely to be labeled as progress statements include the last statement of a thread, the first statement of a critical section, or the statement immediately
DEADLOCK, LIVELOCK, AND STARVATION |
63 |
following a loop. If a thread executes a progress statement, it is considered to be making progress.
Assume that there is an execution of CP that exercises an execution sequence S, and at the end of S there exists a thread T that satisfies the following conditions, regardless of what the other threads will do:
žT will not terminate or deadlock.
žT will never make progress.
Thread T is said to be livelocked at the end of S, and CP is said to have a livelock. Livelock is the busy-waiting analog of deadlock. A livelocked thread is running, not blocked, but it will never make progress.
Incorrect solution 2 in Section 2.1.2 has an execution sequence that results in a violation of the progress requirement for solutions to the critical section problem. (This progress requirement should not be confused with the general requirement to make progress. The latter is used to indicate the absence of livelock and can be required for solutions to any problem.) Below is a prefix of this execution sequence:
žT0 executes (1), (2), (3), and (4). Now turn is 1.
žT1 executes (1), (2), and (3) and then terminates in its noncritical section. Now turn is 0.
žT0 executes (1), (2), (3), and (4), making turn 1, and executes its while-loop at (1).
At the end this sequence, T 0 is stuck in a busy-waiting loop at (1) waiting for turn to become 0. T0 will never enter its critical section (i.e., make any progress). Thus, T 0 is livelocked.
2.4.3 Starvation
Assume that CP contains an infinite execution sequence S satisfying the following three properties:
1.S ends with an infinite repetition of a fair cycle of statements. (A cycle of statements in CP is said to be fair if each nonterminated thread in CP is either always blocked in the cycle or is executed at least once. Nonfair cycles are not considered here, since such cycles cannot repeat forever when fair scheduling is used.)
2.There exists a nonterminated thread T that does not make progress in the cycle.
3.Thread T is neither deadlocked nor livelocked in the cycle. In other words,
when CP reaches the cycle of statements that ends S, CP may instead execute a different sequence S such that T makes progress or terminates in S .
64 |
THE CRITICAL SECTION PROBLEM |
Thread T is said to be starved in the cycle that ends S, and CP is said to have a starvation. When CP reaches the cycle of statements at the end of S, whether thread T will starve depends on how the threads in CP are scheduled. Note that fair scheduling of the threads in CP does not guarantee a starvation-free execution of CP.
Incorrect solution 3 in Section 2.1.3 has an execution sequence that results in a violation of the bounded waiting requirement. Following is that execution sequence:
(a)T 0 executes (1), (2), and (6). Now T 0 is in its critical section and intendToEnter [0] is true.
(b) T 1 executes (1)–(4). Now intendToEnter [1] is false and T 1 is waiting for intendToEnter [0] to be false.
(c)T 0 executes (7), (8), (1), (2), and (6). Now T 0 is in its critical section and intendToEnter [0] is true.
(d) T 1 resumes execution at (4) and is still waiting for intendToEnter [0] to be false.
(e) T 0 executes (7), (8), (1), (2), and (6).
(f) T 1 resumes execution at (4).
. . . Infinite repetition of steps (e) and (f) . . .
In this sequence, T 1 never makes progress in the cycle of statements involving
(e) and (f). However, when execution reaches step (e), the following sequence may be executed instead:
(e)T 0 executes (7).
(f)T 1 resumes execution at (4) and executes (5), (2), (6), and (7).
Thus, T 1 is not deadlocked or livelocked, but it is starved in the cycle involving
(e) and (f).
If contention for a critical section is low, as it is in many cases, starvation is unlikely, and solutions to the critical section problem that theoretically allow starvation may actually be acceptable. This is the case for the partial hardware solution in Section 2.3.1.
2.5 TRACING AND REPLAY FOR SHARED VARIABLES
In this section we show how to trace and replay executions of concurrent programs that read and write shared variables. We present a C++ class that will allow us to demonstrate program tracing and replay and we show how to trace and replay a C++ implementation of Peterson’s algorithm. Even though real programs use high-level synchronization constructs to create critical sections, not algorithms like Peterson’s, the tracing and replay techniques presented in this