
Eilam E.Reversing.Secrets of reverse engineering.2005
.pdf

502 Appendix A
To implement a binary search for switch blocks, the compiler must internally represent the switch block as a tree. The idea is that instead of comparing the provided value against each one of the possible cases in runtime, the compiler generates code that first checks whether the provided value is within the first or second group. The compiler then jumps to another code section that checks the value against the values accepted within the smaller subgroup. This process continues until the correct item is found or until the conditional block is exited (if no case block is found for the value being searched).
Let’s take a look at a common switch block implemented in C and observe how it is transformed into a tree by the compiler.
switch (Value)
{
case 120:
Code...
break; case 140:
Code...
break; case 501:
Code...
break; case 1001:
Code...
break; case 1100:
Code...
break; case 1400:
Code...
break; case 2000:
Code...
break; case 3400:
Code...
break; case 4100:
Code...
break;
};



Deciphering Code Structures 505
mov |
ecx, DWORD PTR [array] |
xor |
eax, eax |
LoopStart: |
|
mov |
DWORD PTR [ecx+eax*4], eax |
add |
eax, 1 |
cmp |
eax, 1000 |
jl |
LoopStart |
It appears that even though the condition in the source code was located before the loop, the compiler saw fit to relocate it. The reason that this happens is that testing the counter after the loop provides a (relatively minor) performance improvement. As I’ve explained, converting this loop to a posttested one means that the compiler can eliminate the unconditional JMP instruction at the end of the loop.
There is one potential risk with this implementation. What happens if the counter starts out at an out-of-bounds value? That could cause problems because the loop body uses the loop counter for accessing an array. The programmer was expecting that the counter be tested before running the loop body, not after! The reason that this is not a problem in this particular case is that the counter is explicitly initialized to zero before the loop starts, so the compiler knows that it is zero and that there’s nothing to check. If the counter were to come from an unknown source (as a parameter passed from some other, unknown function for instance), the compiler would probably place the logic where it belongs: in the beginning of the sequence.
Let’s try this out by changing the above C loop to take the value of counter c from an external source, and recompile this sequence. The following is the output from the Microsoft compiler in this case:
mov |
eax, DWORD PTR [c] |
mov |
ecx, DWORD PTR [array] |
cmp |
eax, 1000 |
jge |
EndOfLoop |
LoopStart: |
|
mov |
DWORD PTR [ecx+eax*4], eax |
add |
eax, 1 |
cmp |
eax, 1000 |
jl |
LoopStart |
EndOfLoop: |
|
It seems that even in this case the compiler is intent on avoiding the two jumps. Instead of moving the comparison to the beginning of the loop and adding an unconditional jump at the end, the compiler leaves everything as it is and simply adds another condition at the beginning of the loop. This initial check (which only gets executed once) will make sure that the loop is not entered if the counter has an illegal value. The rest of the loop remains the same.

506 Appendix A
For the purpose of this particular discussion a for loop is equivalent to a pretested loop such as the ones discussed earlier.
Posttested Loops
So what kind of an effect do posttested loops implemented in the high-level realm actually have on the resulting assembly language code if the compiler produces posttested sequences anyway? Unsurprisingly—very little.
When a program contains a do...while() loop, the compiler generates a very similar sequence to the one in the previous section. The only difference is that with do...while() loops the compiler never has to worry about whether the loop’s conditional statement is expected to be satisfied or not in the first run. It is placed at the end of the loop anyway, so it must be tested anyway. Unlike the previous case where changing the starting value of the counter to an unknown value made the compiler add another check before the beginning of the loop, with do...while() it just isn’t necessary. This means that with posttested loops the logic is always placed after the loop’s body, the same way it’s arranged in the source code.
Loop Break Conditions
A loop break condition occurs when code inside the loop’s body terminates the loop (in C and C++ this is done using the break keyword). The break keyword simply interrupts the loop and jumps to the code that follows. The following assembly code is the same loop you’ve looked at before with a conditional break statement added to it:
mov |
eax, DWORD PTR [c] |
mov |
ecx, DWORD PTR [array] |
LoopStart: |
|
cmp |
DWORD PTR [ecx+eax*4], 0 |
jne |
AfterLoop |
mov |
DWORD PTR [ecx+eax*4], eax |
add |
eax, 1 |
cmp |
eax, 1000 |
jl |
LoopStart |
AfterLoop: |
|
This code is slightly different from the one in the previous examples because even though the counter originates in an unknown source the condition is only checked at the end of the loop. This is indicative of a posttested loop. Also, a new check has been added that checks the current array item before it is

Deciphering Code Structures 507
initialized and jumps to AfterLoop if it is nonzero. This is your break statement—simply an elegant name for the good old goto command that was so popular in “lesser” programming languages.
For this you can easily deduce the original source to be somewhat similar to the following:
do
{
if (array[c]) break;
array[c] = c; c++;
} while (c < 1000);
Loop Skip-Cycle Statements
A loop skip-cycle statement is implemented in C and C++ using the continue keyword. The statement skips the current iteration of the loop and jumps straight to the loop’s conditional statement, which decides whether to perform another iteration or just exit the loop. Depending on the specific type of the loop, the counter (if one is used) is usually not incremented because the code that increments it is skipped along with the rest of the loop’s body. This is one place where for loops differ from while loops. In for loops, the code that increments the counter is considered part of the loop’s logical statement, which is why continue doesn’t skip the counter increment in such loops. Let’s take a look at a compiler-generated assembly language snippet for a loop that has a skip-cycle statement in it:
mov |
eax, DWORD PTR [c] |
mov |
ecx, DWORD PTR [array] |
LoopStart: |
|
cmp |
DWORD PTR [ecx+eax*4], 0 |
jne |
NextCycle |
mov |
DWORD PTR [ecx+eax*4], eax |
add |
eax, 1 |
NextCycle: |
|
cmp |
eax, 1000 |
jl |
SHORT LoopStart |
This code sample is the same loop you’ve been looking at except that the condition now invokes the continue command instead of the break command. Notice how the condition jumps to NextCycle and skips the incrementing of the counter. The program then checks the counter’s value and jumps back to the beginning of the loop if the counter is lower than 1,000.


Deciphering Code Structures 509
times, so that each iteration actually performs the work of three iterations instead of one. The counter incrementing code has been corrected to increment by 3 instead of 1 in each iteration. This is more efficient because the loop’s overhead is greatly reduced—instead of executing the CMP and JL instructions 0x3e7 (999) times, they will only be executed 0x14d (333) times.
A more aggressive type of loop unrolling is to simply eliminate the loop altogether and actually duplicate its body as many times as needed. Depending on the number of iterations (and assuming that number is known in advance), this may or may not be a practical approach.
Branchless Logic
Some optimizing compilers have special optimization techniques for generating branchless logic. The main goal behind all of these techniques is to eliminate or at least reduce the number of conditional jumps required for implementing a given logical statement. The reasons for wanting to reduce the number of jumps in the code to the absolute minimum is explained in the section titled “Hardware Execution Environments in Modern Processors” in Chapter 2. Briefly, the use of a processor pipeline dictates that when the processor encounters a conditional jump, it must guess or predict whether the jump will take place or not, and based on that guess decide which instructions to add to the end of the pipeline—the ones that immediately follow the branch or the ones at the jump’s target address. If it guesses wrong, the entire pipeline is emptied and must be refilled. The amount of time wasted in these situations heavily depends on the processor’s internal design and primarily on its pipeline length, but in most pipelined CPUs refilling the pipeline is a highly expensive operation.
Some compilers implement special optimizations that use sophisticated arithmetic and conditional instructions to eliminate or reduce the number of jumps required in order to implement logic. These optimizations are usually applied to code that conditionally performs one or more arithmetic or assignment operations on operands. The idea is to convert the two or more conditional execution paths into a single sequence of arithmetic operations that result in the same data, but without the need for conditional jumps.
There are two major types of branchless logic code emitted by popular compilers. One is based on converting logic into a purely arithmetic sequence that provides the same end result as the original high-level language logic. This technique is very limited and can only be applied to relatively simple sequences. For slightly more involved logical statements, compilers sometimes employ special conditional instructions (when available on the target CPU). The two primary approaches for implementing branchless logic are discussed in the following sections.

510 Appendix A
Pure Arithmetic Implementations
Certain logical statements can be converted directly into a series of arithmetic operations, involving no conditional execution whatsoever. These are elegant mathematical tricks that allow compilers to translate branched logic in the source code into a simple sequence of arithmetic operations. Consider the following code:
mov |
eax, [ebp - 10] |
and |
eax, 0x00001000 |
neg |
eax |
sbb |
eax, eax |
neg |
eax |
ret |
|
The preceding compiler-generated code snippet is quite common in IA-32 programs, and many reversers have a hard time deciphering its meaning. Considering the popularity of these sequences, you should go over this sample and make sure you understand how it works.
The code starts out with a simple logical AND of a local variable with 0x00001000, storing the result into EAX (the AND instruction always sends the result to the first, left-hand operand). You then proceed to a NEG instruction, which is slightly less common. NEG is a simple negation instruction, which reverses the sign of the operand—this is sometimes called two’s complement. Mathematically, NEG performs a simple
Result = -(Operand);
operation. The interesting part of this sequence is the SBB instruction. SBB is a subtraction with borrow instruction. This means that SBB takes the second (right-hand) operand and adds the value of CF to it and then subtracts the result from the first operand. Here’s a pseudocode for SBB:
Operand1 = Operand1 – (Operand2 + CF);
Notice that in the preceding sample SBB was used on a single operand. This means that SBB will essentially subtract EAX from itself, which of course is a mathematically meaningless operation if you disregard CF. Because CF is added to the second operand, the result will depend solely on the value of CF. If CF == 1, EAX will become –1. If CF == 0, EAX will become zero. It should be obvious that the value of EAX after the first NEG was irrelevant. It is immediately lost in the following SBB because it subtracts EAX from itself. This raises the question of why did the compiler even bother with the NEG instruction?
The Intel documentation states that beyond reversing the operand’s sign, NEG will also set the value of CF based on the value of the operand. If the operand is zero when NEG is executed, CF will be set to zero. If the operand is