Eilam E.Reversing.Secrets of reverse engineering.2005
.pdf
Deciphering Code Structures 481
Table A.1 (continued) |
|
|
|
|
|
|
|
|
|
|
|
RELATION |
|
|
LEFT |
RIGHT |
BETWEEN |
FLAGS |
|
OPERAND |
OPERAND |
OPERANDS |
AFFECTED |
COMMENTS |
X < 0 |
Y < 0 |
X > Y |
OF = 0 SF = 0 ZF = 0 |
This is the same |
|
|
|
|
as the preceding |
|
|
|
|
case, with both X |
|
|
|
|
and Y containing |
|
|
|
|
negative |
|
|
|
|
integers. |
X > 0 |
Y > 0 |
X < Y |
OF = 0 SF = 1 ZF = 0 An SF = 1 |
|
|
|
|
|
represents a |
|
|
|
|
negative result, |
|
|
|
|
which (with OF |
|
|
|
|
being unset) |
|
|
|
|
indicates that Y |
|
|
|
|
is larger than X. |
X < 0 |
Y >= 0 |
X < Y |
OF = 0 SF = 1 ZF = 0 |
This is the same |
|
|
|
|
as the preceding |
|
|
|
|
case, except that |
|
|
|
|
X is negative and |
|
|
|
|
Y is positive. |
|
|
|
|
Again, the |
|
|
|
|
combination of |
|
|
|
|
SF = 1 with OF = 0 |
|
|
|
|
represents that Y |
|
|
|
|
is greater than X. |
X < 0 |
Y > 0 |
X < Y |
OF = 1 SF = 0 ZF = 0 |
This is another |
|
|
|
|
similar case |
|
|
|
|
where X is |
|
|
|
|
negative and Y is |
|
|
|
|
positive, except |
|
|
|
|
that here an |
|
|
|
|
overflow is |
|
|
|
|
generated, and |
|
|
|
|
the result is |
|
|
|
|
positive. |
X > 0 |
Y < 0 |
X > Y |
OF = 1 SF = 1 ZF = 0 |
When X is |
|
|
|
|
positive and Y is |
|
|
|
|
a negative |
integer low enough to generate a positive overflow, both OF and SF are set.
482 Appendix A
In looking at Table A.1, the ground rules for identifying the results of signed integer comparisons become clear. Here’s a quick summary of the basic rules:
■■Anytime ZF is set you know that the subtraction resulted in a zero, which means that the operands are equal.
■■When all three flags are zero, you know that the first operand is greater than the second, because you have a positive result and no overflow.
■■When there is a negative result and no overflow (SF=1 and OF=0), you know that the second operand is larger than the first.
■■When there is an overflow and a positive result, the second operand must be larger than the first, because you essentially have a negative result that is too small to be represented by the destination operand (hence the overflow).
■■When you have an overflow and a negative result, the first operand must be larger than the second, because you essentially have a positive result that is too large to be represented by the destination operand (hence the overflow).
While it is not generally necessary to memorize the comparison outcome tables (tables A.1 and A.2), it still makes sense to go over them and make sure that you properly understand how each flag is used in the operand comparison process. This will be helpful in some cases while reversing when flags are used in unconventional ways. Knowing how flags are set during comparison and subtraction is very helpful for properly understanding logical sequences and quickly deciphering their meaning.
Unsigned Comparisons
Table A.2 demonstrates the behavior of the CMP instruction when comparing unsigned operands. Remember that just like table A.1, the following table also applies to the SUB instruction.
Table A.2 Unsigned Subtraction Outcome Table for CMP and SUB Instructions (X represents the left operand, while Y represents the right operand)
RELATION |
|
|
BETWEEN |
FLAGS |
|
OPERANDS |
AFFECTED |
COMMENTS |
X = Y |
CF = 0 ZF = 1 |
The two operands are equal, so the result is |
|
|
zero. |
|
|
|
X < Y |
CF = 1 ZF = 0 |
Y is larger than X so the result is lower than |
|
|
0, which generates an overflow (CF=1). |
X > Y |
CF = 0 ZF = 0 |
X is larger than Y, so the result is above zero, |
|
|
and no overflow is generated (CF=0). |
|
|
|
Deciphering Code Structures 483
In looking at Table A.2, the ground rules for identifying the results of unsigned integer comparisons become clear, and it’s obvious that unsigned operands are easier to deal with. Here’s a quick summary of the basic rules:
■■Anytime ZF is set you know that the subtraction resulted in a zero, which means that the operands are equal.
■■When both flags are zero, you know that the first operand is greater than the second, because you have a positive result and no overflow.
■■When you have an overflow you know that the second operand is greater than the first, because the result must be too low in order to be represented by the destination operand.
The Conditional Codes
Conditional codes are suffixes added to certain conditional instructions in order to define the conditions governing their execution.
It is important for reversers to understand these mnemonics because virtually every conditional code sequence will include one or more of them. Sometimes their meaning will be very intuitive—take a look at the following code:
cmp
eax, 7
je
SomePlace
In this example, it is obvious that JE (which is jump if equal) will cause a jump to SomePlace if EAX equals 7. This is one of the more obvious cases where understanding the specifics of instructions such as CMP and of the conditional codes is really unnecessary. Unfortunately for us reversers, there are quite a few cases where the conditional codes are used in unintuitive ways. Understanding how the conditional codes use the flags is important for properly understanding program logic. The following sections list each condition code and explain which flags it uses and why.
The conditional codes listed in the following sections are listed as standalone codes, even though they are normally used as instruction suffixes to conditional instructions. Conditional codes are never used alone.
Signed Conditional Codes
Table A.3 presents the IA-32 conditional codes defined for signed operands. Note that in all signed conditional codes overflows are detected using the
484 Appendix A
overflow flag (OF). This is because the arithmetic instructions use OF for indicating signed overflows.
Table A.3 Signed Conditional Codes Table for CMP and SUB Instructions
|
|
SATISFIED |
|
MNEMONICS |
FLAGS |
WHEN |
COMMENTS |
If Greater (G) |
ZF = 0 AND |
X > Y |
Use ZF to confirm |
If Not Less or |
((OF = 0 AND SF = 0) OR |
|
that the operands |
Equal (NLE) |
(OF = 1 AND SF = 1)) |
|
are unequal. Also use |
|
|
|
SF to check for either |
|
|
|
a positive result |
|
|
|
without an overflow, |
|
|
|
indicating that the first |
|
|
|
operand is greater, or |
|
|
|
a negative result with |
|
|
|
an overflow. The latter |
|
|
|
would indicate that |
|
|
|
the second operand |
|
|
|
was a low enough |
|
|
|
negative integer to |
|
|
|
produce a result too |
|
|
|
large to be |
|
|
|
represented by the |
|
|
|
destination (hence the |
|
|
|
overflow). |
If Greater or |
(OF = 0 AND SF = 0) OR |
X >= Y |
This code is similar |
Equal(GE) |
(OF = 1 AND SF = 1) |
|
to the preceding |
If Not Less (NL) |
|
|
code with the |
|
|
|
exception that it |
|
|
|
doesn’t check ZF for |
|
|
|
zero, so it would also |
|
|
|
be satisfied by equal |
|
|
|
operands. |
|
|
|
|
If Less (L) |
(OF = 1 AND SF = 0) OR |
X < Y |
Check for OF = 1 AND |
If Not Greater |
(OF = 0 AND SF = 1) |
|
SF = 0 indicating that |
or Equal (NGE) |
|
|
X was lower than Y |
|
|
|
and the result was too |
|
|
|
low to be represented |
|
|
|
by the destination |
|
|
|
operand (you got an |
|
|
|
overflow and a |
|
|
|
positive result). The |
|
|
|
other case is OF = 0 |
|
|
|
AND SF = 1. This is a |
|
|
|
similar case, except |
|
|
|
that no overflow is |
|
|
|
generated, and the |
|
|
|
result is negative. |
Deciphering Code Structures 487
generate for nearly every function. The particulars of these sequences depend on the specific compiler used and on other issues such as calling convention. Calling conventions are discussed in the section on calling conventions in Appendix C.
On IA-32 processors function are nearly always called using the CALL instruction, which stores the current instruction pointer in the stack and jumps to the function address. This makes it easy to distinguish function calls from other unconditional jumps.
Internal Functions
Internal functions are called from the same binary executable that contains their implementation. When compilers generate an internal function call sequence they usually just embed the function’s address into the code, which makes it very easy to detect. The following is a common internal function call.
Call CodeSectionAddress
Imported Functions
An imported function call takes place when a module is making a call into a function implemented in another binary executable. This is important because during the compilation process the compiler has no idea where the imported function can be found and is therefore unable to embed the function’s address into the code (as is usually done with internal functions).
Imported function calls are implemented using the Import Directory and Import Address Table (see Chapter 3). The import directory is used in runtime for resolving the function’s name with a matching function in the target executable, and the IAT stores the actual address of the target function. The caller then loads the function’s pointer from the IAT and calls it. The following is an example of a typical imported function call:
call |
DWORD PTR [IAT_Pointer] |
Notice the DWORD PTR that precedes the pointer—it is important because it tells the CPU to jump not to the address of IAT_Pointer but to the address that is pointed to by IAT_Pointer. Also keep in mind that the pointer will usually not be named (depending on the disassembler) and will simply contain an address pointing into the IAT.
Detecting imported calls is easy because except for these types of calls, functions are rarely called indirectly through a hard-coded function pointer. I would, however, recommend that you determine the location of the IAT early on in reversing sessions and use it to confirm that a function is indeed
488Appendix A
imported. Locating the IAT is quite easy and can be done with a variety of different tools that dump the module’s PE header and provide the address of the IAT. Tools for dumping PE headers are discussed in Chapter 4.
Some disassemblers and debuggers will automatically indicate an imported function call (by internally checking the IAT address), thus saving you the trouble.
Single-Branch Conditionals
The most basic form of logic in most programs consists of a condition and an ensuing conditional branch. In high-level languages, this is written as an if statement with a condition and a block of conditional code that gets executed if the condition is satisfied. Here’s a quick sample:
if (SomeVariable == 0)
CallAFunction();
From a low-level perspective, implementing this statement requires a logical check to determine whether SomeVariable contains 0 or not, followed by code that skips the conditional block by performing a conditional jump if SomeVariable is nonzero. Figure A.1 depicts how this code snippet would typically map into assembly language.
The assembly language code in Figure A.1 uses TEST to perform a simple zero check for EAX. TEST works by performing a bitwise AND operation on EAX and setting flags to reflect the result (the actual result is discarded). This is an effective way to test whether EAX is zero or nonzero because TEST sets the zero flag (ZF) according to the result of the bitwise AND operation. Note that the condition is reversed: In the source code, the program was checking whether SomeVariable equals zero, but the compiler reversed the condition so that the conditional instruction (in this case a jump) checks whether SomeVariable is nonzero. This stems from the fact that the compiler-generated binary code is organized in memory in the same order as it is organized in the source code. Therefore if SomeVariable is nonzero, the compiler must skip the conditional code section and go straight to the code section that follows.
The bottom line is that in single-branch conditionals you must always reverse the meaning of the conditional jump in order to obtain the true highlevel logical intention.
Deciphering Code Structures 489
|
Assembly Language Code |
High-Level Code |
mov |
eax, [SomeVariable] |
if (SomeVariable == 0) |
test |
eax, eax |
CallAFunction(); |
jnz |
AfterCondition |
... |
call CallAFunction |
|
|
AfterCondition: |
|
|
... |
|
|
Figure A.1 High-level/low-level view of a single branch conditional sequence.
Two-Way Conditionals
Another fundamental functionality of high-level languages is to allow the use of two-way conditionals, typically implemented in high-level languages using the if-else keyword pair. A two-way conditional is different from a singlebranch conditional in the sense that if the condition is not satisfied, the program executes an alternative code block and only then proceeds to the code that follows the ‘if-else’ statement. These constructs are called two-way conditionals because the flow of the program is split into one of two different possible paths: the one in the ‘if’ block, or the one in the ‘else’ block.
Let’s take a quick look at how compilers implement two-way conditionals. First of all, in two-way conditionals the conditional branch points to the ‘else’ block and not to the code that follows the conditional statement. Second, the condition itself is almost always reversed (so that the jump to the ‘else’ block only takes place when the condition is not satisfied), and the primary conditional block is placed right after the conditional jump (so that the conditional code gets executed if the condition is satisfied). The conditional block always ends with an unconditional jump that essentially skips the ‘else’ block—this is a good indicator for identifying two-way conditionals. The ‘else’ block is placed at the end of the conditional block, right after that unconditional jump. Figure A.2 shows what an average if-else statement looks like in assembly language.
490 |
Appendix A |
|
|
|
|
|
Assembly Language Code |
High-Level Code |
|
|
cmp |
[Variable1], 7 |
|
if (SomeVariable == 7) |
|
|
|
|
|
|
jne |
ElseBlock |
Reversed |
|
|
|
SomeFunction(); |
||
|
|
|
|
|
|
call |
SomeFunction |
|
|
|
jmp |
AfterConditionalBlock |
else |
|
|
|
|||
|
ElseBlock: |
|
SomeOtherFunction(); |
|
|
call |
SomeOtherFunction |
|
|
|
AfterConditionalBlock: |
|
|
|
|
... |
|
|
|
Figure A.2 High-level/low-level view of a two-way conditional.
Notice the unconditional JMP right after the function call. That is where the first condition skips the else block and jumps to the code that follows. The basic pattern to look for when trying to detect a simple ‘if-else’ statement in a disassembled program is a condition where the code that follows it ends with an unconditional jump.
Most high-level languages also support a slightly more complex version of a two-way conditional where a separate conditional statement is used for each of the two code blocks. This is usually implemented by combining the ‘if’ and else-if keywords where each statement is used with a separate conditional statement. This way, if the first condition is not satisfied, the program jumps to the second condition, evaluates that one, and simply skips the entire conditional block if neither condition is satisfied. If one of the conditions is satisfied, the corresponding conditional block is executed, and execution just flows into the next program statement. Figure A.3 provides a high-level/low- level view of this type of control flow construct.
Multiple-Alternative Conditionals
Sometimes programmers create long statements with multiple conditions, where each condition leads to the execution of a different code block. One way to implement this in high-level languages is by using a “switch” block (discussed later), but it is also possible to do this using conventional ‘if’ statements. The reason that programmers sometimes must use ‘if’ statements is that they allow for more flexible conditional statements. The problem is that ‘switch’ blocks don’t support complex conditions, only the use of hardcoded constants. In contrast, a sequence of ‘else-if’ statements allows for any kind of complex condition on each of the blocks—it is just more flexible.
