Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Eilam E.Reversing.Secrets of reverse engineering.2005

.pdf
Скачиваний:
69
Добавлен:
23.08.2013
Размер:
8.78 Mб
Скачать

Deciphering Code Structures 481

Table A.1 (continued)

 

 

 

 

 

 

 

 

 

 

RELATION

 

 

LEFT

RIGHT

BETWEEN

FLAGS

 

OPERAND

OPERAND

OPERANDS

AFFECTED

COMMENTS

X < 0

Y < 0

X > Y

OF = 0 SF = 0 ZF = 0

This is the same

 

 

 

 

as the preceding

 

 

 

 

case, with both X

 

 

 

 

and Y containing

 

 

 

 

negative

 

 

 

 

integers.

X > 0

Y > 0

X < Y

OF = 0 SF = 1 ZF = 0 An SF = 1

 

 

 

 

represents a

 

 

 

 

negative result,

 

 

 

 

which (with OF

 

 

 

 

being unset)

 

 

 

 

indicates that Y

 

 

 

 

is larger than X.

X < 0

Y >= 0

X < Y

OF = 0 SF = 1 ZF = 0

This is the same

 

 

 

 

as the preceding

 

 

 

 

case, except that

 

 

 

 

X is negative and

 

 

 

 

Y is positive.

 

 

 

 

Again, the

 

 

 

 

combination of

 

 

 

 

SF = 1 with OF = 0

 

 

 

 

represents that Y

 

 

 

 

is greater than X.

X < 0

Y > 0

X < Y

OF = 1 SF = 0 ZF = 0

This is another

 

 

 

 

similar case

 

 

 

 

where X is

 

 

 

 

negative and Y is

 

 

 

 

positive, except

 

 

 

 

that here an

 

 

 

 

overflow is

 

 

 

 

generated, and

 

 

 

 

the result is

 

 

 

 

positive.

X > 0

Y < 0

X > Y

OF = 1 SF = 1 ZF = 0

When X is

 

 

 

 

positive and Y is

 

 

 

 

a negative

integer low enough to generate a positive overflow, both OF and SF are set.

482 Appendix A

In looking at Table A.1, the ground rules for identifying the results of signed integer comparisons become clear. Here’s a quick summary of the basic rules:

■■Anytime ZF is set you know that the subtraction resulted in a zero, which means that the operands are equal.

■■When all three flags are zero, you know that the first operand is greater than the second, because you have a positive result and no overflow.

■■When there is a negative result and no overflow (SF=1 and OF=0), you know that the second operand is larger than the first.

■■When there is an overflow and a positive result, the second operand must be larger than the first, because you essentially have a negative result that is too small to be represented by the destination operand (hence the overflow).

■■When you have an overflow and a negative result, the first operand must be larger than the second, because you essentially have a positive result that is too large to be represented by the destination operand (hence the overflow).

While it is not generally necessary to memorize the comparison outcome tables (tables A.1 and A.2), it still makes sense to go over them and make sure that you properly understand how each flag is used in the operand comparison process. This will be helpful in some cases while reversing when flags are used in unconventional ways. Knowing how flags are set during comparison and subtraction is very helpful for properly understanding logical sequences and quickly deciphering their meaning.

Unsigned Comparisons

Table A.2 demonstrates the behavior of the CMP instruction when comparing unsigned operands. Remember that just like table A.1, the following table also applies to the SUB instruction.

Table A.2 Unsigned Subtraction Outcome Table for CMP and SUB Instructions (X represents the left operand, while Y represents the right operand)

RELATION

 

 

BETWEEN

FLAGS

 

OPERANDS

AFFECTED

COMMENTS

X = Y

CF = 0 ZF = 1

The two operands are equal, so the result is

 

 

zero.

 

 

 

X < Y

CF = 1 ZF = 0

Y is larger than X so the result is lower than

 

 

0, which generates an overflow (CF=1).

X > Y

CF = 0 ZF = 0

X is larger than Y, so the result is above zero,

 

 

and no overflow is generated (CF=0).

 

 

 

Deciphering Code Structures 483

In looking at Table A.2, the ground rules for identifying the results of unsigned integer comparisons become clear, and it’s obvious that unsigned operands are easier to deal with. Here’s a quick summary of the basic rules:

■■Anytime ZF is set you know that the subtraction resulted in a zero, which means that the operands are equal.

■■When both flags are zero, you know that the first operand is greater than the second, because you have a positive result and no overflow.

■■When you have an overflow you know that the second operand is greater than the first, because the result must be too low in order to be represented by the destination operand.

The Conditional Codes

Conditional codes are suffixes added to certain conditional instructions in order to define the conditions governing their execution.

It is important for reversers to understand these mnemonics because virtually every conditional code sequence will include one or more of them. Sometimes their meaning will be very intuitive—take a look at the following code:

cmp

eax, 7

je

SomePlace

In this example, it is obvious that JE (which is jump if equal) will cause a jump to SomePlace if EAX equals 7. This is one of the more obvious cases where understanding the specifics of instructions such as CMP and of the conditional codes is really unnecessary. Unfortunately for us reversers, there are quite a few cases where the conditional codes are used in unintuitive ways. Understanding how the conditional codes use the flags is important for properly understanding program logic. The following sections list each condition code and explain which flags it uses and why.

The conditional codes listed in the following sections are listed as standalone codes, even though they are normally used as instruction suffixes to conditional instructions. Conditional codes are never used alone.

Signed Conditional Codes

Table A.3 presents the IA-32 conditional codes defined for signed operands. Note that in all signed conditional codes overflows are detected using the

484 Appendix A

overflow flag (OF). This is because the arithmetic instructions use OF for indicating signed overflows.

Table A.3 Signed Conditional Codes Table for CMP and SUB Instructions

 

 

SATISFIED

 

MNEMONICS

FLAGS

WHEN

COMMENTS

If Greater (G)

ZF = 0 AND

X > Y

Use ZF to confirm

If Not Less or

((OF = 0 AND SF = 0) OR

 

that the operands

Equal (NLE)

(OF = 1 AND SF = 1))

 

are unequal. Also use

 

 

 

SF to check for either

 

 

 

a positive result

 

 

 

without an overflow,

 

 

 

indicating that the first

 

 

 

operand is greater, or

 

 

 

a negative result with

 

 

 

an overflow. The latter

 

 

 

would indicate that

 

 

 

the second operand

 

 

 

was a low enough

 

 

 

negative integer to

 

 

 

produce a result too

 

 

 

large to be

 

 

 

represented by the

 

 

 

destination (hence the

 

 

 

overflow).

If Greater or

(OF = 0 AND SF = 0) OR

X >= Y

This code is similar

Equal(GE)

(OF = 1 AND SF = 1)

 

to the preceding

If Not Less (NL)

 

 

code with the

 

 

 

exception that it

 

 

 

doesn’t check ZF for

 

 

 

zero, so it would also

 

 

 

be satisfied by equal

 

 

 

operands.

 

 

 

 

If Less (L)

(OF = 1 AND SF = 0) OR

X < Y

Check for OF = 1 AND

If Not Greater

(OF = 0 AND SF = 1)

 

SF = 0 indicating that

or Equal (NGE)

 

 

X was lower than Y

 

 

 

and the result was too

 

 

 

low to be represented

 

 

 

by the destination

 

 

 

operand (you got an

 

 

 

overflow and a

 

 

 

positive result). The

 

 

 

other case is OF = 0

 

 

 

AND SF = 1. This is a

 

 

 

similar case, except

 

 

 

that no overflow is

 

 

 

generated, and the

 

 

 

result is negative.

Deciphering Code Structures 485

Table A.3 (continued)

 

 

SATISFIED

 

MNEMONICS

FLAGS

WHEN

COMMENTS

If Less or

ZF = 1 OR

X <= Y

This code is the same

Equal (LE)

((OF = 1 AND SF = 0) OR

 

as the preceding code

If Not

(OF = 0 AND SF = 1))

 

with the exception

Greater (NG)

 

 

that it also checks ZF

 

 

 

and so would also be

 

 

 

satisfied if the

 

 

 

operands are equal.

Unsigned Conditional Codes

Table A.4 presents the IA-32 conditional codes defined for unsigned operands. Note that in all unsigned conditional codes, overflows are detected using the carry flag (CF). This is because the arithmetic instructions use CF for indicating unsigned overflows.

Table A.4 Unsigned Conditional Codes

 

 

SATISFIED

 

MNEMONICS

FLAGS

WHEN

COMMENTS

If Above (A)

CF = 0 AND ZF = 0

X > Y

Use CF to confirm that

If Not Below or

 

 

the second operand is

Equal (NBE)

 

 

not larger than the

 

 

 

first (because then CF

 

 

 

would be set), and ZF

 

 

 

to confirm that the

 

 

 

operands are unequal.

If Above or

CF = 0

X >= Y

This code is similar to

Equal (AE)

 

 

the above with the

If Not

 

 

exception that it only

Below (NB)

 

 

checks CF, so it would

If Not Carry (NC)

 

 

also be satisfied by

 

 

 

equal operands.

 

 

 

 

If Below (B)

CF = 1

X < Y

When CF is set we

If Not Above or

 

 

know that the second

Equal (NAE)

 

 

operand is greater

If Carry (C)

 

 

than the first because

 

 

 

an overflow could only

 

 

 

mean that the result

 

 

 

was negative.

 

 

 

 

(continued)

486Appendix A

Table A.4 (continued)

 

 

SATISFIED

 

MNEMONICS

FLAGS

WHEN

COMMENTS

If Below or

CF = 1 OR ZF = 1

X <= Y

This code is the same

Equal (BE)

 

 

as the above with the

If Not

 

 

exception that it also

Above (NA)

 

 

checks ZF, and so

 

 

 

would also be

 

 

 

satisfied if the

 

 

 

operands are equal.

If Equal (E)

ZF = 1

X = Y

ZF is set so we know

If Zero (Z)

 

 

that the result was

 

 

 

zero, meaning that the

 

 

 

operands are equal.

If Not Equal (NE)

ZF = 0

Z != Y

ZF is unset so we

If Not Zero (NZ)

 

 

know that the result

 

 

 

was nonzero, which

 

 

 

implies that the

 

 

 

operands are unequal.

 

 

 

 

Control Flow & Program Layout

The vast majority of logic in the average computer program is implemented through branches. These are the most common programming constructs, regardless of the high-level language. A program tests one or more logical conditions, and branches to a different part of the program based on the result of the logical test. Identifying branches and figuring out their meaning and purpose is one of the most basic code-level reversing tasks.

The following sections introduce the most popular control flow constructs and program layout elements. I start with a discussion of procedures and how they are represented in assembly language and proceed to a discussion of the most common control flow constructs and to a comparison of their low-level representations with their high-level representations. The constructs discussed are single branch conditionals, two-way conditionals, n-way conditionals, and loops, among others.

Deciphering Functions

The most basic building block in a program is the procedure, or function. From a reversing standpoint functions are very easy to detect because of function prologues and epilogues. These are standard initialization sequences that compilers

Deciphering Code Structures 487

generate for nearly every function. The particulars of these sequences depend on the specific compiler used and on other issues such as calling convention. Calling conventions are discussed in the section on calling conventions in Appendix C.

On IA-32 processors function are nearly always called using the CALL instruction, which stores the current instruction pointer in the stack and jumps to the function address. This makes it easy to distinguish function calls from other unconditional jumps.

Internal Functions

Internal functions are called from the same binary executable that contains their implementation. When compilers generate an internal function call sequence they usually just embed the function’s address into the code, which makes it very easy to detect. The following is a common internal function call.

Call CodeSectionAddress

Imported Functions

An imported function call takes place when a module is making a call into a function implemented in another binary executable. This is important because during the compilation process the compiler has no idea where the imported function can be found and is therefore unable to embed the function’s address into the code (as is usually done with internal functions).

Imported function calls are implemented using the Import Directory and Import Address Table (see Chapter 3). The import directory is used in runtime for resolving the function’s name with a matching function in the target executable, and the IAT stores the actual address of the target function. The caller then loads the function’s pointer from the IAT and calls it. The following is an example of a typical imported function call:

call

DWORD PTR [IAT_Pointer]

Notice the DWORD PTR that precedes the pointer—it is important because it tells the CPU to jump not to the address of IAT_Pointer but to the address that is pointed to by IAT_Pointer. Also keep in mind that the pointer will usually not be named (depending on the disassembler) and will simply contain an address pointing into the IAT.

Detecting imported calls is easy because except for these types of calls, functions are rarely called indirectly through a hard-coded function pointer. I would, however, recommend that you determine the location of the IAT early on in reversing sessions and use it to confirm that a function is indeed

488Appendix A

imported. Locating the IAT is quite easy and can be done with a variety of different tools that dump the module’s PE header and provide the address of the IAT. Tools for dumping PE headers are discussed in Chapter 4.

Some disassemblers and debuggers will automatically indicate an imported function call (by internally checking the IAT address), thus saving you the trouble.

Single-Branch Conditionals

The most basic form of logic in most programs consists of a condition and an ensuing conditional branch. In high-level languages, this is written as an if statement with a condition and a block of conditional code that gets executed if the condition is satisfied. Here’s a quick sample:

if (SomeVariable == 0)

CallAFunction();

From a low-level perspective, implementing this statement requires a logical check to determine whether SomeVariable contains 0 or not, followed by code that skips the conditional block by performing a conditional jump if SomeVariable is nonzero. Figure A.1 depicts how this code snippet would typically map into assembly language.

The assembly language code in Figure A.1 uses TEST to perform a simple zero check for EAX. TEST works by performing a bitwise AND operation on EAX and setting flags to reflect the result (the actual result is discarded). This is an effective way to test whether EAX is zero or nonzero because TEST sets the zero flag (ZF) according to the result of the bitwise AND operation. Note that the condition is reversed: In the source code, the program was checking whether SomeVariable equals zero, but the compiler reversed the condition so that the conditional instruction (in this case a jump) checks whether SomeVariable is nonzero. This stems from the fact that the compiler-generated binary code is organized in memory in the same order as it is organized in the source code. Therefore if SomeVariable is nonzero, the compiler must skip the conditional code section and go straight to the code section that follows.

The bottom line is that in single-branch conditionals you must always reverse the meaning of the conditional jump in order to obtain the true highlevel logical intention.

Deciphering Code Structures 489

 

Assembly Language Code

High-Level Code

mov

eax, [SomeVariable]

if (SomeVariable == 0)

test

eax, eax

CallAFunction();

jnz

AfterCondition

...

call CallAFunction

 

AfterCondition:

 

...

 

 

Figure A.1 High-level/low-level view of a single branch conditional sequence.

Two-Way Conditionals

Another fundamental functionality of high-level languages is to allow the use of two-way conditionals, typically implemented in high-level languages using the if-else keyword pair. A two-way conditional is different from a singlebranch conditional in the sense that if the condition is not satisfied, the program executes an alternative code block and only then proceeds to the code that follows the ‘if-else’ statement. These constructs are called two-way conditionals because the flow of the program is split into one of two different possible paths: the one in the ‘if’ block, or the one in the ‘else’ block.

Let’s take a quick look at how compilers implement two-way conditionals. First of all, in two-way conditionals the conditional branch points to the ‘else’ block and not to the code that follows the conditional statement. Second, the condition itself is almost always reversed (so that the jump to the ‘else’ block only takes place when the condition is not satisfied), and the primary conditional block is placed right after the conditional jump (so that the conditional code gets executed if the condition is satisfied). The conditional block always ends with an unconditional jump that essentially skips the ‘else’ block—this is a good indicator for identifying two-way conditionals. The ‘else’ block is placed at the end of the conditional block, right after that unconditional jump. Figure A.2 shows what an average if-else statement looks like in assembly language.

490

Appendix A

 

 

 

 

Assembly Language Code

High-Level Code

 

cmp

[Variable1], 7

 

if (SomeVariable == 7)

 

 

 

 

 

jne

ElseBlock

Reversed

 

 

 

SomeFunction();

 

 

 

 

 

call

SomeFunction

 

 

 

jmp

AfterConditionalBlock

else

 

 

 

ElseBlock:

 

SomeOtherFunction();

 

call

SomeOtherFunction

 

 

AfterConditionalBlock:

 

 

 

...

 

 

 

Figure A.2 High-level/low-level view of a two-way conditional.

Notice the unconditional JMP right after the function call. That is where the first condition skips the else block and jumps to the code that follows. The basic pattern to look for when trying to detect a simple ‘if-else’ statement in a disassembled program is a condition where the code that follows it ends with an unconditional jump.

Most high-level languages also support a slightly more complex version of a two-way conditional where a separate conditional statement is used for each of the two code blocks. This is usually implemented by combining the ‘if’ and else-if keywords where each statement is used with a separate conditional statement. This way, if the first condition is not satisfied, the program jumps to the second condition, evaluates that one, and simply skips the entire conditional block if neither condition is satisfied. If one of the conditions is satisfied, the corresponding conditional block is executed, and execution just flows into the next program statement. Figure A.3 provides a high-level/low- level view of this type of control flow construct.

Multiple-Alternative Conditionals

Sometimes programmers create long statements with multiple conditions, where each condition leads to the execution of a different code block. One way to implement this in high-level languages is by using a “switch” block (discussed later), but it is also possible to do this using conventional ‘if’ statements. The reason that programmers sometimes must use ‘if’ statements is that they allow for more flexible conditional statements. The problem is that ‘switch’ blocks don’t support complex conditions, only the use of hardcoded constants. In contrast, a sequence of ‘else-if’ statements allows for any kind of complex condition on each of the blocks—it is just more flexible.