Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Eilam E.Reversing.Secrets of reverse engineering.2005

.pdf
Скачиваний:
65
Добавлен:
23.08.2013
Размер:
8.78 Mб
Скачать

Auditing Program Binaries 251

but there are other, far more subtle mistakes that can create potential buffer overflow bugs.

One technique that aims to automatically prevent these problems from occurring is by the use of automatic, compiler-generated stack checking. The idea is quite simple: For any function that accesses local variables by reference, push an extra cookie or canary to the stack between the last local variable and the function’s return address. This cookie should then be validated before the function returns to the caller. If the cookie has been modified, program execution immediately stops. This ensures that the return value hasn’t been overwritten with some other address and prevents the execution of any kind of malicious code.

One thing that’s immediately clear about this approach is that the cookie must be a random number. If it’s not, an attacker could simply add the cookie’s value as part of the overflowing payload and bypass the stack protection. The solution is to use a pseudorandom number as a cookie. If you’re wondering just how random pseudorandom numbers can be, take a look at [Knuth2] Donald E. Knuth.

The Art of Computer Programming—Volume 2: Seminumerical Algorithms (Second Edition). Addison Wesley, but suffice it to say that they’re random enough for this purpose. With a pseudorandom number, the attacker has no way of knowing in advance what the cookie is going to be, and so it becomes impossible to fool the cookie verification code (though it’s still possible to work around this whole mechanism in other ways, as explained later in this chapter).

The following code is the same launch function from before, except that stack checking has been added (using the /GS option in the Microsoft C/C++ compiler).

Chapter7!launch:

 

00401060

sub

esp,0x68

00401063

mov

eax,[Chapter7!__security_cookie (0040a428)]

00401068

mov

[esp+0x64],eax

0040106c

mov

eax,[esp+0x6c]

00401070

lea

edx,[esp]

00401073

sub

edx,eax

00401075

mov

cl,[eax]

00401077

mov

[edx+eax],cl

0040107a

inc

eax

0040107b

test

cl,cl

0040107d

jnz

Chapter7!launch+0x15 (00401075)

0040107f

push

edi

00401080

lea

edi,[esp+0x4]

00401084

dec

edi

00401085

mov

al,[edi+0x1]

00401088

inc

edi

00401089

test

al,al

0040108b

jnz

Chapter7!launch+0x25 (00401085)

0040108d

mov

eax,[Chapter7!'string’ (00408128)]

00401092

mov

cl,[Chapter7!'string’+0x4 (0040812c)]

252 Chapter 7

00401098

lea

edx,[esp+0x4]

0040109c

mov

[edi],eax

0040109e

push

edx

0040109f

mov

[edi+0x4],cl

004010a2

call

Chapter7!system (00401110)

004010a7

mov

ecx,[esp+0x6c]

004010ab

add

esp,0x4

004010ae

pop

edi

004010af

call

Chapter7!__security_check_cookie (004011d7)

004010b4

add

esp,0x68

004010b7

ret

 

The __security_check_cookie function is called before launch returns in order to verify that the cookie has not been corrupted. Here is what

__security_check_cookie does.

__security_check_cookie:

004011d7

cmp

ecx,[Chapter7!__security_cookie (0040a428)]

004011dd

jnz

Chapter7!__security_check_cookie+0x9 (004011e0)

004011df

ret

 

004011e0

jmp

Chapter7!report_failure (004011a6)

This idea was originally presented in [Cowan], Crispin Cowan, Calton Pu, David Maier, Heather Hinton, Peat Bakke, Steve Beattie, Aaron Grier, Perry Wagle, and Qian Zhang. Automatic Detection and Prevention of Buffer-Overflow Attacks. The 7th USENIX Security Symposium. San Antonio, TX, January 1998 and has since been implemented in several compilers. The latest versions of the Microsoft C/C++ compilers support stack checking, and the Microsoft operating systems (starting with Windows Server 2003 and Windows XP Service Pack 2) take advantage of this feature.

In Windows, the cookie is stored in a global variable within the protected module (usually in __security_cookie). This variable is initialized by __security_init_cookie when the module is loaded, and is randomized based on the current process and thread IDs, along with the current time or the value of the hardware performance counter (see Listing 7.1). In case you’re wondering, here is the source code for __security_init_cookie. This code is embedded into any program built using the Microsoft compiler that has stack checking enabled.

void __cdecl __security_init_cookie(void)

{

DWORD_PTR cookie;

FT systime;

LARGE_INTEGER perfctr;

Listing 7.1 The __security_init_cookie function that initializes the stack-checking cookie in code generated by the Microsoft C/C++ compiler. (continued)

Auditing Program Binaries 253

/*

* Do nothing if the global cookie has already been initialized. */

if (security_cookie && security_cookie != DEFAULT_SECURITY_COOKIE) return;

/*

*Initialize the global cookie with an unpredictable value which is

*different for each module in a process. Combine a number of sources

*of randomness.

*/

GetSystemTimeAsFileTime(&systime.ft_struct); #if !defined (_WIN64)

cookie = systime.ft_struct.dwLowDateTime; cookie ^= systime.ft_struct.dwHighDateTime; #else /* !defined (_WIN64) */

cookie = systime.ft_scalar; #endif /* !defined (_WIN64) */

cookie ^= GetCurrentProcessId(); cookie ^= GetCurrentThreadId(); cookie ^= GetTickCount();

QueryPerformanceCounter(&perfctr); #if !defined (_WIN64)

cookie ^= perfctr.LowPart; cookie ^= perfctr.HighPart; #else /* !defined (_WIN64) */ cookie ^= perfctr.QuadPart; #endif /* !defined (_WIN64) */

/*

*Make sure the global cookie is never initialized to zero, since in

*that case an overrun which sets the local cookie and return address

*to the same value would go undetected.

*/

__security_cookie = cookie ? cookie : DEFAULT_SECURITY_COOKIE;

}

Listing 7.1 (continued)

Unsurprisingly, stack checking is not impossible to defeat [Bulba, Koziol]. Exactly how that’s done is beyond the scope of this book, but suffice it to say that in some functions the attacker still has a window of opportunity for writing into a local memory address (which almost guarantees that he or she will be able to

254Chapter 7

take over the program in question) before the function reaches the cookie verification code. There are several different tricks that will work in different cases. One option is to try and overwrite the area in the stack where parameters were passed to the function. This trick works for functions that use stack parameters for returning values to their callers, and is typically implemented by having the caller pass a memory address as a parameter and by having the callee write back into that memory address.

The idea is that when a function has a buffer overflow bug, the memory address used for returning values to the caller (assuming that the function does that) can be overwritten using a specially crafted buffer, which would get the function to overwrite a memory address chosen by the attacker (because the function takes that address and writes to it). By being able to write data to an arbitrary address in memory attackers can sometimes gain control of the process before the stack-checking code finds out that a buffer overflow had occurred. In order to do that, attackers must locate a function that passes values back to the caller using parameters and that has an overflow bug. Then in order to exploit such a vulnerability, they must figure out an address to write to in memory that would allow them to run their own code before the process is terminated by the stack-checking code. This address is usually some kind of global address that controls which code is executed when stack checking fails.

As you can see, exploiting programs that have stack-checking mechanisms embedded into them is not as easy as exploiting simple buffer overflow bugs. This means that even though it doesn’t completely eliminate the problem, stack checking does somewhat reduce the total number of possible exploits in a program.

Nonexecutable Memory

This discussion wouldn’t be complete without mentioning one other weapon that helps fight buffer overflows: nonexecutable memory. Certain processors provide support for defining memory pages as nonexecutable, which means that they can only be used for storing data, and that the processor will not run code stored in them. The operating system can then mark stack and data pages as nonexecutable, which prevents an attacker from running code on them using a buffer overflow.

At the time of writing, many new processors already support this functionality (including recent versions of Intel and AMD processors, and the IA-64 Intel processors), and so do many operating systems (including Windows XP Service Pack 2 and above, Solaris 2.6 and above, and several patches implemented for the Linux kernel).

Needless to say, nonexecutable memory doesn’t exactly invalidate the whole concept of buffer overflow attacks. It is quite possible for attackers to

Auditing Program Binaries 255

overcome the hurdles imposed by nonexecutable memory systems, as long as a vulnerable piece of code is found [Designer, Wojtczuk]. The most popular strategy (often called return-to-libc) is to modify the function’s return address to point to a well-known function (such as a runtime library function or a system API) that helps attackers gain control over the process. This completely avoids the problem of having a nonexecutable stack, but requires a slightly more involved exploit.

Heap Overflows

Another type of overflow that can be used for taking control of a program or of the entire system is the malloc exploit or heap overflow [anonymous], [Kaempf], [jp]. The general idea is the same as a stack overflow: programs receive data of an unexpected length and copy it into a buffer that’s too small to contain it. This causes the program to overwrite whatever it is that follows the heap block in memory. Typically, heaps are arranged as linked lists, and the pointers to the next and previous heap blocks are placed either right before or right after the actual block data. This means that writing past the end of a heap block would corrupt that linked list in some way. Usually, this causes the program to crash as soon as the heap manager traverses the linked list (in order to free a block for example), but when done carefully a heap overflow can be used to take over a system.

The idea is that attackers can take advantage of the heap’s linked-list structure in order to overwrite some memory address in the process’s address space. Implementing such attacks can be quite complicated, but the basic idea is fairly straightforward. Because each block in the linked list has “next” and “prev” members, it is possible to overwrite these members in a way that would allow the attacker to write an arbitrary value into an arbitrary address in memory.

Think of what takes place when an element is removed from a doubly linked list. The system must correct the links in the two adjacent items on the list (both the previous item and the next item), so that they correctly link to one another, and not to the item you’re currently deleting. This means that when the item is removed, the code will write the address of the next member into the previous item’s header (it will take both addresses from the header of item currently being deleted), and the address of the prev item into the next item’s header (again, the addresses will be taken from the item currently being deleted). It’s not easy, but by carefully overwriting the values of these next and prev members in one item on the list, attackers can in some cases manage to overwrite strategic memory addresses in the process address space. Of course, the overwrite doesn’t take place immediately—it only happens when the overwritten item is freed.

256 Chapter 7

It should be noted that heap overflows are usually less common than stack overflows because the sizes of heap blocks are almost always dynamically calculated to be large enough to fit the incoming data. Unlike stack buffers, whose size must be predefined, heap buffers have a dynamic size (that’s the whole point of a heap). Because of this, programmers rarely hard-code the size of a heap block when they have variably sized incoming data that they wish to fit into that block. Heap blocks typically become a problem when the programmer miscalculates the number of bytes needed to hold a particular usersupplied buffer in memory.

String Filters

Traditionally, a significant portion of overflow attacks have been stringrelated. The most common example has been the use of the various runtime library string-manipulation routines for copying or processing strings in some way, while letting the routine determine how much data should be written. This is the common strcpy case demonstrated earlier, where an outsider is allowed to provide a string that is copied into a fixed-sized internal buffer through strcpy. Because strcpy only stops copying when it encounters a NULL terminator, the caller can supply a string that would be too long for the target buffer, thus causing an overflow.

What happens if the attacker’s string is internally converted into Unicode (as most strings are in Win32) before it reaches the vulnerable function? In such cases the attacker must feed the vulnerable program a sequence of ASCII characters that would become a workable shellcode once converted into Unicode! This effectively means that between each attacker-provided opcode byte, the Unicode conversion process will add a zero byte. You may be surprised to learn that it’s actually possible to write shellcodes that work after they’re converted to Unicode. The process of developing working shellcodes in this hostile environment is discussed in [Obscou]. What can I say, being an attacker isn’t easy.

Integer Overflows

Integer overflows (see [Blexim], [Koziol]) are a special type of overflow bug where incorrect treatment of integers can lead to a numerical overflow which eventually results in a buffer overflow. The common case in which this happens is when an application receives the length of some data block from the outside world. Except for really extreme cases of recklessness, programmers typically perform some sort of bounds checking on such an integer. Unfortunately, safely checking an integer value is not as trivial as it seems, and there are numerous pitfalls that could allow bad input values to pass as legal values. Here is the most trivial example:

 

 

 

Auditing Program Binaries 257

push

esi

 

 

push

100

 

; /size = 100 (256.)

call

Chapter7.malloc

 

; \malloc

mov

esi,eax

 

 

add

esp,4

 

 

test

esi,esi

 

 

je

short Chapter7.0040104E

 

mov

eax,dword ptr [esp+C]

 

cmp

eax,100

 

 

jg

short Chapter7.0040104E

 

push

eax

 

; /maxlen

mov

eax,dword ptr [esp+C]

; |

push

eax

 

; |src

push

esi

 

; |dest

call

Chapter7.strncpy

; \strncpy

add

esp,0C

 

 

Chapter7.0040104E:

 

 

mov

eax,esipop

esi

 

retn

 

 

 

This function allocates a fixed size buffer (256 bytes long) and copies a usersupplied string into that buffer. The length of the source buffer is also usersupplied (through [esp + c]). This is not a typical overflow vulnerability and is slightly less obvious because the user-supplied length is checked to make sure that it doesn’t exceed the allocated buffer size (that’s the cmp eax, 100). The caveat in this particular sample is the data type of the buffer-length parameter.

There are two conditional code groups in IA-32 assembly language, signed and unsigned, each operating on different CPU flags. The conditional code used in a conditional jump usually exposes the exact data type used in the comparison in the original source code. In this particular case, the use of JG (jump if greater) indicates that the compiler was treating the buffer length parameter as a signed integer. If the parameter was defined as an unsigned integer or simply cast to an unsigned integer during the comparison, the compiler would have generated JA (jump if above) instead of JG for the comparison. You’ll find more information on flags and conditional codes in Appendix A.

Signed buffer-length comparisons are dangerous because with the right input value it is possible to bypass the buffer length check. The idea is quite simple. Conceptually, buffer lengths are always unsigned values because there is no such thing as a negative buffer length—a buffer length variable can only be 0 or some positive integer. When buffer lengths are stored as signed integers comparisons can produce unexpected results because the condition SignedBufferLen <= MAXIMUM_LEN would not only be satisfied when 0 <= SignedBufferLen <= MAXIMUM_LEN, but also when SignedBufferLen < 0. Of course, functions that take buffer lengths as input can’t possibly use negative values, so any negative value is treated as a very large number.

258 Chapter 7

Arithmetic Operations on User-Supplied Integers

Integer overflows come in many flavors. Consider, for example, another case where the buffer length is received from the attacker and is then somehow modified. This is quite common, especially if the program needs to store the usersupplied buffer along with some header or other fixed-sized supplement. Suppose the program takes the user-supplied length and adds a certain constant to it—this will typically be a header length of some sort. This can create significant risks because an attacker could take advantage of integer overflows to create a buffer overflow. Here is an example of code that does this sort of thing:

allocate_object:

 

00401021

push

esi

00401022

push

edi

00401023

mov

edi,[esp+0x10]

00401027

lea

esi,[edi+0x18]

0040102a

push

esi

0040102b

call

Chapter7!malloc (004010d8)

00401030

pop

ecx

00401031

xor

ecx,ecx

00401033

cmp

eax,ecx

00401035

jnz

Chapter7!allocate_object+0x1a (0040103b)

00401037

xor

eax,eax

00401039

jmp

Chapter7!allocate_object+0x42 (00401063)

0040103b

mov

[eax+0x4],ecx

0040103e

mov

[eax+0x8],ecx

00401041

mov

[eax+0xc],ecx

00401044

mov

[eax+0x10],ecx

00401047

mov

[eax+0x14],ecx

0040104a

mov

ecx,edi

0040104c

mov

edx,ecx

0040104e

mov

[eax],esi

00401050

mov

esi,[esp+0xc]

00401054

shr

ecx,0x2

00401057

lea

edi,[eax+0x18]

0040105a

rep

movsd

0040105c

mov

ecx,edx

0040105e

and

ecx,0x3

00401061

rep

movsb

00401063

pop

edi

00401064

pop

esi

00401065

ret

 

The preceding contrived, yet somewhat realistic, function takes a buffer pointer and a buffer length as parameters and allocates a buffer of the length passed to it via [esp+0x10] plus 0x18 (24 bytes). It then initializes what appears to be some kind of a buffer in the beginning and copies the user supplied buffer from [esp+0xc] to offset +18 in the newly allocated block (that’s

Auditing Program Binaries 259

the lea edi,[eax+0x18]). The return value is the pointer of the newly allocated block. Clearly, the idea is that an object is being allocated with a 24-bytes- long buffer. The buffer is being zero initialized, except for the first member at offset +0, which is set to the total size of the buffer allocated. The user-supplied buffer is then placed after the header in the newly allocated block.

At first glance, this code appears to be perfectly safe because the function only writes as many bytes to the allocated buffer as it managed to allocate. The problem is that, as usual, we’re dealing with values coming in from the outside world; there’s no way of knowing what we’re going to get. In this particular case, the problem is caused by the arithmetic operation performed on the buffer length parameter.

The lea esi,[edi+0x18] at address 00401027 seems innocent, but what happens if EDI contains a very high value that’s close to 0xffffffff? In such a case, the addition would overflow and the result would be a low positive number, possibly lower than the length of the buffer itself! Suppose, for example, that you feed the function with 0xfffffff8 as the buffer length. 0xfffffff8 + 0x18 = 0x100000010, but that number is larger than 32 bits. The processor is truncating the result, and you end up with 0x00000010.

Keeping in mind that the buffer length copied by the function is the original supplied length (before the header length was added to it), you can now see how this function would definitely crash. The malloc call will allocate a buffer of 0x10 bytes long, but the function will try to copy 0xfffffff8 bytes to the newly allocated buffer, thus crashing the program.

The solution to this problem is to take a limited-sized input and make sure that the target variable can contain the largest possible result. For example, assuming that 16 bits are enough to represent the user buffer length; simply changing the preceding program to use an unsigned short for the user buffer length would solve the problem. Here is what the corrected version of this function looks like:

allocate_object:

 

00401024

push

esi

00401025

movzx

esi,word ptr [esp+0xc]

0040102a

push

edi

0040102b

lea

edi,[esi+0x18]

0040102e

push

edi

0040102f

call

Chapter7!malloc (004010dc)

00401034

pop

ecx

00401035

xor

ecx,ecx

00401037

cmp

eax,ecx

00401039

jnz

Chapter7!allocate_object+0x1b (0040103f)

0040103b

xor

eax,eax

0040103d

jmp

Chapter7!allocate_object+0x43 (00401067)

0040103f

mov

[eax+0x4],ecx

00401042

mov

[eax+0x8],ecx

00401045

mov

[eax+0xc],ecx

260 Chapter 7

00401048

mov

[eax+0x10],ecx

0040104b

mov

[eax+0x14],ecx

0040104e

mov

ecx,esi

00401050

mov

esi,[esp+0xc]

00401054

mov

edx,ecx

00401056

mov

[eax],edi

00401058

shr

ecx,0x2

0040105b

lea

edi,[eax+0x18]

0040105e

rep

movsd

00401060

mov

ecx,edx

00401062

and

ecx,0x3

00401065

rep

movsb

00401067

pop

edi

00401068

pop

esi

00401069

ret

 

This function is effectively identical to the original version presented earlier, except for movzx esi,word ptr [esp+0xc] at 00401025. The idea is that instead of directly loading the buffer length from the stack and adding 0x18 to it, we now treat it as an unsigned short, which eliminates the possibly of causing an overflow because the arithmetic is performed using 32-bit registers. The use of the MOVZX instruction is crucial here and is discussed in the next section.

Type Conversion Errors

Sometimes software developers don’t fully understand the semantics of the programming language they are using. These semantics can be critical because they define (among other things) how data is going to be handled at a low level. Type conversion errors take place when developers mishandle incoming data types and perform incorrect conversions on them. For example, consider the following variant on my famous allocate_object function:

allocate_object:

 

00401021

push

esi

00401022

movsx

esi,word ptr [esp+0xc]

00401027

push

edi

00401028

lea

edi,[esi+0x18]

0040102b

push

edi

0040102c

call

Chapter7!malloc (004010d9)

00401031

pop

ecx

00401032

xor

ecx,ecx

00401034

cmp

eax,ecx

00401036

jnz

Chapter7!allocate_object+0x1b (0040103c)

00401038

xor

eax,eax

0040103a

jmp

Chapter7!allocate_object+0x43 (00401064)

0040103c

mov

[eax+0x4],ecx

0040103f

mov

[eax+0x8],ecx