Eilam E.Reversing.Secrets of reverse engineering.2005
.pdfAuditing Program Binaries 251
but there are other, far more subtle mistakes that can create potential buffer overflow bugs.
One technique that aims to automatically prevent these problems from occurring is by the use of automatic, compiler-generated stack checking. The idea is quite simple: For any function that accesses local variables by reference, push an extra cookie or canary to the stack between the last local variable and the function’s return address. This cookie should then be validated before the function returns to the caller. If the cookie has been modified, program execution immediately stops. This ensures that the return value hasn’t been overwritten with some other address and prevents the execution of any kind of malicious code.
One thing that’s immediately clear about this approach is that the cookie must be a random number. If it’s not, an attacker could simply add the cookie’s value as part of the overflowing payload and bypass the stack protection. The solution is to use a pseudorandom number as a cookie. If you’re wondering just how random pseudorandom numbers can be, take a look at [Knuth2] Donald E. Knuth.
The Art of Computer Programming—Volume 2: Seminumerical Algorithms (Second Edition). Addison Wesley, but suffice it to say that they’re random enough for this purpose. With a pseudorandom number, the attacker has no way of knowing in advance what the cookie is going to be, and so it becomes impossible to fool the cookie verification code (though it’s still possible to work around this whole mechanism in other ways, as explained later in this chapter).
The following code is the same launch function from before, except that stack checking has been added (using the /GS option in the Microsoft C/C++ compiler).
Chapter7!launch: |
|
|
00401060 |
sub |
esp,0x68 |
00401063 |
mov |
eax,[Chapter7!__security_cookie (0040a428)] |
00401068 |
mov |
[esp+0x64],eax |
0040106c |
mov |
eax,[esp+0x6c] |
00401070 |
lea |
edx,[esp] |
00401073 |
sub |
edx,eax |
00401075 |
mov |
cl,[eax] |
00401077 |
mov |
[edx+eax],cl |
0040107a |
inc |
eax |
0040107b |
test |
cl,cl |
0040107d |
jnz |
Chapter7!launch+0x15 (00401075) |
0040107f |
push |
edi |
00401080 |
lea |
edi,[esp+0x4] |
00401084 |
dec |
edi |
00401085 |
mov |
al,[edi+0x1] |
00401088 |
inc |
edi |
00401089 |
test |
al,al |
0040108b |
jnz |
Chapter7!launch+0x25 (00401085) |
0040108d |
mov |
eax,[Chapter7!'string’ (00408128)] |
00401092 |
mov |
cl,[Chapter7!'string’+0x4 (0040812c)] |
252 Chapter 7
00401098 |
lea |
edx,[esp+0x4] |
0040109c |
mov |
[edi],eax |
0040109e |
push |
edx |
0040109f |
mov |
[edi+0x4],cl |
004010a2 |
call |
Chapter7!system (00401110) |
004010a7 |
mov |
ecx,[esp+0x6c] |
004010ab |
add |
esp,0x4 |
004010ae |
pop |
edi |
004010af |
call |
Chapter7!__security_check_cookie (004011d7) |
004010b4 |
add |
esp,0x68 |
004010b7 |
ret |
|
The __security_check_cookie function is called before launch returns in order to verify that the cookie has not been corrupted. Here is what
__security_check_cookie does.
__security_check_cookie:
004011d7 |
cmp |
ecx,[Chapter7!__security_cookie (0040a428)] |
004011dd |
jnz |
Chapter7!__security_check_cookie+0x9 (004011e0) |
004011df |
ret |
|
004011e0 |
jmp |
Chapter7!report_failure (004011a6) |
This idea was originally presented in [Cowan], Crispin Cowan, Calton Pu, David Maier, Heather Hinton, Peat Bakke, Steve Beattie, Aaron Grier, Perry Wagle, and Qian Zhang. Automatic Detection and Prevention of Buffer-Overflow Attacks. The 7th USENIX Security Symposium. San Antonio, TX, January 1998 and has since been implemented in several compilers. The latest versions of the Microsoft C/C++ compilers support stack checking, and the Microsoft operating systems (starting with Windows Server 2003 and Windows XP Service Pack 2) take advantage of this feature.
In Windows, the cookie is stored in a global variable within the protected module (usually in __security_cookie). This variable is initialized by __security_init_cookie when the module is loaded, and is randomized based on the current process and thread IDs, along with the current time or the value of the hardware performance counter (see Listing 7.1). In case you’re wondering, here is the source code for __security_init_cookie. This code is embedded into any program built using the Microsoft compiler that has stack checking enabled.
void __cdecl __security_init_cookie(void)
{
DWORD_PTR cookie;
FT systime;
LARGE_INTEGER perfctr;
Listing 7.1 The __security_init_cookie function that initializes the stack-checking cookie in code generated by the Microsoft C/C++ compiler. (continued)
Auditing Program Binaries 253
/*
* Do nothing if the global cookie has already been initialized. */
if (security_cookie && security_cookie != DEFAULT_SECURITY_COOKIE) return;
/*
*Initialize the global cookie with an unpredictable value which is
*different for each module in a process. Combine a number of sources
*of randomness.
*/
GetSystemTimeAsFileTime(&systime.ft_struct); #if !defined (_WIN64)
cookie = systime.ft_struct.dwLowDateTime; cookie ^= systime.ft_struct.dwHighDateTime; #else /* !defined (_WIN64) */
cookie = systime.ft_scalar; #endif /* !defined (_WIN64) */
cookie ^= GetCurrentProcessId(); cookie ^= GetCurrentThreadId(); cookie ^= GetTickCount();
QueryPerformanceCounter(&perfctr); #if !defined (_WIN64)
cookie ^= perfctr.LowPart; cookie ^= perfctr.HighPart; #else /* !defined (_WIN64) */ cookie ^= perfctr.QuadPart; #endif /* !defined (_WIN64) */
/*
*Make sure the global cookie is never initialized to zero, since in
*that case an overrun which sets the local cookie and return address
*to the same value would go undetected.
*/
__security_cookie = cookie ? cookie : DEFAULT_SECURITY_COOKIE;
}
Listing 7.1 (continued)
Unsurprisingly, stack checking is not impossible to defeat [Bulba, Koziol]. Exactly how that’s done is beyond the scope of this book, but suffice it to say that in some functions the attacker still has a window of opportunity for writing into a local memory address (which almost guarantees that he or she will be able to
254Chapter 7
take over the program in question) before the function reaches the cookie verification code. There are several different tricks that will work in different cases. One option is to try and overwrite the area in the stack where parameters were passed to the function. This trick works for functions that use stack parameters for returning values to their callers, and is typically implemented by having the caller pass a memory address as a parameter and by having the callee write back into that memory address.
The idea is that when a function has a buffer overflow bug, the memory address used for returning values to the caller (assuming that the function does that) can be overwritten using a specially crafted buffer, which would get the function to overwrite a memory address chosen by the attacker (because the function takes that address and writes to it). By being able to write data to an arbitrary address in memory attackers can sometimes gain control of the process before the stack-checking code finds out that a buffer overflow had occurred. In order to do that, attackers must locate a function that passes values back to the caller using parameters and that has an overflow bug. Then in order to exploit such a vulnerability, they must figure out an address to write to in memory that would allow them to run their own code before the process is terminated by the stack-checking code. This address is usually some kind of global address that controls which code is executed when stack checking fails.
As you can see, exploiting programs that have stack-checking mechanisms embedded into them is not as easy as exploiting simple buffer overflow bugs. This means that even though it doesn’t completely eliminate the problem, stack checking does somewhat reduce the total number of possible exploits in a program.
Nonexecutable Memory
This discussion wouldn’t be complete without mentioning one other weapon that helps fight buffer overflows: nonexecutable memory. Certain processors provide support for defining memory pages as nonexecutable, which means that they can only be used for storing data, and that the processor will not run code stored in them. The operating system can then mark stack and data pages as nonexecutable, which prevents an attacker from running code on them using a buffer overflow.
At the time of writing, many new processors already support this functionality (including recent versions of Intel and AMD processors, and the IA-64 Intel processors), and so do many operating systems (including Windows XP Service Pack 2 and above, Solaris 2.6 and above, and several patches implemented for the Linux kernel).
Needless to say, nonexecutable memory doesn’t exactly invalidate the whole concept of buffer overflow attacks. It is quite possible for attackers to
Auditing Program Binaries 255
overcome the hurdles imposed by nonexecutable memory systems, as long as a vulnerable piece of code is found [Designer, Wojtczuk]. The most popular strategy (often called return-to-libc) is to modify the function’s return address to point to a well-known function (such as a runtime library function or a system API) that helps attackers gain control over the process. This completely avoids the problem of having a nonexecutable stack, but requires a slightly more involved exploit.
Heap Overflows
Another type of overflow that can be used for taking control of a program or of the entire system is the malloc exploit or heap overflow [anonymous], [Kaempf], [jp]. The general idea is the same as a stack overflow: programs receive data of an unexpected length and copy it into a buffer that’s too small to contain it. This causes the program to overwrite whatever it is that follows the heap block in memory. Typically, heaps are arranged as linked lists, and the pointers to the next and previous heap blocks are placed either right before or right after the actual block data. This means that writing past the end of a heap block would corrupt that linked list in some way. Usually, this causes the program to crash as soon as the heap manager traverses the linked list (in order to free a block for example), but when done carefully a heap overflow can be used to take over a system.
The idea is that attackers can take advantage of the heap’s linked-list structure in order to overwrite some memory address in the process’s address space. Implementing such attacks can be quite complicated, but the basic idea is fairly straightforward. Because each block in the linked list has “next” and “prev” members, it is possible to overwrite these members in a way that would allow the attacker to write an arbitrary value into an arbitrary address in memory.
Think of what takes place when an element is removed from a doubly linked list. The system must correct the links in the two adjacent items on the list (both the previous item and the next item), so that they correctly link to one another, and not to the item you’re currently deleting. This means that when the item is removed, the code will write the address of the next member into the previous item’s header (it will take both addresses from the header of item currently being deleted), and the address of the prev item into the next item’s header (again, the addresses will be taken from the item currently being deleted). It’s not easy, but by carefully overwriting the values of these next and prev members in one item on the list, attackers can in some cases manage to overwrite strategic memory addresses in the process address space. Of course, the overwrite doesn’t take place immediately—it only happens when the overwritten item is freed.
256 Chapter 7
It should be noted that heap overflows are usually less common than stack overflows because the sizes of heap blocks are almost always dynamically calculated to be large enough to fit the incoming data. Unlike stack buffers, whose size must be predefined, heap buffers have a dynamic size (that’s the whole point of a heap). Because of this, programmers rarely hard-code the size of a heap block when they have variably sized incoming data that they wish to fit into that block. Heap blocks typically become a problem when the programmer miscalculates the number of bytes needed to hold a particular usersupplied buffer in memory.
String Filters
Traditionally, a significant portion of overflow attacks have been stringrelated. The most common example has been the use of the various runtime library string-manipulation routines for copying or processing strings in some way, while letting the routine determine how much data should be written. This is the common strcpy case demonstrated earlier, where an outsider is allowed to provide a string that is copied into a fixed-sized internal buffer through strcpy. Because strcpy only stops copying when it encounters a NULL terminator, the caller can supply a string that would be too long for the target buffer, thus causing an overflow.
What happens if the attacker’s string is internally converted into Unicode (as most strings are in Win32) before it reaches the vulnerable function? In such cases the attacker must feed the vulnerable program a sequence of ASCII characters that would become a workable shellcode once converted into Unicode! This effectively means that between each attacker-provided opcode byte, the Unicode conversion process will add a zero byte. You may be surprised to learn that it’s actually possible to write shellcodes that work after they’re converted to Unicode. The process of developing working shellcodes in this hostile environment is discussed in [Obscou]. What can I say, being an attacker isn’t easy.
Integer Overflows
Integer overflows (see [Blexim], [Koziol]) are a special type of overflow bug where incorrect treatment of integers can lead to a numerical overflow which eventually results in a buffer overflow. The common case in which this happens is when an application receives the length of some data block from the outside world. Except for really extreme cases of recklessness, programmers typically perform some sort of bounds checking on such an integer. Unfortunately, safely checking an integer value is not as trivial as it seems, and there are numerous pitfalls that could allow bad input values to pass as legal values. Here is the most trivial example:
|
|
|
Auditing Program Binaries 257 |
push |
esi |
|
|
push |
100 |
|
; /size = 100 (256.) |
call |
Chapter7.malloc |
|
; \malloc |
mov |
esi,eax |
|
|
add |
esp,4 |
|
|
test |
esi,esi |
|
|
je |
short Chapter7.0040104E |
|
|
mov |
eax,dword ptr [esp+C] |
|
|
cmp |
eax,100 |
|
|
jg |
short Chapter7.0040104E |
|
|
push |
eax |
|
; /maxlen |
mov |
eax,dword ptr [esp+C] |
; | |
|
push |
eax |
|
; |src |
push |
esi |
|
; |dest |
call |
Chapter7.strncpy |
; \strncpy |
|
add |
esp,0C |
|
|
Chapter7.0040104E: |
|
|
|
mov |
eax,esipop |
esi |
|
retn |
|
|
|
This function allocates a fixed size buffer (256 bytes long) and copies a usersupplied string into that buffer. The length of the source buffer is also usersupplied (through [esp + c]). This is not a typical overflow vulnerability and is slightly less obvious because the user-supplied length is checked to make sure that it doesn’t exceed the allocated buffer size (that’s the cmp eax, 100). The caveat in this particular sample is the data type of the buffer-length parameter.
There are two conditional code groups in IA-32 assembly language, signed and unsigned, each operating on different CPU flags. The conditional code used in a conditional jump usually exposes the exact data type used in the comparison in the original source code. In this particular case, the use of JG (jump if greater) indicates that the compiler was treating the buffer length parameter as a signed integer. If the parameter was defined as an unsigned integer or simply cast to an unsigned integer during the comparison, the compiler would have generated JA (jump if above) instead of JG for the comparison. You’ll find more information on flags and conditional codes in Appendix A.
Signed buffer-length comparisons are dangerous because with the right input value it is possible to bypass the buffer length check. The idea is quite simple. Conceptually, buffer lengths are always unsigned values because there is no such thing as a negative buffer length—a buffer length variable can only be 0 or some positive integer. When buffer lengths are stored as signed integers comparisons can produce unexpected results because the condition SignedBufferLen <= MAXIMUM_LEN would not only be satisfied when 0 <= SignedBufferLen <= MAXIMUM_LEN, but also when SignedBufferLen < 0. Of course, functions that take buffer lengths as input can’t possibly use negative values, so any negative value is treated as a very large number.
258 Chapter 7
Arithmetic Operations on User-Supplied Integers
Integer overflows come in many flavors. Consider, for example, another case where the buffer length is received from the attacker and is then somehow modified. This is quite common, especially if the program needs to store the usersupplied buffer along with some header or other fixed-sized supplement. Suppose the program takes the user-supplied length and adds a certain constant to it—this will typically be a header length of some sort. This can create significant risks because an attacker could take advantage of integer overflows to create a buffer overflow. Here is an example of code that does this sort of thing:
allocate_object: |
|
|
00401021 |
push |
esi |
00401022 |
push |
edi |
00401023 |
mov |
edi,[esp+0x10] |
00401027 |
lea |
esi,[edi+0x18] |
0040102a |
push |
esi |
0040102b |
call |
Chapter7!malloc (004010d8) |
00401030 |
pop |
ecx |
00401031 |
xor |
ecx,ecx |
00401033 |
cmp |
eax,ecx |
00401035 |
jnz |
Chapter7!allocate_object+0x1a (0040103b) |
00401037 |
xor |
eax,eax |
00401039 |
jmp |
Chapter7!allocate_object+0x42 (00401063) |
0040103b |
mov |
[eax+0x4],ecx |
0040103e |
mov |
[eax+0x8],ecx |
00401041 |
mov |
[eax+0xc],ecx |
00401044 |
mov |
[eax+0x10],ecx |
00401047 |
mov |
[eax+0x14],ecx |
0040104a |
mov |
ecx,edi |
0040104c |
mov |
edx,ecx |
0040104e |
mov |
[eax],esi |
00401050 |
mov |
esi,[esp+0xc] |
00401054 |
shr |
ecx,0x2 |
00401057 |
lea |
edi,[eax+0x18] |
0040105a |
rep |
movsd |
0040105c |
mov |
ecx,edx |
0040105e |
and |
ecx,0x3 |
00401061 |
rep |
movsb |
00401063 |
pop |
edi |
00401064 |
pop |
esi |
00401065 |
ret |
|
The preceding contrived, yet somewhat realistic, function takes a buffer pointer and a buffer length as parameters and allocates a buffer of the length passed to it via [esp+0x10] plus 0x18 (24 bytes). It then initializes what appears to be some kind of a buffer in the beginning and copies the user supplied buffer from [esp+0xc] to offset +18 in the newly allocated block (that’s
Auditing Program Binaries 259
the lea edi,[eax+0x18]). The return value is the pointer of the newly allocated block. Clearly, the idea is that an object is being allocated with a 24-bytes- long buffer. The buffer is being zero initialized, except for the first member at offset +0, which is set to the total size of the buffer allocated. The user-supplied buffer is then placed after the header in the newly allocated block.
At first glance, this code appears to be perfectly safe because the function only writes as many bytes to the allocated buffer as it managed to allocate. The problem is that, as usual, we’re dealing with values coming in from the outside world; there’s no way of knowing what we’re going to get. In this particular case, the problem is caused by the arithmetic operation performed on the buffer length parameter.
The lea esi,[edi+0x18] at address 00401027 seems innocent, but what happens if EDI contains a very high value that’s close to 0xffffffff? In such a case, the addition would overflow and the result would be a low positive number, possibly lower than the length of the buffer itself! Suppose, for example, that you feed the function with 0xfffffff8 as the buffer length. 0xfffffff8 + 0x18 = 0x100000010, but that number is larger than 32 bits. The processor is truncating the result, and you end up with 0x00000010.
Keeping in mind that the buffer length copied by the function is the original supplied length (before the header length was added to it), you can now see how this function would definitely crash. The malloc call will allocate a buffer of 0x10 bytes long, but the function will try to copy 0xfffffff8 bytes to the newly allocated buffer, thus crashing the program.
The solution to this problem is to take a limited-sized input and make sure that the target variable can contain the largest possible result. For example, assuming that 16 bits are enough to represent the user buffer length; simply changing the preceding program to use an unsigned short for the user buffer length would solve the problem. Here is what the corrected version of this function looks like:
allocate_object: |
|
|
00401024 |
push |
esi |
00401025 |
movzx |
esi,word ptr [esp+0xc] |
0040102a |
push |
edi |
0040102b |
lea |
edi,[esi+0x18] |
0040102e |
push |
edi |
0040102f |
call |
Chapter7!malloc (004010dc) |
00401034 |
pop |
ecx |
00401035 |
xor |
ecx,ecx |
00401037 |
cmp |
eax,ecx |
00401039 |
jnz |
Chapter7!allocate_object+0x1b (0040103f) |
0040103b |
xor |
eax,eax |
0040103d |
jmp |
Chapter7!allocate_object+0x43 (00401067) |
0040103f |
mov |
[eax+0x4],ecx |
00401042 |
mov |
[eax+0x8],ecx |
00401045 |
mov |
[eax+0xc],ecx |
260 Chapter 7
00401048 |
mov |
[eax+0x10],ecx |
0040104b |
mov |
[eax+0x14],ecx |
0040104e |
mov |
ecx,esi |
00401050 |
mov |
esi,[esp+0xc] |
00401054 |
mov |
edx,ecx |
00401056 |
mov |
[eax],edi |
00401058 |
shr |
ecx,0x2 |
0040105b |
lea |
edi,[eax+0x18] |
0040105e |
rep |
movsd |
00401060 |
mov |
ecx,edx |
00401062 |
and |
ecx,0x3 |
00401065 |
rep |
movsb |
00401067 |
pop |
edi |
00401068 |
pop |
esi |
00401069 |
ret |
|
This function is effectively identical to the original version presented earlier, except for movzx esi,word ptr [esp+0xc] at 00401025. The idea is that instead of directly loading the buffer length from the stack and adding 0x18 to it, we now treat it as an unsigned short, which eliminates the possibly of causing an overflow because the arithmetic is performed using 32-bit registers. The use of the MOVZX instruction is crucial here and is discussed in the next section.
Type Conversion Errors
Sometimes software developers don’t fully understand the semantics of the programming language they are using. These semantics can be critical because they define (among other things) how data is going to be handled at a low level. Type conversion errors take place when developers mishandle incoming data types and perform incorrect conversions on them. For example, consider the following variant on my famous allocate_object function:
allocate_object: |
|
|
00401021 |
push |
esi |
00401022 |
movsx |
esi,word ptr [esp+0xc] |
00401027 |
push |
edi |
00401028 |
lea |
edi,[esi+0x18] |
0040102b |
push |
edi |
0040102c |
call |
Chapter7!malloc (004010d9) |
00401031 |
pop |
ecx |
00401032 |
xor |
ecx,ecx |
00401034 |
cmp |
eax,ecx |
00401036 |
jnz |
Chapter7!allocate_object+0x1b (0040103c) |
00401038 |
xor |
eax,eax |
0040103a |
jmp |
Chapter7!allocate_object+0x43 (00401064) |
0040103c |
mov |
[eax+0x4],ecx |
0040103f |
mov |
[eax+0x8],ecx |