Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Eilam E.Reversing.Secrets of reverse engineering.2005

.pdf
Скачиваний:
65
Добавлен:
23.08.2013
Размер:
8.78 Mб
Скачать

Deciphering File Formats 241

Cryptex File Entry Cluster Layout

Entry #0

Individual Cryptex File Entry Structure

 

 

Entry #1

Next Cluster Index

Offset +00

Fileís First Cluster Index

Offset +04

 

Entry #2 (EMPTY)

File Size in Clusters

Offset +08

 

.

 

Offset +0C

.

File MD5 Hash

Offset +10

.

Offset +14

 

.

 

Offset +18

Entry #25

File Name String

Offset +1C

 

 

Figure 6.3 The format of a Cryptex file entry.

A Cryptex file list table supports holes, which are unused entries. The file size or first cluster index members are typically used as an indicator for whether or not an entry is currently in use or not. You can safely assume that when adding a new file entry Cryptex will just scan this list for an unused entry and place the file in it. File names have a maximum length of 128 bytes. This doesn’t sound like much, but keep in mind that Cryptex strips away all path information from the file name before adding it to the list, so these 128 bytes are used exclusively for the file name. Each file entry contains an MD5 hash that is calculated from the contents of the entire plaintext of the file. This hash is recalculated during the decryption process and is checked against the one stored in the file list. It looks as if Cryptex will still write the decrypted file to disk during the extraction process—even if there is a mismatch in the MD5 hash. In such cases, Cryptex displays an error message.

Files are stored in cluster sequences that are linked using the “next cluster” member in offset +0 inside each cluster. The last cluster in each file chain contains the exact number of bytes that are actually in use within the current cluster. This allows Cryptex to accurately reconstruct the file size during the extraction process (because the file entry only contains the file size in clusters).

Digging Deeper

You might have noticed that even though you’ve just performed a remarkably thorough code analysis of Cryptex, there are still some details regarding its file format that have eluded you. This makes sense when you think about it; you have not nearly covered all the code in Cryptex, and some of the fields must

242Chapter 6

only be accessed in one or two places. To completely and fully understand the entire file format, you might actually have to reverse every single line of code in the program. Cryptex is a tiny program, so this might actually be feasible, but in most cases it won’t be.

So, what do you do with those missing details that you didn’t catch during your intensive reversing session? One primitive, yet effective, approach is to simply let the program update the file and observe changes using a binary filecomparison program (Hex Workshop has this feature). One specific problem you might have with Cryptex is that files are encrypted. It is likely that a sin- gle-byte difference in the plaintext would completely alter the cipher text that is written into the file. One solution is to write a program that decrypts Cryptex archives so that you can more accurately study their layout. This way you would be easily able to compare two different versions of the same Cryptex archive and determine precisely what the changes are and what they expose about those unknown fields. This approach of observing the changes made to a file by the program that owns it is quite useful in data reverse engineering and when combined with clever code-level analysis can usually produce extremely accurate results.

Conclusion

In this chapter, you have learned how to use reversing techniques to dig into undocumented program data such as proprietary file formats or network protocols to reach a point at which you can write code that deciphers such data or even code that generates compatible data. Deciphering a file format is not as different from conventional code-level reversing as you might expect. As demonstrated in this chapter, code-level reversing can, in many cases, provide almost all the answers regarding a program’s data format and how it is structured.

Granted, Cryptex maintains a relatively simple file format. In many realworld reversing scenarios you might run into file formats that employ a far more complex structure. Still, the basic approach is the same: By combining code-level reversing techniques with the process of observing the data modifications performed by the owning program while specific test cases are fed to it, you can get a pretty good grip on most file formats and other types of proprietary data.

C H A P T E R

7

Auditing

Program Binaries

A software program is only as weak as its weakest link. This is true both from a security standpoint and, to a lesser extent, from a reliability and robustness standpoint. You could expend considerable energy on development practices that focus on secure code and yet end up with a vulnerable program just because of some third-party component your program uses. The same holds true for robustness and reliability. Many industry professionals fail to realize that a poorly written third-party software library can invalidate an entire development team’s efforts to produce a high-quality product.

In this chapter, I will demonstrate how reversing can be used for the auditing of a program when source code is unavailable. The general idea is to reverse several code fragments from a program and try to evaluate the code for security vulnerabilities and generally safe programming practices.

The first part of this chapter deals with all kinds of security bugs and demonstrates what they look like in assembly language—from the reversing standpoint. In the second part, I demonstrate a real-world security bug from a live product and attempt to determine the exact error that caused it.

Defining the Problem

Before I attempt to define what constitutes secure code, I must try and define what the word “security” means in the context of this book. I think security

243

244Chapter 7

can be defined as having control of the flow of information on a system. This control means that your files stay inside your computer and out of the hands of nosy intruders, while malicious code stays outside of your computer. Needless to say, there are many other aspects to computer security such as the encryption of information that does flow in and out of the computer and the different levels of access rights granted to different users, but these are not as relevant to our current discussion.

So how does reversing relate to maintaining control of the flow of information on a system? The idea is that whenever you install any kind of software product, you are essentially entrusting your computer and all of the data on it to that program. There are two levels in which this is true. First of all, by installing a software product you are trusting that it is benign and that it doesn’t contain any malicious components that would intentionally steal or corrupt your data. Believe it or not, that’s the simpler part of this story.

The place where things truly get fuzzy is when we start talking about how programs put your system in jeopardy without ever intending to. A simple bug in any kind of software product could theoretically expose your system to malicious code that could steal or corrupt your data. Take an image file such as a JPEG as an example. There are certain types of bugs that could, in some cases, allow a person to take over your system using a specially crafted image file. All it would take is a tiny, otherwise harmless bug in your image viewing program, and that program might inadvertently allow code embedded into the image file to run. What could that code do? Well, just about anything. It would most likely download some sort of backdoor program onto your system, and pave the way for a full-blown hostile takeover (backdoors and other types of malicious programs are discussed in Chapter 8).

The purpose of this chapter is to try and define what makes secure code, and to then demonstrate how we can scan binary executables for these types of security bugs. Unfortunately, attempting to define what makes secure code can sometimes be a futile attempt. This fact should be painfully clear to software developers who constantly release patches that address vulnerabilities found in their program. It can be a never-ending journey—a game of cat and mouse between hackers looking for vulnerabilities and programmers trying to fix them. Few programs start out as being “totally secure,” and in fact, few programs ever reach that state.

In this chapter, I will make an attempt to cover the most typical bugs that turn an otherwise-harmless program into a security risk, and will describe how such bugs can be located while a program is being reversed. This is by no means intended to be a complete guide to every possible security hole you could find in software (and I doubt such guide could ever be written), but simply to give an idea of the types of problems typically encountered.

Auditing Program Binaries 245

Vulnerabilities

A vulnerability is essentially a bug or flaw in a program that compromises the security of the program and usually of the entire computer on which it is running. Basically, a vulnerability is a flaw in the program that might allow malicious intruders to take advantage of it. In most cases, vulnerabilities start with code that takes information from the outside world. This can be any type of user input such as the command-line parameters that programs receive, a file loaded into the program, or a packet of data sent over the network.

The basic idea is simple—feed the program unexpected input (meaning input that the programmer didn’t think it was ever going to be fed) and get it to stray from its normal execution path. A crude way to exploit a vulnerability is to simply get the program to crash. This is typically the easiest objective because in many cases simply feeding the program exceptionally large random blocks of data does the trick.

But crashing a program is just the beginning. The art of finding and exploiting vulnerabilities gets truly interesting when attackers aim to take control of the program and get it to run their own code. This requires an entirely different level of sophistication, because in order to take control of a program attackers must feed it very specific data.

In many cases, vulnerabilities put entire networks at risk because penetrating the outer shell of a network frequently means that you’ve crossed the last line of defense.

The following sections describe the most common vulnerabilities found in the average program and demonstrate how such vulnerabilities can be utilized by attackers. You’ll also find examples of how these vulnerabilities can be found when analyzing assembly language code.

Stack Overflows

Stack overflows (also known as stack-smashing attacks after the well-known Phrack paper, [Aleph1]) have been around for years and are by far the most popular type of program vulnerability. Basically, stack overflow exploits take advantage of the fact that programs (and particularly those written in C-based languages) frequently neglect to perform bounds checking on incoming data.

A simple stack overflow vulnerability can be created when a program receives data from the outside world, either as user input directly or through a network connection, and naively copies that data onto the stack without checking its length. The problem is that stack variables always have a fixed size, because the offsets generated by the compiler for accessing those variables are predetermined and hard-coded into the machine code. This means that a program can’t dynamically allocate stack space based on the amount of

246Chapter 7

information it is passed—it must preallocate enough room in the stack for the largest chunk of data it expects to receive. Of course, properly written code verifies that the received data fits into the stack buffer before copying it, but you’d be surprised how frequently programmers neglect to perform this verification.

What happens when a buffer of an unknown size is copied over into a lim- ited-sized stack buffer? If the buffer is too long to fit into the memory space allocated for it, the copy operation will cause anything residing after the buffer in the stack to be overwritten with whatever is sent as input. This will frequently overwrite variables that reside after the buffer in the stack, but more importantly, if the copied buffer is long enough, it might overwrite the current function’s return address.

For example, consider a function that defines the following local variables:

int counter;

char string[8];

float number;

What if the function would like to fill string with user-supplied data? It would copy the user supplied data onto string, but if the function doesn’t confirm that the user data is eight characters or less and simply copies as many characters as it finds, it would certainly overwrite number, and possibly whatever resides after it in memory.

Figure 7.1 shows the function’s stack area before and after a stack overwrite. The string variable can only contain eight characters, but far more have been written to it. Note that this figure ignores the (very likely) possibility that the compiler would store some of these variables in registers and not in a stack. The most likely candidate is counter, but this would not affect the stack overflow condition.

The important thing to notice about this is the value of CopiedBuffer + 0x10, because CopiedBuffer + 0x10 now replaces the function’s return address. This means that when the function tries to return to the caller (typically by invoking the RET instruction), the CPU will try to jump to whatever address was stored in CopiedBuffer + 0x10. It is easy to see how this could allow an attacker to take control over a system. All that would need to be done is for the attacker to carefully prepare a buffer that contains a pointer to the attacker’s code at the correct offset, so that this address would overwrite the function’s return address.

A typical buffer overflow includes a short code sequence as the payload (the shellcode [Koziol]) and a pointer to the beginning of that code as the return address. This brings us to one the most difficult parts of effectively overflowing the stack—how do you determine the current stack address in the target program in order to point the return address to the right place? The details of how this is done are really beyond the scope of this book, but the generally strategy is to perform some educated guesses.

Auditing Program Binaries 247

Current

Value of

ESP

Current

Value of

EBP

Before Reading string

counter

string[0]..[3]

string[3]..[7]

number

Saved EBP

Return Address

Parameter 1

Parameter 2

Current

Value of

ESP

Current

Value of

EBP

After Reading string

counter

CopiedBuffer

CopiedBuffer + 0x04

CopiedBuffer + 0x08

CopiedBuffer + 0x0C

CopiedBuffer + 0x10

CopiedBuffer + 0x14

CopiedBuffer + 0x18

32 bits

32 bits

Figure 7.1 A function’s stack, before and after a stack overwrite.

For instance, you know that each time you run a program the stack is allocated in the same place, so you can try and guess how much stack space the program has used so far and try and jump to the right place. Alternatively, you could pad our shellcode with NOPs and jump to the memory area where you think the buffer has been copied. The NOPs give you significant latitude because you don’t have to jump to an exact location—you can jump to any address that contains your NOPs and execution will just flow into your code.

A Simple Stack Vulnerability

The most trivial overflow bugs happen when an application stores a temporary buffer in the stack and receives variable-length input from the outside world into that buffer. The classic case is a function that receives a null-terminated string as input and copies that string into a local variable. Here is an example that was disassembled using WinDbg.

Chapter7!launch:

 

00401060

mov

eax,[esp+0x4]

00401064

sub

esp,0x64

00401067

push

eax

00401068

lea

ecx,[esp+0x4]

0040106c

push

ecx

0040106d

call

Chapter7!strcpy (00401180)

00401072

lea

edx,[esp+0x8]

00401076

push

0x408128

0040107b

push

edx

248 Chapter 7

0040107c

call

Chapter7!strcat (00401190)

00401081

lea

eax,[esp+0x10]

00401085

push

eax

00401086

call

Chapter7!system (004010e7)

0040108b

add

esp,0x78

0040108e

ret

 

Before dealing with the specifics of the overflow bug in this code, let’s try to figure out the basics of this function. The function was defined with the cdecl calling convention, so the parameters are unwound by the caller. This means that the RET instruction can’t be used for determining how many parameters the function takes. Let’s try to figure out the stack layout in this function. Start by reading a parameter from [esp+0x4], and then subtract ESP by 100 bytes, to make room for local variables. If you go to the end of the function, you’ll see the code that moves ESP back to where it was when I first entered the function. This is the add esp, 0x78, but why is it adding 120 bytes instead of 100? If you look at the function, you’ll see three function calls to strcpy, strcat, and system. If you look inside those functions, you’ll see that they are all cdecl functions (as are all C runtime library functions), and, as already mentioned, in cdecl functions the caller is responsible for unwinding the parameters from the stack. In this function, instead of adding an add esp, NumberOfBytes after each call, the compiler has chosen to optimize the unwinding process by simply unwinding the parameters from all three function calls at once.

This approach makes for a slightly less “reverser-friendly” function because every time the stack is accessed through ESP, you have to try to figure out where ESP is pointing to for each instruction. Of course, this problem only exists when you’re studying a static disassembly—in a live debugger, you can always just look at the value of ESP at any given moment.

From the program’s perspective, the unwinding of the stack at the end of the function has another disadvantage: The function ends up using a bit more stack space. This is because the parameters from each of the function calls made during the function’s lifetime stay in the stack for the remainder of the function. On the other hand, stack space is generally not a problem in usermode threads in Windows (as opposed to kernel-mode threads, which have a very limited stack space).

So, what do each of the ESP references in this function access? If you look closely, you’ll see that other than the first access at [esp+0x4], the last three stack accesses are all going to the same place. The first is accessing [esp+0x4] and then pushes it into the stack (where it stays until launch returns). The next time the same address is accessed, the offset from ESP has to be higher because ESP is now 4 bytes less than what it was before.

Auditing Program Binaries 249

Now that you understand the dynamics of the stack in this function, it becomes easy to see that only two unique stack addresses are being referenced in this function. The parameter is accessed in the first line (and it looks like the function only takes one parameter), and the beginning of the local variable area in the other three accesses.

The function starts by copying a string whose pointer was passed as the first parameter to a local variable (whose size we know is 100 bytes). This is exactly where the potential stack overflow lies. strcpy has no idea how big a buffer has been reserved for the copied string and will keep on copying until it encounters the null terminator in the source string or until the program crashes. If a string longer than 100 bytes is fed to this function, strcpy will essentially overwrite whatever follows the local string variable in the stack. In this particular function, this would be the function’s return address. Overwriting the return address is a sure way of gaining control of the system.

The classic exploit for this kind of overflow bug is to feed this function with a string that essentially contains code and to carefully place the pointer to that code in the position where strcpy is going to be overwriting the return address. One thing that makes this process slightly more complicated than it initially seems is that the entire buffer being fed to the function can’t contain any zero bytes (except for one at the end), because that would cause strcpy to stop copying.

There are several simple patterns to look for when searching for a stack overflow vulnerability in a program. The first thing is probably to look at a function’s stack size. Functions that take large buffers such as strings or other data and put it on the stack are easily identified because they tend to have huge local variable regions in their stack frames. This can be identified by looking for a SUB ESP instruction at the very beginning of the function. Functions that store large buffers on the stack will usually subtract ESP by a fairly large number.

Of course, in itself a large stack size doesn’t represent a problem. Once you’ve located a function that has a conspicuously large stack space, the next step is to look for places where a pointer to the beginning of that space is used. This would typically be a LEA instruction that uses an operand such as [EBP – 0x200], or [ESP – 0x200], with that constant being near or equal to the specific size of the stack space allocated. The trick at this point is to make sure the code that’s accessing this block is properly aware of its size. It’s not easy, but it’s not impossible either.

Intrinsic Implementations

The C runtime library string-manipulation routines have historically been the reason for quite a few vulnerabilities. Most programmers nowadays know better than to leave such doors wide open, but it’s still worthwhile to learn to identify calls to these functions while reversing. The problem is that some

250Chapter 7

compilers treat these functions as intrinsic, meaning that the compiler automatically inserts their implementation into the calling function (like an inline function) instead of calling the runtime library implementation. Here is the same vulnerable launch function from before, except that both string-manipulation calls have been compiled into the function.

Chapter7!launch:

 

00401060

mov

eax,[esp+0x4]

00401064

lea

edx,[esp-0x64]

00401068

sub

esp,0x64

0040106b

sub

edx,eax

0040106d

lea

ecx,[ecx]

00401070

mov

cl,[eax]

00401072

mov

[edx+eax],cl

00401075

inc

eax

00401076

test

cl,cl

00401078

jnz

Chapter7!launch+0x10 (00401070)

0040107a

push

edi

0040107b

lea

edi,[esp+0x4]

0040107f

dec

edi

00401080

mov

al,[edi+0x1]

00401083

inc

edi

00401084

test

al,al

00401086

jnz

Chapter7!launch+0x20 (00401080)

00401088

mov

eax,[Chapter7!'string’ (00408128)]

0040108d

mov

cl,[Chapter7!'string’+0x4 (0040812c)]

00401093

lea

edx,[esp+0x4]

00401097

mov

[edi],eax

00401099

push

edx

0040109a

mov

[edi+0x4],cl

0040109d

call

Chapter7!system (00401102)

004010a2

add

esp,0x4

004010a5

pop

edi

004010a6

add

esp,0x64

004010a9

ret

 

It is safe to say that regardless of intrinsic string-manipulation functions, any case where a function loops on the address of a stack-variable such as the one obtained by the lea edx,[esp-0x64] in the preceding function is worthy of further investigation.

Stack Checking

There are many possible ways of dealing with buffer overflow bugs. The first and most obvious way is of course to try to avoid them in the first place, but that doesn’t always prove to be as simple as it seems. Sure, it would take a really careless developer to put something like our poor launch in a production system,