Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Eilam E.Reversing.Secrets of reverse engineering.2005

.pdf
Скачиваний:
67
Добавлен:
23.08.2013
Размер:
8.78 Mб
Скачать

Deciphering File Formats 211

that stores the currently typed password. This is the variable at 00405038 against which the header data was compared in Listing 6.3. In OllyDbg, a memory breakpoint can be set by opening the address (00405038) in the Dump window, right-clicking the address, and selecting Breakpoint Hardware, On write Dword. Keep in mind that you must restart the program before you do this because at the point where the bad password message is being printed this variable has already been initialized.

Restart the program, place a hardware breakpoint on 00405038, and let the program run (with the same set of command-line parameters). The debugger breaks somewhere inside RSAENH.DLL, the Microsoft Enhanced Cryptographic Provider. Why is the Microsoft Enhanced Cryptographic Provider writing into a global variable from Cryptex.exe? Probably because Cryptex.EXE had supplied the address of that global variable. Let’s look at the stack and try to trace back and find the call made from Cryptex to the encryption engine. In tracing back through the stack in the Stack Window, you can see that we are currently running inside the CryptGetHashParam API, which was called from a function inside Cryptex. Listing 6.4 shows the code for this function.

00402280

MOV ECX,DS:[405048]

00402286

SUB ESP,8

00402289

LEA EAX,SS:[ESP]

0040228C

PUSH EAX

0040228D

PUSH 0

0040228F

PUSH 0

00402291

PUSH 8003

00402296

PUSH ECX

00402297

CALL DS:[<&ADVAPI32.CryptCreateHash>]

0040229D

TEST EAX,EAX

0040229F

JE SHORT cryptex.004022C2

004022A1

MOV EDX,SS:[ESP+C]

004022A5

MOV EAX,SS:[ESP]

004022A8

PUSH 0

004022AA

PUSH 14

004022AC

PUSH EDX

004022AD

PUSH EAX

004022AE

CALL DS:[<&ADVAPI32.CryptHashData>]

004022B4

TEST EAX,EAX

004022B6

MOV ECX,SS:[ESP]

004022B9

JNZ SHORT cryptex.004022C8

004022BB

PUSH ECX

004022BC

CALL DS:[<&ADVAPI32.CryptDestroyHash>]

004022C2

XOR EAX,EAX

004022C4

ADD ESP,8

004022C7

RETN

 

 

Listing 6.4 Function in Cryptex that calls into the cryptographic service provider—the 16byte password-identifier value is written from within this function. (continued)

212 Chapter 6

004022C8

MOV EAX,SS:[ESP+10]

004022CC

PUSH ESI

004022CD

PUSH 0

004022CF

LEA EDX,SS:[ESP+C]

004022D3

PUSH EDX

004022D4

PUSH EAX

004022D5

PUSH 2

004022D7

PUSH ECX

004022D8

MOV DWORD PTR SS:[ESP+1C],10

004022E0

CALL DS:[<&ADVAPI32.CryptGetHashParam>]

004022E6

MOV EDX,SS:[ESP+4]

004022EA

PUSH EDX

004022EB

MOV ESI,EAX

004022ED

CALL DS:[<&ADVAPI32.CryptDestroyHash>]

004022F3

MOV EAX,ESI

004022F5

POP ESI

004022F6

ADD ESP,8

004022F9

RETN

 

 

Listing 6.4 (continued)

Deciphering the code in Listing 6.4 is not going to be easy unless you do some reading and figure out what all of these hash APIs are about. For this purpose, you can easily go to http://msdn.microsoft.com and lookup the functions CryptCreateHash, CryptHashData, and so on. A hash is defined in MSDN as “A fixed-sized result obtained by applying a mathematical function (the hashing algorithm) to an arbitrary amount of data.” The CryptCreateHash function “initiates the hashing of a stream of data,” the CryptHashData function “adds data to a specified hash object,” while the CryptGetHashParam “retrieves data that governs the operations of a hash object.” With this (very basic) understanding, let’s analyze the function in Listing 6.4 and try to determine what it does.

The code starts out by creating a hash object in the CryptCreateHash call. Notice the second parameter in this call; This is how the hashing algorithm is selected. In this case, the algorithm parameter is hard-coded to 0x8003. Finding out what 0x8003 stands for is probably easiest if you look for a popular hashing algorithm identifier such as CALG_MD2 and find it in the Crypto header file, WinCrypt.H. It turns out that these identifiers are made out of several identifiers, one specifying the algorithm class (ALG_CLASS_HASH), another specifying the algorithm type (ALG_TYPE_ANY), and finally one that specifies the exact algorithm type (ALG_SID_MD2). If you calculate what 0x8003 stands for, you can see that the actual algorithm is ALG_SID_MD5.

Deciphering File Formats 213

MD5 (MD stands for message-digest) is a highly popular cryptographic hashing algorithm that produces a long (128-bit) hash or checksum from a variablelength message. This hash can later be used to uniquely identify the specific message. Two basic properties of MD5 and other cryptographic hashes are that it is extremely unlikely that there would ever be two different messages that produce the same hash and that it is virtually impossible to create a message that will generate a predetermined hash value.

With this information, let’s proceed to determine the nature of the data that Cryptex is hashing. This can be easily gathered by inspecting the call to CryptHashData. According to the MSDN, the second parameter passed to CryptHashData is the data being hashed. In Listing 6.4, Cryptex is passing EDX, which was earlier loaded from [ESP+C]. The third parameter is the buffer length, which is set to 0x14 (20 bytes). A quick look at the buffer pointer to by [ESP+C] shows the following.

0012F5E8 77 03 BE 9F EC CA 20 05 D0 D6 DF FB A2 CF 55 4B

0012F5F8 81 41 C0 FE

Nothing obvious here—this isn’t text or anything, just more unrecognized data. The next thing Cryptex does is call CryptGetHashParam on the hash object, with the value 2 in the second parameter. A quick search through WinCrypt.H shows that the value 2 stands for HP_HASHVAL. This means that Cryptex is asking for the actual hash value (that’s the MD5 result for those 20 bytes from 0012F5E8). The third parameter passed to CryptGetHashParam tells the function where to write the hash value. Guess what? It’s being written into 00405038, the global variable that was used earlier for checking whether the password matches.

To summarize, Cryptex is apparently hashing unknown, nontextual data using the MD5 hashing algorithm, and is writing the result into a global variable. The contents of this global variable are later compared against a value stored in the Cryptex archive file. If it isn’t identical, Cryptex reports an incorrect password. It is obvious that the data that is being hashed in the function from Listing 6.4 is clearly somehow related to the password that was typed. We just don’t understand the connection. The unknown data that was hashed in this function was passed as a parameter from the calling function.

Hashing the Password

At this point you’re probably a bit at a loss regarding the origin of the buffer, you just hashed in Listing 6.4. In such cases, it is usually best to simply trace back in the program until you find the origin of that buffer. In this case, the hashed buffer came from the calling function, at 00402300. This function is shown in Listing 6.5.

214 Chapter 6

00402300

SUB ESP,24

00402303

MOV EAX,DS:[405020]

00402308

PUSH EDI

00402309

MOV EDI,SS:[ESP+2C]

0040230D

MOV SS:[ESP+24],EAX

00402311

LEA EAX,SS:[ESP+4]

00402315

PUSH EAX

00402316

PUSH 0

00402318

PUSH 0

0040231A

PUSH 8004

0040231F

PUSH EDI

00402320

CALL DS:[<&ADVAPI32.CryptCreateHash>]

00402326

TEST EAX,EAX

00402328

JE cryptex.004023CA

0040232E

MOV EDX,SS:[ESP+30]

00402332

MOV EAX,EDX

00402334

PUSH ESI

00402335

LEA ESI,DS:[EAX+1]

00402338

MOV CL,DS:[EAX]

0040233A

ADD EAX,1

0040233D

TEST CL,CL

0040233F

JNZ SHORT cryptex.00402338

00402341

MOV ECX,SS:[ESP+8]

00402345

PUSH 0

00402347

SUB EAX,ESI

00402349

PUSH EAX

0040234A

PUSH EDX

0040234B

PUSH ECX

0040234C

CALL DS:[<&ADVAPI32.CryptHashData>]

00402352

TEST EAX,EAX

00402354

POP ESI

00402355

JE SHORT cryptex.004023BF

00402357

XOR EAX,EAX

00402359

MOV SS:[ESP+11],EAX

0040235D

MOV SS:[ESP+15],EAX

00402361

MOV SS:[ESP+19],EAX

00402365

MOV SS:[ESP+1D],EAX

00402369

MOV SS:[ESP+21],AX

0040236E

LEA ECX,SS:[ESP+C]

00402372

LEA EDX,SS:[ESP+10]

00402376

MOV SS:[ESP+23],AL

0040237A

MOV BYTE PTR SS:[ESP+10],0

0040237F

MOV DWORD PTR SS:[ESP+C],14

00402387

PUSH EAX

00402388

MOV EAX,SS:[ESP+8]

0040238C

PUSH ECX

0040238D

PUSH EDX

0040238E

PUSH 2

 

 

Listing 6.5 The Cryptex key-generation function.

 

 

Deciphering File Formats 215

 

 

 

 

00402390

PUSH EAX

 

 

00402391

CALL DS:[<&ADVAPI32.CryptGetHashParam>]

 

00402397

TEST EAX,EAX

 

 

00402399

JNZ SHORT cryptex.004023A9

 

 

0040239B

PUSH cryptex.00403504

; format = “Unable to obtain MD5

 

 

 

hash value for file.”

 

004023A0

CALL DS:[<&MSVCR71.printf>]

 

 

004023A6

ADD ESP,4

 

 

004023A9

LEA ECX,SS:[ESP+10]

 

 

004023AD

PUSH cryptex.00405038

 

 

004023B2

PUSH ECX

 

 

004023B3

CALL cryptex.00402280

 

 

004023B8

ADD ESP,8

 

 

004023BB

TEST EAX,EAX

 

 

004023BD

JNZ SHORT cryptex.004023DA

 

 

004023BF

MOV EDX,SS:[ESP+4]

 

 

004023C3

PUSH EDX

 

 

004023C4

CALL DS:[<&ADVAPI32.CryptDestroyHash>]

 

004023CA

XOR EAX,EAX

 

 

004023CC

POP EDI

 

 

004023CD

MOV ECX,SS:[ESP+20]

 

 

004023D1

CALL cryptex.004027C9

 

 

004023D6

ADD ESP,24

 

 

004023D9

RETN

 

 

004023DA

MOV ECX,SS:[ESP+4]

 

 

004023DE

LEA EAX,SS:[ESP+8]

 

 

004023E2

PUSH EAX

 

 

004023E3

PUSH 0

 

 

004023E5

PUSH ECX

 

 

004023E6

PUSH 6603

 

 

004023EB

PUSH EDI

 

 

004023EC MOV DWORD PTR SS:[ESP+1C],0

 

 

004023F4

CALL DS:[<&ADVAPI32.CryptDeriveKey>]

 

004023FA

MOV EDX,SS:[ESP+4]

 

 

004023FE

PUSH EDX

 

 

004023FF

CALL DS:[<&ADVAPI32.CryptDestroyHash>]

 

00402405

MOV ECX,SS:[ESP+24]

 

 

00402409

MOV EAX,SS:[ESP+8]

 

 

0040240D

POP EDI

 

 

0040240E

CALL cryptex.004027C9

 

 

00402413

ADD ESP,24

 

 

00402416

RETN

 

 

 

 

 

 

Listing 6.5 (continued)

The function in Listing 6.5 is quite similar to the one in Listing 6.4. It starts out by creating a hash object and hashing some data. One difference is the initialization parameters for the hash object. The function in Listing 6.4 used the

216Chapter 6

value 0x8003 as its algorithm ID, while this function uses 0x8004, which identifies the CALG_SHA algorithm. SHA is another hashing algorithm that has similar properties to MD5, with the difference that an SHA hash is 160 bits long, as opposed to MD5 hashes which are 128 bits long. You might notice that 160 bits are exactly 20 bytes, which is the length of data being hashed in Listing 6.4. Coincidence? You’ll soon find out. . . .

The next sequence calls CryptHashData again, but not before some processing is performed on some data block. If you place a breakpoint on this function and restart the program, you can easily see which data it is that is being processed: It is the password text, which in this case equals 6666666665. Let’s take a look at this processing sequence.

00402335 LEA ESI,DS:[EAX+1]

00402338 MOV CL,DS:[EAX]

0040233A ADD EAX,1

0040233D TEST CL,CL

0040233F JNZ SHORT cryptex.00402338

This loop is really quite simple. It reads each character from the string and checks whether its zero. If it’s not it loops on to the next character. When the loop is completed, EAX points to the string’s terminating NULL character, and ESI points to the second character in the string. The following instruction produces the final result.

00402347 SUB EAX,ESI

Here the pointer to the second character is subtracted from the pointer to the NULL terminator. The result is effectively the length of the string, not including the NULL terminator (because ESI was holding the address to the second character, not the first). This sequence is essentially equivalent to the strlen C runtime library function. You might wonder why the program would implement its own strlen function instead of just calling the runtime library. The answer is that it probably is calling the runtime library, but the compiler is replacing the call with an intrinsic implementation. Some compilers support intrinsic implementations of popular functions, which basically means that the compiler replaces the function call with an actual implementation of the function that is placed inside the calling function. This improves performance because it avoids the overhead of performing a function call.

After measuring the length of the string, the function proceeds to hash the password string using CryptHashData and to extract the resulting hash using CryptGetHashParam. The resulting hash value is then passed on to 00402280, which is the function we investigated in Listing 6.4. This is curious because as we know the function in Listing 6.4 is going to hash that data again, this time using the MD5 algorithm. What is the point of rehashing the output

Deciphering File Formats 217

of one hashing algorithm with another hashing algorithm? That is not clear at the moment.

After the MD5 function returns (and assuming it returns a nonzero value), the function proceeds to call an interesting API called CryptDeriveKey. According to Microsoft’s documentation, CryptDeriveKey “generates cryptographic session keys derived from a base data value.” The base data value is taken for a hash object, which, in this case, is a 160-bit SHA hash calculated from the plaintext password. As a part of the generation of the key object, the caller must also specify which encryption algorithm will be used (this is specified in the second parameter passed to CryptDeriveKey). As you can see in Listing 6.5, Cryptex is passing 0x6603. We return to WinCrypt.H and discover that 0x6603 stands for CALG_3DES. This makes sense and proves that Cryptex works as advertised: It encrypts data using the 3DES algorithm.

When we think about it a little bit, it becomes clear why Cryptex calculated that extra MD5 hash. Essentially, Cryptex is using the generated SHA hash as a key for encrypting and decrypting the data (3DES is a symmetric algorithm, which means that encryption and decryption are both performed using the same key). Additionally, Cryptex needs some kind of an easy way to detect whether the supplied password was correct or incorrect. For this, Cryptex calculates an additional hash (using the MD5 algorithm) from the SHA hash and stores the result in the file header. When an archive is opened, the supplied password is hashed twice (once using SHA and once using MD5), and the MD5 result is compared against the one stored in the archive header. If they match, the password is correct.

You may wonder why Cryptex isn’t just storing the SHA result directly into the file header. Why go through the extra effort of calculating an additional hash value? The reason is that the SHA hash is directly used as the encryption key; storing it in the file header would make it incredibly easy to decrypt Cryptex archives. This might be a bit confusing considering that it is impossible to extract the original plaintext password from the SHA hash value, but it is just not needed. The hash value is all that would be needed in order to decrypt the data. Instead, Cryptex calculates an additional hash from the SHA value and stores that as the unique password identification. Figure 6.1 demonstrates this sequence.

Finally, if you’re wondering why Cryptex isn’t calculating the MD5 password-verification hash directly from the plaintext password but from the SHA hash value, it’s probably because of the (admittedly remote) possibility that someone would be able to covert the MD5 hash value to an equivalent SHA hash value and effectively obtain the decryption key. This is virtually guaranteed to be mathematically impossible, but why risk it? It is certainly going to be impossible to obtain the original data (which is the SHA-generated decryption key) from the MD5 hash value stored in the header. Being overly paranoid is the advisable frame of mind when developing security-related technologies.

218 Chapter 6

Original

Plaintext

Password

SHA Hash

MD5 Hash

Cryptex

(160-bits)

(128-bits)

Header

Raw Data

3DES

Encrypted

Encrypter

Data

 

Figure 6.1 Cryptex’s key-generation and password-verification process.

The Directory Layout

Now that you have a basic understanding of how Cryptex manages its passwords and encryption keys, you can move on to study the Cryptex directory layout. In a real-world program, this step would be somewhat less relevant for those interested in a security-level analysis for Cryptex, but it would be very important for anyone interested in reading or creating Cryptex-compatible archives. Since we’re doing this as an exercise in data reverse engineering, the directory layout is exactly the kind of complex data structure you’re looking to get your hands on.

Analyzing the Directory Processing Code

In order to decipher the directory layout you’ll need to find the location in the Cryptex code that reads the encrypted directory layout data, decrypts it, and proceeds to decipher it. This can be accomplished by simply placing a breakpoint on the ReadFile API and tracing forward in the program to see what it does with the data. Let’s restart the program in OllyDbg (don’t forget to correct the password in the command-line argument), place a breakpoint on ReadFile, and let the program run.

Deciphering File Formats 219

The first hit comes from an internal system call made by ADVAPI32.DLL. Releasing the debugger brings it back to ReadFile again, except that again, it was called internally from system code. You will very quickly realize that there are way too many calls to ReadFile for this approach to work; this API is used by the system heavily.

There are many alternative approaches you could take at this point, depending on the particular application. One option would be to try and restrict the ReadFile breakpoint to calls made on the archive file. You could do this by first placing a breakpoint on the API call that opens or creates the archive (this is probably going to be a call to the CreateFile API), obtain the archive handle from that call, and place a selective breakpoint on ReadFile that only breaks when the specific handle to the Cryptex archive is specified (such breakpoints are supported by most debuggers). This would really reduce the number of calls—you’d only see the relevant calls where Cryptex reads from the archive, and not hundreds of irrelevant system calls.

On the other hand, since Cryptex is really a fairly simple program, you could just let it run until it reached the key-generation function from Listing 6.5. At this point you could just step through the rest of the code until you reach interesting code areas that decipher the directory data structures. Keep in mind that in most real programs you’d have to come up with a better idea for where to place your breakpoint, because simply stepping through the program is going to be an unreasonably tedious task.

You can start by placing a breakpoint at the end of the key-generation function, on address 00402416. Once you reach that address, you can step back into the calling function and step through several irrelevant code sequences, including a call into a function that apparently performs the actual opening of the archive and ends up calling into 004011C0, which is the function analyzed in Listing 6.3. The next function call goes into 004019F0, and (based on a quick look at it) appears to be what we’re looking for. Listing 6.6 lists the OllyDbg-generated disassembly for this function.

004019F0

SUB ESP,8

 

004019F3

PUSH EBX

 

004019F4

PUSH EBP

 

004019F5

PUSH ESI

 

004019F6

MOV ESI,SS:[ESP+18]

 

004019FA

XOR EBX,EBX

 

004019FC

PUSH EBX

; Origin => FILE_BEGIN

004019FD

PUSH EBX

; pOffsetHi => NULL

004019FE

PUSH EBX

; OffsetLo => 0

004019FF

PUSH ESI

; hFile

00401A00

CALL DS:[<&KERNEL32.SetFilePointer>]

00401A06

PUSH EBX

; pOverlapped => NULL

 

 

 

Listing 6.6 Disassembly of function that lists all files within a Cryptex archive. (continued)

220 Chapter 6

00401A07

LEA EAX,SS:[ESP+14]

;

 

 

00401A0B

PUSH EAX

; pBytesRead

 

 

00401A0C

PUSH 28

; BytesToRead = 28 (40.)

 

00401A0E

PUSH cryptex.00406058

; Buffer = cryptex.00406058

 

00401A13

PUSH ESI

; hFile

 

 

00401A14

CALL DS:[<&KERNEL32.ReadFile>]

 

 

00401A1A

MOV ECX,SS:[ESP+1C]

 

 

 

00401A1E

MOV EDX,DS:[406064]

 

 

 

00401A24

PUSH ECX

 

 

 

00401A25

PUSH EDX

 

 

 

00401A26

PUSH ESI

 

 

 

00401A27

CALL cryptex.00401030

 

 

 

00401A2C

MOV EBP,DS:[<&MSVCR71.printf>]

 

 

00401A32

MOV ESI,DS:[406064]

 

 

 

00401A38

PUSH cryptex.00403234

; format = “ File Size

File

 

 

Name”

 

 

00401A3D MOV DWORD PTR SS:[ESP+1C],cryptex.00405050

 

 

00401A45

CALL EBP

; printf

 

 

00401A47

ADD ESP,10

 

 

 

00401A4A

TEST ESI,ESI

 

 

 

00401A4C

JE SHORT cryptex.00401ACD

 

 

 

00401A4E

PUSH EDI

 

 

 

00401A4F

MOV EDI,SS:[ESP+24]

 

 

 

00401A53

JMP SHORT cryptex.00401A60

 

 

 

00401A55

LEA ESP,SS:[ESP]

 

 

 

00401A5C

LEA ESP,SS:[ESP]

 

 

 

00401A60

MOV ESI,SS:[ESP+10]

 

 

 

00401A64

ADD ESI,8

 

 

 

00401A67

MOV DWORD PTR SS:[ESP+14],1A

 

 

 

00401A6F

NOP

 

 

 

00401A70

MOV EAX,DS:[ESI]

 

 

 

00401A72

TEST EAX,EAX

 

 

 

00401A74

JE SHORT cryptex.00401A9A

 

 

 

00401A76

MOV EDX,EAX

 

 

 

00401A78

SHL EDX,0A

 

 

 

00401A7B

SUB EDX,EAX

 

 

 

00401A7D

ADD EDX,EDX

 

 

 

00401A7F

LEA ECX,DS:[ESI+14]

 

 

 

00401A82

ADD EDX,EDX

 

 

 

00401A84

PUSH ECX

 

 

 

00401A85

SHR EDX,0A

 

 

 

00401A88

PUSH EDX

 

 

 

00401A89

PUSH cryptex.00403250

; ASCII “ %10dK

%s”

 

00401A8E

CALL EBP

 

 

 

00401A90

MOV EAX,DS:[ESI]

 

 

 

00401A92

ADD DS:[EDI],EAX

 

 

 

00401A94

ADD ESP,0C

 

 

 

00401A97

ADD EBX,1

 

 

 

 

 

 

 

 

Listing 6.6 (continued)