
Eilam E.Reversing.Secrets of reverse engineering.2005
.pdf
Deciphering File Formats 211
that stores the currently typed password. This is the variable at 00405038 against which the header data was compared in Listing 6.3. In OllyDbg, a memory breakpoint can be set by opening the address (00405038) in the Dump window, right-clicking the address, and selecting Breakpoint Hardware, On write Dword. Keep in mind that you must restart the program before you do this because at the point where the bad password message is being printed this variable has already been initialized.
Restart the program, place a hardware breakpoint on 00405038, and let the program run (with the same set of command-line parameters). The debugger breaks somewhere inside RSAENH.DLL, the Microsoft Enhanced Cryptographic Provider. Why is the Microsoft Enhanced Cryptographic Provider writing into a global variable from Cryptex.exe? Probably because Cryptex.EXE had supplied the address of that global variable. Let’s look at the stack and try to trace back and find the call made from Cryptex to the encryption engine. In tracing back through the stack in the Stack Window, you can see that we are currently running inside the CryptGetHashParam API, which was called from a function inside Cryptex. Listing 6.4 shows the code for this function.
00402280 |
MOV ECX,DS:[405048] |
00402286 |
SUB ESP,8 |
00402289 |
LEA EAX,SS:[ESP] |
0040228C |
PUSH EAX |
0040228D |
PUSH 0 |
0040228F |
PUSH 0 |
00402291 |
PUSH 8003 |
00402296 |
PUSH ECX |
00402297 |
CALL DS:[<&ADVAPI32.CryptCreateHash>] |
0040229D |
TEST EAX,EAX |
0040229F |
JE SHORT cryptex.004022C2 |
004022A1 |
MOV EDX,SS:[ESP+C] |
004022A5 |
MOV EAX,SS:[ESP] |
004022A8 |
PUSH 0 |
004022AA |
PUSH 14 |
004022AC |
PUSH EDX |
004022AD |
PUSH EAX |
004022AE |
CALL DS:[<&ADVAPI32.CryptHashData>] |
004022B4 |
TEST EAX,EAX |
004022B6 |
MOV ECX,SS:[ESP] |
004022B9 |
JNZ SHORT cryptex.004022C8 |
004022BB |
PUSH ECX |
004022BC |
CALL DS:[<&ADVAPI32.CryptDestroyHash>] |
004022C2 |
XOR EAX,EAX |
004022C4 |
ADD ESP,8 |
004022C7 |
RETN |
|
|
Listing 6.4 Function in Cryptex that calls into the cryptographic service provider—the 16byte password-identifier value is written from within this function. (continued)

212 Chapter 6
004022C8 |
MOV EAX,SS:[ESP+10] |
004022CC |
PUSH ESI |
004022CD |
PUSH 0 |
004022CF |
LEA EDX,SS:[ESP+C] |
004022D3 |
PUSH EDX |
004022D4 |
PUSH EAX |
004022D5 |
PUSH 2 |
004022D7 |
PUSH ECX |
004022D8 |
MOV DWORD PTR SS:[ESP+1C],10 |
004022E0 |
CALL DS:[<&ADVAPI32.CryptGetHashParam>] |
004022E6 |
MOV EDX,SS:[ESP+4] |
004022EA |
PUSH EDX |
004022EB |
MOV ESI,EAX |
004022ED |
CALL DS:[<&ADVAPI32.CryptDestroyHash>] |
004022F3 |
MOV EAX,ESI |
004022F5 |
POP ESI |
004022F6 |
ADD ESP,8 |
004022F9 |
RETN |
|
|
Listing 6.4 (continued)
Deciphering the code in Listing 6.4 is not going to be easy unless you do some reading and figure out what all of these hash APIs are about. For this purpose, you can easily go to http://msdn.microsoft.com and lookup the functions CryptCreateHash, CryptHashData, and so on. A hash is defined in MSDN as “A fixed-sized result obtained by applying a mathematical function (the hashing algorithm) to an arbitrary amount of data.” The CryptCreateHash function “initiates the hashing of a stream of data,” the CryptHashData function “adds data to a specified hash object,” while the CryptGetHashParam “retrieves data that governs the operations of a hash object.” With this (very basic) understanding, let’s analyze the function in Listing 6.4 and try to determine what it does.
The code starts out by creating a hash object in the CryptCreateHash call. Notice the second parameter in this call; This is how the hashing algorithm is selected. In this case, the algorithm parameter is hard-coded to 0x8003. Finding out what 0x8003 stands for is probably easiest if you look for a popular hashing algorithm identifier such as CALG_MD2 and find it in the Crypto header file, WinCrypt.H. It turns out that these identifiers are made out of several identifiers, one specifying the algorithm class (ALG_CLASS_HASH), another specifying the algorithm type (ALG_TYPE_ANY), and finally one that specifies the exact algorithm type (ALG_SID_MD2). If you calculate what 0x8003 stands for, you can see that the actual algorithm is ALG_SID_MD5.

Deciphering File Formats 213
MD5 (MD stands for message-digest) is a highly popular cryptographic hashing algorithm that produces a long (128-bit) hash or checksum from a variablelength message. This hash can later be used to uniquely identify the specific message. Two basic properties of MD5 and other cryptographic hashes are that it is extremely unlikely that there would ever be two different messages that produce the same hash and that it is virtually impossible to create a message that will generate a predetermined hash value.
With this information, let’s proceed to determine the nature of the data that Cryptex is hashing. This can be easily gathered by inspecting the call to CryptHashData. According to the MSDN, the second parameter passed to CryptHashData is the data being hashed. In Listing 6.4, Cryptex is passing EDX, which was earlier loaded from [ESP+C]. The third parameter is the buffer length, which is set to 0x14 (20 bytes). A quick look at the buffer pointer to by [ESP+C] shows the following.
0012F5E8 77 03 BE 9F EC CA 20 05 D0 D6 DF FB A2 CF 55 4B
0012F5F8 81 41 C0 FE
Nothing obvious here—this isn’t text or anything, just more unrecognized data. The next thing Cryptex does is call CryptGetHashParam on the hash object, with the value 2 in the second parameter. A quick search through WinCrypt.H shows that the value 2 stands for HP_HASHVAL. This means that Cryptex is asking for the actual hash value (that’s the MD5 result for those 20 bytes from 0012F5E8). The third parameter passed to CryptGetHashParam tells the function where to write the hash value. Guess what? It’s being written into 00405038, the global variable that was used earlier for checking whether the password matches.
To summarize, Cryptex is apparently hashing unknown, nontextual data using the MD5 hashing algorithm, and is writing the result into a global variable. The contents of this global variable are later compared against a value stored in the Cryptex archive file. If it isn’t identical, Cryptex reports an incorrect password. It is obvious that the data that is being hashed in the function from Listing 6.4 is clearly somehow related to the password that was typed. We just don’t understand the connection. The unknown data that was hashed in this function was passed as a parameter from the calling function.
Hashing the Password
At this point you’re probably a bit at a loss regarding the origin of the buffer, you just hashed in Listing 6.4. In such cases, it is usually best to simply trace back in the program until you find the origin of that buffer. In this case, the hashed buffer came from the calling function, at 00402300. This function is shown in Listing 6.5.

214 Chapter 6
00402300 |
SUB ESP,24 |
00402303 |
MOV EAX,DS:[405020] |
00402308 |
PUSH EDI |
00402309 |
MOV EDI,SS:[ESP+2C] |
0040230D |
MOV SS:[ESP+24],EAX |
00402311 |
LEA EAX,SS:[ESP+4] |
00402315 |
PUSH EAX |
00402316 |
PUSH 0 |
00402318 |
PUSH 0 |
0040231A |
PUSH 8004 |
0040231F |
PUSH EDI |
00402320 |
CALL DS:[<&ADVAPI32.CryptCreateHash>] |
00402326 |
TEST EAX,EAX |
00402328 |
JE cryptex.004023CA |
0040232E |
MOV EDX,SS:[ESP+30] |
00402332 |
MOV EAX,EDX |
00402334 |
PUSH ESI |
00402335 |
LEA ESI,DS:[EAX+1] |
00402338 |
MOV CL,DS:[EAX] |
0040233A |
ADD EAX,1 |
0040233D |
TEST CL,CL |
0040233F |
JNZ SHORT cryptex.00402338 |
00402341 |
MOV ECX,SS:[ESP+8] |
00402345 |
PUSH 0 |
00402347 |
SUB EAX,ESI |
00402349 |
PUSH EAX |
0040234A |
PUSH EDX |
0040234B |
PUSH ECX |
0040234C |
CALL DS:[<&ADVAPI32.CryptHashData>] |
00402352 |
TEST EAX,EAX |
00402354 |
POP ESI |
00402355 |
JE SHORT cryptex.004023BF |
00402357 |
XOR EAX,EAX |
00402359 |
MOV SS:[ESP+11],EAX |
0040235D |
MOV SS:[ESP+15],EAX |
00402361 |
MOV SS:[ESP+19],EAX |
00402365 |
MOV SS:[ESP+1D],EAX |
00402369 |
MOV SS:[ESP+21],AX |
0040236E |
LEA ECX,SS:[ESP+C] |
00402372 |
LEA EDX,SS:[ESP+10] |
00402376 |
MOV SS:[ESP+23],AL |
0040237A |
MOV BYTE PTR SS:[ESP+10],0 |
0040237F |
MOV DWORD PTR SS:[ESP+C],14 |
00402387 |
PUSH EAX |
00402388 |
MOV EAX,SS:[ESP+8] |
0040238C |
PUSH ECX |
0040238D |
PUSH EDX |
0040238E |
PUSH 2 |
|
|
Listing 6.5 The Cryptex key-generation function.

|
|
Deciphering File Formats 215 |
|
|
|
|
|
00402390 |
PUSH EAX |
|
|
00402391 |
CALL DS:[<&ADVAPI32.CryptGetHashParam>] |
|
|
00402397 |
TEST EAX,EAX |
|
|
00402399 |
JNZ SHORT cryptex.004023A9 |
|
|
0040239B |
PUSH cryptex.00403504 |
; format = “Unable to obtain MD5 |
|
|
|
hash value for file.” |
|
004023A0 |
CALL DS:[<&MSVCR71.printf>] |
|
|
004023A6 |
ADD ESP,4 |
|
|
004023A9 |
LEA ECX,SS:[ESP+10] |
|
|
004023AD |
PUSH cryptex.00405038 |
|
|
004023B2 |
PUSH ECX |
|
|
004023B3 |
CALL cryptex.00402280 |
|
|
004023B8 |
ADD ESP,8 |
|
|
004023BB |
TEST EAX,EAX |
|
|
004023BD |
JNZ SHORT cryptex.004023DA |
|
|
004023BF |
MOV EDX,SS:[ESP+4] |
|
|
004023C3 |
PUSH EDX |
|
|
004023C4 |
CALL DS:[<&ADVAPI32.CryptDestroyHash>] |
|
|
004023CA |
XOR EAX,EAX |
|
|
004023CC |
POP EDI |
|
|
004023CD |
MOV ECX,SS:[ESP+20] |
|
|
004023D1 |
CALL cryptex.004027C9 |
|
|
004023D6 |
ADD ESP,24 |
|
|
004023D9 |
RETN |
|
|
004023DA |
MOV ECX,SS:[ESP+4] |
|
|
004023DE |
LEA EAX,SS:[ESP+8] |
|
|
004023E2 |
PUSH EAX |
|
|
004023E3 |
PUSH 0 |
|
|
004023E5 |
PUSH ECX |
|
|
004023E6 |
PUSH 6603 |
|
|
004023EB |
PUSH EDI |
|
|
004023EC MOV DWORD PTR SS:[ESP+1C],0 |
|
|
|
004023F4 |
CALL DS:[<&ADVAPI32.CryptDeriveKey>] |
|
|
004023FA |
MOV EDX,SS:[ESP+4] |
|
|
004023FE |
PUSH EDX |
|
|
004023FF |
CALL DS:[<&ADVAPI32.CryptDestroyHash>] |
|
|
00402405 |
MOV ECX,SS:[ESP+24] |
|
|
00402409 |
MOV EAX,SS:[ESP+8] |
|
|
0040240D |
POP EDI |
|
|
0040240E |
CALL cryptex.004027C9 |
|
|
00402413 |
ADD ESP,24 |
|
|
00402416 |
RETN |
|
|
|
|
|
|
Listing 6.5 (continued)
The function in Listing 6.5 is quite similar to the one in Listing 6.4. It starts out by creating a hash object and hashing some data. One difference is the initialization parameters for the hash object. The function in Listing 6.4 used the

216Chapter 6
value 0x8003 as its algorithm ID, while this function uses 0x8004, which identifies the CALG_SHA algorithm. SHA is another hashing algorithm that has similar properties to MD5, with the difference that an SHA hash is 160 bits long, as opposed to MD5 hashes which are 128 bits long. You might notice that 160 bits are exactly 20 bytes, which is the length of data being hashed in Listing 6.4. Coincidence? You’ll soon find out. . . .
The next sequence calls CryptHashData again, but not before some processing is performed on some data block. If you place a breakpoint on this function and restart the program, you can easily see which data it is that is being processed: It is the password text, which in this case equals 6666666665. Let’s take a look at this processing sequence.
00402335 LEA ESI,DS:[EAX+1]
00402338 MOV CL,DS:[EAX]
0040233A ADD EAX,1
0040233D TEST CL,CL
0040233F JNZ SHORT cryptex.00402338
This loop is really quite simple. It reads each character from the string and checks whether its zero. If it’s not it loops on to the next character. When the loop is completed, EAX points to the string’s terminating NULL character, and ESI points to the second character in the string. The following instruction produces the final result.
00402347 SUB EAX,ESI
Here the pointer to the second character is subtracted from the pointer to the NULL terminator. The result is effectively the length of the string, not including the NULL terminator (because ESI was holding the address to the second character, not the first). This sequence is essentially equivalent to the strlen C runtime library function. You might wonder why the program would implement its own strlen function instead of just calling the runtime library. The answer is that it probably is calling the runtime library, but the compiler is replacing the call with an intrinsic implementation. Some compilers support intrinsic implementations of popular functions, which basically means that the compiler replaces the function call with an actual implementation of the function that is placed inside the calling function. This improves performance because it avoids the overhead of performing a function call.
After measuring the length of the string, the function proceeds to hash the password string using CryptHashData and to extract the resulting hash using CryptGetHashParam. The resulting hash value is then passed on to 00402280, which is the function we investigated in Listing 6.4. This is curious because as we know the function in Listing 6.4 is going to hash that data again, this time using the MD5 algorithm. What is the point of rehashing the output

Deciphering File Formats 217
of one hashing algorithm with another hashing algorithm? That is not clear at the moment.
After the MD5 function returns (and assuming it returns a nonzero value), the function proceeds to call an interesting API called CryptDeriveKey. According to Microsoft’s documentation, CryptDeriveKey “generates cryptographic session keys derived from a base data value.” The base data value is taken for a hash object, which, in this case, is a 160-bit SHA hash calculated from the plaintext password. As a part of the generation of the key object, the caller must also specify which encryption algorithm will be used (this is specified in the second parameter passed to CryptDeriveKey). As you can see in Listing 6.5, Cryptex is passing 0x6603. We return to WinCrypt.H and discover that 0x6603 stands for CALG_3DES. This makes sense and proves that Cryptex works as advertised: It encrypts data using the 3DES algorithm.
When we think about it a little bit, it becomes clear why Cryptex calculated that extra MD5 hash. Essentially, Cryptex is using the generated SHA hash as a key for encrypting and decrypting the data (3DES is a symmetric algorithm, which means that encryption and decryption are both performed using the same key). Additionally, Cryptex needs some kind of an easy way to detect whether the supplied password was correct or incorrect. For this, Cryptex calculates an additional hash (using the MD5 algorithm) from the SHA hash and stores the result in the file header. When an archive is opened, the supplied password is hashed twice (once using SHA and once using MD5), and the MD5 result is compared against the one stored in the archive header. If they match, the password is correct.
You may wonder why Cryptex isn’t just storing the SHA result directly into the file header. Why go through the extra effort of calculating an additional hash value? The reason is that the SHA hash is directly used as the encryption key; storing it in the file header would make it incredibly easy to decrypt Cryptex archives. This might be a bit confusing considering that it is impossible to extract the original plaintext password from the SHA hash value, but it is just not needed. The hash value is all that would be needed in order to decrypt the data. Instead, Cryptex calculates an additional hash from the SHA value and stores that as the unique password identification. Figure 6.1 demonstrates this sequence.
Finally, if you’re wondering why Cryptex isn’t calculating the MD5 password-verification hash directly from the plaintext password but from the SHA hash value, it’s probably because of the (admittedly remote) possibility that someone would be able to covert the MD5 hash value to an equivalent SHA hash value and effectively obtain the decryption key. This is virtually guaranteed to be mathematically impossible, but why risk it? It is certainly going to be impossible to obtain the original data (which is the SHA-generated decryption key) from the MD5 hash value stored in the header. Being overly paranoid is the advisable frame of mind when developing security-related technologies.

218 Chapter 6
Original
Plaintext
Password
SHA Hash |
MD5 Hash |
Cryptex |
(160-bits) |
(128-bits) |
Header |
Raw Data |
3DES |
Encrypted |
|
Encrypter |
Data |
||
|
Figure 6.1 Cryptex’s key-generation and password-verification process.
The Directory Layout
Now that you have a basic understanding of how Cryptex manages its passwords and encryption keys, you can move on to study the Cryptex directory layout. In a real-world program, this step would be somewhat less relevant for those interested in a security-level analysis for Cryptex, but it would be very important for anyone interested in reading or creating Cryptex-compatible archives. Since we’re doing this as an exercise in data reverse engineering, the directory layout is exactly the kind of complex data structure you’re looking to get your hands on.
Analyzing the Directory Processing Code
In order to decipher the directory layout you’ll need to find the location in the Cryptex code that reads the encrypted directory layout data, decrypts it, and proceeds to decipher it. This can be accomplished by simply placing a breakpoint on the ReadFile API and tracing forward in the program to see what it does with the data. Let’s restart the program in OllyDbg (don’t forget to correct the password in the command-line argument), place a breakpoint on ReadFile, and let the program run.

Deciphering File Formats 219
The first hit comes from an internal system call made by ADVAPI32.DLL. Releasing the debugger brings it back to ReadFile again, except that again, it was called internally from system code. You will very quickly realize that there are way too many calls to ReadFile for this approach to work; this API is used by the system heavily.
There are many alternative approaches you could take at this point, depending on the particular application. One option would be to try and restrict the ReadFile breakpoint to calls made on the archive file. You could do this by first placing a breakpoint on the API call that opens or creates the archive (this is probably going to be a call to the CreateFile API), obtain the archive handle from that call, and place a selective breakpoint on ReadFile that only breaks when the specific handle to the Cryptex archive is specified (such breakpoints are supported by most debuggers). This would really reduce the number of calls—you’d only see the relevant calls where Cryptex reads from the archive, and not hundreds of irrelevant system calls.
On the other hand, since Cryptex is really a fairly simple program, you could just let it run until it reached the key-generation function from Listing 6.5. At this point you could just step through the rest of the code until you reach interesting code areas that decipher the directory data structures. Keep in mind that in most real programs you’d have to come up with a better idea for where to place your breakpoint, because simply stepping through the program is going to be an unreasonably tedious task.
You can start by placing a breakpoint at the end of the key-generation function, on address 00402416. Once you reach that address, you can step back into the calling function and step through several irrelevant code sequences, including a call into a function that apparently performs the actual opening of the archive and ends up calling into 004011C0, which is the function analyzed in Listing 6.3. The next function call goes into 004019F0, and (based on a quick look at it) appears to be what we’re looking for. Listing 6.6 lists the OllyDbg-generated disassembly for this function.
004019F0 |
SUB ESP,8 |
|
004019F3 |
PUSH EBX |
|
004019F4 |
PUSH EBP |
|
004019F5 |
PUSH ESI |
|
004019F6 |
MOV ESI,SS:[ESP+18] |
|
004019FA |
XOR EBX,EBX |
|
004019FC |
PUSH EBX |
; Origin => FILE_BEGIN |
004019FD |
PUSH EBX |
; pOffsetHi => NULL |
004019FE |
PUSH EBX |
; OffsetLo => 0 |
004019FF |
PUSH ESI |
; hFile |
00401A00 |
CALL DS:[<&KERNEL32.SetFilePointer>] |
|
00401A06 |
PUSH EBX |
; pOverlapped => NULL |
|
|
|
Listing 6.6 Disassembly of function that lists all files within a Cryptex archive. (continued)

220 Chapter 6
00401A07 |
LEA EAX,SS:[ESP+14] |
; |
|
|
00401A0B |
PUSH EAX |
; pBytesRead |
|
|
00401A0C |
PUSH 28 |
; BytesToRead = 28 (40.) |
|
|
00401A0E |
PUSH cryptex.00406058 |
; Buffer = cryptex.00406058 |
|
|
00401A13 |
PUSH ESI |
; hFile |
|
|
00401A14 |
CALL DS:[<&KERNEL32.ReadFile>] |
|
|
|
00401A1A |
MOV ECX,SS:[ESP+1C] |
|
|
|
00401A1E |
MOV EDX,DS:[406064] |
|
|
|
00401A24 |
PUSH ECX |
|
|
|
00401A25 |
PUSH EDX |
|
|
|
00401A26 |
PUSH ESI |
|
|
|
00401A27 |
CALL cryptex.00401030 |
|
|
|
00401A2C |
MOV EBP,DS:[<&MSVCR71.printf>] |
|
|
|
00401A32 |
MOV ESI,DS:[406064] |
|
|
|
00401A38 |
PUSH cryptex.00403234 |
; format = “ File Size |
File |
|
|
|
Name” |
|
|
00401A3D MOV DWORD PTR SS:[ESP+1C],cryptex.00405050 |
|
|
||
00401A45 |
CALL EBP |
; printf |
|
|
00401A47 |
ADD ESP,10 |
|
|
|
00401A4A |
TEST ESI,ESI |
|
|
|
00401A4C |
JE SHORT cryptex.00401ACD |
|
|
|
00401A4E |
PUSH EDI |
|
|
|
00401A4F |
MOV EDI,SS:[ESP+24] |
|
|
|
00401A53 |
JMP SHORT cryptex.00401A60 |
|
|
|
00401A55 |
LEA ESP,SS:[ESP] |
|
|
|
00401A5C |
LEA ESP,SS:[ESP] |
|
|
|
00401A60 |
MOV ESI,SS:[ESP+10] |
|
|
|
00401A64 |
ADD ESI,8 |
|
|
|
00401A67 |
MOV DWORD PTR SS:[ESP+14],1A |
|
|
|
00401A6F |
NOP |
|
|
|
00401A70 |
MOV EAX,DS:[ESI] |
|
|
|
00401A72 |
TEST EAX,EAX |
|
|
|
00401A74 |
JE SHORT cryptex.00401A9A |
|
|
|
00401A76 |
MOV EDX,EAX |
|
|
|
00401A78 |
SHL EDX,0A |
|
|
|
00401A7B |
SUB EDX,EAX |
|
|
|
00401A7D |
ADD EDX,EDX |
|
|
|
00401A7F |
LEA ECX,DS:[ESI+14] |
|
|
|
00401A82 |
ADD EDX,EDX |
|
|
|
00401A84 |
PUSH ECX |
|
|
|
00401A85 |
SHR EDX,0A |
|
|
|
00401A88 |
PUSH EDX |
|
|
|
00401A89 |
PUSH cryptex.00403250 |
; ASCII “ %10dK |
%s” |
|
00401A8E |
CALL EBP |
|
|
|
00401A90 |
MOV EAX,DS:[ESI] |
|
|
|
00401A92 |
ADD DS:[EDI],EAX |
|
|
|
00401A94 |
ADD ESP,0C |
|
|
|
00401A97 |
ADD EBX,1 |
|
|
|
|
|
|
|
|
Listing 6.6 (continued)