Eilam E.Reversing.Secrets of reverse engineering.2005
.pdf
|
|
Deciphering File Formats 231 |
|
|
|
|
|
00401D59 |
ADD ESP,14 |
|
|
00401D5C |
TEST EDI,EDI |
|
|
00401D5E |
JE cryptex.00401E39 |
|
|
00401D64 |
MOV ESI,DS:[<&KERNEL32.GetConsoleScreenBufferInfo>] |
|
|
00401D6A |
LEA EBX,DS:[EBX] |
|
|
00401D70 |
MOV EDX,DS:[40504C] |
|
|
00401D76 |
LEA ECX,SS:[ESP+2C] |
|
|
00401D7A |
PUSH ECX |
|
|
00401D7B |
PUSH EDX |
|
|
00401D7C |
CALL ESI |
|
|
00401D7E FLD DWORD PTR SS:[ESP+10] |
|
|
|
00401D82 |
SUB ESP,8 |
|
|
00401D85 |
FSTP QWORD PTR SS:[ESP] |
|
|
00401D88 |
PUSH cryptex.00403320 |
; ASCII “%2.2f percent |
|
|
|
completed.” |
|
00401D8D |
CALL EBP |
|
|
00401D8F |
ADD ESP,0C |
|
|
00401D92 |
CMP EDI,1 |
|
|
00401D95 |
MOV EAX,0FFC |
|
|
00401D9A |
JA SHORT cryptex.00401DA1 |
|
|
00401D9C |
MOV EAX,DS:[405050] |
|
|
00401DA1 |
PUSH 0 |
|
|
00401DA3 |
PUSH EAX |
|
|
00401DA4 |
MOV EAX,SS:[ESP+24] |
|
|
00401DA8 |
PUSH cryptex.00405054 |
|
|
00401DAD |
PUSH EAX |
|
|
00401DAE |
CALL DS:[<&ADVAPI32.CryptHashData>] |
|
|
00401DB4 |
TEST EAX,EAX |
|
|
00401DB6 |
JE cryptex.00401EEE |
|
|
00401DBC |
CMP EDI,1 |
|
|
00401DBF |
MOV EAX,0FFC |
|
|
00401DC4 |
JA SHORT cryptex.00401DCB |
|
|
00401DC6 |
MOV EAX,DS:[405050] |
|
|
00401DCB |
MOV EDX,SS:[ESP+14] |
|
|
00401DCF |
PUSH 0 |
; /pOverlapped = NULL |
|
00401DD1 |
LEA ECX,SS:[ESP+2C] |
; | |
|
00401DD5 |
PUSH ECX |
; |pBytesWritten |
|
00401DD6 |
PUSH EAX |
; |nBytesToWrite |
|
00401DD7 |
PUSH cryptex.00405054 |
; |Buffer = cryptex.00405054 |
|
00401DDC |
PUSH EDX |
; |hFile |
|
00401DDD |
CALL DS:[<&KERNEL32.WriteFile>] |
|
|
00401DE3 |
SUB EDI,1 |
|
|
00401DE6 |
JE SHORT cryptex.00401E00 |
|
|
00401DE8 |
MOV EAX,SS:[ESP+8C] |
|
|
00401DEF |
MOV ECX,DS:[405050] |
|
|
00401DF5 |
PUSH EAX |
|
|
00401DF6 |
PUSH ECX |
|
|
00401DF7 |
PUSH EBX |
|
|
|
|
|
|
Listing 6.8 (continued)
232 Chapter 6
00401DF8 |
CALL cryptex.00401030 |
|
00401DFD |
ADD ESP,0C |
|
00401E00 |
MOV EAX,DS:[40504C] |
|
00401E05 |
LEA EDX,SS:[ESP+44] |
|
00401E09 |
PUSH EDX |
|
00401E0A |
PUSH EAX |
|
00401E0B |
CALL ESI |
|
00401E0D |
MOV ECX,SS:[ESP+30] |
|
00401E11 |
MOV EDX,DS:[40504C] |
|
00401E17 |
PUSH ECX |
; /CursorPos |
00401E18 |
PUSH EDX |
; |hConsole => 00000007 |
00401E19 |
CALL DS:[<&KERNEL32.SetConsoleCursorPosition>] |
|
00401E1F |
TEST EDI,EDI |
|
00401E21 |
MOVSS XMM0,SS:[ESP+10] |
|
00401E27 |
ADDSS XMM0,SS:[ESP+20] |
|
00401E2D |
MOVSS SS:[ESP+10],XMM0 |
|
00401E33 |
JNZ cryptex.00401D70 |
|
00401E39 |
FLD QWORD PTR DS:[403B98] |
|
00401E3F |
SUB ESP,8 |
|
00401E42 |
FSTP QWORD PTR SS:[ESP] |
|
00401E45 |
PUSH cryptex.00403368 |
; ASCII “%2.2f percent |
|
|
completed.” |
00401E4A |
CALL EBP |
|
00401E4C |
PUSH cryptex.00403384 |
|
00401E51 |
CALL EBP |
|
00401E53 |
XOR EAX,EAX |
|
00401E55 |
MOV SS:[ESP+6D],EAX |
|
00401E59 |
MOV SS:[ESP+71],EAX |
|
00401E5D |
MOV SS:[ESP+75],EAX |
|
00401E61 |
MOV SS:[ESP+79],AX |
|
00401E66 |
ADD ESP,10 |
|
00401E69 |
LEA ECX,SS:[ESP+24] |
|
00401E6D |
LEA EDX,SS:[ESP+5C] |
|
00401E71 |
MOV SS:[ESP+6B],AL |
|
00401E75 |
MOV BYTE PTR SS:[ESP+5C],0 |
|
00401E7A MOV DWORD PTR SS:[ESP+24],10 |
|
|
00401E82 |
PUSH EAX |
|
00401E83 |
MOV EAX,SS:[ESP+20] |
|
00401E87 |
PUSH ECX |
|
00401E88 |
PUSH EDX |
|
00401E89 |
PUSH 2 |
|
00401E8B |
PUSH EAX |
|
00401E8C |
CALL DS:[<&ADVAPI32.CryptGetHashParam>] |
|
00401E92 |
TEST EAX,EAX |
|
00401E94 |
JNZ SHORT cryptex.00401EA0 |
|
00401E96 |
PUSH cryptex.00403388 |
; ASCII “Unable to obtain MD5 |
|
|
hash value for file.” |
|
|
|
Listing 6.8 (continued)
Deciphering File Formats 233
00401E9B |
CALL EBP |
|
00401E9D |
ADD ESP,4 |
|
00401EA0 |
MOV ECX,4 |
|
00401EA5 |
LEA EDI,SS:[ESP+6C] |
|
00401EA9 |
LEA ESI,SS:[ESP+5C] |
|
00401EAD |
XOR EDX,EDX |
|
00401EAF REPE CMPS DWORD PTR ES:[EDI],DWORD PTR DS:[ESI] |
||
00401EB1 |
JE SHORT cryptex.00401EC2 |
|
00401EB3 |
MOV EAX,SS:[ESP+18] |
|
00401EB7 |
PUSH EAX |
|
00401EB8 |
PUSH cryptex.004033B4 |
; ASCII “ERROR: File “%s” is |
|
|
corrupted!” |
00401EBD |
CALL EBP |
|
00401EBF |
ADD ESP,8 |
|
00401EC2 |
MOV ECX,SS:[ESP+1C] |
|
00401EC6 |
PUSH ECX |
|
00401EC7 |
CALL DS:[<&ADVAPI32.CryptDestroyHash>] |
|
00401ECD |
MOV EDX,SS:[ESP+14] |
|
00401ED1 |
MOV ESI,DS:[<&KERNEL32.CloseHandle>] |
|
00401ED7 |
PUSH EDX |
; /hObject |
00401ED8 |
CALL ESI |
; \CloseHandle |
00401EDA |
PUSH EBX |
; /hObject |
00401EDB |
CALL ESI |
; \CloseHandle |
00401EDD |
MOV ECX,SS:[ESP+7C] |
|
00401EE1 |
POP ESI |
|
00401EE2 |
POP EBP |
|
00401EE3 |
POP EDI |
|
00401EE4 |
POP EBX |
|
00401EE5 |
CALL cryptex.004027C9 |
|
00401EEA |
ADD ESP,70 |
|
00401EED |
RETN |
|
|
|
|
Listing 6.8 (continued)
Let’s begin with a quick summary of the most important operations performed by the function in Listing 6.8. The function starts by opening the archive file. This is done by calling a function at 00401670, which opens the archive and proceeds to call into the header and password verification function at 004011C0, which you analyzed in Listing 6.3. After 00401670 returns the function proceeds to create a hash object of the same type you saw earlier that was used for calculating the password hash. This time the algorithm type is 0x8003, which is ALG_SID_MD5. The purpose of this hash object is still unclear.
The code then proceeds to read the Cryptex header into the same global variable at 00406058 that you encountered earlier, and to search the file list for the relevant file entry.
234 Chapter 6
Scanning the File List
The scanning of the file list is performed by calling a function at 004017B0, which goes through a familiar route of scanning the file list and comparing each name with the name of the file being extracted. Once the correct item is found the function retrieves several fields from the file entry. The following is the code that is executed in the file searching routine once a file entry is found.
00401881 |
MOV ECX,SS:[ESP+10] |
00401885 |
LEA EAX,DS:[ESI+ESI*4] |
00401888 |
ADD EAX,EAX |
0040188A |
ADD EAX,EAX |
0040188C |
SUB EAX,ESI |
0040188E |
MOV EDX,DS:[ECX+EAX*8+8] |
00401892 |
LEA EAX,DS:[ECX+EAX*8] |
00401895 |
MOV ECX,SS:[ESP+24] |
00401899 |
MOV DS:[ECX],EDX |
0040189B |
MOV ECX,SS:[ESP+28] |
0040189F |
TEST ECX,ECX |
004018A1 |
JE SHORT cryptex.004018BC |
004018A3 |
LEA EDX,DS:[EAX+C] |
004018A6 |
MOV ESI,DS:[EDX] |
004018A8 |
MOV DS:[ECX],ESI |
004018AA |
MOV ESI,DS:[EDX+4] |
004018AD |
MOV DS:[ECX+4],ESI |
004018B0 |
MOV ESI,DS:[EDX+8] |
004018B3 |
MOV DS:[ECX+8],ESI |
004018B6 |
MOV EDX,DS:[EDX+C] |
004018B9 |
MOV DS:[ECX+C],EDX |
004018BC |
MOV EAX,DS:[EAX+4] |
First of all, let’s inspect what is obviously an optimized arithmetic sequence of some sort in the beginning of this sequence. It can be slightly confusing because of the use of the LEA instruction, but LEA doesn’t have to deal with addresses. The LEA at 00401885 is essentially multiplying ESI by 5 and storing the result in EAX. If you go back to the beginning of this function, it is easy to see that ESI is essentially employed as a counter; it is initialized to zero and then incremented by one with each item that is traversed. However, once all file entries in the current cluster are scanned (remember there are 0x1A entries), ESI is set to zero again. This implies that ESI is used as the index into the current file entry in the current cluster.
Let’s return to the arithmetic sequence and try to figure out what it is doing. You’ve already established that the first LEA is multiplying ESI by 5. This is followed by two ADDs that effectively multiply ESI by itself. The bottom line is that ESI is being multiplied by 20 and is then subtracted by its original value. This is equivalent to multiplying ESI by 19. Lovely isn’t it? The next line at 0040188E actually uses the outcome of this computation (which is now in EAX) as an
Deciphering File Formats 235
index, but not before it multiplies it by 8. This line essentially takes ESI, which was an index to the current file entry, and multiplies it by 19 * 8 = 152. Sounds familiar doesn’t it? You’re right: 152 is the file entry length. By computing [ECX+EAX*8+8], Cryptex is obtaining the value of offset +8 at the current file entry.
We already know that offset +8 contains the file size in clusters, and this value is being sent back to the caller using a parameter that was passed in to receive this value. Cryptex needs the file size in order to extract the file. After loading the file size, Cryptex checks for what is apparently another output parameter that is supposed to receive additional output data from this function, this time at [ESP+28]. If it is nonzero, Cryptex copies the value from offset +C at the file entry into the pointer that was passed and proceeds to copy offset +10 into offset +4 in the pointer that was passed, and so on, until a total of four DWORDs, or 16 bytes are copied. As a reminder, those 16 bytes are the ones that looked like junk when you dumped the file list earlier. Before returning to the caller, the function loads offset +4 at the current file entry and sets that into EAX—it is returning it to the caller.
To summarize, this sequence scans the file list looking for a specific file name, and once that entry is found it returns three individual items to the caller. The file size in clusters, an unknown, seemingly random 16-byte sequence, and another unknown DWORD from offset +4 in the file entry. Let’s proceed to see how this data is used by the file extraction routine.
Decrypting the File
After returning from 004017B0, Cryptex proceeds to scan the supplied file name for backslashes and loops until the last backslash is encountered. The actual scanning is performed using the C runtime library function strchr, which simply returns the address of the first instance of the character, if one is found. The address that points to the last backslash is stored in [ESP+20]; this is essentially the “clean” version of the file name without any path information. One instruction that draws attention in this otherwise trivial sequence is the one at 00401C9E.
00401C9E MOV EDI,EDI
You might recall that we’ve already seen a similar instruction in the previous chapter. In that case, it was used as an infrastructure to allow people to trap system APIs in Windows. This case is not relevant here, so why would the compiler insert an instruction that does nothing into the middle of a function? The answer is simple. The address in which this instruction begins is unaligned, which means that it doesn’t start on a 32-bit boundary. Executing unaligned instructions (or accessing unaligned memory addresses in general)
236Chapter 6
takes longer for 32-bit processors. By placing this instruction before the loop starts the compiler ensured that the loop won’t begin on an unaligned instruction. Also, notice that again the compiler could have used NOPs, but instead used this instruction which does nothing, yet accurately fills the 2-byte gap that was present.
After obtaining a backslash-free version of the file name, the function goes to create the new file that will contain the extracted data. After creating the file the function checks that 004017B0 actually found a file by testing EBP, which is where the function’s return value was stored. If it is zero, Cryptex displays a file not found error message and quits. If EBP is nonzero, Cryptex calls the familiar 00401030, which reads and decrypts a sector, while using EBP (the return value from 004017B0) as the second parameter, which is treated as the cluster number to read and decrypt.
So, you now know that 004017B0 returns a cluster index, but you’re not sure what this cluster index is. It doesn’t take much guesswork to figure out that this is the cluster index of the file you’re trying to extract, or at least the first cluster for the file you’re trying to extract (most files are probably going to occupy more than one cluster). If you go back to our discussion of the file lookup function, you see that its return value came from offset +4 in the file entry (see instruction at 004018BC). The bottom line is that you now know that offset +4 in the file entry contains the index of the first data cluster.
If you look in the debugger, you will see that the third parameter is a pointer into which the data was decrypted, and that after the function returns this buffer contains the lovely asterisks! It is important to note that the asterisks are preceded by a 4-byte value: 0000046E. A quick conversion reveals that this number equals 1134, which is the exact file size of the original asterisks.txt file you encrypted earlier.
The Floating-Point Sequence
If you go back to the extraction sequence from Listing 6.8, you will find that after reading the first cluster you run into a code sequence that contains some highly unusual instructions. Even though these instructions are not particularly important to the extraction process (in fact, they are probably the least important part of the sequence), you should still take a close look at them just to make sure that you can properly decipher this type of code. Here is the sequence I am referring to:
00401D28 FILD DWORD PTR SS:[ESP+2C] 00401D2C JGE SHORT cryptex.00401D34 00401D2E FADD DWORD PTR DS:[403BA0] 00401D34 FDIVR QWORD PTR DS:[403B98] 00401D3A MOV EAX,SS:[ESP+24]
|
|
Deciphering File Formats 237 |
00401D3E |
XORPS XMM0,XMM0 |
|
00401D41 |
MOV EBP,DS:[<&MSVCR71.printf>] |
|
00401D47 |
PUSH EAX |
|
00401D48 |
PUSH cryptex.00403308 |
; ASCII “Extracting “%.35s” - “ |
00401D4D |
MOVSS SS:[ESP+24],XMM0 |
|
00401D53 |
FSTP DWORD PTR SS:[ESP+34] |
|
00401D57 |
CALL EBP |
|
This sequence looks unusual because it contains quite a few instructions that you haven’t encountered before. What are those instructions? A quick trip to the Intel IA-32 Instruction Set Reference document [Intel2], [Intel3] reveals that most of these instructions are floating-point arithmetic instructions. The sequence starts with an FILD instruction that simply loads a regular 32-bit integer from [ESP+2C] (which is where the file’s total cluster count is stored), converts it into an 80-bit double extended-precision floating-point number and stores it in a special floating-point stack. The floating-point is a set of float- ing-point registers that store the values that are currently in use by the processor. It can be seen as a simple group of registers where the CPU manages their allocation.
The next floating-point instruction is an FADD, which is only executed if [ESP+2C] is a negative number. This FADD adds an immediate floating-point number stored at 00403BA0 to the value currently stored at the top of the floating-point stack. Notice that unlike the FILD instruction, which loads an integer into the floating-point stack, this FADD uses a floating-point number in memory, so simply dumping the value at 00403BA0 as a 32-bit number shows its value as 4F800000. This is irrelevant since you must view this number is a 32-bit floating-point number, which is what FADD expects as an operand. When you instruct OllyDbg to treat this data as a 32-bit floating-point number, you come up with 4.294967e+09.
This number might seem like pure nonsense, but its not. A trained eye immediately recognizes that it is conspicuously similar to the value of 232: 4,294,967,296. It is in fact not similar, but identical to 232. The idea here is quite simple. Apparently FILD always treats the integers as signed, but the original program declared an unsigned integer that was to be converted into a floatingpoint form. To force the CPU to always treat these values as signed the compiler generated code that adds 232 to the variable if it has its most significant bit set. This would convert the signed negative number in the floating-point stack to the correct positive value that it should have been assigned in the first place.
After correcting the loaded number, Cryptex uses the FDIVR instruction to divide a constant from 00403B98 by the number from the top of the floatingpoint stack. This time the number is a 64-bit floating-point number (according to the Intel documentation), so you can ask OllyDbg to dump data starting at 00403B98 as 64-bit floating point. Olly displays 100.0000000000000, which means that Cryptex is dividing 100.0 by the total number of clusters.
238 Chapter 6
The next instruction loads the file name address from [ESP+24] to EAX and proceeds to another unusual instruction called XORPS, which takes an unusual operand called XMM0. This is part of a completely separate instruction set called SSE2 that is supported by most currently available implementations of IA-32 processors. The SSE2 instruction set contains Single Instruction Multiple Data (SIMD) instructions that can operate on several groups of operands at the same time. This can create significant performance boosts for computationally intensive programs such as multimedia and content creation applications. XMM0 is the first of 8 special, 128-bit registers names: XMM0 through XMM7. These registers can only be accessed using SSE instructions, and their contents are usually made up of several smaller operands. In this particular case, the XORPS instruction XORs the entire contents of the first SSE register with the second SSE register. Because XORPS is XORing a value with itself, it is essentially setting the value of XMM0 to zero.
The FSTP instruction that comes next stores the value from the top of the floating-point stack into [ESP+34]. As you can see from the DWORD PTR that precedes the address, the instruction treats the memory address as a 32-bit location, and will convert the value to a 32-bit floating-point representation. As a reminder, the value currently stored at the top of the floating-point stack is the result of the earlier division operation.
The Decryption Loop
At this point, we enter into what is clearly a loop that continuously reads and decrypts additional clusters using 00401030, hashes that data using CryptHashData, and writes the block to the file that was opened earlier using the WriteFile API.
At this point, you can also easily see what all of this floating-point business was about. With each cluster that is decrypted Cryptex is printing an accurate floating-point number that shows the percentage of the file that has been written so far. By dividing 100.0 by the total number of clusters earlier, Cryptex simply determined a step size by which it will increment the current completed percentage after each written cluster.
One thing that is interesting is how Cryptex knows which cluster to read next. Because Cryptex supports deleting files from archives, files are not guaranteed to be stored sequentially within the archive. Because of this, Cryptex always reads the next cluster index from 00405050 and passes that to 00401030 when reading the next cluster. 00405050 is the beginning of the currently active cluster buffer. This indicates that, just like in the file list, the first DWORD in a cluster contains the next cluster index in the current chain. One interesting aspect of this design is revealed in the following lines.
Deciphering File Formats 239
00401DBC CMP EDI,1 00401DBF MOV EAX,0FFC
00401DC4 JA SHORT cryptex.00401DCB 00401DC6 MOV EAX,DS:[405050] 00401DCB ...
At any given moment during this loop EDI contains the number of clusters left to go. When there is more than one cluster to go (EDI > 1), the number of bytes to be read (stored in EAX) is hard-coded to 0xFFC (4092 bytes), which is probably just the maximum number of bytes in a cluster. When Cryptex writes the last cluster in the file, it takes the number of bytes to write from the first DWORD in the cluster—the very same spot where the next cluster index is usually stored. Get it? Because Cryptex knows that this is the last cluster, the location where the next cluster index is stored is unused, so Cryptex uses that location to store the actual number of bytes that were stored in the last cluster. This is how Cryptex works around the problem of not directly storing the actual file size but merely storing the number of clusters it uses.
Verifying the Hash Value
After the final cluster is decrypted and written into the extracted file, Cryptex calls CryptGetHashParam to recover the MD5 hash value that was calculated out of the entire decrypted data. This is compared against that 16-bytes sequence that was returned from 004017B0 (recall that these 16-bytes were retrieved from the file’s entry in the file table). If there’s a mismatch Cryptex prints an error message saying the file is corrupted. Clearly the MD5 hash is used here as a conventional checksum; for every file that is encrypted an MD5 hash is calculated, and Cryptex verifies that the data hasn’t been tampered with inside the archive.
The Big Picture
At this point, we have developed a fairly solid understanding of the .crx file format. This section provides a brief overview of all the information gathered in this reversing session. You have deciphered the meaning of most of the
.crx fields, at least the ones that matter if you were to write a program that views or dumps an archive. Figure 6.2 illustrates what you know about the Cryptex header.
The Cryptex header comprises a standard 8-byte signature that contains the string CrYpTeX9. The header contains a 16-byte MD5 checksum that is used for confirming the user-supplied password. Cryptex archives are encrypted using a Crypto-API implementation of the triple-DES algorithm. The tripleDES key is generated by hashing the user-supplied password using the SHA
240Chapter 6
algorithm and treating the resulting 160-bit hash as the key. The same 160-bit key is hashed again using the MD5 algorithm and the resulting 16-byte hash is the one that ends up in the Cryptex header—it looks as if the only reason for its existence is so that Cryptex can verify that the typed password matches the one that was used when the archive was created.
You have learned that Cryptex archives are divided into fixed-sized clusters. Some clusters contain file list information while others contain actual file data. Information inside Cryptex archives is always managed on a cluster level; there are apparently no bigger or smaller chunks that are supported in the file format. All clusters are encrypted using the triple-DES algorithm with the key derived from the SHA hash; this applies to both file list clusters and actual file data clusters. The actual size of a single cluster is 4,104 bytes, yet the actual content is only 4,092 bytes. The first 4 bytes in a cluster generally contain the index of the next cluster (yet there are several exceptions), so that explains the 4,096 bytes. We have not been able to determine the reason for those extra 8 bytes that make up a cluster.
The next interesting element in the Cryptex archive is the file list data structure. A file list is made up of one or more clusters, and each cluster contains 26 file entries. Figure 6.3 illustrates what is known about a single file entry.
Cryptex File Header Structure
Signature1 () |
Offset +00 |
|
Signature2 () |
Offset +04 |
|
Unknown |
Offset +08 |
|
First File-List Cluster |
Offset +0C |
|
Unknown |
Offset +10 |
|
Unknown |
Offset +14 |
|
|
Offset +18 |
|
Password Hash |
Offset +1C |
|
Offset +20 |
||
|
||
|
Offset +24 |
Figure 6.2 The Cryptex header.