Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Eilam E.Reversing.Secrets of reverse engineering.2005

.pdf
Скачиваний:
65
Добавлен:
23.08.2013
Размер:
8.78 Mб
Скачать

 

 

Deciphering File Formats 231

 

 

 

 

00401D59

ADD ESP,14

 

 

00401D5C

TEST EDI,EDI

 

 

00401D5E

JE cryptex.00401E39

 

 

00401D64

MOV ESI,DS:[<&KERNEL32.GetConsoleScreenBufferInfo>]

 

00401D6A

LEA EBX,DS:[EBX]

 

 

00401D70

MOV EDX,DS:[40504C]

 

 

00401D76

LEA ECX,SS:[ESP+2C]

 

 

00401D7A

PUSH ECX

 

 

00401D7B

PUSH EDX

 

 

00401D7C

CALL ESI

 

 

00401D7E FLD DWORD PTR SS:[ESP+10]

 

 

00401D82

SUB ESP,8

 

 

00401D85

FSTP QWORD PTR SS:[ESP]

 

 

00401D88

PUSH cryptex.00403320

; ASCII “%2.2f percent

 

 

 

completed.”

 

00401D8D

CALL EBP

 

 

00401D8F

ADD ESP,0C

 

 

00401D92

CMP EDI,1

 

 

00401D95

MOV EAX,0FFC

 

 

00401D9A

JA SHORT cryptex.00401DA1

 

 

00401D9C

MOV EAX,DS:[405050]

 

 

00401DA1

PUSH 0

 

 

00401DA3

PUSH EAX

 

 

00401DA4

MOV EAX,SS:[ESP+24]

 

 

00401DA8

PUSH cryptex.00405054

 

 

00401DAD

PUSH EAX

 

 

00401DAE

CALL DS:[<&ADVAPI32.CryptHashData>]

 

00401DB4

TEST EAX,EAX

 

 

00401DB6

JE cryptex.00401EEE

 

 

00401DBC

CMP EDI,1

 

 

00401DBF

MOV EAX,0FFC

 

 

00401DC4

JA SHORT cryptex.00401DCB

 

 

00401DC6

MOV EAX,DS:[405050]

 

 

00401DCB

MOV EDX,SS:[ESP+14]

 

 

00401DCF

PUSH 0

; /pOverlapped = NULL

 

00401DD1

LEA ECX,SS:[ESP+2C]

; |

 

00401DD5

PUSH ECX

; |pBytesWritten

 

00401DD6

PUSH EAX

; |nBytesToWrite

 

00401DD7

PUSH cryptex.00405054

; |Buffer = cryptex.00405054

 

00401DDC

PUSH EDX

; |hFile

 

00401DDD

CALL DS:[<&KERNEL32.WriteFile>]

 

00401DE3

SUB EDI,1

 

 

00401DE6

JE SHORT cryptex.00401E00

 

 

00401DE8

MOV EAX,SS:[ESP+8C]

 

 

00401DEF

MOV ECX,DS:[405050]

 

 

00401DF5

PUSH EAX

 

 

00401DF6

PUSH ECX

 

 

00401DF7

PUSH EBX

 

 

 

 

 

 

Listing 6.8 (continued)

232 Chapter 6

00401DF8

CALL cryptex.00401030

 

00401DFD

ADD ESP,0C

 

00401E00

MOV EAX,DS:[40504C]

 

00401E05

LEA EDX,SS:[ESP+44]

 

00401E09

PUSH EDX

 

00401E0A

PUSH EAX

 

00401E0B

CALL ESI

 

00401E0D

MOV ECX,SS:[ESP+30]

 

00401E11

MOV EDX,DS:[40504C]

 

00401E17

PUSH ECX

; /CursorPos

00401E18

PUSH EDX

; |hConsole => 00000007

00401E19

CALL DS:[<&KERNEL32.SetConsoleCursorPosition>]

00401E1F

TEST EDI,EDI

 

00401E21

MOVSS XMM0,SS:[ESP+10]

 

00401E27

ADDSS XMM0,SS:[ESP+20]

 

00401E2D

MOVSS SS:[ESP+10],XMM0

 

00401E33

JNZ cryptex.00401D70

 

00401E39

FLD QWORD PTR DS:[403B98]

 

00401E3F

SUB ESP,8

 

00401E42

FSTP QWORD PTR SS:[ESP]

 

00401E45

PUSH cryptex.00403368

; ASCII “%2.2f percent

 

 

completed.”

00401E4A

CALL EBP

 

00401E4C

PUSH cryptex.00403384

 

00401E51

CALL EBP

 

00401E53

XOR EAX,EAX

 

00401E55

MOV SS:[ESP+6D],EAX

 

00401E59

MOV SS:[ESP+71],EAX

 

00401E5D

MOV SS:[ESP+75],EAX

 

00401E61

MOV SS:[ESP+79],AX

 

00401E66

ADD ESP,10

 

00401E69

LEA ECX,SS:[ESP+24]

 

00401E6D

LEA EDX,SS:[ESP+5C]

 

00401E71

MOV SS:[ESP+6B],AL

 

00401E75

MOV BYTE PTR SS:[ESP+5C],0

 

00401E7A MOV DWORD PTR SS:[ESP+24],10

 

00401E82

PUSH EAX

 

00401E83

MOV EAX,SS:[ESP+20]

 

00401E87

PUSH ECX

 

00401E88

PUSH EDX

 

00401E89

PUSH 2

 

00401E8B

PUSH EAX

 

00401E8C

CALL DS:[<&ADVAPI32.CryptGetHashParam>]

00401E92

TEST EAX,EAX

 

00401E94

JNZ SHORT cryptex.00401EA0

 

00401E96

PUSH cryptex.00403388

; ASCII “Unable to obtain MD5

 

 

hash value for file.”

 

 

 

Listing 6.8 (continued)

Deciphering File Formats 233

00401E9B

CALL EBP

 

00401E9D

ADD ESP,4

 

00401EA0

MOV ECX,4

 

00401EA5

LEA EDI,SS:[ESP+6C]

 

00401EA9

LEA ESI,SS:[ESP+5C]

 

00401EAD

XOR EDX,EDX

 

00401EAF REPE CMPS DWORD PTR ES:[EDI],DWORD PTR DS:[ESI]

00401EB1

JE SHORT cryptex.00401EC2

 

00401EB3

MOV EAX,SS:[ESP+18]

 

00401EB7

PUSH EAX

 

00401EB8

PUSH cryptex.004033B4

; ASCII “ERROR: File “%s” is

 

 

corrupted!”

00401EBD

CALL EBP

 

00401EBF

ADD ESP,8

 

00401EC2

MOV ECX,SS:[ESP+1C]

 

00401EC6

PUSH ECX

 

00401EC7

CALL DS:[<&ADVAPI32.CryptDestroyHash>]

00401ECD

MOV EDX,SS:[ESP+14]

 

00401ED1

MOV ESI,DS:[<&KERNEL32.CloseHandle>]

00401ED7

PUSH EDX

; /hObject

00401ED8

CALL ESI

; \CloseHandle

00401EDA

PUSH EBX

; /hObject

00401EDB

CALL ESI

; \CloseHandle

00401EDD

MOV ECX,SS:[ESP+7C]

 

00401EE1

POP ESI

 

00401EE2

POP EBP

 

00401EE3

POP EDI

 

00401EE4

POP EBX

 

00401EE5

CALL cryptex.004027C9

 

00401EEA

ADD ESP,70

 

00401EED

RETN

 

 

 

 

Listing 6.8 (continued)

Let’s begin with a quick summary of the most important operations performed by the function in Listing 6.8. The function starts by opening the archive file. This is done by calling a function at 00401670, which opens the archive and proceeds to call into the header and password verification function at 004011C0, which you analyzed in Listing 6.3. After 00401670 returns the function proceeds to create a hash object of the same type you saw earlier that was used for calculating the password hash. This time the algorithm type is 0x8003, which is ALG_SID_MD5. The purpose of this hash object is still unclear.

The code then proceeds to read the Cryptex header into the same global variable at 00406058 that you encountered earlier, and to search the file list for the relevant file entry.

234 Chapter 6

Scanning the File List

The scanning of the file list is performed by calling a function at 004017B0, which goes through a familiar route of scanning the file list and comparing each name with the name of the file being extracted. Once the correct item is found the function retrieves several fields from the file entry. The following is the code that is executed in the file searching routine once a file entry is found.

00401881

MOV ECX,SS:[ESP+10]

00401885

LEA EAX,DS:[ESI+ESI*4]

00401888

ADD EAX,EAX

0040188A

ADD EAX,EAX

0040188C

SUB EAX,ESI

0040188E

MOV EDX,DS:[ECX+EAX*8+8]

00401892

LEA EAX,DS:[ECX+EAX*8]

00401895

MOV ECX,SS:[ESP+24]

00401899

MOV DS:[ECX],EDX

0040189B

MOV ECX,SS:[ESP+28]

0040189F

TEST ECX,ECX

004018A1

JE SHORT cryptex.004018BC

004018A3

LEA EDX,DS:[EAX+C]

004018A6

MOV ESI,DS:[EDX]

004018A8

MOV DS:[ECX],ESI

004018AA

MOV ESI,DS:[EDX+4]

004018AD

MOV DS:[ECX+4],ESI

004018B0

MOV ESI,DS:[EDX+8]

004018B3

MOV DS:[ECX+8],ESI

004018B6

MOV EDX,DS:[EDX+C]

004018B9

MOV DS:[ECX+C],EDX

004018BC

MOV EAX,DS:[EAX+4]

First of all, let’s inspect what is obviously an optimized arithmetic sequence of some sort in the beginning of this sequence. It can be slightly confusing because of the use of the LEA instruction, but LEA doesn’t have to deal with addresses. The LEA at 00401885 is essentially multiplying ESI by 5 and storing the result in EAX. If you go back to the beginning of this function, it is easy to see that ESI is essentially employed as a counter; it is initialized to zero and then incremented by one with each item that is traversed. However, once all file entries in the current cluster are scanned (remember there are 0x1A entries), ESI is set to zero again. This implies that ESI is used as the index into the current file entry in the current cluster.

Let’s return to the arithmetic sequence and try to figure out what it is doing. You’ve already established that the first LEA is multiplying ESI by 5. This is followed by two ADDs that effectively multiply ESI by itself. The bottom line is that ESI is being multiplied by 20 and is then subtracted by its original value. This is equivalent to multiplying ESI by 19. Lovely isn’t it? The next line at 0040188E actually uses the outcome of this computation (which is now in EAX) as an

Deciphering File Formats 235

index, but not before it multiplies it by 8. This line essentially takes ESI, which was an index to the current file entry, and multiplies it by 19 * 8 = 152. Sounds familiar doesn’t it? You’re right: 152 is the file entry length. By computing [ECX+EAX*8+8], Cryptex is obtaining the value of offset +8 at the current file entry.

We already know that offset +8 contains the file size in clusters, and this value is being sent back to the caller using a parameter that was passed in to receive this value. Cryptex needs the file size in order to extract the file. After loading the file size, Cryptex checks for what is apparently another output parameter that is supposed to receive additional output data from this function, this time at [ESP+28]. If it is nonzero, Cryptex copies the value from offset +C at the file entry into the pointer that was passed and proceeds to copy offset +10 into offset +4 in the pointer that was passed, and so on, until a total of four DWORDs, or 16 bytes are copied. As a reminder, those 16 bytes are the ones that looked like junk when you dumped the file list earlier. Before returning to the caller, the function loads offset +4 at the current file entry and sets that into EAX—it is returning it to the caller.

To summarize, this sequence scans the file list looking for a specific file name, and once that entry is found it returns three individual items to the caller. The file size in clusters, an unknown, seemingly random 16-byte sequence, and another unknown DWORD from offset +4 in the file entry. Let’s proceed to see how this data is used by the file extraction routine.

Decrypting the File

After returning from 004017B0, Cryptex proceeds to scan the supplied file name for backslashes and loops until the last backslash is encountered. The actual scanning is performed using the C runtime library function strchr, which simply returns the address of the first instance of the character, if one is found. The address that points to the last backslash is stored in [ESP+20]; this is essentially the “clean” version of the file name without any path information. One instruction that draws attention in this otherwise trivial sequence is the one at 00401C9E.

00401C9E MOV EDI,EDI

You might recall that we’ve already seen a similar instruction in the previous chapter. In that case, it was used as an infrastructure to allow people to trap system APIs in Windows. This case is not relevant here, so why would the compiler insert an instruction that does nothing into the middle of a function? The answer is simple. The address in which this instruction begins is unaligned, which means that it doesn’t start on a 32-bit boundary. Executing unaligned instructions (or accessing unaligned memory addresses in general)

236Chapter 6

takes longer for 32-bit processors. By placing this instruction before the loop starts the compiler ensured that the loop won’t begin on an unaligned instruction. Also, notice that again the compiler could have used NOPs, but instead used this instruction which does nothing, yet accurately fills the 2-byte gap that was present.

After obtaining a backslash-free version of the file name, the function goes to create the new file that will contain the extracted data. After creating the file the function checks that 004017B0 actually found a file by testing EBP, which is where the function’s return value was stored. If it is zero, Cryptex displays a file not found error message and quits. If EBP is nonzero, Cryptex calls the familiar 00401030, which reads and decrypts a sector, while using EBP (the return value from 004017B0) as the second parameter, which is treated as the cluster number to read and decrypt.

So, you now know that 004017B0 returns a cluster index, but you’re not sure what this cluster index is. It doesn’t take much guesswork to figure out that this is the cluster index of the file you’re trying to extract, or at least the first cluster for the file you’re trying to extract (most files are probably going to occupy more than one cluster). If you go back to our discussion of the file lookup function, you see that its return value came from offset +4 in the file entry (see instruction at 004018BC). The bottom line is that you now know that offset +4 in the file entry contains the index of the first data cluster.

If you look in the debugger, you will see that the third parameter is a pointer into which the data was decrypted, and that after the function returns this buffer contains the lovely asterisks! It is important to note that the asterisks are preceded by a 4-byte value: 0000046E. A quick conversion reveals that this number equals 1134, which is the exact file size of the original asterisks.txt file you encrypted earlier.

The Floating-Point Sequence

If you go back to the extraction sequence from Listing 6.8, you will find that after reading the first cluster you run into a code sequence that contains some highly unusual instructions. Even though these instructions are not particularly important to the extraction process (in fact, they are probably the least important part of the sequence), you should still take a close look at them just to make sure that you can properly decipher this type of code. Here is the sequence I am referring to:

00401D28 FILD DWORD PTR SS:[ESP+2C] 00401D2C JGE SHORT cryptex.00401D34 00401D2E FADD DWORD PTR DS:[403BA0] 00401D34 FDIVR QWORD PTR DS:[403B98] 00401D3A MOV EAX,SS:[ESP+24]

 

 

Deciphering File Formats 237

00401D3E

XORPS XMM0,XMM0

 

00401D41

MOV EBP,DS:[<&MSVCR71.printf>]

00401D47

PUSH EAX

 

00401D48

PUSH cryptex.00403308

; ASCII “Extracting “%.35s” - “

00401D4D

MOVSS SS:[ESP+24],XMM0

 

00401D53

FSTP DWORD PTR SS:[ESP+34]

 

00401D57

CALL EBP

 

This sequence looks unusual because it contains quite a few instructions that you haven’t encountered before. What are those instructions? A quick trip to the Intel IA-32 Instruction Set Reference document [Intel2], [Intel3] reveals that most of these instructions are floating-point arithmetic instructions. The sequence starts with an FILD instruction that simply loads a regular 32-bit integer from [ESP+2C] (which is where the file’s total cluster count is stored), converts it into an 80-bit double extended-precision floating-point number and stores it in a special floating-point stack. The floating-point is a set of float- ing-point registers that store the values that are currently in use by the processor. It can be seen as a simple group of registers where the CPU manages their allocation.

The next floating-point instruction is an FADD, which is only executed if [ESP+2C] is a negative number. This FADD adds an immediate floating-point number stored at 00403BA0 to the value currently stored at the top of the floating-point stack. Notice that unlike the FILD instruction, which loads an integer into the floating-point stack, this FADD uses a floating-point number in memory, so simply dumping the value at 00403BA0 as a 32-bit number shows its value as 4F800000. This is irrelevant since you must view this number is a 32-bit floating-point number, which is what FADD expects as an operand. When you instruct OllyDbg to treat this data as a 32-bit floating-point number, you come up with 4.294967e+09.

This number might seem like pure nonsense, but its not. A trained eye immediately recognizes that it is conspicuously similar to the value of 232: 4,294,967,296. It is in fact not similar, but identical to 232. The idea here is quite simple. Apparently FILD always treats the integers as signed, but the original program declared an unsigned integer that was to be converted into a floatingpoint form. To force the CPU to always treat these values as signed the compiler generated code that adds 232 to the variable if it has its most significant bit set. This would convert the signed negative number in the floating-point stack to the correct positive value that it should have been assigned in the first place.

After correcting the loaded number, Cryptex uses the FDIVR instruction to divide a constant from 00403B98 by the number from the top of the floatingpoint stack. This time the number is a 64-bit floating-point number (according to the Intel documentation), so you can ask OllyDbg to dump data starting at 00403B98 as 64-bit floating point. Olly displays 100.0000000000000, which means that Cryptex is dividing 100.0 by the total number of clusters.

238 Chapter 6

The next instruction loads the file name address from [ESP+24] to EAX and proceeds to another unusual instruction called XORPS, which takes an unusual operand called XMM0. This is part of a completely separate instruction set called SSE2 that is supported by most currently available implementations of IA-32 processors. The SSE2 instruction set contains Single Instruction Multiple Data (SIMD) instructions that can operate on several groups of operands at the same time. This can create significant performance boosts for computationally intensive programs such as multimedia and content creation applications. XMM0 is the first of 8 special, 128-bit registers names: XMM0 through XMM7. These registers can only be accessed using SSE instructions, and their contents are usually made up of several smaller operands. In this particular case, the XORPS instruction XORs the entire contents of the first SSE register with the second SSE register. Because XORPS is XORing a value with itself, it is essentially setting the value of XMM0 to zero.

The FSTP instruction that comes next stores the value from the top of the floating-point stack into [ESP+34]. As you can see from the DWORD PTR that precedes the address, the instruction treats the memory address as a 32-bit location, and will convert the value to a 32-bit floating-point representation. As a reminder, the value currently stored at the top of the floating-point stack is the result of the earlier division operation.

The Decryption Loop

At this point, we enter into what is clearly a loop that continuously reads and decrypts additional clusters using 00401030, hashes that data using CryptHashData, and writes the block to the file that was opened earlier using the WriteFile API.

At this point, you can also easily see what all of this floating-point business was about. With each cluster that is decrypted Cryptex is printing an accurate floating-point number that shows the percentage of the file that has been written so far. By dividing 100.0 by the total number of clusters earlier, Cryptex simply determined a step size by which it will increment the current completed percentage after each written cluster.

One thing that is interesting is how Cryptex knows which cluster to read next. Because Cryptex supports deleting files from archives, files are not guaranteed to be stored sequentially within the archive. Because of this, Cryptex always reads the next cluster index from 00405050 and passes that to 00401030 when reading the next cluster. 00405050 is the beginning of the currently active cluster buffer. This indicates that, just like in the file list, the first DWORD in a cluster contains the next cluster index in the current chain. One interesting aspect of this design is revealed in the following lines.

Deciphering File Formats 239

00401DBC CMP EDI,1 00401DBF MOV EAX,0FFC

00401DC4 JA SHORT cryptex.00401DCB 00401DC6 MOV EAX,DS:[405050] 00401DCB ...

At any given moment during this loop EDI contains the number of clusters left to go. When there is more than one cluster to go (EDI > 1), the number of bytes to be read (stored in EAX) is hard-coded to 0xFFC (4092 bytes), which is probably just the maximum number of bytes in a cluster. When Cryptex writes the last cluster in the file, it takes the number of bytes to write from the first DWORD in the cluster—the very same spot where the next cluster index is usually stored. Get it? Because Cryptex knows that this is the last cluster, the location where the next cluster index is stored is unused, so Cryptex uses that location to store the actual number of bytes that were stored in the last cluster. This is how Cryptex works around the problem of not directly storing the actual file size but merely storing the number of clusters it uses.

Verifying the Hash Value

After the final cluster is decrypted and written into the extracted file, Cryptex calls CryptGetHashParam to recover the MD5 hash value that was calculated out of the entire decrypted data. This is compared against that 16-bytes sequence that was returned from 004017B0 (recall that these 16-bytes were retrieved from the file’s entry in the file table). If there’s a mismatch Cryptex prints an error message saying the file is corrupted. Clearly the MD5 hash is used here as a conventional checksum; for every file that is encrypted an MD5 hash is calculated, and Cryptex verifies that the data hasn’t been tampered with inside the archive.

The Big Picture

At this point, we have developed a fairly solid understanding of the .crx file format. This section provides a brief overview of all the information gathered in this reversing session. You have deciphered the meaning of most of the

.crx fields, at least the ones that matter if you were to write a program that views or dumps an archive. Figure 6.2 illustrates what you know about the Cryptex header.

The Cryptex header comprises a standard 8-byte signature that contains the string CrYpTeX9. The header contains a 16-byte MD5 checksum that is used for confirming the user-supplied password. Cryptex archives are encrypted using a Crypto-API implementation of the triple-DES algorithm. The tripleDES key is generated by hashing the user-supplied password using the SHA

240Chapter 6

algorithm and treating the resulting 160-bit hash as the key. The same 160-bit key is hashed again using the MD5 algorithm and the resulting 16-byte hash is the one that ends up in the Cryptex header—it looks as if the only reason for its existence is so that Cryptex can verify that the typed password matches the one that was used when the archive was created.

You have learned that Cryptex archives are divided into fixed-sized clusters. Some clusters contain file list information while others contain actual file data. Information inside Cryptex archives is always managed on a cluster level; there are apparently no bigger or smaller chunks that are supported in the file format. All clusters are encrypted using the triple-DES algorithm with the key derived from the SHA hash; this applies to both file list clusters and actual file data clusters. The actual size of a single cluster is 4,104 bytes, yet the actual content is only 4,092 bytes. The first 4 bytes in a cluster generally contain the index of the next cluster (yet there are several exceptions), so that explains the 4,096 bytes. We have not been able to determine the reason for those extra 8 bytes that make up a cluster.

The next interesting element in the Cryptex archive is the file list data structure. A file list is made up of one or more clusters, and each cluster contains 26 file entries. Figure 6.3 illustrates what is known about a single file entry.

Cryptex File Header Structure

Signature1 ()

Offset +00

Signature2 ()

Offset +04

Unknown

Offset +08

First File-List Cluster

Offset +0C

Unknown

Offset +10

Unknown

Offset +14

 

Offset +18

Password Hash

Offset +1C

Offset +20

 

 

Offset +24

Figure 6.2 The Cryptex header.