Eilam E.Reversing.Secrets of reverse engineering.2005
.pdf
Deciphering File Formats 201
Using Cryptex
Before actually starting to reverse Cryptex, let’s play with it a little bit so you can learn how it works. In general, it is important to develop a good understanding of a program and its user interface before attempting to reverse it. In a commercial product, you would be reading the user manual at this point.
Cryptex is a console-mode application, which means that it doesn’t have any GUI—it is operated using command-line options, and it provides feedback through a console window. In order to properly launch Cryptex, you’ll need to open a Command Prompt window and run Cryptex.exe within it. The best way to start is by simply running Cryptex.exe without any com- mand-line options. Cryptex displays a welcome screen that also includes its “user’s manual”—a quick reference for the supported commands and how they can be used. Listing 6.1 shows the Cryptex welcome and help screen.
Cryptex 1.0 - Written by Eldad Eilam
Usage: Cryptex <Command> <Archive-Name> <Password> [FileName]
Supported Commands:
‘a’, ‘e’: Encrypts a file. Archive will be created if it doesn’t
|
already |
exist. |
|
|
‘x’, ‘o’: Decrypts a file. File |
will |
be decrypted into the current |
||
|
directory. |
|
|
|
‘l’ |
: Lists all files in the specified archive. |
|||
‘d’, ‘r’: Deletes |
the specified |
file |
from the archive. |
|
Password is an unlimited-length string that can contain any combination of letters, numbers, and symbols. For maximum security it is recommended that the password be made as long as possible and that it be made up of a random sequence of many different characters, digits, and symbols. Passwords are case-sensitive. An archive’s password is established while it
is created. It cannot be changed afterwards and must be specified whenever that particular archive is accessed.
Examples:
Encrypting a file: “Cryptex a MyArchive s8Uj~ c:\mydox\myfile.doc” Encrypting multiple files: “Cryptex a MyArchive s8Uj~ c:\mydox\*.doc” Decrypting a file: “Cryptex x MyArchive s8Uj~ file.doc”
Listing the contents of an archive: “Cryptex l MyArchive s8Uj~” Deleting a file from an archive: “Cryptex d MyArchive s8Uj~ myfile.doc”
Listing 6.1 Cryptex.exe’s welcome screen.
202 Chapter 6
Cryptex is quite straightforward to use, with only four supported commands. Files are encrypted using a user-supplied password, and the program supports deleting files from the archive and extracting files from it. It is also possible to add multiple files with one command using wildcards such as *.doc.
There are several reasons that could justify deciphering the file format of a program such as Cryptex. First of all, it is the only way to evaluate the level of security offered by the product. Let’s say that an organization wants to use such a product for archiving and transmitting critical information. Should they rely on the author’s guarantees regarding the product’s security level? Perhaps the author has installed some kind of a back door that would allow him or her to easily decrypt any file created by the program? Perhaps the program is poorly written and employs some kind of a home-made, trivial encryption algorithm. Perhaps (and this is more common than you would think) the program incorrectly uses a strong, industry-standard encryption algorithm in a way that compromises the security of the encrypted files.
File formats are also frequently reversed for compatibility and interoperability purposes. For instance, consider the (very likely) possibility that Cryptex became popular to the point where other software vendors would be interested in adding Cryptex-compatibility to their programs. Unless the .crx Cryptex file format was published, the only way to accomplish this would be by reversing the file format. Finally, it is important to keep in mind that the data reverse-engi- neering journey we’re about to embark on is not specifically tied to file formats; the process could be easily applied to networking protocols.
Reversing Cryptex
How does one begin to reverse a file format? In most cases, the answer is to create simple, tiny files that contain known, easy-to-spot values. In the case of Cryptex, this boils down to creating one or more small archives that contain a single file with easily recognizable contents.
This approach is very helpful, but it is not always going to be feasible. For example, with some file formats you might only have access to code that reads from the file, but not to the code that generates files using that format. This would greatly increase the complexity of the reversing process, because it would limit our options. In such cases, you would usually need to spend significant amounts of time studying the code that reads your file format. In most cases, a thorough analysis of such code would provide most of the answers.
Luckily, in this particular case Cryptex lets you create as many archives as you please, so you can freely experiment. The best idea at this point would be to take a simple text file containing something like a long sequence of a single character such as “*****************************” and to encode it
Deciphering File Formats 203
into an archive file. Additionally, I would recommend trying out some long and repetitive password, to try and see if, God forbid, the password is somehow stored in the file. It also makes sense to quickly scan the file for the original name of the encrypted file, to see if Cryptex encrypts the actual file table, or just the actual file contents. Let’s start out by creating a tiny file called asterisks.txt, and fill it with a long sequence of asterisks (I created a file about 1K long). Then proceed to creating a Cryptex archive that contains the asterisks.txt file. Let’s use the string 6666666666 as the password.
Cryptex a Test1 6666666666 asterisks.txt
Cryptex provides the following feedback.
Cryptex 1.0 - Written by Eldad Eilam
Archive “Test1.crx” does not exist. Creating a new archive.
Adding file “asterisks.txt” to archive “Test1”.
Encrypting “asterisks.txt” - 100.00 percent completed.
Interestingly, if you check the file size for Test1.crx, it is far larger than expected, at 8,248 bytes! It looks as if Cryptex archives have quite a bit of overhead—you’ll soon see why that is. Before actually starting to look inside the file, let’s ask Cryptex to show its contents, just to see how Cryptex views it. You can do this using the L command in Cryptex, which lists the files contained in the given archive. Note that Cryptex requires the archive’s password on every command, including the list command.
Cryptex l Test1 6666666666
Cryptex produces the following output.
Cryptex 1.0 - Written by Eldad Eilam
Listing all files in archive “Test1”.
File Size |
File Name |
3K |
asterisks.txt |
Total files listed: 1 Total size: 3K
There aren’t a whole lot of surprises in this output, but there’s one somewhat interesting point: the asterisks.txt file was originally 1K and is shown here as being 3K long. Why has the file expanded by 2K? Let’s worry about that later. For now, let’s try one more thing: it is going to be interesting to see how Cryptex responds when an incorrect password is supplied and whether it always requires a password, even for a mere file listing. Run Cryptex with the following command line:
Cryptex l Test1 6666666665
204 Chapter 6
Unsurprisingly, Cryptex provides the following response:
Cryptex 1.0 - Written by Eldad Eilam
Listing all files in archive “Test1”.
ERROR: Invalid password. Unable to process file.
So, Cryptex actually confirms the password before providing the list of files. This might seem like a futile exercise, considering that the documentation explicitly said that the password is always required. However, the exact text of the invalid-password message is useful because you can later look for the code that displays it in the program and try to determine how it establishes whether or not the password is correct.
For now, let’s start looking inside the Cryptex archive files. For this purpose any hex dump tool would do just fine—there are quite a few free products online, but if you’re willing to invest a little money in it, Hex Workshop is one of the more powerful data-reversing tools. Here are the first 64 bytes of the Test1.crx file just produced.
00000000 |
4372 |
5970 |
5465 |
5839 |
0100 |
0000 |
0100 |
0000 |
CrYpTeX9 |
........ |
|
00000010 |
0000 |
0000 |
0200 |
0000 |
5F60 |
43BC |
26F0 |
F7CA |
........ |
|
_'C.&... |
00000020 |
6816 |
0D2B 99E7 |
FA61 BEB1 DA78 |
C0F6 4D89 |
h..+... |
a |
...x..M. |
||||
00000030 |
7CC7 82E8 |
01F5 |
3CB9 549D |
2EC9 |
868F |
1FFD |
|..... |
<.T....... |
|||
Like most file formats, .crx files start out with a signature, CrYpTeX9 in this case, followed by what looks like several data fields, and continuing into an apparently random byte sequence starting at address 0x18. If you look at the rest of the file, it all contains similarly unreadable junk. This indicates that the entire contents of the file have been encrypted, including the file table. As expected, none of the key strings such as the password, the asterisks.txt file name, or the actual asterisks can be found within this file. As further evidence that the file has been encrypted, we can use the Character Distribution feature in Hex Workshop to get an overview of the data within the file. Interestingly, we discover that the file contains seemingly random data, with an almost equal character distribution of about 0.4 percent for each of the 256 characters. It looks like the encryption algorithm applied by Cryptex has completely eliminated any obvious resemblance between the encrypted data and the password, file name, or file contents.
At this point, it becomes clear that you’re going to have to dig into the program in order to truly decipher the .crx file format. This is exactly where conventional code reversing and data reversing come together: you must look inside the program in order to see how it manages its data. Granted, this program is an extreme example because the data is encrypted, but even with programs that don’t intentionally hide the contents of their file formats, it is often very difficult to decipher a file format by merely observing the data.
Deciphering File Formats 205
The first step you must take in order to get an overview of Cryptex and how it works is to obtain a list of its imported functions. This can be done using any executable dumping tool such as those discussed in Chapter 4; I often choose Microsoft’s DUMPBIN, which is a command-line tool. The import list is important because it will provide us with an overview of how Cryptex does some of the things that it does. For example, how does it read and write to the archive files? Does it use a section object, does it call into some kind of runtime library file I/O functions, or does it directly call into the Win32 file I/O APIs?
Establishing which system (and other) services the program utilizes is critical because in order to track Cryptex’s I/O accesses (which is what you’re going to have to do in order to find the logic that generates and deciphers .crx files) you’re going to have to place breakpoints on these function calls. Listing 6.2 provides (abridged) DUMPBIN output that lists imports from Cryptex.exe.
KERNEL32.dll |
|
138 |
GetCurrentDirectoryA |
D3 |
FindNextFileA |
1B1 |
GetStdHandle |
15C |
GetFileSizeEx |
12F |
GetConsoleScreenBufferInfo |
2E5 |
SetConsoleCursorPosition |
2E |
CloseHandle |
4D |
CreateFileA |
303 |
SetEndOfFile |
394 |
WriteFile |
2A9 |
ReadFile |
169 |
GetLastError |
C9 |
FindFirstFileA |
30E |
SetFilePointer |
13B |
GetCurrentProcessId |
13E |
GetCurrentThreadId |
1C0 |
GetSystemTimeAsFileTime |
1D5 |
GetTickCount |
297 |
QueryPerformanceCounter |
177 |
GetModuleHandleA |
AF |
ExitProcess |
ADVAPI32.dll |
|
8C |
CryptDestroyKey |
A0 |
CryptReleaseContext |
8A |
CryptDeriveKey |
88 |
CryptCreateHash |
9D |
CryptHashData |
|
|
Listing 6.2 A list of all functions called from Cryptex.EXE, produced using DUMPBIN.
(continued)
206 Chapter 6
99 |
CryptGetHashParam |
8B |
CryptDestroyHash |
8F |
CryptEncrypt |
89 |
CryptDecrypt |
85 |
CryptAcquireContextA |
MSVCR71.dll |
|
CA |
_c_exit |
FA |
_exit |
4B |
_XcptFilter |
CD |
_cexit |
7C |
__p___initenv |
C2 |
_amsg_exit |
6E |
__getmainargs |
13F |
_initterm |
9F |
__setusermatherr |
BB |
_adjust_fdiv |
82 |
__p__commode |
87 |
__p__fmode |
9C |
__set_app_type |
6B |
__dllonexit |
1B8 |
_onexit |
DB |
_controlfp |
F1 |
_except_handler3 |
9B |
__security_error_handler |
300 |
sprintf |
305 |
strchr |
2EC |
printf |
297 |
exit |
30F |
strncpy |
1FE |
_stricmp |
|
|
Listing 6.2 (continued)
Let’s go through each of the modules in Listing 6.2 and examine what it’s revealing about how Cryptex works. Keep in mind that not all of these entries are directly called by Cryptex. Most programs statically link with other libraries (such as runtime libraries), which make their own calls into the operating system or into other DLLs.
The entries in KERNEL32.dll are highly informative because they’re telling us that Cryptex apparently uses direct calls into Win32 File I/O APIs such as CreateFile, ReadFile, WriteFile, and so on. The following section in Listing 6.2 is also informative and lists functions called from the ADVAPI32.dll module. A quick glance at the function names reveals a very important detail about Cryptex: It uses the Windows Crypto API (this is easy to spot with function names such as CryptEncrypt and CryptDecrypt).
Deciphering File Formats 207
The Windows Crypto API is a generic cryptographic library that provides support for installable cryptographic service providers (CSPs) and can be used for encrypting and decrypting data using a variety of cryptographic algorithms. Microsoft provides several CSPs that aren’t built into Windows and support a wide range of symmetric and asymmetric cryptographic algorithms such as DES, RSA, and AES. The fact that Cryptex uses the Crypto API can be seen as good news, because it means that it is going to be quite trivial to determine which encryption algorithms the program employs and how it produces the encryption keys. This would have been more difficult if Cryptex were to use a built-in implementation of the encryption algorithm because you would have to reverse it to determine exactly which algorithm it is and whether it is properly implemented.
The next entry in Listing 6.2 is MSVCR71.DLL, which is the Visual C++ runtime library DLL. In this list, you can see the list of runtime library functions called by Cryptex. This doesn’t really tell you much, except for the presence of the printf function, which is used for printing messages to the console window. The printf function is what you’d look at if you wanted to catch moments where Cryptex is printing certain messages to the console window.
The Password Verification Process
One basic step that is relatively simple and is likely to reveal much about how Cryptex goes about its business is to find out how it knows whether or not the user has typed the correct password. This will also be a good indicator of whether or not Cryptex is secure (depending on whether the password or some version of it is actually stored in the archive).
Catching the “Bad Password” Message
The easiest way to go about checking Cryptex’s password verification process is to create an archive (Test1.crx from earlier in this chapter would do just fine), and to start Cryptex in a debugger, feeding it with an incorrect password. You would then try to catch the place in the code where Cryptex notifies the user that a bad password has been supplied. This is easy to accomplish because you know from Listing 6.2 that Cryptex uses the printf runtime library function. It is very likely that you’ll be able to catch a printf call that contains the “bad password” message, and trace back from that call to see how Cryptex made the decision to print that message.
Start by loading the program in any debugger, preferably a user-mode one such as WinDbg or OllyDbg (I personally picked OllyDbg), and placing a breakpoint on the printf function from MSVCR71.DLL. Notice that unlike the previous reversing session where you relied exclusively on dead listing,
208Chapter 6
this time you have a real program to work with, so you can easily perform this reversing session from within a debugger.
Before actually launching the program you must also set the launch parameters so that Cryptex knows which archive you’re trying to open. Keep in mind that you must type an incorrect password, so that Cryptex generates its incorrect password message. As for which command to have Cryptex perform, it would probably be best to just have Cryptex list the files in the archive, so that nothing is actually written into the archive (though Cryptex is unlikely to change anything when supplied with a bad password anyway). I personally used Cryptex l test1 6666666665, and placed a breakpoint on printf from the MSVCR71.DLL (using the Executable Modules window in OllyDbg and then listing its exports in the Names window).
Upon starting the program, three calls to printf were caught. The first contained the Cryptex 1.0 . . . message, the second contained the Listing all file . . .
message, and the third contained what you were looking for: the ERROR: Invalid password . . . string. From here, all you must do is jump back to the caller and hopefully locate the logic that decides whether to accept or reject the password that was passed in. Once you hit that third printf, you can use Ctrl+F9 in Olly to go to the RET instruction that will take you directly into the function that made the call to printf. This function is given in Listing 6.3.
004011C0 |
PUSH ECX |
|
004011C1 |
PUSH ESI |
|
004011C2 |
MOV ESI,SS:[ESP+C] |
|
004011C6 |
PUSH 0 |
; Origin = FILE_BEGIN |
004011C8 |
PUSH 0 |
; pOffsetHi = NULL |
004011CA |
PUSH 0 |
; OffsetLo = 0 |
004011CC |
PUSH ESI |
; hFile |
004011CD |
CALL DS:[<&KERNEL32.SetFilePointer>] |
|
004011D3 |
PUSH 0 |
; pOverlapped = NULL |
004011D5 |
LEA EAX,SS:[ESP+8] |
|
004011D9 |
PUSH EAX |
; pBytesRead |
004011DA |
PUSH 28 |
; BytesToRead = 28 (40.) |
004011DC |
PUSH cryptex.00406058 |
; Buffer = cryptex.00406058 |
004011E1 |
PUSH ESI |
; hFile |
004011E2 |
CALL DS:[<&KERNEL32.ReadFile>] |
; ReadFile |
004011E8 |
TEST EAX,EAX |
|
004011EA |
JNZ SHORT cryptex.004011EF |
|
004011EC |
POP ESI |
|
004011ED |
POP ECX |
|
004011EE |
RETN |
|
004011EF CMP DWORD PTR DS:[406058],70597243 |
||
004011F9 |
JNZ SHORT cryptex.0040123C |
|
004011FB |
CMP DWORD PTR DS:[40605C],39586554 |
|
|
|
|
Listing 6.3 Cryptex’s header-verification function that reads the Cryptex archive header and checks the supplied password.
Deciphering File Formats 209
00401205 |
JNZ SHORT cryptex.0040123C |
|
00401207 |
PUSH EDI |
|
00401208 |
MOV ECX,4 |
|
0040120D |
MOV EDI,cryptex.00405038 |
|
00401212 |
MOV ESI,cryptex.00406070 |
|
00401217 |
XOR EDX,EDX |
|
00401219 |
REPE CMPS DWORD PTR ES:[EDI],DWORD PTR DS:[ESI] |
|
0040121B |
POP EDI |
|
0040121C |
JE SHORT cryptex.00401234 |
|
0040121E |
PUSH cryptex.00403170 |
; format = “ERROR: Invalid |
|
|
password. Unable to process |
|
|
file.” |
00401223 |
CALL DS:[<&MSVCR71.printf>] |
; printf |
00401229 |
ADD ESP,4 |
|
0040122C |
PUSH 1 |
; status = 1 |
0040122E |
CALL DS:[<&MSVCR71.exit>] |
; exit |
00401234 |
MOV EAX,1 |
|
00401239 |
POP ESI |
|
0040123A |
POP ECX |
|
0040123B |
RETN |
|
0040123C |
PUSH cryptex.0040313C |
; format = “ERROR: Invalid |
|
|
Cryptex9 signature in file |
|
|
header!” |
00401241 |
CALL DS:[<&MSVCR71.printf>] |
; printf |
00401247 |
ADD ESP,4 |
|
0040124A |
PUSH 1 |
; status = 1 |
0040124C |
CALL DS:[<&MSVCR71.exit>] |
; exit |
|
|
|
Listing 6.3 (continued)
It looks as if the function in Listing 6.3 performs some kind of header verification on the archive. It starts out by moving the file pointer to zero (using the SetFilePointer API), and proceeds to read the first 0x28 bytes from the archive file using the ReadFile API. The header data is read into a data structure that is stored at 00406058. It is quite easy to see that this address is essentially a global variable of some sort (as opposed to a heap or stack address), because it is very close to the code address itself. A quick look at the Executable Modules window shows us that the program’s executable, Cryptex.exe was loaded into 00400000. This indicates that 00406058 is somewhere within the Cryptex.exe module, probably in the data section (you could verify this by checking the module’s data section RVA using an executable dumping tool, but it is quite obvious).
The function proceeds to compare the first two DWORDs in the header with the hard-coded values 70597243 and 39586554. If the first two DWORDs don’t match these constants, the function jumps to 0040123C and displays the message ERROR: Invalid Cryptex9 signature in file header!. A
210Chapter 6
quick check shows that 70597243 is the hexadecimal value for the characters CrYp, and 39586554 for the characters TeX9. Cryptex is simply verifying the header and printing an error message if there is a mismatch.
The following code sequence is the one you’re after (because it decides whether the function returns 1 or prints out the bad password message). This sequence compares two 16-byte sequences in memory and prints the error message if there is a mismatch. The first sequence starts at 00405038 and is another global variable whose contents are unknown at this point. The second data sequence starts at 00406070, which is a part of the header global variable you looked at before, that starts at 00406058. This is apparent because earlier ReadFile was reading 0x28 bytes into this address—00406070 is only 0x18 bytes past the beginning, so there are still 0x10 (or 16 in decimal) bytes left in this buffer.
The actual comparison is performed using the REPE CMPS instruction, which repeatedly compares a pair of DWORDs, one pointed at by EDI and the other by ESI, and increments both index registers after each iteration. The number of iterations depends on the value of ECX, and in this case is set to 4, which means that the instruction will compare four DWORDs (16 bytes) and will jump to 00401234 if the buffers are identical. If the buffers are not identical execution will flow into 0040121E, which is where we wound up.
The obvious question at this point is what are those buffers that Cryptex is comparing? Is it the actual passwords? A quick look in OllyDbg reveals the contents of both buffers. The following is the contents of the global variable at 00405038 with whom we are comparing the archive’s header buffer:
00405038 1F 79 A0 18 0B 91 0D AC A2 0B 09 7B 8D B4 CF 0E
The buffer that originated in the archive’s header contains the following:
00406070 5F 60 43 BC 26 F0 F7 CA 68 16 0D 2B 99 E7 FA 61
The two are obviously different, and are also clearly not the plaintext passwords. It looks like Cryptex is storing some kind of altered version of the password inside the file and is comparing that with what must be an altered version of the currently typed password (which must have been altered with the exact same algorithm in order for this to work). The interesting questions are how are passwords transformed, and is that transformation secure—would it be somehow possible to reconstruct the password using only that altered version? If so, you could extract the password from the archive header.
The Password Transformation Algorithm
The easiest way to locate the algorithm that transforms the plaintext password into this 16-byte sequence is to place a memory breakpoint on the global variable
