 
        
        Assembly Language Step by Step 1992
.pdfwould make the individual .ASM files smaller and much easier to understand.
The other reason is to avoid assembling completed portions of the program every time any part of the program is assembled. One thing you'll be doing is writing assembly language procedures, small detours from the main run of steps and tests that can be taken from anywhere within the assembly language program. Once you write and perfect a procedure, you can tuck it away in an .ASM file with other completed procedures, assemble it, and then simply link the resulting .OBJ file into the "working" .ASM file. The alternative is to waste time by reassembling perfected source code over and over again every time you assemble the main portion of the program.
Notice that in the upper-right corner of Figure 3.5 is a row of .OBJ files. These .OBJ files were assembled earlier from correct .ASM files, yielding binary disk files containing ready-to-go machine instructions. When the linker links the .OBJ file produced from your in-progress .ASM file, it adds in the previously assembled .OBJ files, which are called modules. The single .EXE file that the linker writes to disk contains the machine instructions from all of the .OBJ files handed to the linker when then linker is invoked.
Once the in-progress .ASM file is completed and made correct, its .OBJ module can be put up on the rack with the others, and added to the next in-progress .ASM source code file. Little by little you construct your application program out of the modules you build one at a time.
A very important bonus is that some of the procedures in an .OBJ module may be used in a future assembly language program that hasn't even been begun yet. Creating such libraries of toolkit procedures can be an extraordinar-ily effective way to save time by reusing code over and over again, without even passing it through the assembler again! Something to keep in mind is that the linker must be used even when you have only one
.OBJ file. Connecting multiple modules is only one of many essential things the linker does. To produce an .EXE file, you must invoke the linker, even if your program is a little thing contained in only one .ASM and hence one .OBJ file.
Invoking the linker is again done from the DOS command line. Each assembler typically has its own linker. MASM's linker is called LINK, and TASM's is called TLINK. Like the assembler, the linker understands a suite of commands and directives that I can't describe exhaustively here. Read your assembler manuals carefully.
For single-module programs, however, there's nothing complex to be do Linking our hypothetical FOO.OBJ object file into an .EXE file using TLINK ' done by entering
C:\ASM>TLINK FOO at the DOS prompt.
If you're using MASM, using LINK is done much the same way. Again, as with MASM,
you need to place a semicolon at the end of the command to avoid a series of questions about various linker defaults (for example, C:\ASM>LINK FOO;)
Linking multiple files involves naming each file on the command line. With TLINK, you simply name each .OBJ file on the command line after the word TLINK, with a space between each filename. You do not have to include the .OBJ extension—TLINK assumes that all modules to be linked end in .OBJ:
C:\ASM>TLINK FOO BAR BAS
Under MASM, you do the same thing, except that you place a plus sign (+) between each of the .OBJ filenames:
C:\ASM>LINK FOO+BAR+BAS
In both cases, the name of the .EXE file produced will be the name of the first .OBJ file named, with the .EXE extension replacing the .OBJ extension.
Linker Errors
As with the assembler, the linker may discover problems as it weaves multiple .OBJ files together into a single .EXE file. Linker errors are subtler than assembler errors and are usually harder to find. Fortunately, they are rarer and not as easy to make.
As with assembler errors, when you are presented with a linker error you have to return to the editor and figure out what the problem is. Once you've identified the problem (or think you have) and changed something in the source code file to fix the problem, you must reassemble and relink the program to see if the linker error went away. Until it does, you have to loop back to the editor, try something else, and assemble/link once more.
If possible, avoid doing this by trial and error. Read your assembler and linker manuals. Understand what you're doing. The more you understand about what's going on within the assembler and the linker, the easier it will to determine who or what is giving the linker fits.
Testing the .EXE File
If you receive no linker errors, the linker will create and fill a single .EXE file with the
machine instructions present in all of the .OBJ files named on the linker command line. The .EXE file is your executable program. You can run it by simply naming it on the DOS command line and pressing Enter:
C:\ASM>FOO
When you invoke your program in this way, one of two things will happen: the program will work as you intended it to, or you'll be confronted with the effects of one or more program bugs. A bug is anything in a program that doesn't work the way you want it to. This makes a bug somewhat more subjective than an error. One person might think red characters displayed on a blue background is a bug, while another might consider it a clever New Age feature and be quite pleased. Settling bug vs. feature conflicts like this is up to you. Consensus is called for here, with fistfights only as a last resort.
There are bugs and there are bugs. When working in assembly language, it's quite common for a bug to completely "blow the machine away," which is less violent than some think. A system crash is what you call it when the machine sits there mutely, and will not respond to the keyboard. You may have to press Ctrl+Alt+Delete to reboot the system, or (worse) have to press the reset button, or even power down and then power up again. Be ready for this—it will happen to you, sooner and oftener than you will care for. Figure 3.5 announces the exit of the assembly language development process as happening when your program works perfectly. A very serious question is this: How do you know when it works perfectly? Simple programs assembled while learning the language may be easy enough to test in a minute or two. But any program that accomplishes anything useful will take hours of testing at minimum. A serious and ambitious application could take weeks—or months—to test thoroughly. A program that takes various kinds of input values and produces various kinds of output should be tested with as many different combinations of input values as possible, and you should examine every possible output every time.
Even so, finding every last bug is considered by some to be an impossible ideal. Perhaps—but you should strive to come as close as possible, in as efficient a fashion as you can manage. I'll have a lot more to say about bugs and debugging throughout the rest of this book.
Errors Versus Bugs
In the interest of keeping the Babel-effect at bay, I think it's important to carefully draw
the distinction between errors and bugs. An error is something wrong with your source code file that either the assembler or the linker kick out as unacceptable. An error prevents the assembly or link process from going to completion, and will thus prevent a final .EXE file from being produced.
A bug, by contrast, is a problem discovered during execution of a program Under DOS. Bugs are not deferred by either the assembler or the linker can be benign, such as a misspelled word in a screen message or a line positioned on the wrong screen row; or a bug can make your DOS session run off into the bushes and not come back.
Both errors and bugs require that you go back to the text editor and change something in your source code file. The difference here is that most errors are reported with a line number telling you where to go in your source code file to fix the problem. Bugs, on the other hand, are left as an exercise for the student. You have to hunt them down, and neither the assembler nor the linker will give you much in the line of clues.
Debuggers and Debugging
The final, and almost certainly the most painful part of the assembly language development process is debugging. Debugging is simply the systematic process by which bugs are located and corrected. A debugger is a utility program designed specifically to help you locate and identify bugs.
Debugger programs are among the most mysterious and difficult to under-stand of all programs. Debuggers are part X-ray machine and part magnifying glass. A debugger loads into memory with your program and remains in memory, side by side with your program. The debugger then puts tendrils down into both DOS and into your program, and enables some truly peculiar things to be done.
One of the problems with debugging computer programs is that they operate so quickly. Thousands of machine instructions can be executed in a single second, and if one of those instructions isn't quite right, it's long gone before you can identify which one it is by staring at the screen. A debugger allows you to execute the machine instructions in a program one at a time, allowing you to pause indefinitely between each one to examine the effects of the last instruction on the screen. The debugger also lets you look at the contents of any location in memory, and the values stored in any register, during that pause between instructions.
As mentioned previously, both MASM and TASM are packaged with their own advanced debuggers. MASM's CodeView and TASM's Turbo Debugger are brutally powerful (and hellishly complicated) creatures that require manuals considerably thicker
than this book. For this reason, I won't try to explain how to use either CodeView or Turbo Debugger.
Very fortunately, every copy of DOS is shipped with a more limited but perfectly good debugger called DEBUG. DEBUG can do nearly anything that a beginner would want from a debugger, and in this book we'll do our debugging with DEBUG.
3.5 DEBUG and How to Use It
The assembler and the linker are rather single-minded programs. As translators, they do only one thing: translate. This involves reading data from one file and writing a translation of that data into another file.
That's all a translator needs to do. The job isn't necessarily an easy thing for the translator to do, but it's easy to describe and understand. Debuggers, by contrast, are like the electrician's little bag of tools—they do lots of different things in a great many different ways, and take plenty of explanation and considerable practice to master.
In this chapter I'll introduce you to DEBUG, a program that will allow you to single step your assembly language programs and examine their and the machine's innards between each and every machine instruction. This section is only an introduction—DEBUG is learned best by doing, and you'll be both using and learning DEBUG's numerous powers all through the rest of this book. By providing you with an overview of what DEBUG does here, you'll be more capable of integrating its features into your general understanding of assembly language development process as we examine it through the rest of the book.
DEBUG's Bag of Tricks
It's well worth taking a page or so simply to describe what sorts of things DEBUG can do before actually showing you how they're done. It's actually quite a list:
• Display or change memory and files. Your programs will both exist in and affect memory, and DEBUG can show you any part of memory—which implies that it can show you any part of any program or binary file as well. DEBUG displays memory as a series of hexadecimal values, with a corresponding display of any printable ASCII characters to the right of the values. We'll show you some examples a little later on. In addition to seeing the contents of memory, you can change those contents as well. And, if the contents of memory represent a file, you can write the changed file back out to disk.
•Display or change the contents of all CPU registers. CPU registers allow you to work very quickly, and you should use them as much as you can. You need to see what's going on in the registers while you use them, and with one command, DEBUG can display the contents of all machine registers and flags at one time. If you want to change the contents of a register while stepping through a program's machine instructions, you can do that as well.
•Fill a region of memory with a single value. If you have an area of memory that you want "blanked out," DEBUG will allow you to fill that area of memory with any character or binary value.
•Search memory for sequences of binary values. You can search any area of memory for a specific sequence of characters or binary value, including names stored in memory or sequences of machine instructions. You can then examine or change something that you know exists somewhere in memory but not where.
•Assemble new machine instructions into memory. DEBUG contains a simple assembler that does much of what MASM and TASM can do—one machine instruction at a time. If you want to replace a machine instruction somewhere within your program, you can type MOV AX,BX rather than having to look up and type 8BH OC3H.
•"Un-assemble" binary machine instructions into their mnemonics and operands.
The flipside of the last feature is also possible: DEBUG can take the two hexadecimal
values 8BH and OC3H and tell you that they represent the assembly language mnemonic MOV AX,BX. This feature is utterly essen-tial when you need to trace a program in operation and understand what is happening when the next two bytes in memory are read into the CPU and executed. If you don't know what machine instruction those two bytes represent, you'll be totally lost.
• Single step a program under test. Finally, DEBUG's most valuable skill is to run a program one machine instruction at a time, pausing between each instruction. During this pause you can look at or change memory, look at or change registers, search for things in memory, "patch" the program by replacing existing machine instructions with new ones, and so on. This is what you'll do most of the time with DEBUG.
Taking DEBUG for a Spin
DEBUG can be a pretty forbidding character, terse to the point of being mute. You'll be spending a lot of time standing on DEBUG's shoulders and looking around, however, so you'd best get used to him now.
The easiest way to start is to use DEBUG to load a file into memory and examine it. On
the listings disk associated with this book is a file called SAM.TXT. It's an ordinary DOS text file. (Its contents were used to demonstrate the line structuring of text files with CR and LF in Figure 3.1.) If you don't have the listings disk, you can simply load your text editor and enter the following lines:
Sam was a
man.
Make sure you press Enter after the period at the end of "man." Then save the file to disk as SAM.TXT.
Let's lay SAM out on DEBUG's dissection table and take a look at his innards. DEBUG will load itself and the file of your choice into memory at the same time, with only one command. Type DEBUG followed by the name of the file you want to load, as in the following example:
C:\ASM>DEBUG SAM.TXT
Make sure you use the full filename. Some programs like MASM and TASM will allow you to use only the first part of the filename and assume a file extension like .ASM, but DEBUG requires the full filename.
Like old Cal Coolidge, DEBUG doesn't say much, and never more than he has to. Unless DEBUG can't find SAM.TXT, all it will respond with is a single dash character (-) as its prompt, indicating that all is well and that DEBUG is awaiting a command.
Looking at a Hex Dump
Looking at SAM.TXT's interior is easy. Just type a D at the dash prompt. (Think dump.) DEBUG will obediently display a hex dump of the first 128 bytes of memory containing the contents of SAM.TXT read from disk. The hexadecimal numbers will probably look bewilderingly mysterious, but to their right you'll see the comforting words "Sam was a man" in a separate area of the screen. To help a little, I've taken the hex dump of SAM.TXT as you'll see it on your screen and annotated it in Figure 3.6.
This is a hex dump. It has three parts: the leftmost part on the screen is the address of the start of each line of the dump. Each line contains 16 bytes. An address has two parts, and you'll notice that the left part of the address does not change while the right part is 16 greater at the start of each succeeding line. The 86-family CPU's two-part addresses are a
source of considerable confusion and aggravation, and I'll take them up in detail in Chapter 5. For now, ignore the unchanging part of the address and consider the part that changes to be a count of the bytes on display, starting with 100H.
The part to the right of the address is the hexadecimal representation of the 128 bytes of memory being displayed. The part to the right of the hexadecimal values are those same 128 bytes of memory displayed as ASCII characters. Now, not all binary values have corresponding printable ASCII characters. Any invisible or unprintable characters are shown as period (.) characters.
This can be confusing. The last displayable character in SAM.TXT is a period, and is actually the very first character on the second line of the hex dump. The ASCII side shows four identical periods in a row. To find out what's a period and what's simply a nondisplayable character, you must look back to the hexadecimal side and recognize the ASCII code for a period, which is 2EH.
Here is a good place to point out that an ASCII table of characters and their codes is an utterly essential thing to have. Borland's Sidekick product includes a very good table, and it's always waiting in memory only a keystroke away. If you don't have Sidekick, I'd advise you to take a photocopy of the ASCII table provided in Appendix B and keep it close at hand.
Memory "Garbage"
Take a long, close look at the hexadecimal equivalents of the characters in SAM.TXT. Notice that SAM.TXT is a very short file (20 bytes), but that 128 bytes are displayed. Look for the EOF (end of file) marker on the second line.
Character 1AH is always considered the last byte of any text file. All the other bytes after the EOF marker are called "garbage," and that's pretty much what they are: random bytes that existed in memory before SAM.TXT rode in from disk. DEBUG works only from memory, and displays hex dumps of memory in 128-byte chunks by default. (You can direct DEBUG to display more bytes at a time by using some additional commands, which I won't go into here.) Only the first 20 bytes of SAM.TXT are significant information, but DEBUG obligingly shows you what's in memory well beyond the end of SAM's data.
The bytes are probably not entirely random, but instead may be part of the code or data left over from the last program to be loaded and executed in that area of memory. Because the garbage bytes fall after the EOF marker, you can safely ignore them, but should know just what they are and why they appear in your hex dump. You might
 
occasionally see recognizable data strings from other programs in memory garbage and wonder how they got into your current program.
They didn't get into your current program. They were just there, and now show through beyond the end of the file you last loaded under DEBUG. Knowing where legitimate information ends and where garbage begins is always important, and not usually as clearcut as it is here.
Changing Memory with DEBUG
DEBUG can easily change bytes in memory, whether they are part of a file loaded from disk or not. The DEBUG command to change bytes is the E command. (Think enter new
data.) You can use the E command to change some of the data in SAM.TXT. Part of this process is shown in Figure 3.6. Notice the following command line:
-e 010e
To taciturn Mr. Debug, this means, "Begin accepting entered bytes at address 010EH." I show the lower case e's used in the command to put across the point that DEBUG is not case sensitive, even for letters used as hexadecimal digits. In other words, there is nothing sacred about using uppercase A through E for hex digits. They can be lowercase or uppercase as you choose, and you don't even have to be consistent about it.
What DEBUG does in response to the E command shown in Figure 3-6 is display the following prompt:
38E3:010E 61.
The cursor waits after the period for your input. What DEBUG has done is shown you what value is already at address 010EH, so that you can decide whether you want to change it. If not, just press Enter, and the dash prompt will return.
Otherwise, enter a hexadecimal value to take the place of value 6lH. In Figure 3.6 I entered 6FH. Once you enter a replacement value, you have the choice of completing your change by pressing Enter and returning to the dash prompt; or changing the byte at the next address. If a change is your choice press the spacebar instead of pressing Enter. DEBUG will display the byte at the next highest address and wait for your replacement value, just as it did the first time.
This is shown in Figure 3.6. In fact, Figure 3.6 shows four successive replacements of bytes starting at address 010EH. Notice the lonely hex byte 0A followed by a period. What happened there is that I pressed Enter without typing a replacement byte, ending the E command and returning to the dash prompt.
You'll also note that the next command typed at the dash prompt was "q", for Quit. Typing Q at the dash prompt will return you immediately to DOS.
The Dangers of Modifying Files
Keep in mind that what I've just demonstrated was not changing a file, but simply changing the contents of a file loaded into memory. A file loaded into memory through DEBUG as we did with SAM.TXT is called a memory image of that file. Only the memory image of the file was changed. SAM.TXT remains on disk, unchanged and
