
Assembly Language Step by Step 1992
.pdfbetween a variable and a simple constant? A variable is located at one and only one particular place in memory. A simple constant, on the other hand, is a value "dropped into" the program anywhere it is used, and exists at no particular place in memory. Simple constants are used mostly in expressions calculated by the compiler during compilation.
It's the same thing here. The DW and DB directives define and set aside areas of memory for storage of data. A DW exists somewhere at some address, and only exists in one place. The EQU directive, by contrast, is a symbol you define mostly for the assembler's use. It sets aside no memory and has no particular address. Consider this line from
SHOWCHAR:
LinesDown EQU 4 |
; Number of lines down to start ASCII table |
The value defined as LinesDown exists at no single place in the SHOWCHAR program. It allocates no storage. It's actually a notation in the assembler's symbol table, telling the assembler to substitute the value 4 for the symbol LinesDown, anywhere it encounters the symbol LinesDown. The same is true of the equates for ScrnWidth and LineLen. When the assembler encounters equates in a program, it performs a simple textual substitution of the values assigned to the symbol defined in the equate. The symbol is dumped, and the value is dropped in. Then assembly continues, using the substituted values rather than the symbols. In a very real sense, the assembler is pausing to alter the source code when it processes an equate, then picks up its assembly task again. This is exactly what happens when the assembler processes a macro, by the way.
An example may help. Imagine that the assembler is assembling SHOWCHAR.ASM, when it encounters the following line:
ADD |
DI,ScrnWidth*LinesDown*2 ; Start table display down a ways |
It looks up ScrnWidth and LinesDown in its symbol table, and discovers that they are equates. It then calls time out from assembling, and processes the two equates by substituting their values into the line of source code for their text symbols. The line of source code changes to the following:
ADD |
01,80*4*2 |
; Start table display down a ways |
Assembly-Time Calculations
But in assembling the line shown above, the assembler has to pull another trick out of its hat. It has to be able to deal with the expression 80*4*2. We've not seen this before in
our discussions, but the assembler happily parses the expression and performs the math exactly as you would imagine: it cooks 80*4*2 down to the single integer value 640. It then performs another substitution on the line in question, which finally cooks down to this:
ADD |
01,640 |
; Start table display down a ways |
At last, the line becomes an utterly ordinary line of assembly-language code, which is turned to object code in a trice.
So the assembler can in fact do a little math on its own, quite apart from the arithmetic instructions supported by the CPU. This is called assembly-time math, and it has some very important limitations:
• Assembly-time calculations can only be done on values that are fixed
and unambiguous at assembly-time. This most pointedly excludes the contents of variables. Equates are fine. DBs, DWs, and DDs are not. Variables are empty containers at assembly time; just buckets into which values will be thrown at runtime. You can't perform a calculation with an empty bucket!
• Assembly-time calculations are performed once, at assembly-time, and cannot be recalculated at runtime for a different set of values. This should be obvious, but it's easy enough to misconstrue the nature of assembly-time math while you're a beginner.
Let me point out an importance consequence of the use of assembly-time math in SHOWCHAR. In SHOWCHAR, the ASCII table is displayed four lines down from the top of the screen, at the left margin. Now, what do we need to do to allow the ASCII table to be moved around the screen at runtime?
Oh, not much, just rewrite the whole thing.
I'm not being trying to be funny. That's the price you pay for the convenience of assembly-time calculation. We baked the screen position of the ASCII table into the program at the source code level, and if we wanted to parameterize the position of the ASCII table we'd have to take a whole different approach, and do what we did with RULER.ASM: use the IMUL instruction to perform the multiplication that calculates the offset into the screen buffer, at runtime.
We can change the LinesDown equate in SHOWCHAR.ASM to have a value of 6 or 10—but we then have to reassemble and relink SHOWCHAR for the change to take effect. The calculation is done only once, at assembly time. Thereafter, as long as we use the resulting .EXE file, the ASCII table will be the number of lines down the screen that we defined in the LinesDown equate.
Assembly-time calculations may not seem as useful now, in the light of these restrictions. However, they serve a purpose that may not be immediately obvious: they make it a little easier for us to read the sense in our own source code. We could have just skipped the equates and the assembly-time math, done the math in our heads and written the line of code like this:
ADD |
DI.640 |
; Start table display down a ways |
How obvious is it to you that adding 640 to DI starts the display of the table down the screen by four lines? Using equates and assembly-time math builds the screenpositioning algorithm into the source code, right there where it's used.
Equates and assembly-time math cost you nothing in terms of runtime speed or memory usage. They do slow down the assembly process a little, but the person who uses your programs never knows that—and it's the user that you want to wow with your assemblylanguage brilliance. And anything that makes your own source code easier to read and modify is well worth the minuscule extra time it takes to assemble.
Nested Instruction Loops
Once all the registers are set up correctly according to the assumptions made by STOSW, the real work of SHOWCHAR is performed by two instruction loops, one inside the other. The inner loop displays a line consisting of 64 characters. The outer loop breaks up the display into four such lines. The inner loop is by far the more interesting of the two. Here it is:
DoChar: |
STOSW |
; Note that there's no REP prefix! |
JCXZ |
AllDone |
; When the full set is printed, quit |
INC |
AL |
; Bump the character value in AL up by 1 |
DEC |
BL |
; Decrement the line counter by one |
LOOPNZ |
DoChar |
; Go back & do another char until BL goes to 0 |
The work here (putting a character/attribute pair into the video buffer) is again done by STOSW. Once again, STOSW is working solo, without REP. Without REP to pull the loop inside the CPU, you have to set the loop up yourself.
Keep in mind what happens each time STOSW fires: the character in AX is copied to ES:DI, And DI is incremented by 2. At the other end of the loop, the LOOPNZ instruction decrements CX by 1 and closes the loop.
During register setup, we loaded CX with the number of characters we wanted to display—in this case, 256. Each time STOSW fires, it places another character on the screen, and there is one less character left to display. CX acts as the master counter, keeping track of when we finally display the last remaining character. When CX goes to 0, we've displayed the full ASCII character set and the job is done.
Jumping When CX Goes to 0
Hence the instruction JCXZ. This is a special branching instruction created specifically to help with loops like this. Back in Chapter 9, I explained how it's possible to branch using one of the many variations of the JMP instruction, based on the state of one of the machine flags. Earlier in this chapter, I explained the LOOP instruction, which is a special purpose sort of a JMP instruction, one combined with an implied DEC CX instruction. JCXZ is yet another variety of JMP instruction, but one that doesn't watch any of the flags or decrement any registers. Instead, JCXZ watches the CX register. When it sees that CX has just gone to 0, it jumps to the specified label. If CX does not contain an 0 value, execution falls through to the next instruction in line.
In the case of the inner loop shown above, JCXZ branches to the "close up shop" code when it sees that CX has finally gone to 0. This is how the SHOWCHAR program terminates.
Most of the other JMP instructions have "partners" that branch when the governing flag is not true. That is, JC (Jump on Carry) branches when the Carry flag equals 1. Its partner, JNC (Jump on Not Carry), jumps if the Carry flag is not I.
However, JCXZ is a loner. There is no JCXNZ instruction, so don't go looking for one in the instruction reference!
Closing the Inner Loop
Assuming that CX has not yet been decremented down to 0 by the STOSW instruction, (a condition watched for by JCXZ) the loop continues, and AL is again incremented by 1. This is how the next ASCII character in line is selected. The value in AX is sent to the location at ES:DI by STOSW, and the character code proper is stored in AL. If you increment the value in AL, you change the displayed character to the next one in line. For example, if AL contains the value for the character A (65), incrementing AL changes the A character to a B (66) character. On the next pass through the loop, STOSW will fire a B at the screen instead of an A.
Why not just increment AX? The AH half of AX contains the attribute byte, and we do not want to change that. By explicitly incrementing AL instead of AX, we ensure that AH will never be altered.
After the character code in AL is incremented, BL is decremented. Now, BL is not directly related to the string instructions. Nothing in any of the assumptions made by the string instructions involves BL. We're using BL for something else entirely here; BL is acting as a counter that governs the length of the lines of characters shown on the screen. BL was loaded earlier with the value represented by LineLen; here, 64. On each pass through the loop, the DEC BL instruction decrements the value of BL by 1. Then the LOOPNZ instruction gets its moment in the sun.
LOOPNZ is a little bit different from our friend LOOP that we examined earlier. It's just different enough to get you into trouble if you don't truly understand how it works. Both LOOP and LOOPNZ decrement the CX register by 1: LOOP watches the state of the CX register, and closes the loop until CX goes to 0; LOOPNZ watches both the state of the CX register and the state of the Zero flag (ZF). (LOOP ignores ZF.) LOOPNZ will only close the loop if CX <> 0 and ZF = 0. In other words, LOOPNZ closes the loop only if CX still has something left in it, and if the ZF is not set.
So what exactly is LOOPNZ watching for here? Remember that immediately prior to the LOOPNZ instruction, we're decrementing BL by 1 through a DEC BL instruction. The DEC instruction always affects ZF. If DEC's operand goes to 0 as a result of the DEC instruction, ZF goes to 1 (is set). Otherwise, ZF stays at 0 (remains cleared). So in effect, LOOPNZ is watching the state of the BL register. Until BL is decremented to 0 (setting ZF) LOOPNZ closes the loop. After BL goes to 0, the inner loop is finished and execution falls through LOOPNZ to the next instruction.
What about CX? Well, LOOPNZ is watching CX—but so is JCXZ. JCXZ is actually the switch that governs when the whole loop—both inner and outer portions—have done their work and must stop. So while LOOPNZ does watch CX, somebody else is doing that task, and that somebody else will take action on CX before LOOPNZ can. LOOPNZ's job is thus to decrement CX, but to watch BL. It governs the inner of the two loops.
Closing the Outer Loop
But does that mean JCXZ closes the outer loop? No. JCXZ tells us when both loops are finished. Closing the outer loop is done a little differently from closing the inner loop.
Take another look at the two nested loops:
DoLine: |
MOV |
BL,LineLen ;Each line will consist of 64 characters |
||
DoCHar: |
STOSW |
|
;Note that there's no REP prefix! |
|
JCXZ |
AllDone |
; When the full set is printed, quit |
||
INC |
AL |
|
;Bump the character value in AL up by 1 |
|
DEC |
BL |
|
;Decrement the line counter by 1 |
|
LOOPNZ |
DoChar |
;Go back & do another char until BL goes to 0 |
||
ADD |
DI, (ScrnWidth-LineLen)*2 |
; Move DI to start of next line |
||
JMP |
DoLine |
;Start display of the next line |
The inner loop is considered complete when we've displayed one full line of the ASCII table to the screen. BL governs the length of a line, and when BL goes to 0 (which the LOOPNZ instruction detects) a line is finished. LOOPNZ then falls through to the ADD instruction that modifies DI.
We modify DI to jump from the end of a completed line to the start of the next line at the left margin. This means we have to "wrap" by some number of characters from the end of the ASCII table line to the end of the visible screen. The number of bytes this requires is noted by the assembly-time expression (ScrnWidth-LineLen)*2. This expression is basically the difference between the length of one ASCII table line and the width of the visible screen. Remember that each character position is actually represented by both a character and an attribute byte in the video refresh buffer, thus the *2 portion of the expression. The result of the expression is the number of bytes we must move into the video refresh buffer to come to the start of the next line at the left screen margin.
But after that "wrap" is accomplished by modifying DI, the outer loop's work is done, and we close the loop. This time, we do it unconditionally, by way of a simple JMP instruction. The target of the JMP instruction is the Doline label. No ifs, no arguments. At the top of the outer loop, (represented by the DoLine label) we load the length of a line back into the now-empty BL register, and drop back into the inner loop. The inner loop starts firing characters at the screen again, and will continue to do so until JCXZ detects that CX has gone to 0.
At that point, both the inner and outer loops are finished, and the full ASCII table has been displayed. SHOWCHAR's work is done, and it terminates.
SHOWCHAR.ASM Recap
Let's look back at what we've just been through. SHOWCHAR.ASM contains two nested loops. The inner loop shoots characters at the screen via STOSW. The outer loop shoots lines of characters at the screen, by repeating the inner loop some number of times (here, 4).
The inner loop is governed by the value in the BL register, which is initially set up to take the length of a line of characters (here, 64). The outer loop is not explicitly governed by the number of lines to be displayed. That is, you don't load the number 4 into a register and decrement it. Instead, the outer loop continues until the value in CX goes to 0, indicating that the whole job is done.
Both the inner and outer loops modify the registers that STOSW works with. The inner loop modifies AL after each character is fired at the screen. This makes it possible to display a different character each time STOSW fires. The outer loop modifies DI (the destination index register) each time a line of characters is complete. This allows us to break the destination string up into four separate, non continuous lines.
The Other String Instructions
STOSW is only one of the several string instructions in the 86-family instruction set. I would have liked to cover the others here, but space won't allow, in this edition, at least. In particular, the MOVSW instruction (Move String by Word) is useful, because it allows you to copy entire regions of memory from one place to another, screamingly fast, and with only a single instruction:
REP MOVSW
You probably understand enough about string instruction etiquette now to pick up MOVSW yourself from an assembly-language reference. All of the same register conventions apply, only with MOVSW you're working with both the source and destination strings at the same time.
I felt it important to discuss not only the string instructions, but their supporting cast of characters: LOOP, LOOPNZ, and JCXZ. Individual instructions are important, but not nearly as important as the full context within which they work. Now that you've seen how STOSW is used in non REP loops, you should be able to apply the same knowledge to the other string instructions as well.
Further Research: Building Your Assembly-Language Video Toolkit
Video is important—it's the fundamental way your programs communicate with their users. Fast video is essential, and BIOS-based video fails in that regard. The Clear and Ruler macros are good examples of just how fast video routines can be made with solid knowledge of assembly language.
You have the fundamentals of a really good and extremely fast toolkit of video routines for your assembly-language programs. To get some serious practice in assemblylanguage design and implementation, it's up to you to fill that toolkit out.
Here's a list of some of the new routines you should design and perfect for your video toolkit:
•WriteFast A routine to move a string of characters from your data segment to the visible display buffer. You can do this easily using instructions we've discussed so far. A suggestion: use the LOOP instruction for an easy time of it, or research the MOVSW instruction for a trickier—but much faster—routine.
•WritelnFast Like WriteFast, but moves the hardware cursor to the beginning
of the following line after the write. If the write is to the bottom line on the screen, scroll the screen using INT 10 BIOS calls, or for more speed, MOVSW.
• WriteDown A routine to move a string of characters from the data segment
to the visible display buffer, only vertically. This is useful for displaying boxes for menus and other screen forms, using the PC's line drawing characters. SHOWCHAR.ASM gives you a hint as to how to approach this one.
• DrawBox Using WriteFast and WriteDown, create a routine that draws
a box on the screen. Allow the programmer to specify whether it is made of single-line or double-line line-drawing characters.
• GetString A delimited field-entry routine. Delineate a field, by highlighting
the background or framing a portion of a line with vertical bar characters, and allow the user to move the cursor and enter characters within the bounds of the field. When the user presses Enter, return the entered characters to a buffer somewhere in the data segment. This is ambitious and might require seventy or eighty instructions, but it's likely to be a lot of fun.
Getting your video tools in order will allow you to move on to other, more involved subjects like file I/O and interface to the serial and parallel ports. "Real" assemblylanguage programs require all these things, and you should strive to create them as small,
easily read and understood toolkit-style procedures and macros. Create them so that they call one another rather than duplicating function—assembly language is difficult enough without creating routines that do the same old things over and over again.
