
Assembly Language Step by Step 1992
.pdfIt's one of those little tradeoffs you'll be faced with time and again as you hone your skills in assembly.
11.6 Detecting Which CPU Your Code Is Running On
If you think about the notion of using the instructions present in the 286 or 386 but not the 8086/8088, the question very quickly arises: what happens when you try to execute a 386 instruction on an 8088 or a 286?
Simple answer: nothing good. On the 8088 and 8086, the response of the CPU to an undefined opcode is truly undefined, in that the instruction fetching mechanism simply hands the bogus opcode to the CPU's microcode circuitry, and then whatever happens, happens. Sometimes nothing, sometimes something defined but unexpected (like finding that a register is
"magically" incremented or zeroed out), and sometimes the CPU will just hang. The worst of it is that you can never count on "undocumented instructions" to work consistently from one build of the CPU chip to another, so the best advice is, don't do it!
Things are both better and worse on the 286, 386, and 486. On these more advanced processors, the instruction fetching mechanism actually checks each fetched instruction against a valid opcode matrix to determine if the instruction is defined or not. If the CPU fetches an undefined opcode, it will generate an interrupt 6, which can be used to signal the error.
That's the good news—a consistent reaction to a bogus opcode. The bad news is that there's no standard for handing an interrupt 6. The CPU only knows how to generate the interrupt; once execution jets off to the address stored in the interrupt vector table for interrupt 6, things are out of the CPU's hands. It's the responsibility of the BIOS or of DOS to install handlers for "system" interrupts like interrupt 6.
Some do. Some (especially 8088-based, XT-class machines) don't. And of those that do, the action taken on an interrupt 6 is anything but consistent. Some machines ignore the interrupt and simply return control without taking action. On machines like that, nothing at all happens when an undefined opcode is fetched. Other machines may halt execution with a cryptic error message on the screen. Still other machines may just go nuts.
Again, don't do it. You'll generate bugs aplenty just arranging the multitude of perfectly legal opcodes in your programs. Don't complicate matters by forcing the target machine to digest and react to instructions it doesn't have.
Practicing Safe Execution
So, if you're going to use instructions that are defined on some PCs and not on others, you are going to have to build some machinery into your programs to detect what sort of CPU the programs are running on, and abort execution if an older machine is detected that can't run the software.
This sounds harder than it actually is. There are some well-defined differences among the various CPUs that do not involve defined or undefined instructions, and a very clever program can look for those differences and safely determine what sort of CPU is executing it. If your program requires a 386 and determines that it isn't running on one, it can abort back to DOS before anything untoward happens.
I've written such a routine, and it's given at the end of this section, in the listing WHATAMI.ASM. Read it through, and then we'll go through it, step by step. It's certainly the subtlest piece of code I've presented in this book, and if you can understand how it works, you're well on your way to being a competent journeyman assembly programmer.
The CPU Identifier from a Height
WHATAMI.ASM is a fully executable program that is a shell around the CPU detector procedure, CPUID. Nothing unusual has to be done to assemble and link it; create the executable file the same way you've been doing for all the listings in this book. The only caution, again, is to be sure the version of the assembler you're using is new enough to understand 386 instructions! CPUID returns a value in AL that indicates what CPU is currently executing the program. A 0 value indicates the presence of either an 8086 or an 8088. (Because those two chips execute the same identical instruction set, there's very little point in looking further to see which one it actually is.) A 1 value indicates the presence of an 80286, a 2 value indicates any of the 80386 family, including the 386SX and 386DX, and a 3 value indicates the presence of a 486SX or 486DX.
Looking for an 8086 or 8088
We start at the bottom of the CPU totem pole, and assume that we have an 8088 or 8086. The first test is to eliminate the possibility of these CPUs. The test turns on a quirk of the 8086/8088 CPUs: the top four bits of the Flags register are forever stuck in the set state; that is, as 1 bits. Even if you try to force the Flags register to 16 0 bits, the top 4 bits will not change, and if you read back the Flags register after trying to set all 16 bits to 0, you'll find that the top 4 bits always come back as 1s.
That's how it is on the 8086 and 8088. However, things are different on the newer CPUs. More of the bits in the Flags register are meaningful, and therefore the top 4 bits are not left stuck at 1. So if you push 16 0 bits onto the stack and then read them back, any value other than 0F000H eliminates the 8086 and 8088 from the running.
There's no instruction that allows you to directly store a value into the Flags register. The only instruction that can affect the entire Flags register at once is POPF, which pops the word from the top of the stack into the Flags register.
In the test, we clear DX to 0, push DX onto the stack, and then pop the two 0 bytes from the top
of the stack into the Flags register with POPF. That writes our test value to the Flags register; to read it back we simply use PUSHF to push the Flags register back onto the stack, and then pop the top of the stack into AX for inspection.
The test consists of a comparison of the value in AX (which used to be in the Flags register) against the literal value 0F000H:
CMP AX,0F000H
JE Done
If the value in AX is in fact 0F000H, we're done—because we found that the top four bits of the flags register are stuck at 1, indicating an 8086 or 8088.
At the Done label, we copy the value in DX to AX and return to the caller. The value in DX is going to indicate which CPU we've found. Each time we eliminate the next higher CPU, we increment DX to the next higher code. DX was originally forced to 0, so we pass 0 back to the caller—indicating an 8086 or 8088.
Looking for a 286
Eliminating the 286 is similarly easy. The same general method is used, because the 286 has a quirk that is inside out from the 8088s: the top 4 bits of the stack are always forced to 0 bits on the 286, when the 286 is running in real mode. If the 286 is running in protected mode, bits 12 through 14 of the Flags register are meaningful and can change, but in real mode (which is the only mode our code will be using in this book) bits 12 through 15 will always be 0. (Bit 15 is always 0 on the 286, regardless of real or protected mode.) So what we do is try to set the Flags register to 0F000H—and if the flags come back as something other than 0F000H, we can eliminate the 286 and know that we have at least a 386.
Before we do anything else, we increment DX to 1, since we now know that we have at least a 286 on the line. And because we know we have at least a 286 on the line, we can use an instruction introduced with the 286: PUSH <immed>, which can push an immediate value (like 0F000H) onto the stack. This makes it unnecessary to first load 0F000H into a register (as we did for the 8088 test) and then push the register's value onto the stack. On the 286 and newer processors, you can push a literal value directly.
Using the same general method we used in testing for the 8088, we push 0F000H onto the stack, pop it into the Flags register, push the Flags register back into the stack, and pop the value from the Flags register off the stack into AX. Then we check to see if AX still contains the 0F000H value we forced into the Flags register. If the value comes back as 0, we know we have a 286, so we exit to Done with 1 in DX.
Remember that the 286 forces the top 4 bits of the Flags register to 0. If something other than 0 comes back in those top 4 bits, we know we have at least a 386 and possibly a 486. So we increment DX again, to 2 (2 is the code for the 386).
Is It a 386 or a 486?
We now know we have either a 386 or a 486. Telling them apart isn't quite as easy as telling them from their less-powerful brothers, but once again, it's a matter of flags. The 486 has a flag that the 386 doesn't have, and by testing for this flag we can see whether we have a 486 on the line. If we don't, we know we have a 386 by elimination.
You'll notice in reading the 486 test that we're suddenly working with the extended registers introduced with the 386: EAX, ESP, and so on. Keep in mind that these are all 32 bits in size. The 486 extended Flags register (EFLAGS) has a flag called the alignment check (AC) Flag. The AC Flag is used to detect alignment faults; that is, attempts to access memory from an address that is not evenly divisible by 4. Why 4? The 386 and 486 access memory 32 bits (4 bytes) at a time, every time they access memory at all. Because of the way that the CPU sends memory address information out on the memory address pins, memory accesses happen most quickly when the requested addresses are aligned on a double word boundary, that is, when the requested address is divisible by 4.
There are some truly arcane reasons why it is vitally important that memory accesses be aligned on a double word boundary sometimes, but you're unlikely to encounter them in normal work. (Mostly they come up when more than one processor must share the same address space, which is mighty unlikely on a PC-compatible machine!) The AC flag was added to the 486 to allow enforcement of double word alignment. If you set up some of the 486's special control registers just so, a nonaligned memory access can generate an error interrupt, and the AC flag is part of this enforcement machinery.
For our purposes, however, it's a handy feature that doesn't exist on the 386—so if we can spot the Alignment check flag in EFLAGS, we know we have a 486.
The first step in the test is to save the value in ESP into a register, because we're going to have to ensure that the stack pointer is double word aligned— which might change ESP's value. Having the old value in a register will allow us to put things back the way they were before we started testing.
Aligning an Address to a Double Word Boundary
With ESP's original value safely tucked away, we force the stack pointer to be double word aligned. This is as simple as rounding the value of the stack pointer down to the next lowest memory address on a double word boundary. The way we do this may puzzle you:
AND ESP,NOT 3
What does NOT 3 mean? NOT is an assembler operator that inverts all the bits of its operand,
which in this case is 3. Because ESP is a 32-bit register, the NOT 3 operand is also 32 bits in size. The full 32-bit expression of 3 (including all the leading zeroes) is 00000003H. If you invert the bits in every digit in that number (including those leading zeroes) you get
FFFFFFFCH.
That's a big, ugly hex number, and all those F's really aren't the point. The whole point lies in the lowest two bits of the last hex digit C, which are 0. When you AND two values bit by bit, any 0 bits in either operand will force those bits in the result to 0.
What we want to do is force the two lowest-order bits in ESP to 0. If we do that, regardless of what address was previously in ESP, the altered address will be aligned on a double word boundary. If this isn't immediately clear, think of it this way: every double word aligned address is four bytes greater than the one before it. Not one byte, or two bytes, or three bytes, but four bytes greater. You're essentially counting bytes by fours (0,4,8,12,16, etc.), which means that the bits that carry the "in between" values (5, 6, or 7, for example) must be zeroed out, or you're not really counting by fours at all.
AND ESP,NOT 3 simply forces the low two bits of ESP to 0, ensuring that ESP is aligned on a double word boundary. We need to do that because if ESP is not double word aligned and we attempt to use it, we may inadvertently set AC to 1, which would muddy the waters of the test we're trying to do.
The Last Test
And the test is this: we push EFLAGS onto the stack, and then save a copy of EFLAGS in both EAX and EBX by popping the flags value off the stack into EAX and then copying EAX into EBX. (Note that this use of the ESP could generate a flip in the AC flag if ESP were not double word aligned.) Then, we take the copy of EFLAGS in EAX, and try to flip the state of the AC flag. It's done this way:
XOR EAX,00040000H
Remember: XORing a bit against 0 leaves the bit in its current state. XORing a bit against 1 reverses the state of that bit. Only one bit in 00040000H is 1; all the others are 0. That single 1 bit is at the same ordinal position as the AC bit in the EFLAGS register. XORing 00040000H against EFLAGS will toggle the state of the AC bit.
We can't, of course, XOR against EFLAGS directly. So we do basically what we did in the previous two tests: manipulate a value in a register, and then push the value to the stack and pop it back into EFLAGS. We XOR a copy of EFLAGS against 00040000H to toggle the bit at position 18, then push the altered value from EAX onto the stack and pop it back into EFLAGS. Having forced an altered value into EFLAGS, we then immediately copy it back for a look. Remember: the 386 doesn't define the AC bit, and on the 386 the bit at position 18 is stuck at 0. We attempted to flip the bit at position 18 in EFLAGS. On the 486 it will flip; on the 386 it's
stuck. If we read back EFLAGS and find that bit 18 has changed, we have a 486. If the bit at position 18 has not changed, we have a 386.
If we find a 486, we increment DX to 3 and return. Otherwise, we simply return, leaving DX at 2, which is the code for the 386.
Many thanks to Robert Hummel of PC Magazine for explaining this algorithm to me.
The WHATAMI.ASM Utility
That's all there is to CPUID. WHATAMI.ASM does nothing more than call CPUID to determine the running CPU, and then use CPUID's return code to index into a table of messages. The selected message indicates to the user (through DOS function 40H) which CPU is in the machine.
Only a little review on selecting items in a table: each text message in MsgTbl is exactly 16 bytes long. The first message is for the 8088, and to select it you need an offset into the table of 0. The second message, for the 286, is at an offset of 16 into the table—1 x 16, and the 286 ID code is 1. The third message, for the 386, is at an offset of 32 into the table—2 x 16, and the 386 ID code is 2.
Getting the idea?
We're basically multiplying the CPU ID code by 16 to create an offset to the correct message in the table. Multiplying by 16—a power of 2—is easy: you just shift left by four bits. That done, you add the offset to the starting address of the table, and pass the resulting address for DOS so that DOS can display the message with its function 40H.
As I've said before, it's a very good rule of thumb: always make items in a table come out to a length that's an even power of 2—2, 4, 8, 16, 32, or 64 bytes comprise a good assortment of lengths. Even if you have to pad the ends of data items with Os or space characters, you will save a lot of fooling around if you can generate an offset by simple power-of-2 multiplies using SHL.
Passing a Value Back to ERRORLEVEL
You can use WHATAMI.EXE from a batch file, and it will pass the CPU ID code back to DOS for use in the batch ERRORLEVEL feature, allowing your batch files to test the value returned by WHATAMI and take action accordingly. All you have to do to pass a value back in ERRORLEVEL is to leave the value in AL when your program calls DOS service 4CH to return control to DOS. DOS takes care of the rest.



,

You never really learn assembly language.
You can improve your skills over time, by reading good books on the subject, by reading good code that others have written, and most of all, by writing lots and lots of code yourself. But at no point will you be able to stand up and say, I know it.
You shouldn't feel bad about this. In fact, I take some encouragement from occasionally hearing that Michael Abrash, author of Zen of Assembly Language, has learned something new about assembly language. Michael has been writing high-performance assembly code for almost ten years, and has evolved into one of the five or six best assembly programmers in the Western Hemisphere.
If Michael is still learning, is there hope for the rest of us?