- •Table of Contents
- •Preface
- •Additional Material
- •Basic Electronics
- •1.0 The Atom
- •1.1 Isotopes and Ions
- •1.2 Static Electricity
- •1.3 Electrical Charge
- •1.4 Electrical Circuits
- •1.5 Circuit Elements
- •1.6 Semiconductors
- •Number Systems
- •2.0 Counting
- •2.1 The Origins of the Decimal System
- •2.2 Types of Numbers
- •2.3 Radix Representations
- •2.4 Number System Conversions
- •Data Types and Data Storage
- •3.0 Electronic-Digital Machines
- •3.1 Character Representations
- •3.2 Storage and Encoding of Integers
- •3.3 Encoding of Fractional Numbers
- •3.4 Binary-Coded Decimals (BCD)
- •Digital Logic, Arithmetic, and Conversions
- •4.0 Microcontroller Logic and Arithmetic
- •4.1 Logical Instructions
- •4.2 Microcontroller Arithmetic
- •4.3 Bit Manipulations and Auxiliary Operations
- •4.4 Unsigned Binary Arithmetic
- •4.5 Signed Binary Arithmetic
- •4.6 Data Format Conversions
- •Circuits and Logic Gates
- •5.0 Digital Circuits
- •5.1 The Diode Revisited
- •5.2 The Transistor
- •5.3 Logic Gates
- •5.4 Transistor-Transistor Logic
- •5.5 Other TTL Logic Families
- •5.6 CMOS Logic Gates
- •Circuit Components
- •6.0 Power Supplies
- •6.1 Clocked Logic and Flip-flops
- •6.2 Clocks
- •6.3 Frequency Dividers and Counters
- •6.4 Multiplexers and Demultiplexers
- •6.5 Input Devices
- •The Microchip PIC
- •7.0 The PICMicro Microcontroller
- •7.1 PIC Architecture
- •Mid-range PIC Architecture
- •8.0 Processor Architecture and Design
- •8.1 The Mid-range Core Features
- •8.2 Mid-Range CPU and Instruction Set
- •8.3 EEPROM Data Storage
- •8.4 Data Memory Organization
- •8.5 Mid-range I/O and Peripheral Modules
- •PIC Programming: Tools and Techniques
- •9.0 Microchip’s MPLAB
- •9.1 Integrated Development Environment
- •9.2 Simulators and Debuggers
- •9.3 Programmers
- •9.4 Engineering PIC Software
- •9.5 Pseudo Instructions
- •Programming Essentials: Input and Output
- •10.0 16F84A Programming Template
- •10.1 Introducing the 16F84A
- •10.2 Simple Circuits and Programs
- •10.3 Programming the Seven-segment LED
- •10.4 A Demonstration Board
- •Interrupts
- •11.0 Interrupts on the 16F84
- •11.1 Interrupt Sources
- •11.2 Interrupt Handlers
- •11.3 Interrupt Programming
- •11.4 Sample Programs
- •Timers and Counters
- •12.0 The 16F84 Timer0 Module
- •12.1 Delays Using Timer0
- •12.2 Timer0 as a Counter
- •12.3 Timer0 Programming
- •12.4 The Watchdog Timer
- •12.5 Sample Programs
- •LCD Interfacing and Programming
- •13.0 LCD Features and Architecture
- •13.1 Interfacing with the HD44780
- •13.2 HD44780 Instruction Set
- •13.3 LCD Programming
- •13.4 Sample Programs
- •Communications
- •14.0 PIC Communications Overview
- •14.1 Serial Data Transmission
- •14.2 Parallel Data Transmission
- •14.4 PIC Protocol-based Serial Programming
- •14.5 Sample Programs
- •Data EEPROM Programming
- •15.0 PIC Internal EEPROM Memory
- •15.1 EEPROM Devices and Interfaces
- •15.2 Sample Programs
- •Analog to Digital and Realtime Clocks
- •16.0 A/D Converters
- •16.1 A/D Integrated Circuits
- •16.2 PIC On-Board A/D Hardware
- •16.3 Realtime Clocks
- •16.4 Sample Programs
- •Index
Chapter 3
Data Types and Data Storage
In this chapter we review the various encodings and formats used for representing character and numeric data in digital systems. Tha character formats are used for encoding the letters, symbols, and control codes of the various alphabets. The numeric formats allow representing binary numbers as signed and unsigned integers in several forms, binary floating-point numbers, and decimal floating-point numbers, usually called binary-coded decimals or BCD.
3.0 Electronic-Digital Machines
The mechanization of arithmetic is often traced back to the abacus, slide rule, mechanical calculators, and punch card machines. The work of John von Neumann at Princeton’s Institute for Advanced Study and Research marks the first highlight in the design and construction of a digital-electronic calculating machine. In von Neumann’s design, data and instructions are stored in a common memory area. An alternative approach, known as Harvard architecture, was discarded at first but has recently been re-validated and is in use in several microcontroller families.
The calculating power of the first computer was approximately 2000 operations per second, while previous electro-mechanical devices were capable of performing only 3 or 4 operations. Today’s digital machines can execute more than 1 billion instructions per second. Technological advances and miniaturization techniques have reduced the cost and size of computing machinery.
3.1 Character Representations
Over the years, data representation issues have often been determined by the various conventions used by the different hardware manufacturer. Machines have had different word lengths and different character sets and have used various schemes for storing character and data. Fortunately, in microprocessor and microcontroller design, the encoding of character data has not been subject to major disagreements.
Historically, the methods used to represent characters have varied widely, but the basic approach has always been to choose a fixed number of bits and then map the
33
34 |
Chapter 3 |
various bit combinations to the various characters. Clearly, the number of bits of the storage format limits the total number of distinct characters that can be represented. In this manner, the 6-bit codes used on a number of earlier computing machines allow representing 64 characters. This range allows including the uppercase letters, the decimal digits, some special characters, but not the lowercase letters.
Computer manufacturers that used the 6-bit format often argued that their customers had no need for lower-case letters. Nowadays 7- and 8-bit codes that allow representing the lower-case letters have been adopted almost universally.
Most of the world (except IBM) has standardized character representations by using the ISO (International Standards Organization) code. ISO exists in several national variants; the one used in the United States is called ASCII, which stands for
American Standard Code for Information Interchange. All microcomputers and microcontrollers use ASCII as the code for character representation.
3.1.1 ASCII
ASCII is a character encoding based on the English alphabet. ASCII was first published as a standard in 1967 and was last updated in 1986. The first 33 codes, referred to as non-printing codes, are mostly obsolete control characters. The remaining 95 printable characters (starting with the space character) include the common characters found in a standard keyboard, the decimal digits, and the upperand lower-case characters of the English alphabet. Table 3.1 lists the ASCII characters in decimal, hexadecimal, and binary.
Table 3.1
ASCII Character Representation
DECIMAL |
HEX |
BINARY |
VALUE |
|
|
|
|
|
|
000 |
000 |
00000000 |
annual |
(Null character) |
001 |
001 |
00000001 |
SOH |
(Start of Header) |
002 |
002 |
00000010 |
STX |
(Start of Text) |
003 |
003 |
00000011 |
ETX |
(End of Text) |
004 |
004 |
00000100 |
EOT |
(End of Transmission) |
005 |
005 |
00000101 |
ENQ |
(Enquiry) |
006 |
006 |
00000110 |
ACK |
(Acknowledgment) |
007 |
007 |
00000111 |
BEL |
(Bell) |
008 |
008 |
00001000 |
BS |
(Backspace) |
009 |
009 |
00001001 |
HT |
(Horizontal Tab) |
010 |
00A |
00001010 |
LF |
(Line Feed) |
011 |
00B |
00001011 |
VT |
(Vertical Tab) |
012 |
00C |
00001100 |
FF |
(Form Feed) |
013 |
00D |
00001101 |
CR |
(Carriage Return) |
014 |
00E |
00001110 |
SO |
(Shift Out) |
015 |
00F |
00001111 |
SI |
(Shift In) |
016 |
010 |
00010000 |
DLE |
(Data Link Escape) |
017 |
011 |
00010001 |
DC1 |
(XON)(Device Control 1) |
018 |
012 |
00010010 |
DC2 |
(Device Control 2) |
019 |
013 |
00010011 |
DC3 |
(XOFF)(Device Control 3) |
020 |
014 |
00010100 |
DC4 |
(Device Control 4) |
021 |
015 |
00010101 |
NAK |
(- Acknowledge) |
022 |
016 |
00010110 |
SYN |
(Synchronous Idle) |
(continues)
Data Types and Data Storage |
|
35 |
|||
|
|
|
Table 3.1 |
|
|
|
|
ASCII Character Representation (conitnued) |
|||
|
|
|
|
|
|
DECIMAL |
HEX |
BINARY |
VALUE |
|
|
|
|
|
|
|
|
000 |
000 |
00000000 |
annual |
(Null character) |
|
023 |
017 |
00010111 |
ETB |
(End of Trans. Block) |
|
024 |
018 |
00011000 |
CAN |
(Cancel) |
|
025 |
019 |
00011001 |
EM |
(End of Medium) |
|
026 |
01A |
00011010 |
SUB |
(Substitute) |
|
027 |
01B |
00011011 |
ESC |
(Escape) |
|
028 |
01C |
00011100 |
FS |
(File Separator) |
|
029 |
01D |
00011101 |
GS |
(Group Separator) |
|
030 |
01E |
00011110 |
RS |
(Request to Send) |
|
031 |
01F |
00011111 |
US |
(Unit Separator) |
|
032 |
020 |
00100000 |
SP |
(Space) |
|
033 |
021 |
00100001 |
! |
(exclamation mark) |
|
034 |
022 |
00100010 |
“ |
(double quote) |
|
035 |
023 |
00100011 |
# |
(number sign) |
|
036 |
024 |
00100100 |
$ |
(dollar sign) |
|
037 |
025 |
00100101 |
% |
(percent) |
|
038 |
026 |
00100110 |
& |
(ampersand) |
|
039 |
027 |
00100111 |
‘ |
(single quote) |
|
040 |
028 |
00101000 |
( |
(left/opening parenthesis) |
|
041 |
029 |
00101001 |
) |
(right/closing parenthesis) |
|
042 |
02A |
00101010 |
* |
(asterisk) |
|
043 |
02B |
00101011 |
+ |
(plus) |
|
044 |
02C |
00101100 |
, |
(comma) |
|
045 |
02D |
00101101 |
- |
(minus or dash) |
|
046 |
02E |
00101110 |
. |
(dot) |
|
047 |
02F |
00101111 |
/ |
(forward slash) |
|
048 |
030 |
00110000 |
0 |
(decimal digits ...) |
|
049 |
031 |
00110001 |
1 |
|
|
050 |
032 |
00110010 |
2 |
|
|
051 |
033 |
00110011 |
3 |
|
|
052 |
034 |
00110100 |
4 |
|
|
053 |
035 |
00110101 |
5 |
|
|
054 |
036 |
00110110 |
6 |
|
|
055 |
037 |
00110111 |
7 |
|
|
056 |
038 |
00111000 |
8 |
|
|
057 |
039 |
00111001 |
9 |
|
|
058 |
03A |
00111010 |
: |
(colon) |
|
059 |
03B |
00111011 |
; |
(semi-colon) |
|
060 |
03C |
00111100 |
< |
(less than) |
|
061 |
03D |
00111101 |
= |
(equal sign) |
|
062 |
03E |
00111110 |
> |
(greater than) |
|
063 |
03F |
00111111 |
? |
(question mark) |
|
064 |
040 |
01000000 |
@ |
(AT symbol) |
|
065 |
041 |
01000001 |
A |
|
|
066 |
042 |
01000010 |
B |
|
|
067 |
043 |
01000011 |
C |
|
|
. . . |
|
|
|
|
|
090 |
05A |
01011010 |
Z |
|
|
091 |
05B |
01011011 |
[ |
(left/opening bracket) |
|
092 |
05C |
01011100 |
\ |
(back slash) |
|
093 |
05D |
01011101 |
] |
(right/closing bracket) |
|
(continues)
36 |
|
|
|
Chapter 3 |
|
|
|
Table 3.1 |
|
|
|
ASCII Character Representation (conitnued) |
||
|
|
|
|
|
DECIMAL HEX |
BINARY |
VALUE |
|
|
|
|
|
|
|
094 |
05E |
01011110 |
^ |
(circumflex) |
095 |
05F |
01011111 |
_ |
(underscore) |
096 |
060 |
01100000 |
` |
(accent) |
097 |
061 |
01100001 |
a |
|
098 |
062 |
01100010 |
b |
|
099 |
063 |
01100011 |
c |
|
... |
|
|
|
|
122 |
07A |
01111010 |
z |
|
123 |
07B |
01111011 |
{ |
(left/opening brace) |
124 |
07C |
01111100 |
| |
(vertical bar) |
125 |
07D |
01111101 |
} |
(right/closing brace) |
126 |
07E |
01111110 |
~ |
(tilde) |
127 |
07F |
01111111 |
DEL |
(delete) |
3.1.2 EBCDIC and IBM
In spite of ASCII’s general acceptance, IBM continues to use EBCDIC (Extended Binary Coded Decimal Interchange Code) for character encoding. IBM mainframes and midrange systems such as the AS/400 use a wholly incompatible character set primarily designed for punched cards.
EBCDIC uses the full eight bits available to it, so there is no place left to implement parity checking. On the other hand, EBCDIC has a wider range of control characters than ASCII.
EBCDIC character encoding is based on Binary Coded Decimal (BCD), which we discuss later in this chapter. There are four main blocks in the EBCDIC code page:
1.The range 0000 0000 to 0011 1111 is reserved for control characters.
2.The range 0100 0000 to 0111 1111 is for punctuation.
3.The range 1000 0000 to 1011 1111 is for lowercase characters.
4.The range 1100 0000 to 1111 1111 is for uppercase characters and numbers.
Actually, microprocessor and microcontroller design need not address how character data is encoded. Usually a set of instructions allows manipulating 8-bit quantities, but the processor need not be concerned with what the encodings represent. On the other hand, some mainframe processors do have instructions that manipulate character codes. For example, the EDIT instruction on the IBM 370 implements the kind of picture conversion that appears in COBOL programs.
3.1.3 Unicode
One of the limitations of the ASCII code is that eight bits are not enough for representing characters sets in languages such as Japanese or Chinese which use large character sets. This has led to the development of encodings which allow representing large character sets. Unicode has been proposed as a universal character encoding standard that can be used for representation of text for computer processing.
