Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Microcontroller Programming. Thi Micro Chip PIC (Julio Sanchez, 2007).pdf
Скачиваний:
476
Добавлен:
12.08.2013
Размер:
4.99 Mб
Скачать

Chapter 3

Data Types and Data Storage

In this chapter we review the various encodings and formats used for representing character and numeric data in digital systems. Tha character formats are used for encoding the letters, symbols, and control codes of the various alphabets. The numeric formats allow representing binary numbers as signed and unsigned integers in several forms, binary floating-point numbers, and decimal floating-point numbers, usually called binary-coded decimals or BCD.

3.0 Electronic-Digital Machines

The mechanization of arithmetic is often traced back to the abacus, slide rule, mechanical calculators, and punch card machines. The work of John von Neumann at Princeton’s Institute for Advanced Study and Research marks the first highlight in the design and construction of a digital-electronic calculating machine. In von Neumann’s design, data and instructions are stored in a common memory area. An alternative approach, known as Harvard architecture, was discarded at first but has recently been re-validated and is in use in several microcontroller families.

The calculating power of the first computer was approximately 2000 operations per second, while previous electro-mechanical devices were capable of performing only 3 or 4 operations. Today’s digital machines can execute more than 1 billion instructions per second. Technological advances and miniaturization techniques have reduced the cost and size of computing machinery.

3.1 Character Representations

Over the years, data representation issues have often been determined by the various conventions used by the different hardware manufacturer. Machines have had different word lengths and different character sets and have used various schemes for storing character and data. Fortunately, in microprocessor and microcontroller design, the encoding of character data has not been subject to major disagreements.

Historically, the methods used to represent characters have varied widely, but the basic approach has always been to choose a fixed number of bits and then map the

33

34

Chapter 3

various bit combinations to the various characters. Clearly, the number of bits of the storage format limits the total number of distinct characters that can be represented. In this manner, the 6-bit codes used on a number of earlier computing machines allow representing 64 characters. This range allows including the uppercase letters, the decimal digits, some special characters, but not the lowercase letters.

Computer manufacturers that used the 6-bit format often argued that their customers had no need for lower-case letters. Nowadays 7- and 8-bit codes that allow representing the lower-case letters have been adopted almost universally.

Most of the world (except IBM) has standardized character representations by using the ISO (International Standards Organization) code. ISO exists in several national variants; the one used in the United States is called ASCII, which stands for

American Standard Code for Information Interchange. All microcomputers and microcontrollers use ASCII as the code for character representation.

3.1.1 ASCII

ASCII is a character encoding based on the English alphabet. ASCII was first published as a standard in 1967 and was last updated in 1986. The first 33 codes, referred to as non-printing codes, are mostly obsolete control characters. The remaining 95 printable characters (starting with the space character) include the common characters found in a standard keyboard, the decimal digits, and the upperand lower-case characters of the English alphabet. Table 3.1 lists the ASCII characters in decimal, hexadecimal, and binary.

Table 3.1

ASCII Character Representation

DECIMAL

HEX

BINARY

VALUE

 

 

 

 

 

 

000

000

00000000

annual

(Null character)

001

001

00000001

SOH

(Start of Header)

002

002

00000010

STX

(Start of Text)

003

003

00000011

ETX

(End of Text)

004

004

00000100

EOT

(End of Transmission)

005

005

00000101

ENQ

(Enquiry)

006

006

00000110

ACK

(Acknowledgment)

007

007

00000111

BEL

(Bell)

008

008

00001000

BS

(Backspace)

009

009

00001001

HT

(Horizontal Tab)

010

00A

00001010

LF

(Line Feed)

011

00B

00001011

VT

(Vertical Tab)

012

00C

00001100

FF

(Form Feed)

013

00D

00001101

CR

(Carriage Return)

014

00E

00001110

SO

(Shift Out)

015

00F

00001111

SI

(Shift In)

016

010

00010000

DLE

(Data Link Escape)

017

011

00010001

DC1

(XON)(Device Control 1)

018

012

00010010

DC2

(Device Control 2)

019

013

00010011

DC3

(XOFF)(Device Control 3)

020

014

00010100

DC4

(Device Control 4)

021

015

00010101

NAK

(- Acknowledge)

022

016

00010110

SYN

(Synchronous Idle)

(continues)

Data Types and Data Storage

 

35

 

 

 

Table 3.1

 

 

 

 

ASCII Character Representation (conitnued)

 

 

 

 

 

 

DECIMAL

HEX

BINARY

VALUE

 

 

 

 

 

 

 

 

000

000

00000000

annual

(Null character)

023

017

00010111

ETB

(End of Trans. Block)

024

018

00011000

CAN

(Cancel)

025

019

00011001

EM

(End of Medium)

026

01A

00011010

SUB

(Substitute)

027

01B

00011011

ESC

(Escape)

028

01C

00011100

FS

(File Separator)

029

01D

00011101

GS

(Group Separator)

030

01E

00011110

RS

(Request to Send)

031

01F

00011111

US

(Unit Separator)

032

020

00100000

SP

(Space)

033

021

00100001

!

(exclamation mark)

034

022

00100010

(double quote)

035

023

00100011

#

(number sign)

036

024

00100100

$

(dollar sign)

037

025

00100101

%

(percent)

038

026

00100110

&

(ampersand)

039

027

00100111

(single quote)

040

028

00101000

(

(left/opening parenthesis)

041

029

00101001

)

(right/closing parenthesis)

042

02A

00101010

*

(asterisk)

043

02B

00101011

+

(plus)

044

02C

00101100

,

(comma)

045

02D

00101101

-

(minus or dash)

046

02E

00101110

.

(dot)

047

02F

00101111

/

(forward slash)

048

030

00110000

0

(decimal digits ...)

049

031

00110001

1

 

 

050

032

00110010

2

 

 

051

033

00110011

3

 

 

052

034

00110100

4

 

 

053

035

00110101

5

 

 

054

036

00110110

6

 

 

055

037

00110111

7

 

 

056

038

00111000

8

 

 

057

039

00111001

9

 

 

058

03A

00111010

:

(colon)

059

03B

00111011

;

(semi-colon)

060

03C

00111100

<

(less than)

061

03D

00111101

=

(equal sign)

062

03E

00111110

>

(greater than)

063

03F

00111111

?

(question mark)

064

040

01000000

@

(AT symbol)

065

041

01000001

A

 

 

066

042

01000010

B

 

 

067

043

01000011

C

 

 

. . .

 

 

 

 

 

090

05A

01011010

Z

 

 

091

05B

01011011

[

(left/opening bracket)

092

05C

01011100

\

(back slash)

093

05D

01011101

]

(right/closing bracket)

 

(continues)

36

 

 

 

Chapter 3

 

 

 

Table 3.1

 

 

 

ASCII Character Representation (conitnued)

 

 

 

 

DECIMAL HEX

BINARY

VALUE

 

 

 

 

 

 

094

05E

01011110

^

(circumflex)

095

05F

01011111

_

(underscore)

096

060

01100000

`

(accent)

097

061

01100001

a

 

098

062

01100010

b

 

099

063

01100011

c

 

...

 

 

 

 

122

07A

01111010

z

 

123

07B

01111011

{

(left/opening brace)

124

07C

01111100

|

(vertical bar)

125

07D

01111101

}

(right/closing brace)

126

07E

01111110

~

(tilde)

127

07F

01111111

DEL

(delete)

3.1.2 EBCDIC and IBM

In spite of ASCII’s general acceptance, IBM continues to use EBCDIC (Extended Binary Coded Decimal Interchange Code) for character encoding. IBM mainframes and midrange systems such as the AS/400 use a wholly incompatible character set primarily designed for punched cards.

EBCDIC uses the full eight bits available to it, so there is no place left to implement parity checking. On the other hand, EBCDIC has a wider range of control characters than ASCII.

EBCDIC character encoding is based on Binary Coded Decimal (BCD), which we discuss later in this chapter. There are four main blocks in the EBCDIC code page:

1.The range 0000 0000 to 0011 1111 is reserved for control characters.

2.The range 0100 0000 to 0111 1111 is for punctuation.

3.The range 1000 0000 to 1011 1111 is for lowercase characters.

4.The range 1100 0000 to 1111 1111 is for uppercase characters and numbers.

Actually, microprocessor and microcontroller design need not address how character data is encoded. Usually a set of instructions allows manipulating 8-bit quantities, but the processor need not be concerned with what the encodings represent. On the other hand, some mainframe processors do have instructions that manipulate character codes. For example, the EDIT instruction on the IBM 370 implements the kind of picture conversion that appears in COBOL programs.

3.1.3 Unicode

One of the limitations of the ASCII code is that eight bits are not enough for representing characters sets in languages such as Japanese or Chinese which use large character sets. This has led to the development of encodings which allow representing large character sets. Unicode has been proposed as a universal character encoding standard that can be used for representation of text for computer processing.