Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Handbook_of_statistical_analysis_using_SAS

.pdf
Скачиваний:
17
Добавлен:
01.05.2015
Размер:
4.92 Mб
Скачать

describe the Microsoft Windows interface, as this is by far the most popular, although other windowing environments, such as X-windows, are quite similar.

1.2The Microsoft Windows User Interface

Display 1.1

Display 1.1 shows how SAS version 8 appears running under Windows. When SAS is started, there are five main windows open, namely the Editor, Log, Output, Results, and Explorer windows. In Display 1.1, the Editor, Log, and Explorer windows are visible. The Results window is hidden behind the Explorer window and the Output window is hidden

behind the Program Editor and Log windows.

At the top, below the SAS title bar, is the menu bar. On the line below that is the tool bar with the command bar at its left end. The tool bar consists of buttons that perform frequently used commands. The command bar allows one to type in less frequently used commands. At the bottom, the status line comprises a message area with the current directory and editor cursor position at the right. Double-clicking on the current directory allows it to be changed.

Briefly, the purpose of the main windows is as follows.

©2002 CRC Press LLC

1.Editor: The Editor window is for typing in editing, and running programs. When a SAS program is run, two types of output are generated: the log and the procedure output, and these are displayed in the Log and Output windows.

2.Log: The Log window shows the SAS statements that have been submitted together with information about the execution of the program, including warning and error messages.

3.Output: The Output window shows the printed results of any procedures. It is here that the results of any statistical analyses are shown.

4.Results: The Results window is effectively a graphical index to the Output window useful for navigating around large amounts of procedure output. Right-clicking on a procedure, or section of output, allows that portion of the output to be viewed, printed, deleted, or saved to file.

5.Explorer: The Explorer window allows the contents of SAS data sets and libraries to be examined interactively, by double-clicking on them.

When graphical procedures are run, an additional window is opened to display the resulting graphs.

Managing the windows (e.g., moving between windows, resizing them, and rearranging them) can be done with the normal windows controls, including the Window menu. There is also a row of buttons and tabs at the bottom of the screen that can be used to select a window. If a window has been closed, it can be reopened using the View menu.

To simplify the process of learning to use the SAS interface, we concentrate on the Editor, Log, and Output windows and the most important and useful menu options, and recommend closing the Explorer and Results windows because these are not essential.

1.2.1 The Editor Window

In version 8 of SAS, a new editor was introduced, referred to as the enhanced editor. The older version, known as the program editor, has been retained but is not recommended. Here we describe the enhanced editor and may refer to it simply as “the editor.” If SAS starts up using the program editor rather than the enhanced editor, then from the Tools menu select Options; Preferences then the Edit tab and select the Use Enhanced Editor option*.

* At the time of writing, the enhanced editor was not yet available under X-windows.

©2002 CRC Press LLC

The editor is essentially a built-in text editor specifically tailored to the SAS language and with additional facilities for running SAS programs.

Some aspects of the Editor window will be familiar as standard features of Windows applications. The File menu allows programs to be read from a file, saved to a file, or printed. The File menu also contains the command to exit from SAS. The Edit menu contains the usual options for cutting, copying, and pasting text and those for finding and replacing text.

The program currently in the Editor window can be run by choosing the Submit option from the Run menu. The Run menu is specific to the Editor window and will not be available if another window is the active window. Submitting a program may remove it from the Editor window. If so, it can be retrieved by choosing Recall Last Submit from the Run menu.

It is possible to run part of the program in the Editor window by selecting the text and then choosing Submit from the Run menu. With this method, the submitted text is not cleared from the Editor window. When running parts of programs in this way, make sure that a full step has been submitted. The easiest way to do this is to include a Run statement as the last statement.

The Options submenu within Tools allows the editor to be configured. When the Enhanced Editor window is the active window (View, Enhanced Editor will ensure that it is), Tools; Options; Enhanced Editor Options will open a window similar to that in Display 1.2. The display shows the recommended setup, in particular, that the options for collapsible code sections and automatic indentation are selected, and that Clear text on submit is not.

1.2.2 The Log and Output Windows

The contents of the Log and Output windows cannot be edited; thus, several options of the File and Edit menus are disabled when these windows are active.

The Clear all option in the Edit menu will empty either of these windows. This is useful for obtaining a “clean” printout if a program has been run several times as errors were being corrected.

1.2.3 Other Menus

The View menu is useful for reopening a window that has been closed. The Solutions menu allows access to built-in SAS applications but these

are beyond the scope of this text.

©2002 CRC Press LLC

Display 1.2

The Help menu tends to become more useful as experience in SAS is gained, although there may be access to some tutorial materials if they have been licensed from SAS. Version 8 of SAS comes with a complete set of documentation on a CD-ROM in a format that can be browsed and searched with an HTML (Web) browser. If this has been installed, it can be accessed through Help; Books and Training; SAS Online Doc.

Context-sensitive help can be invoked with the F1 key. Within the editor, when the cursor is positioned over the name of a SAS procedure, the F1 key brings up the help for that procedure.

1.3The SAS Language

Learning to use the SAS language is largely a question of learning the statements that are needed to do the analysis required and of knowing how to structure them into steps. There are a few general principles that are useful to know.

©2002 CRC Press LLC

Most SAS statements begin with a keyword that identifies the type of statement. (The most important exception is the assignment statement that begins with a variable name.) The enhanced editor recognises keywords as they are typed and changes their colour to blue. If a word remains red, this indicates a problem. The word may have been mistyped or is invalid for some other reason.

1.3.1 All SAS Statements Must End with a Semicolon

The most common mistake for new users is to omit the semicolon and the effect is to combine two statements into one. Sometimes, the result will be a valid statement, albeit one that has unintended results. If the result is not a valid statement, there will be an error message in the SAS log when the program is submitted. However, it may not be obvious that a semicolon has been omitted before the program is run, as the combined statement will usually begin with a valid keyword.

Statements can extend over more than one line and there may be more than one statement per line. However, keeping to one statement per line, as far as possible, helps to avoid errors and to identify those that do occur.

SAS statements fall into four broad categories according to where in a program they can be used. These are

1.Data step statements

2.Proc step statements

3.Statements that can be used in both data and proc steps

4.Global statements that apply to all following steps

Because the functions of the data and proc steps are so different, it is perhaps not surprising that many statements are only applicable to one type of step.

1.3.2 Program Steps

Data and proc steps begin with a data or proc statement, respectively, and end at the next data or proc statement, or the next run statement. When a data step has the data included within it, the step ends after the data. Understanding where steps begin and end is important because SAS programs are executed in whole steps. If an incomplete step is submitted, it will not be executed. The statements that were submitted will be listed in the log, but SAS will appear to have stopped at that point without explanation. In fact, SAS will simply be waiting for the step to be completed before running it. For this reason it is good practice to explicitly mark

©2002 CRC Press LLC

the end of each step by inserting a run statement and especially important to include one as the last statement in the program.

The enhanced editor offers several visual indicators of the beginning and end of steps. The data, proc, and run keywords are colour-coded in Navy blue, rather than the standard blue used for other keywords. If the enhanced editor options for collapsible code sections have been selected as shown in Display 1.2, each data and proc step will be separated by lines in the text and indicated by brackets in the margin. This gives the appearance of enclosing each data and proc step in its own box.

Data step statements must be within the relevant data step, that is, after the data statement and before the end of the step. Likewise, proc step statements must be within the proc step.

Global statements can be placed anywhere. If they are placed within a step, they will apply to that step and all subsequent steps until reset. A simple example of a global statement is the title statement, which defines a title for procedure output and graphs. The title is then used until changed or reset.

1.3.3 Variable Names and Data Set Names

In writing a SAS program, names must be given to variables and data sets. These can contain letters, numbers, and underline characters, and can be up to 32 characters in length but cannot begin with a number. (Prior to version 7 of SAS, the maximum length was eight characters.) Variable names can be in upper or lower case, or a mixture, but changes in case are ignored. Thus Height, height, and HEIGHT would all refer to the same variable.

1.3.4 Variable Lists

When a list of variable names is needed in a SAS program, an abbreviated form can often be used. A variable list of the form sex - - weight refers to the variables sex and weight and all the variables positioned between them in the data set. A second form of variable list can be used where a set of variables have names of the form score1, score2, … score10 . That is, there are ten variables with the root score in common and ending in the digits 1 to 10. In this case, they can be referred to by the variable list score1 - score10 and do not need to be contiguous in the data set.

Before looking at the SAS language in more detail, the short example shown in Display 1.3 can be used to illustrate some of the preceding material. The data are adapted from Table 17 of A Handbook of Small Data Sets (SDS) and show the age and percentage body fat for 14 women. Display 1.4 shows

©2002 CRC Press LLC

how the example appears in the Editor window. The Results and Explorer windows have been closed and the Editor window maximized. The program consists of three steps: a data step followed by two proc steps. Submitting this program results in the log and procedure output shown in Displays 1.5 and 1.6, respectively.

From the log one can see that the program has been split into steps and each step run separately. Notes on how the step ran follow the statements that comprise the step. Although notes are for information only, it is important to check them. For example, it is worth checking that the notes for a data step report the expected number of observations and variables. The log may also contain warning messages, which should always be checked, as well as error messages.

The reason the log refers to the SAS data set as WORK.BODYFAT rather than simply bodyfat is explained later.

data bodyfat;

Input age pctfat; datalines;

23 28

39 31

41 26

4925

5031

5335

5342

5429

5633

5730

5833

5834

6041

61

34

;

 

proc print data=bodyfat; run;

proc corr data=bodyfat; run;

Display 1.3

©2002 CRC Press LLC

Display 1.4

1data bodyfat;

2Input age pctfat;

3datalines;

NOTE: The data set WORK.BODYFAT has 14 observations and 2 variables. NOTE: DATA statement used:

real time 0.59 seconds

18;

19proc print data=bodyfat;

20run;

NOTE: There were 14 observations read from the data set WORK.BODYFAT. NOTE: PROCEDURE PRINT used:

real time 0.38 seconds

21proc corr data=bodyfat;

22run;

NOTE: There were 14 observations read from the data set WORK.BODYFAT. NOTE: PROCEDURE CORR used:

real time 0.33 seconds

Display 1.5

©2002 CRC Press LLC

 

 

 

The SAS System

 

 

1

 

 

Obs

age

 

pctfat

 

 

 

 

1

 

23

 

28

 

 

 

 

2

 

39

 

31

 

 

 

 

3

 

41

 

26

 

 

 

 

4

 

49

 

25

 

 

 

 

5

 

50

 

31

 

 

 

 

6

 

53

 

35

 

 

 

 

7

 

53

 

42

 

 

 

 

8

 

54

 

29

 

 

 

 

9

 

56

 

33

 

 

 

 

10

 

57

 

30

 

 

 

 

11

 

58

 

33

 

 

 

 

12

 

58

 

34

 

 

 

 

13

 

60

 

41

 

 

 

 

14

 

61

 

34

 

 

 

 

 

The SAS System

 

 

2

 

 

The CORR Procedure

 

 

 

 

2 Variables: age pctfat

 

 

 

 

 

 

Simple Statistics

 

 

Variable

N

Mean

Std Dev

 

Sum

Minimum

Maximum

age

14

50.85714

10.33930

712.00000

23.00000 61.00000

pctfat

14

32.28571

 

4.92136

452.00000

25.00000

42.00000

Pearson Correlation Coefficients, N = 14

Prob > |r| under H0: Rho=0

 

age

pctfat

age

1.00000

0.50125

 

 

0.0679

pctfat

0.50125

1.00000

 

0.0679

 

Display 1.6

©2002 CRC Press LLC

1.4The Data Step

Before data can be analysed in SAS, they need to be read into a SAS data set. Creating a SAS data set for subsequent analysis is the primary function of the data step. The data can be “raw” data or come from a previously created SAS data set. A data step is also used to manipulate, or reorganise the data. This can range from relatively simple operations (e.g., transforming variables) to more complex restructuring of the data. In many practical situations, organising and preprocessing the data takes up a large portion of the overall time and effort. The power and flexibility of SAS for such data manipulation is one of its great strengths.

We begin by describing how to create SAS data sets from raw data and store them on disk before turning to data manipulation. Each of the subsequent chapters includes the data step used to prepare the data for analysis and several of them illustrate features not described in this chapter.

1.4.1 Creating SAS Data Sets from Raw Data*

Display 1.7 shows some hypothetical data on members of a slimming club, giving the membership number, team, starting weight, and current weight. Assuming these are in the file wgtclub1.dat, the following data step could be used to create a SAS data set.

data wghtclub;

infile 'n:\handbook2\datasets\wgtclub1.dat'; input idno team $ startweight weightnow;

run;

 

1023

red

189

165

 

1049

yellow

145

124

 

1219

red

210

192

 

1246

yellow

194

177

 

1078

red

127

118

 

1221

yellow

220 .

 

1095

blue

135

127

 

1157

green

155

141

 

 

 

 

 

 

 

 

 

 

 

 

*A “raw” data file can also be referred to as a text file, or ASCII file. Such files only include the printable characters plus tabs, spaces, and end-of-line characters. The files produced by database programs, spreadsheets, and word processors are not normally “raw” data, although such programs usually have the ability to “export” their data to such a file.

©2002 CRC Press LLC

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]