Handbook_of_statistical_analysis_using_SAS
.pdfThe riskdiff option requests differences in risks (or binomial proportions) and their confidence limits.
The weight statement specifies a variable that contains weights for each observation. It is most commonly used to specify cell counts, as in this example. The default weight is 1, so the weight statement is not required when the data set consists of observations on individuals.
The results are shown in Display 3.8. First, the 2 × 2 table of data is printed, augmented with total frequency, row, and column percentages. A number of statistics calculated from the frequencies in the table are then printed, beginning with the well-known chi-square statistic used to test for the independence of the two variables forming the table. Here, the P-value associated with the chi-square statistic suggests that sex and height are not independent. The likelihood ratio chi-square is an alternative statistic for testing independence (as described in Everitt [1992]). Here, the usual chi-square statistic and the likelihood ratio statistic take very similar values. Next the continuity adjusted chi-square statistic is printed. This involves what is usually known as Yates’s correction, again described in Everitt (1992). The correction is usually suggested as a simple way of dealing with what was once perceived as the problem of unacceptably small frequencies in contingency tables. Currently, as seen later, there are much better ways of dealing with the problem and really no need to ever use Yates’s correction. The Mantel-Haenszel statistic tests the hypothesis of a linear association between the row and column variables. Both should be ordinal numbers. The next three statistics printed in Display 3.8 — namely, the Phi coefficient, Contingency coefficient, and Cramer’s V — are all essentially attempts to quantify the degree of the relationship between the two variables forming the contingency table. They are all described in detail in Everitt (1992).
Following these statistics in Display 3.8 is information on Fisher’s exact test. This is more relevant to the data in Display 3.2 and thus its discussion comes later. Next come the results of estimating a confidence interval for the difference in proportions in the contingency table. Thus, for example, the estimated difference in the proportion of female and male sandflies caught in the 3-ft light traps is 0.0921 (0.6726–0.5805). The standard error of this difference is calculated as:
0.6726-----------------(---1----–-----0.6726----------------)- |
+ |
0.5805-----------------(---1----–-----0.5805----------------)- |
223 |
|
298 |
that is, the value of 0.0425 given in Display 3.8. The confidence interval for the difference in proportions is therefore:
0.0921 ± 1.96 × 0.0425 = (0.0088, 0.1754)
©2002 CRC Press LLC
as given in Display 3.8. The proportion of female sandflies caught in the 3-ft traps is larger than the corresponding proportion for males.
|
The FREQ Procedure |
|
|||
|
Table of sex by height |
|
|||
Sex |
|
|
height |
|
|
Frequency |
|
|
|
|
|
|
|
|
|||
Percent |
|
|
|
|
|
Row Pct |
|
|
|
|
|
Col Pct |
|
|
3 |
|
Total |
|
|
35 |
|||
|
|
|
|
|
|
f |
|
|
150 |
73 |
223 |
|
|
|
28.79 |
14.01 |
42.80 |
|
|
|
67.26 |
32.74 |
|
|
|
|
46.44 |
36.87 |
|
|
|
|
|
|
|
m |
|
|
173 |
125 |
298 |
|
|
|
33.21 |
23.99 |
57.20 |
|
|
|
58.05 |
41.95 |
|
|
|
|
53.56 |
63.13 |
|
|
|
|
|
|
|
Total |
323 |
198 |
521 |
||
|
62.00 |
38.00 |
100.00 |
Statistics for Table of sex by height |
|
||||
Statistic |
DF |
Value |
Prob |
||
|
|
|
|
|
|
Chi-Square |
1 |
4.5930 |
|
0.0321 |
|
Likelihood Ratio Chi-Square |
1 |
4.6231 |
|
0.0315 |
|
Continuity Adj. Chi-Square |
1 |
4.2104 |
|
0.0402 |
|
Mantel-Haenszel Chi-Square |
1 |
4.5842 |
|
0.0323 |
|
Phi Coefficient |
|
0.0939 |
|
|
|
Contingency Coefficient |
|
0.0935 |
|
|
|
Cramer's V |
|
0.0939 |
|
||
|
Fisher's Exact Test |
|
|
|
|
|
|
|
|
|
|
|
Cell (1,1) Frequency (F) |
150 |
|
|
|
|
Left-sided Pr <= F |
|
0.9875 |
|
|
|
Right-sided Pr >= F |
|
0.0199 |
|
|
|
Table Probability (P) |
|
0.0073 |
|
|
|
Two-sided Pr <= P |
|
0.0360 |
|
|
©2002 CRC Press LLC
|
|
Column 1 Risk Estimates |
|
|
||
|
|
|
(Asymptotic) 95% |
(Exact) 95% |
||
|
Risk |
ASE |
Confidence Limits |
Confidence Limits |
||
|
|
|
|
|
|
|
Row 1 |
0.6726 |
0.0314 |
0.6111 |
0.7342 |
0.6068 |
0.7338 |
Row 2 |
0.5805 |
0.0286 |
0.5245 |
0.6366 |
0.5223 |
0.6372 |
Total |
0.6200 |
0.0213 |
0.5783 |
0.6616 |
0.5767 |
0.6618 |
Difference |
0.0921 |
0.0425 |
0.0088 |
0.1754 |
|
|
Difference is (Row 1 – Row 2)
|
|
|
Column 2 Risk Estimates |
|
|
|||
|
|
|
|
(Asymptotic) 95% |
(Exact) 95% |
|||
|
|
Risk |
ASE |
Confidence Limits |
Confidence Limits |
|||
|
|
|
|
|
|
|
|
|
Row 1 |
0 |
.3274 |
0.0314 |
0.2658 |
0 |
.3889 |
0.2662 |
0.3932 |
Row 2 |
0 |
.4195 |
0.0286 |
0.3634 |
0 |
.4755 |
0.3628 |
0.4777 |
Total |
0 |
.3800 |
0.0213 |
0.3384 |
0 |
.4217 |
0.3382 |
0.4233 |
Difference |
-0 |
.0921 |
0.0425 |
-0.1754 |
-0 |
.0088 |
|
|
Difference is (Row 1 - Row 2)
Sample Size = 521
Display 3.8
3.3.3 Acacia Ants
The acacia ants data are also in the form of a contingency table and are read in as four observations representing cell counts.
data ants;
input species $ invaded $ n; cards;
A no 2
A yes 13
©2002 CRC Press LLC
B no 10 B no 3
;
proc freq data=ants;
tables species*invaded / chisq expected; weight n;
run;
In this example, the expected option in the tables statement is used to print expected values under the independence hypothesis for each cell.
The results are shown in Display 3.9. Here, because of the small frequencies in the table, Fisher’s exact test might be the preferred option, although all the tests of independence have very small associated P-values and thus, very clearly, species and invasion are not independent. A higher proportion of ants invaded species A than species B.
|
The FREQ Procedure |
|
||||
Table of species by invaded |
|
|||||
species |
invaded |
|
|
|
|
|
Frequency |
|
|
|
|
|
|
|
|
|
|
|
|
|
Expected |
|
|
|
|
|
|
Percent |
|
|
|
|
|
|
Row Pct |
|
|
|
|
|
|
Col Pct |
no |
|
yes |
|
|
Total |
|
|
|
||||
|
|
|
|
|
|
|
A |
|
2 |
|
13 |
|
15 |
|
8.0357 |
6.9643 |
|
|
||
|
7 |
.14 |
46 |
.43 |
|
53.57 |
|
13 |
.33 |
86 |
.67 |
|
|
|
13 |
.33 |
100 |
.00 |
|
|
|
|
|
|
|
|
|
B |
|
13 |
|
0 |
|
13 |
|
6.9643 |
6.0357 |
|
|
||
|
46 |
.43 |
0 |
.00 |
|
46.43 |
|
100 |
.00 |
0 |
.00 |
|
|
|
86 |
.67 |
0 |
.00 |
|
|
|
|
|
|
|
|
|
Total |
|
15 |
|
13 |
|
28 |
|
53 |
.57 |
46 |
.43 |
|
100.00 |
©2002 CRC Press LLC
Statistics for Table of species by invaded |
|
|||||
Statistic |
DF |
|
Value |
Prob |
||
|
|
|
|
|
|
|
Chi-Square |
1 |
21 |
.0311 |
|
<.0001 |
|
Likelihood Ratio Chi-Square |
1 |
26 |
.8930 |
|
<.0001 |
|
Continuity Adj. Chi-Square |
1 |
17 |
.6910 |
|
<.0001 |
|
Mantel-Haenszel Chi-Square |
1 |
20 |
.2800 |
|
<.0001 |
|
Phi Coefficient |
|
-0.8667 |
|
|
||
Contingency Coefficient |
|
0 |
.6549 |
|
|
|
Cramer's V |
|
-0.8667 |
|
|
||
|
Fisher's Exact Test |
|
|
|
||
|
|
|
|
|
|
|
|
Cell (1,1) Frequency (F) |
|
2 |
|
|
|
|
Left-sided Pr <= F |
|
2.804E-06 |
|
||
|
Right-sided Pr >= F |
|
|
1.0000 |
|
|
|
Table Probability (P) |
|
|
1.0000 |
|
|
|
Two-sided Pr <= P |
|
2.831E-06 |
|
Sample Size = 28
Display 3.9
3.3.4 Piston Rings
Moving on to the piston rings data, they are read in and analysed as follows:
data pistons;
input machine site $ n; cards;
1 North 17
1 Centre 17
©2002 CRC Press LLC
1 South 12
2 North 11
2 Centre 9
2 South 13
3 North 11
3 Centre 8
3 South 19
4 North 14
4 Centre 7
4 South 28
;
proc freq data=pistons order=data;
tables machine*site / chisq deviation cellchi2 norow nocol nopercent;
weight n; run;
The order=data option in the proc statement specifies that the rows and columns of the tables follow the order in which they occur in the data. The default is number order for numeric variables and alphabetical
order for character variables.
The deviation option in the tables statement requests the printing of residuals in the cells, and the cellchi2 option requests that each cell’s
contribution to the overall chi-square be printed. To make it easier to
view the results the, row, column, and overall percentages are suppressed with the norow, nocol, and nopercent options, respectively.
Here, the chi-square test for independence given in Display 3.10 shows only relatively weak evidence of a departure from independence. (The relevant P-value is 0.069). However, the simple residuals (the differences between an observed frequency and that expected under independence) suggest that failures are fewer than might be expected in the South leg of Machine 1 and more than expected in the South leg of Machine 4. (Other types of residuals may be more useful — see Exercise 3.2.)
©2002 CRC Press LLC
The FREQ Procedure |
|
|
|
|
|
|
|
||
|
|
Table of machine by site |
|
|
|||||
|
|
Machine |
|
|
|
Site |
|
|
|
|
|
Frequency |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Deviation |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
Cell Chi-Square |
|
North |
Centre |
|
South |
Total |
|
|
|
|
|
|
|
|
|
|
|
1 |
|
7 |
|
17 |
|
12 |
46 |
||
|
|
|
2 |
.3133 |
5 |
.6386 |
|
-7.952 |
|
|
|
|
0 |
.3644 |
2 |
.7983 |
|
3.1692 |
|
|
|
|
|
|
|
|
|
|
|
2 |
|
11 |
|
9 |
|
13 |
33 |
||
|
|
|
0 |
.4639 |
0 |
.8494 |
|
-1.313 |
|
|
|
|
0 |
.0204 |
0 |
.0885 |
|
0.1205 |
|
|
|
|
|
|
|
|
|
|
|
3 |
|
11 |
|
8 |
|
19 |
38 |
||
|
|
|
-1.133 |
-1.386 |
|
2.5181 |
|
||
|
|
|
0.1057 |
0.2045 |
|
0.3847 |
|
||
|
|
|
|
|
|
|
|
|
|
4 |
|
4 |
|
7 |
|
28 |
49 |
||
|
|
|
-1.645 |
-5.102 |
|
6.747 |
|
||
|
|
|
0.1729 |
2.1512 |
|
2.1419 |
|
||
|
|
|
|
|
|
|
|
|
|
|
|
Total |
|
53 |
|
41 |
|
72 |
166 |
Statistics for Table of machine by site |
|
|||
Statistic |
DF |
|
Value |
Prob |
|
|
|
|
|
Chi-Square |
6 |
11 |
.7223 |
0.0685 |
Likelihood Ratio Chi-Square |
6 |
12 |
.0587 |
0.0607 |
Mantel-Haenszel Chi-Square |
1 |
5 |
.4757 |
0.0193 |
Phi Coefficient |
|
0 |
.2657 |
|
Contingency Coefficient |
|
0 |
.2568 |
|
Cramer's V |
|
|
0.1879 |
|
Sample Size = 166
Display 3.10
3.3.5 Oral Contraceptives
The oral contraceptives data involve matched observations. Consequently, they cannot be analysed with the usual chi-square statistic. Instead, they require application of McNemar’s test, as described in Everitt (1992). The data can be read in and analysed with the following SAS commands:
©2002 CRC Press LLC
data the_pill;
input caseuse $ contruse $ n; cards;
Y Y 10
Y N 57
N Y 13
N N 95
;
proc freq data=the_pill order=data; tables caseuse*contruse / agree; weight n;
run;
The agree option on the tables statement requests measures of agreement, including the one of most interest here, the McNemar test. The results appear in Display 3.11. The test of no association between using oral contraceptives and suffering from blood clots is rejected. The proportion of matched pairs in which the case has used oral contraceptives and the control has not is considerably higher than pairs where the reverse is the case.
|
|
|
The FREQ Procedure |
|
||||
|
|
Table of caseuse by contruse |
|
|||||
|
|
caseuse |
|
contruse |
|
|||
|
|
Frequency |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Percent |
|
|
|
|
|
|
|
|
Row Pct |
|
|
|
|
|
|
|
|
Col Pct |
|
Y |
|
N |
|
Total |
|
|
|
|
|
|
|
|
|
|
|
Y |
|
|
10 |
57 |
|
67 |
|
|
|
|
5 |
.71 |
32.57 |
|
38.29 |
|
|
|
|
14 |
.93 |
85.07 |
|
|
|
|
|
|
43 |
.48 |
37.50 |
|
|
|
|
|
|
|
|
|
|
|
|
|
N |
|
|
13 |
95 |
|
108 |
|
|
|
|
7 |
.43 |
54.29 |
|
61.71 |
|
|
|
|
12 |
.04 |
87.96 |
|
|
|
|
|
|
56 |
.52 |
62.50 |
|
|
|
|
|
|
|
|
|
|
|
|
|
Total |
|
|
23 |
152 |
|
175 |
|
|
|
13 |
.14 |
86.86 |
|
100.00 |
|
|
|
|
|
|
|
|
|
|
©2002 CRC Press LLC
Statistics for Table of caseuse by contruse
McNemar's Test
Statistic (S) |
27.6571 |
DF |
1 |
Pr > S |
<.0001 |
Simple Kappa Coefficient
Kappa |
0 |
.0330 |
|
ASE |
|
0 |
.0612 |
95% |
Lower Conf Limit |
-0 |
.0870 |
95% |
Upper Conf Limit |
0 |
.1530 |
Sample Size = 175
Display 3.11
3.3.6 Oral Cancers
The data on the regional distribution of oral cancers is read in using the following data step:
data lesions;
length region $8.;
input site $ 1-16 n1 n2 n3; region='Keral';
n=n1;
output;
region='Gujarat';
n=n2;
output;
region='Anhara';
n=n3;
output; drop n1-n3; cards;
Buccal Mucosa 8 1 8
Labial Mucosa 0 1 0
©2002 CRC Press LLC
Commissure |
0 |
1 |
0 |
Gingiva |
0 |
1 |
0 |
Hard palate |
0 |
1 |
0 |
Soft palate |
0 |
1 |
0 |
Tongue |
0 |
1 |
0 |
Floor of mouth |
1 |
0 |
1 |
Alveolar ridge |
1 |
0 |
1 |
; |
|
|
|
This data step reads in the values for three cell counts from a single line
of instream data and then creates three separate observations in the output data set. This is achieved using three output statements in the data step. The effect of each output statement is to create an observation in the data set with the data values that are current at that point. First, the input
statement reads a line of data that contains the three cell counts. It uses column input to read the first 16 columns into the site variable, and then
the list input to read the three cell counts into variables n1 to n3. When the first output statement is executed, the region variable has been assigned the value ‘Keral’ and the variable n has been set equal to the first of the three cell counts read in. At the second output statement, the value of region is ‘Gujarat’, and n equals n2, the second cell count, and so on for the third output statement. When restructuring data like this, it is wise to
check the results, either by viewing the resultant data set interactively or using proc print. The SAS log also gives the number of variables and
observations in any data set created and thus can be used to provide a check.
The drop statement excludes the variables mentioned from the lesions data set.
proc freq data=lesions order=data; tables site*region /exact; weight n;
run;
For 2 × 2 tables, Fisher’s exact test is calculated and printed by default.
For larger tables, exact tests must be explicitly requested with the exact option on the tables statement. Here, because of the very sparse nature
of the data, it is likely that the exact approach will differ from the usual chi-square procedure. The results given in Display 3.12 confirm this. The chi-square test has an associated P-value of 0.14, indicating that the hypothesis of independence site and region is acceptable. The exact test has an associated P-value of 0.01, indicating that the site of lesion and
©2002 CRC Press LLC