Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Национальный университет биоресурсов и природопользования

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

Handbook_of_statistical_analysis_using_SAS

.pdf

Скачиваний:

Добавлен:

01.05.2015

Размер:

4.92 Mб

Скачать

☆

<<< < Предыдущая 1 2 3 4 5 6 7 8 9 10 1112 / 3612 13 14 15 16 17 18 19 20 21 22 23 24 25 > Следующая >>>

values of drug and biofeed. Because six lines of data will be read, one line per iteration of the data step _n_ will increment from 1 to 6, corresponding to the line of data read with the input statement.

The key elements in splitting the one line of data into separate observations are the array, the do loop, and the output statement. The array statement deﬁnes an array by specifying the name of the array (nall here), the number of variables to be included in braces, and the list of variables to be included (n1 to n12 in this case).

In SAS, an array is a shorthand way of referring to a group of variables. In effect, it provides aliases for them so that each variable can be referred to using the name of the array and its position within the array in braces. For example, in this data step, n12 could be referred to as nall{12} or, when the variable i has the value 12 as nall{i}. However, the array only lasts for the duration of the data step in which it is deﬁned.

The main purpose of an iterative do loop, like the one used here, is to repeat the statements between the do and the end a ﬁxed number of times, with an index variable changing at each repetition. When used to process each of the variables in an array, the do loop should start with the index variable equal to 1 and end when it equals the number of variables in the array.

Within the do loop, in this example, the index variable i is ﬁrst used to set the appropriate values for diet. Then a variable for the blood pressure reading (bp) is assigned one of the 12 values input. A character variable (cell) is formed by concatenating the values of the drug, biofeed, and diet variables. The double bar operator (||) concatenates character values.

The output statement writes an observation to the output data set with the current value of all variables. An output statement is not normally necessary because, without it an observation is automatically written out at the end of the data step. Putting an output statement within the do loop results in 12 observations being written to the data set.

Finally, the drop statement excludes the index variable i and n1 to n12 from the output data set because they are no longer needed.

As with any relatively complex data manipulation, it is wise to check that the results are as they should be, for example, by using proc print.

To begin the analysis, it is helpful to look at some summary statistics for each of the cells in the design.

proc tabulate data=hyper; class drug diet biofeed; var bp;

table drug*diet*biofeed, bp*(mean std n);

run;

The tabulate procedure is useful for displaying descriptive statistics in a concise tabular form. The variables used in the table must ﬁrst be declared in either a class statement or a var statement. Class variables are those used to divide the observations into groups. Those declared in the var statement (analysis variables) are those for which descriptive statistics are to be calculated. The ﬁrst part of the table statement up to the comma speciﬁes how the rows of the table are to be formed, and the remaining part speciﬁes the columns. In this example, the rows comprise a hierarchical grouping of biofeed within diet within drug. The columns comprise the blood pressure mean and standard deviation and cell count for each of the groups. The resulting table is shown in Display 5.2. The differences between the standard deviations seen in this display may have implications for the analysis of variance of these data because one of the assumptions made is that observations in each cell come from populations with the same variance.


				bp

			Mean	Std		N

drug	diet	biofeed

X	N	A	188.00	10	.86	6.00
		P	168.00	8	.60	6.00
	Y	A	173.00	9	.80	6.00
		P	169.00	14	.82	6.00
Y	N	A	200.00	10	.08	6.00
		P	204.00	12	.68	6.00
	Y	A	187.00	14	.01	6.00
		P	172.00	10	.94	6.00
Z	N	A	209.00	14	.35	6.00
		P	189.00	12	.62	6.00
	Y	A	182.00	17	.11	6.00
		P	173.00	11	.66	6.00

Display 5.2

There are various ways in which the homogeneity of variance assumption can be tested. Here, the hovtest option of the anova procedure is used to apply Levene’s test (Levene [1960]). The cell variable calculated above, which has 12 levels corresponding to the 12 cells of the design, is used:

proc anova data=hyper; class cell;

model bp=cell; means cell / hovtest;

run;

The results are shown in Display 5.3. Concentrating on the results of Levene’s test given in this display, we see that there is no formal evidence of heterogeneity of variance, despite the rather different observed standard deviations noted in Display 5.2.

				The ANOVA Procedure
				Class Level Information
Class	Levels Values
cell	12 XAN XAY XPN XPY YAN YAY YPN YPY ZAN ZAY ZPN ZPY
			Number of observations				72
				The ANOVA Procedure
Dependent Variable: bp
				Sum of
Source			DF	Squares		Mean Square F Value		Pr > F
Model			11	13194.00000 1199.45455 7.66				<.0001
Error			60	9400.00000		156.66667
Corrected Total			71	22594.00000
		R-Square		Coeff Var		Root MSE	bp Mean
		0.583960		6.784095 12.51666 184.5000
Source		DF	Anova SS		Mean Square		F Value	Pr > F
	cell	11	13194.00000		1199.45455		7.66	<.0001

The ANOVA Procedure

Levene's Test for Homogeneity of bp Variance

ANOVA of Squared Deviations from Group Means

		Sum of	Mean
Source	DF	Squares	Square	F Value	Pr > F
cell	11	180715	16428.6	1.01	0.4452
Error	60	971799	16196.6

	The ANOVA Procedure
Level of		--------------bp-------------
cell	N	Mean		Std Dev
XAN	6	188.000000 10.8627805
XAY	6	173.000000	9	.7979590
XPN	6	168.000000	8	.6023253
XPY	6	169.000000 14.8189068
YAN	6	200.000000 10.0796825
YAY	6	187.000000 14.0142784
YPN	6	204.000000 12.6806940
YPY	6	172.000000 10.9361785
ZAN	6	209.000000 14.3527001
ZAY	6	182.000000 17.1113997
ZPN	6	189.000000 12.6174482
ZPY	6	173.000000	11.6619038

Display 5.3

To apply the model speciﬁed in Eq. (5.1) to the hypertension data, proc anova can now be used as follows:

proc anova data=hyper; class diet drug biofeed;

model bp=diet|drug|biofeed; means diet*drug*biofeed;

ods output means=outmeans; run;

The anova procedure is speciﬁcally for balanced designs, that is, those with the same number of observations in each cell. (Unbalanced designs should be analysed using proc glm, as illustrated in a subsequent chapter.) The class statement speciﬁes the classiﬁcation variables, or factors. These

may be numeric or character variables. The model statement speciﬁes the dependent variable on the left-hand side of the equation and the effects (i.e., factors and their interactions) on the right-hand side of the equation. Main effects are speciﬁed by including the variable name and interactions by joining the variable names with an asterisk. Joining variable names with a bar is a shorthand way of specifying an interaction and all the lower-order interactions and main effects implied by it. Thus, the model statement above is equivalent to:

model bp=diet drug diet*drug biofeed diet*biofeed drug*biofeed diet*drug*biofeed;

The order of the effects is determined by the expansion of the bar operator from left to right.

The means statement generates a table of cell means and the ods output statement speciﬁes that this is to be saved in a SAS data set called outmeans.

The results are shown in Display 5.4. Here, it is the analysis of variance table that is of most interest. The diet, biofeed, and drug main effects are all signiﬁcant beyond the 5% level. None of the ﬁrst-order interactions are signiﬁcant, but the three-way, second-order interaction of diet, drug, and biofeedback is signiﬁcant. Just what does such an effect imply, and what are its implications for interpreting the analysis of variance results?

First, a signiﬁcant second-order interaction implies that the ﬁrst-order interaction between two of the variables differs in form or magnitude in the different levels of the remaining variable. Second, the presence of a signiﬁcant second-order interaction means that there is little point in drawing conclusions about either the non-signiﬁcant ﬁrst-order interactions or the signiﬁcant main effects. The effect of drug, for example, is not consistent for all combinations of diet and biofeedback. It would therefore be potentially misleading to conclude, on the basis of the signiﬁcant main effect, anything about the speciﬁc effects of these three drugs on blood pressure.

The ANOVA Procedure

Class Level Information

Class	Levels	Values
diet	2	N Y
drug	3	X Y Z
biofeed	2	A P

		Number of observations 72
			The ANOVA Procedure
Dependent Variable: bp
			Sum of
Source	DF		Squares		Mean Square			F Value		Pr > F
Model	11	13194.00000			1199.45455				7.66	<.0001
Error	60		9400.00000		156.66667
Corrected Total	71	22594.00000
R-Square			Coeff Var	Root MSE			bp Mean
0.583960			6.784095	12.51666			184.5000
Source		DF	Anova SS Mean Square F Value Pr > F
diet		1	5202.000000		5202	.000000		33	.20 <.0001
drug		2	3675.000000		1837	.500000		11	.73 <.0001
diet*drug		2	903.000000		451	.500000		2	.88 0.0638
biofeed		1	2048.000000		2048	.000000		13	.07 0.0006
diet*biofeed		1	32.000000		32	.000000		0	.20 0.6529
drug*biofeed		2	259.000000		129	.500000		0	.83 0.4425
dietdrugbiofeed		2	1075.000000		537	.500000		3	.43 0.0388
			The ANOVA Procedure
Level of	Level	of	Level of		--------------bp-------------
diet	drug		biofeed	N		Mean			Std Dev
N	X		A	6	188.000000 10.8627805
N	X		P	6	168.000000			8.6023253
N	Y		A	6	200.000000 10.0796825
N	Y		P	6	204.000000 12.6806940
N	Z		A	6	209.000000 14.3527001
N	Z		P	6	189.000000 12.6174482
Y	X		A	6	173.000000			9.7979590
Y	X		P	6	169.000000 14.8189068
Y	Y		A	6	187.000000 14.0142784
Y	Y		P	6	172.000000 10.9361785
Y	Z		A	6	182.000000 17.1113997
Y	Z		P	6	173.000000			11.6619038

Display 5.4

Understanding the meaning of the signiﬁcant second-order interaction is facilitated by plotting some simple graphs. Here, the interaction plot of diet and biofeedback separately for each drug will help.

The cell means in the outmeans data set are used to produce interaction diagrams as follows:

proc print data=outmeans; proc sort data=outmeans;

by drug;

symbol1 i=join v=none l=2; symbol2 i=join v=none l=1;

proc gplot data=outmeans; plot mean_bp*biofeed=diet ; by drug;

run;

First the outmeans data set is printed. The result is shown in Display 5.5. As well as checking the results, this also shows the name of the variable containing the means.

To produce separate plots for each drug, we use the by statement within proc gplot, but the data set must ﬁrst be sorted by drug. Plot statements of the form plot y*x=z were introduced in Chapter 1 along with the symbol statement to change the plotting symbols used. We know that diet has two values, so we use two symbol statements to control the way in which the means for each value of diet are plotted. The i (interpolation) option speciﬁes that the means are to be joined by lines. The v (value) option suppresses the plotting symbols because these are not needed and the l (linetype) option speciﬁes different types of line for each diet. The resulting plots are shown in Displays 5.6 through 5.8. For drug X, the diet × biofeedback interaction plot indicates that diet has a negligible effect when biofeedback is given, but substantially reduces blood pressure when biofeedback is absent. For drug Y, the situation is essentially the reverse of that for drug X. For drug Z, the blood pressure difference when the diet is given and when it is not is approximately equal for both levels of biofeedback.

Obs	Effect	diet	drug	biofeed	N	Mean_bp		SD_bp
1	diet_drug_biofeed	N	X	A	6	188.000000	10	.8627805
2	diet_drug_biofeed	N	X	P	6	168.000000	8	.6023253
3	diet_drug_biofeed	N	Y	A	6	200.000000	10	.0796825
4	diet_drug_biofeed	N	Y	P	6	204.000000	12	.6806940
5	diet_drug_biofeed	N	Z	A	6	209.000000	14	.3527001
6	diet_drug_biofeed	N	Z	P	6	189.000000	12	.6174482
7	diet_drug_biofeed	Y	X	A	6	173.000000	9	.7979590
8	diet_drug_biofeed	Y	X	P	6	169.000000	14	.8189068
9	diet_drug_biofeed	Y	Y	A	6	187.000000	14	.0142784
10	diet_drug_biofeed	Y	Y	P	6	172.000000	10	.9361785
11	diet_drug_biofeed	Y	Z	A	6	182.000000	17	.1113997
12	diet_drug_biofeed	Y	Z	P	6	173.000000	11	.6619038

Display 5.5

Display 5.6

Display 5.7

Display 5.8

In some cases, a signiﬁcant high-order interaction may make it difﬁcult to interpret the results from a factorial analysis of variance. In such cases, a transformation of the data may help. For example, we can analyze the log-transformed observations as follows:

data hyper; set hyper;

logbp=log(bp);

run;

proc anova data=hyper; class diet drug biofeed;

model logbp=diet|drug|biofeed; run;

The data step computes the natural log of bp and stores it in a new variable logbp. The anova results for the transformed variable are given in Display 5.9.

		The ANOVA Procedure
		Class Level Information
		Class	Levels		Values
		diet		2	N Y
		drug		3	X Y Z
		biofeed		2	A P
		Number of observations 72
		The ANOVA Procedure
Dependent Variable: logbp
			Sum of
Source	DF	Squares		Mean Square		F Value	Pr > F
Model	11	0.37953489			0.03450317	7.46	<.0001
Error	60	0.27754605			0.00462577
Corrected Total	71	0.65708094

<<< < Предыдущая 1 2 3 4 5 6 7 8 9 10 1112 / 3612 13 14 15 16 17 18 19 20 21 22 23 24 25 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
01.05.201522.82 Mб115gistologia.pdf
#
22.08.20193.23 Mб13Gnuch.-Kovt.-Skoroch puc..doc
#
01.05.2015325.63 Кб5GOST_20850-84_ДКК.doc.столярка.doc
#
01.05.202552.58 Кб0GTM_4_modul_pererobka.docx
#
01.03.2025142.34 Кб0G__2.doc
#
01.05.20154.92 Mб18Handbook_of_statistical_analysis_using_SAS.pdf
#
10.08.201983.97 Кб16HARDWARE.doc
#
01.05.201533.9 Кб6History.docx
#
10.03.201612.98 Mб20hmelnickii_g_o_homenko_v_s_veterinarna_farmakologiya.pdf
#
10.03.20164.78 Mб10Hroshi_ta_kredyt_vyd4.pdf
#
01.05.202577.31 Кб0Inform_Tech.doc