Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Национальный университет биоресурсов и природопользования

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

Handbook_of_statistical_analysis_using_SAS

.pdf

Скачиваний:

Добавлен:

01.05.2015

Размер:

4.92 Mб

Скачать

☆

<<< < Предыдущая 1 2 3 4 56 / 366 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 > Следующая >>>

	Tests for Normality
Test			--Statistic---		-----P-value------
Shapiro-Wilk	W			0.887867	Pr < W	<0.0001
Kolmogorov-Smirnov	D			0.196662	Pr > D	<0.0100
Cramer-von Mises	W-Sq			0.394005	Pr > W-Sq	<0.0050
Anderson-Darling	A-Sq			2.399601	Pr > A-Sq	<0.0050
	Quantiles (Definition 5)
	Quantile			Estimate
	100% Max				138
	99%				138
	95%				122
	90%				101
	75%	Q3			75
	50%	Median			39
	25%	Q1			14
	10%				8
	5%				6
	1%				5
	0% Min				5
	Extreme Observations
----Lowest---- ----Highest---
	Value		Obs	Value	Obs
	5		39	107	38
	5		3	122	19
	6		41	122	59
	6		37	133	35
	8		46	138	26

Fitted Distribution for Hardness

Parameters for Normal Distribution

Parameter	Symbol	Estimate
Mean	Mu	47.18033
Std Dev	Sigma	38.09397

Goodness-of-Fit Tests for Normal Distribution
Test	---Statistic----		-----P-value-----
Kolmogorov-Smirnov	D	0.19666241	Pr > D	<0.010
Cramer-von Mises	W-Sq	0.39400529	Pr > W-Sq	<0.005
Anderson-Darling	A-Sq	2.39960138	Pr > A-Sq	<0.005

Quantiles for Normal Distribution

		--------Quantile-------
Percent		Observed		Estimated
1	.0	5	.00000	-41.43949
5	.0	6	.00000	-15.47867
10	.0	8	.00000	-1.63905
25	.0	14	.00000	21	.48634
50	.0	39	.00000	47	.18033
75	.0	75	.00000	72	.87432
90	.0	101	.00000	95	.99971
95	.0	122	.00000	109	.83933
99	.0	138	.00000	135	.80015

Display 2.3

The quantiles provide information about the tails of the distribution as well as including the ﬁve number summaries for each variable. These consist of the minimum, lower quartile, median, upper quartile, and maximum values of the variables. The box plots that can be constructed from these summaries are often very useful in comparing distributions and identifying outliers. Examples are given in subsequent chapters.

The listing of extreme values can be useful for identifying outliers, especially when used with an id statement. The following section, entitled “Fitted Distribution for Hardness,” gives details of the distribution ﬁtted to the histogram. Because a normal distribution is ﬁtted in this instance, it largely duplicates the output generated by the normal option on the proc statement.

The numerical information in Display 2.2 and the plots in Displays 2.4 and 2.5 all indicate that mortality is symmetrically, approximately normally, distributed. The formal tests of normality all result in non-signiﬁcant values of the test statistic. The results in Display 2.3 and the plots in Displays 2.6 and 2.7, however, strongly suggest that calcium concentration (hardness) has a skew distribution with each of the tests for normality having associated P-values that are very small.

Display 2.4

Display 2.5

Display 2.6

Display 2.7

The ﬁrst step in examining the relationship between mortality and water hardness is to look at the scatterplot of the two variables. This can be found using proc gplot with the following instructions:

proc gplot;

plot mortal*hardness; run;

The resulting graph is shown in Display 2.8. The plot shows a clear negative association between the two variables, with high levels of calcium concentration tending to occur with low mortality values and vice versa. The correlation between the two variables is easily found using proc corr, with the following instructions:

proc corr data=water pearson spearman; var mortal hardness;

run;

Display 2.8

The pearson and spearman options in the proc corr statement request that both types of correlation coefﬁcient be calculated. The default, if neither option is used, is the Pearson coefﬁcient.

The results from these instructions are shown in Display 2.9. The correlation is estimated to be –0.655 using the Pearson coefﬁcient and –0.632 using Spearman’s coefﬁcient. In both cases, the test that the population correlation is zero has an associated P-value of 0.0001. There is clearly strong evidence for a non-zero correlation between the two variables.

The CORR Procedure

2 Variables: Mortal Hardness

			Simple Statistics
Variable	N	Mean	Std Dev		Median	Minimum	Maximum
Mortal	61	1524	187	.66875	1555	1096	1987
Hardness	61	47.18033	38	.09397	39.00000	5.00000	138.00000

Pearson Correlation Coefficients, N = 61

Prob > |r| under H0: Rho=0

		Mortal	Hardness
Mortal	1	.00000	-0	.65485
				<.0001
Hardness	-0	.65485	1	.00000
		<.0001

Spearman Correlation Coefficients, N = 61

Prob > |r| under H0: Rho=0

		Mortal	Hardness
Mortal	1	.00000	-0	.63166
				<.0001
Hardness	-0	.63166	1	.00000
		<.0001

Display 2.9

One of the questions of interest about these data is whether or not there is a geographical factor in the relationship between mortality and water hardness, in particular whether this relationship differs between the

towns in the North and those in the South. To examine this question, a useful ﬁrst step is to replot the scatter diagram in Display 2.8 with northern and southern towns identiﬁed with different symbols. The necessary instructions are

symbol1 value=dot; symbol2 value=circle; proc gplot;

plot mortal*hardness = location; run;

The plot statement of the general form plot y * x = z will result in a scatter plot of y by x with a different symbol for each value of z. In this case, location has only two values and the ﬁrst two plotting symbols used by SAS are 'x'and '+'. The symbol statements change the plotting symbols to give more impact to the scattergram.

The resulting plot is shown in Display 2.10. There appears to be no obvious difference in the form of the relationship between mortality and hardness for the two groups of towns.

Display 2.10

Separate correlations for northern and southern towns can be produced using proc corr with a by statement as follows:

proc sort;

by location;

proc corr data=water pearson spearman; var mortal hardness;

by location; run;

The by statement has the effect of producing separate analyses for each subgroup of the data deﬁned by the speciﬁed variable, location in this case. However, the data set must ﬁrst be sorted by that variable.

The results from this series of instructions are shown in Display 2.11. The main items of interest in this display are the correlation coefﬁcients and the results of the tests that the population correlations are zero. The Pearson correlation for towns in the North is –0.369, and for those in the South it is –0.602. Both values are signiﬁcant beyond the 5% level. The Pearson and Spearman coefﬁcients take very similar values for this example.

---------------------------------------			location=north -------------------------------------
		The CORR Procedure
		2 Variables:		Mortal	Hardness
			Simple Statistics
Variable	N	Mean	Std Dev		Median	Minimum	Maximum
Mortal	35	1634	136	.93691	1637	1378	1987
Hardness	35	30.40000	26	.13449 17.00000		6.00000	94.00000

Pearson Correlation Coefficients, N = 35

Prob > |r| under H0: Rho=0

		Mortal	Hardness
Mortal	1	.00000	-0	.36860
				0.0293
Hardness	-0	.36860	1	.00000
		0.0293

Spearman Correlation Coefficients, N = 35

Prob > |r| under H0: Rho=0

		Mortal	Hardness
Mortal	1	.00000	-0	.40421
				0.0160
Hardness	-0	.40421	1	.00000
		0.0160

--------------------------------------			location=south --------------------------------------
		The CORR Procedure
		2 Variables: Mortal			Hardness
		Simple Statistics
Variable	N	Mean	Std Dev	Median		Minimum	Maximum
Mortal	26	1377 140.26918			1364	1096	1627
Hardness	26 69.76923 40.36068 75.50000					5.00000 138.00000
		Pearson Correlation Coefficients, N = 26
		Prob > \|r\| under H0: Rho=0
			Mortal		Hardness
		Mortal	1.00000		-0.60215
						0.0011
		Hardness	-0.60215		1	.00000
			0.0011
		Spearman Correlation Coefficients, N = 26
		Prob > \|r\| under H0: Rho=0
			Mortal		Hardness
		Mortal	1.00000		-0.59572
						0.0013
		Hardness	-0.59572		1	.00000
			0.0013

Display 2.11

Examination of scatterplots often centres on assessing density patterns such as clusters, gaps, or outliers. However, humans are not particularly good at visually examining point density and some type of density estimate added to the scatterplot is frequently very helpful. Here, plotting a bivariate density estimate for mortality and hardness is useful for gaining more insight into the structure of the data. (Details on how to calculate bivariate densities are given in Silverman [1986].) The following code produces and plots the bivariate density estimate of the two variables:

proc kde data=water out=bivest; var mortal hardness;

proc g3d data=bivest;

plot hardness*mortal=density; run;

The KDE procedure (proc kde) produces estimates of a univariate or bivariate probability density function using kernel density estimation (see Silverman [1986]). If a single variable is speciﬁed in the var statement, a univariate density is estimated and a bivariate density if two are speciﬁed. The out=bivest option directs the density estimates to a SAS data set. These can then be plotted with the three-dimensional plotting procedure proc g3d. The resulting plot is shown in Display 2.12. The two clear modes in the diagram correspond, at least approximately, to northern and southern towns.

Display 2.12

<<< < Предыдущая 1 2 3 4 56 / 366 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
14.11.201956.62 Кб3Gal_-Vol_kn.docx
#
01.05.201545.25 Mб31Get_Rid_of_your_Accent_-_Advanced_Level.pdf
#
01.05.201522.82 Mб93gistologia.pdf
#
22.08.20193.23 Mб9Gnuch.-Kovt.-Skoroch puc..doc
#
01.05.2015325.63 Кб5GOST_20850-84_ДКК.doc.столярка.doc
#
01.05.20154.92 Mб17Handbook_of_statistical_analysis_using_SAS.pdf
#
10.08.201983.97 Кб13HARDWARE.doc
#
01.05.201533.9 Кб6History.docx
#
10.03.201612.98 Mб20hmelnickii_g_o_homenko_v_s_veterinarna_farmakologiya.pdf
#
10.03.20164.78 Mб10Hroshi_ta_kredyt_vyd4.pdf
#
01.05.201553.25 Кб68inform_testi (1).doc