Добавил:

Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Национальный университет биоресурсов и природопользования

Предмет:

[НЕСОРТИРОВАННОЕ]

Файл:

Handbook_of_statistical_analysis_using_SAS

.pdf

Скачиваний:

Добавлен:

01.05.2015

Размер:

4.92 Mб

Скачать

☆

<<< < Предыдущая 1 2 3 45 / 365 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 > Следующая >>>

between them with the message “Press Forward to see next graph” in the status line. The Page Down and Page Up keys are used for Forward and Backward, respectively.

1.10Some Tips for Preventing and Correcting Errors

When writing programs:

1.One statement per line, where possible.

2.End each step with a run statement.

3.Indent each statement within a step (i.e., each statement between the data or proc statement and the run statement) by a couple of spaces. This is automated in the enhanced editor.

4.Give the full path name for raw data ﬁles on the inﬁlestatement.

5.Begin any programs that produce graphics with goptions reset=all; and then set the required options.

Before submitting a program:

1. Check that each statement ends with a semicolon.

2.Check that all opening and closing quotes match. Use the enhanced editor colour coding to double-check.

3.Check any statement that does not begin with a keyword (blue, or navy blue) or a variable name (black).

4.Large blocks of purple may indicate a missing quotation mark.

5.Large areas of green may indicate a missing */ from a comment.

“Collapse” the program to check its overall structure. Hold down the Ctrl and Alt keys and press the numeric keypad minus key. Only the data, proc statements, and global statements should be visible. To expand the program, press the numeric keypad plus key while holding down Ctrl and Alt.

After running a program:

1.Examine the SAS log for warning and error messages.

2.Check for the message: "SAS went to a new line when INPUT statement reached past the end of a line" when using list input.

3.Verify that the number of observations and variables read in is correct.

4.Print out small data sets to ensure that they have been read correctly.

If there is an error message for a statement that appears to be correct, check whether the semicolon was omitted from the previous statement.

The message that a variable is “uninitialized” or “not found” usually means it has been misspelled.

To correct a missing quote, submit: '; run;or "; run; and then correct the program and resubmit it.

Chapter 2

Data Description and

Simple Inference:

Mortality and Water

Hardness in the U.K.

2.1 Description of Data

The data to be considered in this chapter were collected in an investigation of environmental causes of diseases, and involve the annual mortality rates per 100,000 for males, averaged over the years from 1958 to 1964, and the calcium concentration (in parts per million) in the drinking water supply for 61 large towns in England and Wales. (The higher the calcium concentration, the harder the water.) The data appear in Table 7 of SDS and have been rearranged for use here as shown in Display 2.1. (Towns at least as far north as Derby are identiﬁed in the table by an asterisk.)

The main questions of interest about these data are as follows:

How are mortality and water hardness related?

Is there a geographical factor in the relationship?

2.2 Methods of Analysis

Initial examination of the data involves graphical techniques such as histograms and normal probability plots to assess the distributional properties of the two variables, to make general patterns in the data more visible, and to detect possible outliers. Scatterplots are used to explore the relationship between mortality and calcium concentration.

Following this initial graphical exploration, some type of correlation coefﬁcient can be computed for mortality and calcium concentration. Pearson’s correlation coefﬁcient is generally used but others, for example, Spearman’s rank correlation, may be more appropriate if the data are not considered to have a bivariate normal distribution. The relationship between the two variables is examined separately for northern and southern towns.

Finally, it is of interest to compare the mean mortality and mean calcium concentration in the north and south of the country by using either a t-test or its nonparametric alternative, the Wilcoxon rank-sum test.

2.3 Analysis Using SAS

Assuming the data is stored in an ASCII ﬁle, water.dat, as listed in Display 2.1 (i.e., including the '*'to identify the location of the town and the name of the town), then they can be read in using the following instructions:

Town	Mortality	Hardness

Bath	1247	105
* Birkenhead	1668	17
Birmingham	1466	5
* Blackburn	1800	14
* Blackpool	1609	18
* Bolton	1558	10
* Bootle	1807	15
Bournemouth	1299	78
* Bradford	1637	10
Brighton	1359	84
Bristol	1392	73
* Burnley	1755	12
Cardiff	1519	21
Coventry	1307	78
Croydon	1254	96
* Darlington	1491	20
* Derby	1555	39
* Doncaster	1428	39

	East Ham	1318	122
	Exeter	1260	21
* Gateshead		1723	44
* Grimsby		1379	94
*	Halifax	1742	8
* Huddersﬁeld		1574	9
*	Hull	1569	91
	Ipswich	1096	138
* Leeds		1591	16
	Leicester	1402	37
*	Liverpool	1772	15
* Manchester		1828	8
* Middlesbrough		1704	26
* Newcastle		1702	44
	Newport	1581	14
	Northampton	1309	59
	Norwich	1259	133
* Nottingham		1427	27
* Oldham		1724	6
	Oxford	1175	107
	Plymouth	1486	5
	Portsmouth	1456	90
*	Preston	1696	6
	Reading	1236	101
* Rochdale		1711	13
* Rotherham		1444	14
* St Helens		1591	49
*	Salford	1987	8
* Shefﬁeld		1495	14
	Southampton	1369	68
	Southend	1257	50
* Southport		1587	75
* South Shields		1713	71
*	Stockport	1557	13
* Stoke		1640	57
* Sunderland		1709	71
	Swansea	1625	13
*	Wallasey	1625	20
	Walsall	1527	60
	West Bromwich	1627	53
	West Ham	1486	122
	Wolverhampton	1485	81
* York		1378	71

Display 2.1

data water;

infile 'water.dat';

input flag $ 1 Town $ 2-18 Mortal 19-22 Hardness 25-27; if flag = '*' then location = 'north';

else location = 'south';

run;

The input statement uses SAS's column input where the exact columns containing the data for each variable are speciﬁed. Column input is simpler than list input in this case for three reasons:

There is no space between the asterisk and the town name.

Some town names are longer than eight characters — the default length for character variables.

Some town names contain spaces, which would make list input complicated.

The univariate procedure can be used to examine the distributions of numeric variables. The following simple instructions lead to the results shown in Displays 2.2 and 2.3:

proc univariate data=water normal; var mortal hardness;

histogram mortal hardness /normal; probplot mortal hardness;

run;

The normal option on the proc statement results in a test for the normality of the variables (see below). The var statement speciﬁes which variable(s) are to be included. If the var statement is omitted, the default is all the numeric variables in the data set. The histogram statement produces histograms for both variables and the /normal option requests a normal distribution curve. Curves for various other distributions, including nonparametric kernel density estimates (see Silverman [1986]) can be produced by varying this option. Probability plots are requested with the probplot statement. Normal probability plots are the default. The resulting histograms and plots are shown in Displays 2.4 to 2.7.

Displays 2.2 and 2.3 provide signiﬁcant information about the distributions of the two variables, mortality and hardness. Much of this is selfexplanatory, for example, Mean, Std Deviation, Variance, and N. The meaning of some of the other statistics printed in these displays are as follows:

Uncorrected SS: Uncorrected sum of squares; simply the sum of squares of the observations

Corrected SS:	Corrected sum of squares; simply the sum
	of squares of deviations of the observa-
	tions from the sample mean
Coeff Variation:	Coefﬁcient of variation; the standard devi-
	ation divided by the mean and multiplied
	by 100
Std Error Mean:	Standard deviation divided by the square
	root of the number of observations
Range:	Difference between largest and smallest
	observation in the sample
Interquartile Range:	Difference between the 25% and 75%
	quantiles (see values of quantiles given
	later in display to conﬁrm)
Student’s t:	Student’s t-test value for testing that the
	population mean is zero
Pr>\|t\|:	Probability of a greater absolute value for
	the t-statistic
Sign Test:	Nonparametric test statistic for testing
	whether the population median is zero
Pr>\|M\|:	Approximation to the probability of a
	greater absolute value for the Sign test
	under the hypothesis that the population
	median is zero
Signed Rank:	Nonparametric test statistic for testing
	whether the population mean is zero
Pr>=\|S\|:	Approximation to the probability of a
	greater absolute value for the Sign Rank
	statistic under the hypothesis that the pop-
	ulation mean is zero
Shapiro-Wilk W:	Shapiro-Wilk statistic for assessing the nor-
	mality of the data and the corresponding
	P-value (Shapiro and Wilk [1965])

Kolmogorov-Smirnov D: Kolmogorov-Smirnov statistic for assessing the normality of the data and the corresponding P-value (Fisher and Van Belle [1993])

Cramer-von Mises W-sq: Cramer-von Mises statistic for assessing the normality of the data and the associated P-value (Everitt [1998])

Anderson-Darling A-sq: Anderson-Darling statistic for assessing the normality of the data and the associated P-value (Everitt [1998])

The UNIVARIATE Procedure

Variable: Mortal

Moments

N	61	Sum Weights		61
Mean	1524.14754	Sum Observations		92973
Std Deviation	187.668754	Variance	35219.5612
Skewness	-0.0844436	Kurtosis	-0	.4879484
Uncorrected SS	143817743	Corrected SS	2113173.67
Coeff Variation	12.3130307	Std Error Mean	24	.0285217

	Basic Statistical Measures
Location		Variability
Mean	1524.148	Std Deviation	187.66875
Median	1555.000	Variance	35220
Mode	1486.000	Range	891.00000
		Interquartile Range	289.00000

NOTE: The mode displayed is the smallest of 3 modes with a count of 2.

Tests for Location: Mu0=0


Test	-Statistic-			-----P-value------
Student's t	t	63.43077		Pr > \|t\|		<.0001
Sign	M		30.5	Pr >= \|M\|		<.0001
Signed Rank	S		945.5	Pr >= \|S\|		<.0001
	Tests for Normality
Test		--Statistic---			-----P-value------
Shapiro-Wilk		W	0.985543		Pr < W		0.6884
Kolmogorov-Smirnov		D	0.073488		Pr > D		>0.1500
Cramer-von Mises		W-Sq	0.048688		Pr > W-Sq		>0.2500
Anderson-Darling		A-Sq	0.337398		Pr > A-Sq		>0.2500

Quantiles (Definition 5)

Quantile		Estimate
100% Max			1987
99%			1987
95%			1800
90%			1742
75% Q3			1668
50% Median			1555
25% Q1			1379
10%			1259
5%			1247
1%			1096
0% Min			1096
Extreme Observations
----Lowest----		----Highest---
Value	Obs	Value	Obs
1096	26	1772	29
1175	38	1800	4
1236	42	1807	7
1247	1	1828	30
1254	15	1987	46

Fitted Distribution for Mortal

Parameters for Normal Distribution

Parameter	Symbol	Estimate
Mean	Mu	1524.148
Std Dev	Sigma	187.6688

Goodness-of-Fit Tests for Normal Distribution
Test	---Statistic----		-----P-value-----
Kolmogorov-Smirnov	D	0.07348799	Pr > D	>0.150
Cramer-von Mises	W-Sq	0.04868837	Pr > W-Sq	>0.250
Anderson-Darling	A-Sq	0.33739780	Pr > A-Sq	>0.250

Quantiles for Normal Distribution
		------Quantile------
Percent		Observed	Estimated
1	.0	1096.00	1087.56
5	.0	1247.00	1215.46
10	.0	1259.00	1283.64
25	.0	1379.00	1397.57
50	.0	1555.00	1524.15
75	.0	1668.00	1650.73
90	.0	1742.00	1764.65
95	.0	1800.00	1832.84
99	.0	1987.00	1960.73

Display 2.2

The UNIVARIATE Procedure
	Variable: Hardness
	Moments
N	61	Sum Weights		61
Mean	47.1803279	Sum Observations		2878
Std Deviation	38.0939664	Variance	1451.15027
Skewness	0.69223461	Kurtosis	-0	.6657553
Uncorrected SS	222854	Corrected SS	87069.0164
Coeff Variation	80.7412074	Std Error Mean	4	.8774326

	Basic Statistical Measures
Location		Variability
Mean	47.18033	Std Deviation	38	.09397
Median	39.00000	Variance		1451
Mode	14.00000	Range	133	.00000
		Interquartile Range	61	.00000

	Tests for Location: Mu0=0
Test		-Statistic-		-----P-value------
Student's t	t	9.673189		Pr > \|t\|	<.0001
Sign	M	30	.5	Pr >= \|M\|	<.0001
Signed Rank	S	945	.5	Pr >= \|S\|	<.0001

<<< < Предыдущая 1 2 3 45 / 365 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 > Следующая >>>

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]

#
01.05.201522.82 Mб115gistologia.pdf
#
22.08.20193.23 Mб13Gnuch.-Kovt.-Skoroch puc..doc
#
01.05.2015325.63 Кб5GOST_20850-84_ДКК.doc.столярка.doc
#
01.05.202552.58 Кб0GTM_4_modul_pererobka.docx
#
01.03.2025142.34 Кб0G__2.doc
#
01.05.20154.92 Mб18Handbook_of_statistical_analysis_using_SAS.pdf
#
10.08.201983.97 Кб16HARDWARE.doc
#
01.05.201533.9 Кб6History.docx
#
10.03.201612.98 Mб20hmelnickii_g_o_homenko_v_s_veterinarna_farmakologiya.pdf
#
10.03.20164.78 Mб10Hroshi_ta_kredyt_vyd4.pdf
#
01.05.202577.31 Кб0Inform_Tech.doc