The CORRESP Procedure
Column Coordinates |
|
|
|
Dim1 |
|
Dim2 |
Prem Died |
0 |
.3504 |
-0.0450 |
Prem Alive |
0 |
.0595 |
-0 |
.0010 |
FT Died |
0 |
.2017 |
0 |
.1800 |
FT Alive |
-0 |
.0123 |
-0 |
.0005 |
Summary Statistics for the Column Points
|
Quality |
Mass |
Inertia |
Prem Died |
0.9991 |
0.0152 |
0.6796 |
Prem Alive |
0.9604 |
0.0749 |
0.0990 |
FT Died |
0.9996 |
0.0066 |
0.1722 |
FT Alive |
0.9959 |
0.9034 |
0.0492 |
Partial Contributions to Inertia for the Column Points
|
Dim1 |
Dim2 |
Prem Died |
0.7359 |
0.1258 |
Prem Alive |
0.1047 |
0.0003 |
FT Died |
0.1055 |
0.8730 |
FT Alive |
0.0539 |
0.0008 |
Indices of the Coordinates that Contribute Most to Inertia for the Column Points
|
Dim1 |
Dim2 |
Best |
Prem Died |
1 |
0 |
1 |
Prem Alive |
0 |
0 |
1 |
FT Died |
2 |
2 |
2 |
FT Alive |
0 |
0 |
1 |
Squared Cosines for the Column Points
|
Dim1 |
Dim2 |
Prem Died |
0.9829 |
0.0162 |
Prem Alive |
0.9601 |
0.0003 |
FT Died |
0.5563 |
0.4433 |
FT Alive |
0.9945 |
0.0014 |
Display 16.6
The chi-squared statistic for these data is 19.1090, which with nine degrees of freedom has an associated P-value of 0.024. Thus, it appears that “type” of mother is related to what happens to the newborn baby. The correspondence analysis of the data shows that the first two eigenvalues account for 99.5% of the inertia. Clearly, a two-dimensional solution provides an extremely good representation of the relationship between the two variables. The two-dimensional solution plotted in Display 16.7 suggests that young mothers who smoke tend to produce more full-term babies who then die in the first year, and older mothers who smoke have rather more than expected premature babies who die in the first year. It does appear that smoking is a risk factor for death in the first year of the baby’s life and that age is associated with length of gestation, with older mothers delivering more premature babies.
Display 16.7
16.3.3Are the Germans Really Arrogant?
The data on perceived characteristics of European nationals can be read in as follows.
data europeans;
infile 'n:\handbook2\datasets\europeans.dat' expandtabs; input country $ c1-c13;
label c1='stylish'
c2='arrogant'
c3='sexy'
c4='devious' c5='easy-going' c6='greedy' c7='cowardly' c8='boring' c9='efficient' c10='lazy' c11='hard working' c12='clever' c13='courageous';
run;
In this case, we assume that the name of the country is included in the data file so that it can be read in with the cell counts and used to label the rows of the table. The correspondence analysis and plot are produced in the same way and the results are shown in Displays 16.8 and 16.9.
proc corresp data=europeans out=coor; var c1-c13;
id country; run;
%plotit(data=coor,datatype=corresp,color=black,colors=black);
|
|
|
The CORRESP Procedure |
|
|
|
Inertia and Chi-Square Decomposition |
Singular |
Principal |
|
Chi- |
|
|
Cumulative |
|
Value |
Inertia |
Square |
Percent |
Percent |
10 20 30 40 50 |
|
|
|
|
|
|
|
|
----+----+----+----+----+--- |
0.49161 |
0.24168 |
255 |
.697 |
49 |
.73 |
49 |
.73 |
************************* |
0.38474 |
0.14803 |
156 |
.612 |
30 |
.46 |
80 |
.19 |
*************** |
0.20156 |
0.04063 |
42 |
.985 |
8 |
.36 |
88 |
.55 |
**** |
0.19476 |
0.03793 |
40 |
.133 |
7 |
.81 |
96 |
.36 |
**** |
0.11217 |
0.01258 |
13 |
.313 |
2 |
.59 |
98 |
.95 |
* |
0.07154 0.00512 |
5 |
.415 |
1 |
.05 |
100 |
.00 |
* |
Total |
0.48597 |
514 |
.153 |
100 |
.00 |
|
|
|
Degrees of Freedom = 72
Row Coordinates |
|
|
|
Dim1 |
|
Dim2 |
France |
0 |
.6925 |
-0.3593 |
Spain |
0 |
.2468 |
0 |
.3503 |
Italy |
0 |
.6651 |
0 |
.0062 |
U.K. |
-0.3036 |
0 |
.1967 |
Ireland |
-0.1963 |
0 |
.6202 |
Holland |
-0.5703 |
-0.0921 |
Germany |
-0 |
.5079 |
-0.5530 |
Summary Statistics for the Row Points
|
Quality |
Mass |
Inertia |
France |
0.9514 |
0.1531 |
0.2016 |
Spain |
0.4492 |
0.1172 |
0.0986 |
Italy |
0.9023 |
0.1389 |
0.1402 |
U.K. |
0.7108 |
0.1786 |
0.0677 |
Ireland |
0.7704 |
0.1361 |
0.1538 |
Holland |
0.5840 |
0.1002 |
0.1178 |
Germany |
0.9255 |
0.1758 |
0.2204 |
Partial Contributions to Inertia for the Row Points
|
Dim1 |
Dim2 |
France |
0.3039 |
0.1335 |
Spain |
0.0295 |
0.0971 |
Italy |
0.2543 |
0.0000 |
U.K. |
0.0681 |
0.0467 |
Ireland |
0.0217 |
0.3537 |
Holland |
0.1348 |
0.0057 |
Germany |
0.1876 |
0.3632 |
Indices of the Coordinates that Contribute Most to Inertia for the Row Points
|
Dim1 |
Dim2 |
Best |
France |
1 |
1 |
1 |
Spain |
0 |
0 |
2 |
Italy |
1 |
0 |
1 |
U.K. |
0 |
0 |
1 |
Ireland |
0 |
2 |
2 |
Holland |
1 |
0 |
1 |
Germany |
2 |
2 |
2 |
The CORRESP Procedure
Squared Cosines for the Row Points
|
Dim1 |
Dim2 |
France |
0.7496 |
0.2018 |
Spain |
0.1490 |
0.3002 |
Italy |
0.9023 |
0.0001 |
U.K. |
0.5007 |
0.2101 |
Ireland |
0.0701 |
0.7002 |
Holland |
0.5692 |
0.0148 |
Germany 0.4235 0.5021
Column Coordinates |
|
|
|
|
Dim1 |
|
Dim2 |
stylish |
0 |
.8638 |
-0.3057 |
arrogant |
-0.0121 |
-0.5129 |
sexy |
0 |
.9479 |
-0.1836 |
devious |
0 |
.2703 |
0 |
.0236 |
easy-going |
0 |
.0420 |
0 |
.5290 |
greedy |
0 |
.1369 |
-0.1059 |
cowardly |
0 |
.5869 |
0 |
.2627 |
boring |
-0.2263 |
0 |
.0184 |
efficient |
-0.6503 |
-0.4192 |
lazy |
0 |
.2974 |
0 |
.5603 |
hard working |
-0.4989 |
-0.0320 |
clever |
-0.5307 |
-0.2961 |
courageous |
-0.4976 |
0.6278 |
Summary Statistics for the Column Points
|
Quality |
Mass |
Inertia |
stylish |
0.9090 |
0.0879 |
0.1671 |
arrogant |
0.6594 |
0.1210 |
0.0994 |
sexy |
0.9645 |
0.0529 |
0.1053 |
devious |
0.3267 |
0.0699 |
0.0324 |
easy-going |
0.9225 |
0.1248 |
0.0784 |
greedy |
0.2524 |
0.0473 |
0.0115 |
cowardly |
0.6009 |
0.0350 |
0.0495 |
boring |
0.4431 |
0.0633 |
0.0152 |
efficient |
0.9442 |
0.1040 |
0.1356 |
lazy |
0.6219 |
0.0671 |
0.0894 |
hard working |
0.9125 |
0.1361 |
0.0767 |
clever |
0.9647 |
0.0227 |
0.0179 |
courageous |
0.7382 |
0.0681 |
0.1217 |
Partial Contributions to Inertia for the Column Points
|
Dim1 |
Dim2 |
stylish |
0.2714 |
0.0555 |
arrogant |
0.0001 |
0.2150 |
sexy |
0.1968 |
0.0121 |
devious |
0.0211 |
0.0003 |
easy-going |
0.0009 |
0.2358 |
greedy |
0.0037 |
0.0036 |
cowardly |
0.0499 |
0.0163 |
boring |
0.0134 |
0.0001 |
efficient |
0.1819 |
0.1234 |
lazy |
0.0246 |
0.1423 |
hard working |
0.1401 |
0.0009 |
clever |
0.0264 |
0.0134 |
courageous |
0.0697 |
0.1812 |
The CORRESP Procedure
Indices of the Coordinates that Contribute Most to Inertia for the Column Points
|
Dim1 |
Dim2 |
Best |
stylish |
1 |
0 |
1 |
arrogant |
0 |
2 |
2 |
sexy |
1 |
0 |
1 |
devious |
0 |
0 |
1 |
easy-going |
0 |
2 |
2 |
greedy |
0 |
0 |
1 |
cowardly |
0 |
0 |
1 |
boring |
0 |
0 |
1 |
efficient |
1 |
1 |
1 |
lazy |
0 |
2 |
2 |
hard working |
1 |
0 |
1 |
clever |
0 |
0 |
1 |
courageous |
2 |
2 |
2 |
Squared Cosines for the Column Points
|
Dim1 |
Dim2 |
stylish |
0.8078 |
0.1012 |
arrogant |
0.0004 |
0.6591 |
sexy |
0.9296 |
0.0349 |
devious |
0.3242 |
0.0025 |
easy-going |
0.0058 |
0.9167 |
greedy |
0.1579 |
0.0946 |
cowardly |
0.5006 |
0.1003 |
boring |
0.4402 |
0.0029 |
efficient |
0.6670 |
0.2772 |
lazy |
0.1367 |
0.4851 |
hard working |
0.9088 |
0.0038 |
clever |
0.7357 |
0.2290 |
courageous |
0.2848 |
0.4534 |
Display 16.8
1.0 |
|
|
|
* |
courageous |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
* |
Ireland |
lazy |
|
|
0.5 |
|
|
|
|
|
|
* easy*-going |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
* Spain |
|
|
|
|
|
|
|
* |
UK |
|
* cowardly |
|
|
|
|
|
|
|
|
|
0.0 |
|
hard working |
|
|
* boring |
*devious * Italy |
|
|
|
|
|
|
* |
*Holland |
|
|
* greedy |
|
sexy |
|
|
|
* clever |
|
|
|
|
|
stylish* |
|
|
* |
|
|
|
France |
* |
* |
-0.5 |
|
efficient |
|
|
* arrogant |
|
|
* Germany |
|
|
|
|
|
|
|
-1.0
-1.0 |
-0.5 |
0.0 |
0.5 |
1.0 |
|
|
Dimension 1 (49.73%) |
|
Here, a two-dimensional representation accounts for approximately 80% of the inertia. The two-dimensional solution plotted in Display 16.9 is left to the reader for detailed interpretation, noting only that it largely fits the author’s own prejudices about perceived national stereotypes.
Exercises
16.1Construct a scatterplot matrix of the first four correspondence analysis coordinates of the European stereotypes data.
16.2Calculate the chi-squared distances for both the row and column profiles of the smoking and motherhood data, and then compare them with the corresponding Euclidean distances in Display 16.7.
Appendix A
SAS Macro to Produce Scatterplot Matrices
This macro is based on one supplied with the SAS system but has been adapted and simplified. It uses proc iml and therefore requires that SAS/IML be licensed.
The macro has two arguments: the first is the name of the data set that contains the data to be plotted; the second is a list of numeric variables to be plotted. Both arguments are required.
%macro scattmat(data,vars);
/* expand variable list and separate with commas */
data _null_;
set &data (keep=&vars);
length varlist $500. name $32.; array xxx {*} _numeric_;
do i=1 to dim(xxx);
call vname(xxx{i},name); varlist=compress(varlist||name);
if i<dim(xxx) then varlist=compress(varlist||','); end;
call symput('varlist',varlist);
stop;
run;
proc iml;
/*-- Load graphics --*/ call gstart;
/*-- Module : individual scatter plot --*/ start gscatter(t1, t2);
/* pairwise elimination of missing values */ t3=t1;
t4=t2;
t5=t1+t2;
dim=nrow(t1);
j=0;
do i=1 to dim;
if t5[i]=. then ; else do;
j=j+1;
t3[j]=t1[i];
t4[j]=t2[i];
end;
end;
t1=t3[1:j];
t2=t4[1:j];
/* ---------------- */ window=(min(t1)||min(t2))//
(max(t1)||max(t2)); call gwindow(window);
call gpoint(t1,t2); finish gscatter;
/*-- Module : do scatter plot matrix --*/ start gscatmat(data, vname);
call gopen('scatter'); nv=ncol(vname);
if (nv=1) then nv=nrow(vname); cellwid=int(90/nv);
dist=0.1*cellwid;