
Encyclopedia of Sociology Vol
.5.pdf
UTOPIAN ANALYSIS AND DESIGN
tinue to exist). All able-bodied persons in Utopia become part of its work force—slaves, male nonslaves, and even women! This is seen as an enormous augmentation of the work force. Within each household, however, male dominance prevails. Households are under the authority of the oldest free male. Women are specifically designated as ‘‘subordinate’’ to their husbands, as children are to their parents and younger people generally are to their elders. In Utopia, the applicability of equality is severely restricted.
In discussing utopias it is important to distinguish between analytic and design models. Analytic models purport to be summaries of existing empirical reality; design models are summaries or sketches of future, past, or alternative societies, social structures, or worlds.
Characteristically, utopian literature contains a critique of existing society along with a model of a different one. Frequently the design model incorporates a more or less indirect critique of an existing state of affairs. Plato’s Republic (1941), the work that seems to have been the prototype of More’s Utopia, was greatly influenced by the social conditions observed and experienced by Plato. He saw the Athens in which he lived as a very corrupt democracy and felt that in such a system politicians inevitably pandered to mobs. If the mob insisted upon venal demands, politicians found it necessary to agree with them or lose their own positions. Reform, he felt, was not possible in a corrupt society. In the Republic Socrates, voicing Plato’s sentiments, concludes that ‘‘the multitude can never be philosophical. Accordingly, it is bound to disapprove of all who pursue wisdom; and so also, of course, are those individuals who associate with the mob and set their hearts on pleasing it’’ (1941, p. 201).
Interestingly, it has been suggested that Plato’s hostility to democracy was, at least to some extent, shaped by his economic and social background. Members of his family were large landholders who, along with others in a similar position, saw the rise of commerce as a threat to their economic positions. Democratic government undermined their political preeminence, as did militant foreign policies. They had a great deal to lose through war because they were subject to heavy war taxes. Moreover, some had had their lands ravaged by Spartans during the Peloponnesian War; others
had retreated behind the walls of Athens. These conservative elements were not above attempting to subvert the democratic system (Klosko 1986, p. 10).
In any event, Plato’s utopia is clearly elitist in nature. For a variety of reasons most utopian schemes seem to be controlled by elites of some sort. As one writer explains it:
They begin with the proposition that things are bad; things must become better, perhaps perfect here on earth; things will not improve by themselves; a plan must be developed and carried out; this implies the existence of an enlightened individual, or a few, who will think and act in a way that many by themselves cannot think and act. (Brinton 1965, p. 50)
For Plato, the elites were what he called philosophers. In a sense these were the theoreticians or model makers. The problem he saw was converting their models—their ideal worlds—into reality. Plato was very realistic about this matter of convertibility. He has Socrates ask, ‘‘Is it not in the nature of things that action should come less close to truth than thought?’’ (1941, p. 178). He is, however, concerned about trying to come as close as possible to having the real world correspond to the ideal one. The solution? To have philosophers become rulers or to have rulers become philosophers. In either case enormous, if not complete, power is to be held by a caste of elites.
In effect, social inequality is found even in the work of the triumvirate usually referred to as the ‘‘utopian socialists’’: Claude Henri de Rouvroy de Saint Simon (1760–1825), Charles Fourier (1772– 1837), and Robert Owen (1771–1858).
In his early work Saint Simon’s elites were scientists, but later he tended to subordinate them or at least to keep them on a par with industrial chiefs. He evaded the problem of social equality by saying that each member of society would be paid in accordance with his or her ‘‘investment.’’ This referred to the contribution each made to the productive process. Since different people had different talents, these contributions would differ. Some people’s contributions would be more important than others’, and accordingly those people would be paid more. But although the rewards of different people would differ, there would not be wide discrepancies between the rewards of the
3202

UTOPIAN ANALYSIS AND DESIGN
lowestand highest-paid workers (Manuel and Manuel 1979, pp. 590–614).
Unlike Saint Simon, who never wrote a detailed description of a utopian society, Charles Fourier wrote thousands of pages of detailed descriptions of his ‘‘Phalanx,’’ including architectural specifications, work schedules and countless other details. The Phalanx was to be organized essentially as a shareholding corporation. Members were free to buy as many shares as they wished or could afford. Fourier stressed the fact that in his utopia there would be three social classes: the rich, the poor, and the middle. The condition of the poor would be enormously better than their condition in existing society, but the rich or upper class would be entitled to more lavish living quarters, more sumptuous food, and, in general, a more luxurious life-style than the others. During the last fifteen years or so of his life, most of Fourier’s efforts were devoted to the search for a wealthy person to subsidize a trial of his Phalanx (Beecher 1986).
Robert Owen insisted on what he regarded to be complete equality. Conceding that people were born with differing abilities, he contended that these abilities were provided by God and should not be the basis for differential rewards. Nevertheless, as a self-made man who became extremely successful and managed the most important cot- ton-spinning factory in Britain, he never seemed to lose the self-assurance that he knew best how to manage a community and that all members would understand the wisdom of his decisions. He has been characterized as a benevolent autocrat who acted somewhat like a military commander who has little direct contact with his troops (Cole 1969; Manuel and Manuel 1979, pp. 676–693).
In the United States, the most widely read utopian novel based on the assumption of absolute economic equality is undoubtedly Edward Bellamy’s Looking Backward (1887). Bellamy (1850– 1898), influenced by the development of the large economic trusts in the United States, postulated that by the year 2000 only one enormous trust would remain: the United States government. He went to great pains to make it clear that his utopia was devoid of Marxist or other European influences. The principle of income or reward on which it was based was neither ‘‘From each according to his investment or product’’ nor the classic
‘‘From each according to his ability, to each according to his need,’’ although it was much closer to the latter than to the former.
In Bellamy’s vision of the United States in the year 2000, each person received an equal share of the total national product. In effect, every inhabitant received a credit card showing his or her share of the product. The share could be spent in any manner. If too many individuals decided to buy a particular product, the price of that product would be raised. The point, however, is that people were entitled to a share of the national product not on the basis of their individual productivity but simply because they existed as human beings. In some telling passages Bellamy’s characters observe that members of families do not deny food or other needs to other family members because they have been unproductive. In effect, the entire country (and, presumably, ultimately the entire world) would resemble our more primitive notion of one family.
Bellamy’s work received widespread attention throughout the world. In England, William Morris (1834–1896) objected strenuously to the centralized control and bureaucratic form of organization in Looking Backward. Morris wrote his own utopian novel, News from Nowhere (1866). Unlike Bellamy’s utopia, which came into being through a process of evolution, a violent revolution has occurred in Nowhere. London has become a series of relatively small villages separated by flowers and wooded areas. There is no centralized govern- ment—no government at all—as we normally understand it. With the end of private property and domestic arrangements in which women are essentially the property of men, the underlying reasons for criminal behavior have been eliminated. Random acts of violence are regarded as transitory diseases and are dealt with by nurses and doctors rather than by jailers.
It has been argued that Morris was essentially an anarchist theorist, although Morris himself vigorously objected to such characterization of his work. It has been suggested that anarchism has two major forms: collectivist and individualist. Morris is seen as essentially a collectivist anarchist, although not an anarchosyndicalist—the form that stresses trade-union activity. He ridiculed conventional forms of individualism. Anarchism itself is defined as a social theory that advocates a commu-
3203

UTOPIAN ANALYSIS AND DESIGN
nity-centered life with great amounts of personal liberty. It opposes coercion of its population (Sargent 1990, pp. 61–64).
Other commentators see News from Nowhere as an effort by Morris to present his arguments against anarchism (Holzman 1990, p. 99). It seems clear that his work does not fit neatly into any prefabricated ideological cubbyhole. Morris cherished aesthetic over intellectual values (he was an architect, artist, poet, designer, and craftsman). When one of his characters in News from Nowhere is asked how labor is rewarded, the reply is quite predictable: it is not rewarded. Work has become a pleasure—not a hardship. Each person does what he or she can do best; the quandary of extrinsic motivation has substantially disappeared.
Motivation, however, is the central concern in B. F. Skinner’s Walden Two (1948). Burrhus Frederic Skinner (1904–1990) was a professional psychologist whose utopia was a product of his interest in behavioral engineering. His ideal community has been described as one of means rather than of ends—one in which technique has been elevated to utopian status (Kumar 1987, p. 349).
This is not completely accurate. It does capture the essence of how Skinner himself saw his utopia, but it omits direct consideration of the implicit values held by its designer.
Skinner himself was unquestionably a wellmotivated, humanistic scientist, but he neglected his customary penetrating analysis when approaching the area of values held by the boss scientist. At one point in Walden Two, however, he does seem to have some insight into this difficulty. Frazier, the founder of the community, voices the unspoken criticism of one of the other characters by pointing to his own insensitivity to the effect he has on others, except when the effect is calculated; his lack of the personal warmth responsible in part for the success of the community; the ulterior and devious nature of his own motives. He then cries out, ‘‘But God damn it Burris . . . can’t you see? I’m—not—a—product—of—Walden—Two!’’ (Skinner 1948, p. 233).
Economic and basic social equality exist in this community, but effective control is exercised through the built-in reinforcement techniques of its designer. When Frazier is challenged on this by one of the characters who observes that Frazier,
looking at the world from the middle of the twentieth century, assumes he knows the best course for humanity forever, Frazier essentially agrees. His defense is that the techniques of behavioral engineering currently exist (and presumably will continue to be used), but they are in the wrong hands—those of charlatans, salespeople, ward heelers, bullies, cheats, educators, priests, and others. Ultimately, Skinner’s designer insists, human beings are never free—their behavior is determined by prior conditioning in the society in which they were raised. The belief in their own freedom is what allows human beings unwittingly to become conditioned by reinforcers in their existing environments.
Thus, in effect, Walden Two achieves its effects by changing the psychological characteristics of its inhabitants through environmental modification. Its final form is presumably an experimental question. The queries are simple enough and are stated explicitly at one point: What is the best behavior for the individual as far as the group is concerned? How can an individual be induced to behave in that way? The answer presumably can change over time, on the basis of experimental experience. The entire edifice would seem to depend upon the continuing moral superiority of the reinforcement designers over the charlatans they replace.
Quite a different sort of utopia has been proposed by the philosopher Robert Nozick, who outlines what he calls the framework for a utopia. In a word (or two), this framework is equivalent to what Nozick calls the minimal state (Nozick 1974, pp. 297–334). This is a state ‘‘limited to the narrow functions of protection against force, theft, fraud, enforcement of contracts, and so on . . . any more extensive state will violate persons’ rights not to be forced to do certain things and is unjustified . . .’’ (Nozick 1974, p. ix).
Nozick is not concerned with modifying behavior or specifying social structures beyond this minimum state. He begins with the assumption that individual persons have certain rights that may never be violated by any other person or the state. These include the right not to be killed or attacked if you are not doing any harm; not to be coerced or imprisoned; not to be limited in the use of your property if that use does not violate the rights of others.
3204

UTOPIAN ANALYSIS AND DESIGN
In arguing for a minimal state, Nozick, on the one hand, is arguing against anarchism (in which there is no state at all). On the other hand, he argues against all forms of the welfare state (in which some people with excessive wealth may be required to surrender some of their property to help others who are less fortunate) (Paul 1981).
As Nozick sees it, rights define a moral boundary around individual persons. The sanctity of this boundary takes priority over all other possible goals. Thus, it becomes readily understandable why he feels that nonvoluntary redistribution of income is morally indefensible:
It is an extraordinary but apparent consequence of this view that for a government to tax each of its able-bodied citizens five dollars a year to support cripples and orphans would violate the rights of the able-bodied and would be morally impermissible, whereas to refrain from taxation even if it meant allowing the cripples and orphans to starve to death would be the morally required governmental policy. (Scheffler 1981, p. 151)
Here again we see the clash of values that lie at the heart of utopian schemes and their critics. A serious and widely discussed effort to resolve these clashes was made late in the twentieth century by another social philosopher, John Rawls. A Theory of Justice (Rawls 1971) was not a utopian novel but a meticulously argued tome that has been compared with John Locke’s Second Treatise of Civil Government and John Stuart Mill’s On Liberty. The central question confronting his work has been expressed thus: ‘‘Is it possible to satisfy the legitimate ‘leftist’, ‘socialist’ critics of Western capitalism within a broadly liberal, capitalist and democratic framework?’’ (Goldman 1980, p. 431).
Unfortunately, Rawls has found himself increasingly caught between attacks from both the left and the right. The left feels he has not gone far enough in constraining property rights; the right feels he places too great an emphasis upon the value of equality, especially at the expense of the right to property (Goldman 1980, pp. 431–432).
A central point argued by Rawls is that there is no injustice if greater benefits are earned by a few, provided the situation of people not so fortunate is thereby improved (Rawls 1971, pp. 14–15).
As one commentator expressed it, for Rawls equality comes first. Goods are to be distributed equally unless it can be shown that an unequal distribution is to the advantage of the least advantaged. This would be a ‘‘just’’ distribution (Schaar 1980). One might add, parenthetically, that this justice would depend substantially upon the nature of the existing social and economic arrangements under which this inequality occurs. Would a different set of arrangements allow greater equality? For example, is capital available only through private sources? Would public sources serve similar ends with less inequality?
The central issue for utopian analysts from Plato through twentieth-century philosophers is how one constructs a ‘‘just’’ society. But there is no single definition of ‘‘just’’; it all depends on what you consider to be important. Are you concerned exclusively with yourself? your immediate family? others in your community? in your country? in the world?
And so it is that utopian analysis and design ultimately begin with an implicit, if not explicit, value orientation. One school of thought begins with an overwhelming belief that elites of one sort or another must be favored in the new society. Elite status may be gained through existing wealth, birth, talent, skill, intelligence, or physical strength. Another school begins with what is, broadly speaking, the concept of equality. Here the implicit notion is not unlike Western ideas of the family: to each equally, irrespective of either productivity or need. Between these two polar positions lie a range of intermediate proposals that may provide greater amounts of compensation based upon some definition of need or elite status. In turn, compensation may or may not be linked directly to political or other forms of power.
Issues relating to the nation-state (its form, its powers, and even its very existence), ethnicity, and inequality became acute in the final decade of the twentieth century. Ethnic groups throughout the world grew militant in their demands for their own national entities. Many saw this as a path to a solution for their own problems of inequality. With the apparent easing, if not the elimination, of Cold War tensions between the Soviet Union and the United States, widespread controversies began relative to the shape of a ‘‘new world order.’’ This posed unprecedented challenges to utopian thought.
3205

UTOPIAN ANALYSIS AND DESIGN
To deal with these challenges, social scientists, as well as imaginative novelists and others, were confronted with the task of integrating value configurations, social structures, and psychological sets on levels that may well make all previous efforts at utopian analysis and design resemble the stumbling steps of a child just learning to walk.
(SEE ALSO: Equity Theory; Social Philosophy)
REFERENCES
Beecher, Jonathan 1986 Charles Fourier: The Visionary and His World. Berkeley: University of California Press.
Brinton, Crane 1965 ‘‘Utopia and Democracy.’’ In Frank E. Manuel, ed., Utopias and Utopian Thought. Boston: Beacon Press.
Cole, Margaret 1969 Robert Owen of New Lanark 1771– 1858. New York: August M. Kelley.
Gil, Efraim 1996 ‘‘The Individual within the Collective: A New Perspective.’’ Journal of Rural Cooperation 24:5–15.
Goldman, Alan H. 1980 ‘‘Responses to Rawls from the Political Right.’’ In H. Gene Blocker and Elizabeth H. Smith, eds., John Rawls’ Theory of Social Justice. Athens: Ohio University Press.
Hacohen, Malachi-Haim 1996 ‘‘Karl Popper in Exile: The Viennese Progressive Imagination and the Making of The Open Society.’’ Philosophy of the Social Sciences 26:452–492.
Hodgson, Geoffrey M. 1995 ‘‘The Political Economy of Utopia.’’ Review of Social Economy 53:195–213.
Holzman, Michael 1990 ‘‘The Encouragement and Warning of History: William Morris’s A Dream of John Ball.’’ In Florence S. Boos and Carole G. Silver, eds.,
Socialism and the Literary Artistry of William Morris. Columbia: University of Missouri Press.
Klosko, George 1986 The Development of Plato’s Political Theory. New York: Methuen.
Kumar, Krishan 1987 Utopia and Anti-Utopia in Modern Times. New York: Basil Blackwell.
Lowy, Michael 1997 ‘‘The Romantic Utopia of Walter Benjamin’’ (L’Utopie romantique de Walter Benjamin) Raison Presente 121:19–27.
Maler, Henri. 1998. ‘‘An Pocryphal Testament: Socialism, Utopian and Scientific.’’ Science and Society 62:48–61.
Manuel, Frank E., and Fritzie P. Manuel 1979 Utopian Thought in the Western World. Cambridge, Mass.: Harvard University Press.
Martensson, Bertil 1991 ‘‘The Paradoxes of Utopia: A Study in Utopian Rationalism.’’ Philosophy of the Social Sciences 21:476–514.
More, Sir Thomas (1516) 1965 Utopia. Paul Turner, trans. London: Penguin.
Morris, William 1966 News from Nowhere. In The Collected Works of William Morris, vol. 16, pp. 3–211. New York: Russell and Russell.
Nozick, Robert 1974 Anarchy, State and Utopia. New York: Basic Books.
Oyzerman, Teodor Il ich. 1998 ‘‘Marxism and Utopianism. Marxisms’s Overcoming of Utopianism as an Unfinished Historical Process’’ (Marksizm i utopiszm. Preodolenie marksizmom utopizma kak nezavershennyi istoricheskiy protsess) Svobodnya-Mysl 2:76–83.
Paul, Jeffrey (ed.) 1981 Reading Nozick. Totowa, N.J.: Rowan and Littlefield.
Plato 1941 The Republic of Plato, Francis MacDonald Cornford, trans. and ed. New York and London: Oxford University Press.
Prat, Jean-Louis 1995 ‘‘Utopian Utilitarianism’’ (L’Utiliarisme utopique) Revue du MAUSS 6:53–60.
Rawls, John 1971 A Theory of Justice. Cambridge, Mass.: Harvard University Press.
Sargent, Lyman Tower 1990 ‘‘William Morris and the Anarchist Tradition.’’ In F. S. Boos and C. G. Silver, eds., Socialism and the Literary Artistry of William Morris. Columbia: University of Missouri Press.
Schaar, John H. 1980 ‘‘Equality of Opportunity and the Just Society.’’ In H. G. Blocker and E. H. Smith, eds.,
John Rawls’ Theory of Social Justice. Athens: Ohio University Press.
Scheffler, Samuel 1981 ‘‘Natural Rights, Equality and the Minimal State.’’ In Jeffrey Paul, ed., Reading Nozick. Totowa, N.J.: Rowan and Littlefield.
Skinner, B. F. 1948 Walden Two. New York: Macmillan.
ROBERT BOGUSLAW
3206

V
VALIDITY
In the simplest sense, a measure is said to be valid to the degree that it measures what it is hypothesized to measure (Nunnally 1967, p. 75). More precisely, validity has been defined as the degree to which a score derived from a measurement procedure reflects a point on the underlying construct it is hypothesized to reflect (Bohrnstedt 1983). In the most recent Standards for Educational and Psychological Testing (American Psychological Association 1985), it is stated that validity ‘‘refers to the appropriateness, meaningfulness, and usefulness of the specific inferences made from . . .
scores.’’ The emphasis is clear: Validity refers to the degree to which evidence supports the inferences drawn from a score rather than the scores or the instruments that produce the scores. Inferences drawn for a given measure with one population may be valid but may not be valid for other measures. As will be shown below, evidence for inferences about validity can be accumulated in a variety of ways. In spite of this variety, validity is a unitary concept. The varied types of inferential evidence relate to the validity of a particular measure under investigation.
Several important points related to validity should be noted:
1.Validity is a matter of degree rather than an all-or-none matter (Nunnally 1967, p. 75; Messick 1989).
2.Since the constructs of interest in sociology (normlessness, religiosity, economic conservatism, etc.) generally are not ame-
nable to direct observation, validity can be ascertained only indirectly.
3.Validation is a dynamic process; the evidence for or against the validity of the inferences that can be drawn from a measure may change with accumulating evidence. Validity in this sense is always a continuing and evolving matter rather than something that is fixed once and for all (Messick 1989).
4.Validity is the sine qua non of measurement; without it, measurement is meaningless.
In spite of the clear importance of validity in making defensible inferences about the reasonableness of theoretical formulations, the construct more often than not is given little more than lip service in sociological research. Measures are assumed to be valid because they ‘‘look valid,’’ not because they have been evaluated as a way to get statistical estimates of validity. In this article, the different meanings of validity are introduced and methods for estimating the various types of validity are discussed.
TYPES OF VALIDITY
The Standards produced jointly by the American Psychological Association, the American Educational Research Association, and the National Council on Measurement in Education distinguish between and among three types of evidence related to validity: (1) criterion-related, (2) content, and (3)
3207

VALIDITY
construct evidence (American Psychological Association 1985).
Criterion-Related Evidence for Validity. Cri- terion-related evidence for validity is assessed by the correlation between a measure and a criterion variable of interest. The criterion varies with the purpose of the researcher and/or the client for the research. Thus, in a study to determine the effect of early childhood education, a criterion of interest might be how well children perform on a standardized reading test at the end of the third grade. In a study for an industrial client, it might be the number of years it takes to reach a certain job level. The question that is always asked when one is accumulating evidence for criterion-related validity is: How accurately can the criterion be predicted from the scores on a measure? (American Psychological Association 1985).
Since the criterion variable may be one that exists in the present or one that a researcher may want to predict in the future, evidence for criteri- on-related validity is classified into two major types: predictive and concurrent.
Evidence for predictive validity is assessed by examining the future standing on a criterion variable as predicted from the present standing on a measure of interest. For example, if one constructs a measure of work orientation, evidence of its predictive validity for job performance might be ascertained by administering that measure to a group of new hires and correlating it with a criterion of success (supervisors’ ratings, regular advances within the organization, etc.) at a later point in time. The evidence for the validity of a measure is not limited to a single criterion. There are as many validities as there are criterion variables to be predicted from that measure. The preceding example makes this clear. In addition, the example shows that the evidence for the validity of a measure varies depending on the time at which the criterion is assessed. Generally, the closer in time the measure and the criterion are assessed, the higher the validity, but this is not always true.
Evidence for concurrent validity is assessed by correlating a measure and a criterion of interest at the same point in time. A measure of the concurrent validity of a measure of religious belief, for example, is its correlation with concurrent attendance at religious services. Just as is the case for
predictive validity, there are as many concurrent validities as there are criteria to be explained; there is no single concurrent validity for a measure.
Concurrent validation also can be evaluated by correlating a measure of X with extant measures of X, for instance, correlating one measure of selfesteem with a second one. It is assumed that the two measures reflect the same underlying construct. Two measures may both be labeled selfesteem, but if one contains items that deal with one’s social competence and the other contains items that deal with how one feels and evaluates oneself, it will not be surprising to find no more than a modest correlation between the two.
Evidence for validity based on concurrent studies may not square with evidence for validity based on predictive studies. For example, a measure of an attitude toward a political issue may correlate highly in August in terms of which political party one believes one will vote for in November but may correlate rather poorly with the actual vote in November.
Many of the constructs of interest to sociologists do not have criteria against which the validity of a measure can be ascertained easily. When they do, the criteria may be so poorly measured that the validity coefficients are badly attenuated by measurement error. For these reasons, sociological researchers have rarely computed criterion-related validities.
Content Validity. One can imagine a domain of meaning that a construct is intended to measure. Content validity provides evidence for the degree to which one has representatively sampled from that domain of meaning. (Bohrnstedt 1983). One also can think of a domain as having various facets (Guttman 1959), and just as one can use stratification to obtain a sample of persons, one can use stratification principles to improve the evidence for content validity.
While content validity has received close attention in the construction of achievement and proficiency measures psychology and educational psychology, it usually has been ignored by sociologists. Many sociological researchers have instead been satisfied to construct a few items on an ad hoc, one-shot basis in the apparent belief that they are measuring what they intended to measure. In
3208

VALIDITY
fact, the construction of good measures is a tedious, arduous, and time-consuming task.
Because domains of interest cannot be enumerated in the same way that a population of persons or objects can, the task of assuring the content validity of one’s measures is less rigorous than one would hope. While an educational psychologist can sample four-, five-, or six-letter words in constructing a spelling test, no such clear criteria exist for a sociologist who engages in social measurement. However, some guidelines can be provided. First, the researcher should search the literature carefully to determine how various authors have used the concept that is to be measured. There are several excellent handbooks that summarize social measures in use, including Robinson and Shaver’s Measures of Social Psychological Attitudes (1973); Robinson et al.’s Measures of Political Attitudes (1968); Robinson et al.’s Measures of Occupational Attitudes and Occupational Characteristics (1969); Shaw and Wright’s Scales for the Measurement of Attitudes (1967); and Miller’s Handbook of Research Design and Social Measurement (1977). These volumes not only contain lists of measures but provide existing data on the reliability and validity of those measures. However, since these books are out of date as soon as they go to press, researchers developing their own methods must do additional literature searches. Second, sociological researchers should rely on their own observations and insights and ask whether they yield additional facets to the construct under consideration.
Using these two approaches, one develops sets of items, one to capture each of the various facets or strata within the domain of meaning. There is no simple criterion by which one can judge whether a domain of meaning has been sampled properly. However, a few precautions can be taken to help ensure the representation of the various facets within the domain.
First, the domain can be stratified into its major facets. One first notes the most central meanings of the construct, making certain that the stratification is exhaustive, that is, that all major meaning facets are represented. If a facet appears to involve a complex of meanings, it should be subdivided further into substrata. The more one refines the strata and substrata the easier it is to construct the items later and the more complete the coverage
of meanings associated with the construct will be. Second, one should write several items or locate several extant indicators to reflect the meanings associated with each stratum and substratum. Third, after the items have been written, they should tried out on very small samples composed of persons of the type the items will eventually be used with, using cognitive interviewing techniques, in which subjects are asked to ‘‘think aloud’’ as they respond to the items. This technique for the improvement of items, while quite new in survey research, is very useful for improving the validity of items (Sudman et al. 1995). For example, Levine et al. (1997) have shown how cognitive interviewing helped in the improvement of school staffing resources, as did Levine (1996) in describing the development of background questionnaires for use with the large-scale cognitive assessments. Fourth, after the items have been refined through the use of cognitive laboratory techniques, the newly developed items should be field-tested on a sample similar to that with which one intends to examine the main research questions. The fieldtest sample should be large enough to examine whether the items are operating as planned vis-à- vis the constructs they are putatively measuring, using multivariate tools such as confirmatory factor analysis ( Joreskog 1969) and item response theory methods (Hambleton and Swaminathan 1985).
Finally, after the items are developed, the main study should employ a sampling design that takes into account the characteristics of the population about which generalizations are to be made (ethnicity, gender, region of country, etc.). The study also should be large enough to generate stable parameter estimates when one is using multivariate techniques such as multiple regression (Bohrnstedt and Knoke 1988) and structural equation techniques (Bollen 1989).
It can be argued that what the Standards call content validity is not a separate method for assessing validity. Instead, it is a set of procedures for sampling content domains that, if followed, can help provide evidence for construct validity (see the discussion of construct validity below). Messick (1989), in a similar stance, states that so-called content validity does not meet the definition of validity given above, since it does not deal directly with scores or their interpretation. This position can be better understood in the context of construct validity.
3209

VALIDITY
Construct Validity. The 1974 Standards state: ‘‘A construct is. . . a theoretical idea developed to explain and to organize some aspects of existing knowledge. . . It is a dimension understood or inferred from its network of interrelationships’’ (American Psychological Association 1985). The Standards further indicate that in developing evidence for construct validity,
the investigator begins by formulating hypotheses about the characteristics of those who have high scores on the [measure] in contrast to those who have low scores. Taken together, such hypotheses form at least a tentative theory about the nature of the construct the [measure] is believed to be measuring.
Such hypotheses or theoretical formulations lead to certain predictions about how people. . .
will behave. . . in certain defined situations. If the investigator’s theory. . . is correct, most predictions should be confirmed. (p. 30)
The notion of a construct implies hypotheses of two types. First, it implies that items from one stratum within the domain of meaning correlate together because they all reflect the same underlying construct or ‘‘true’’ score. Second, whereas items from one domain may correlate with items from another domain, the implication is that they do so only because the constructs themselves are correlated. Furthermore, it is assumed that there are hypotheses about how measures of different domains correlate with one another. To repeat, construct validation involves two types of evidence. The first is evidence for theoretical validity (Lord and Novick 1968): an assessment of the relationship between items and an underlying, latent unobserved construct. The second involves evidence that the underlying latent variables correlate as hypothesized. If either or both sets of these hypotheses fail, evidence for construct validation is absent. If one can show evidence for theoretical validity but evidence about the interrelations among those constructs is missing, that suggests that one is not measuring the intended construct or that the theory is wrong or inadequate. The more unconfirmed hypotheses one has involving the constructs, the more one is likely to assume the former rather than the latter.
The discussion above makes clear the close relationship between construct validation and the-
ory validation. To be able to show construct validity assumes that the researcher has a clearly stated set of interrelated hypotheses between important theoretical constructs, which in turn can be measured by sets of indicators. Too often in sociology, one or both of these components are missing.
Campbell (1953, 1956) uses a multitrait– multimethod matrix, a useful tool for assessing the construct validity of a set of measures collected using differing methods. Thus, for example, one might collect data using multiple indicators of three constructs, say, prejudice, alienation, and anomie, using three different data collection methods: a face-to-face interview, a telephone interview, and a questionnaire. To the degree that different methods yield the same or a very similar result, the construct demonstrates what Campbell (1954) calls convergent validity. Campbell argues that in addition, the constructs must not correlate too highly with each other; that is, to use Campbell and Fiske’s (1959) term, they must also exhibit discriminant validity. Measures that meet both criteria provide evidence for construct validity.
VALIDITY GENERALIZATION
An important issue for work in educational and industrial settings is the degree to which the crite- rion-related evidence for validity obtained in one setting generalizes to other settings (American Psychological Association 1985). The point is that evidence for the validity of an instrument in one setting in no ways guarantees its validity in any other setting. By contrast, the more evidence there is of consistency of findings across settings that are maximally different, the stronger the evidence for validity generalization is.
Evidence for validity generalization generally is garnered in one of two ways. The usual way is simply to do a nonquantitative review of the relevant literature; then, on the basis of that review, a conclusion about the generalizability of the measure across a variety of settings is made. More recently, however, meta-analytic techniques (Hedges and Olkin 1985) have been employed to provide quantitative evidence for validity generalization.
Variables that may affect validity generalization include the particular criterion measure used, the sample to which the instrument is adminis-
3210

VALIDITY
tered, the time period during which the instrument was used, and the setting in which the assessment is done.
Differential predication. In using a measure in different demographic groups that differ in experience or that have received different treatments (e.g., different instructional programs), the possibility exists that the relationship between the criterion measure and the predictor will vary across groups. To the degree that this is true, a measure is said to display differential prediction.
Closely related is the notion of predictive bias. While there is some dispute about the best definition, the most commonly accepted definition states that predictive bias exists if different regression equations are needed for different groups and if predictions result in decisions for those groups that are different from the decisions that would be made based on a pooled groups regression analysis (American Psychological Association 1985). Perhaps the best example to differentiate the two concepts is drawn from examining the relationship between education and income. It has been shown that that relationship is stronger for whites than it is for blacks; that is, education differentially predicts income. If education were then used as a basis for selection into jobs at a given income level, education would be said to have a predictive bias against blacks because they would have to have a greater number of years of education to be selected for a given job level compared to whites.
Differential prediction should not be confused with differential validity, a term used in the context of job placement and classification. Differential validity refers to the ability of a measure or, more commonly, a battery of measures to differentially predict success or failure in one job compared to another. Thus, the armed services use the battery of subtests in the Armed Services Vocational Aptitude Battery (U.S. Government Printing Office 1989; McLaughlin et al. 1984) in making the initial assignment of enlistees to military occupational specialties.
MORE RECENT FORMULATIONS OF
VALIDITY
More recent definitions of validity have been even broader than that used in the 1985 Standards.
Messick (1989) defines validity as an evaluative judgment about the degree to which ‘‘empirical and theoretical rationales support the adequacy and appropriateness of inferences and actions based on . . . scores or other modes of assessment’’ (p. 13). For Messick, validity is more than a statement of the existing empirical evidence linking a score to a latent construct; it is also a statement about the evidence for the appropriateness of using and interpreting the scores. While most measurement specialists separate the use of scores from their interpretation, Messick (1989) argues that the value implications and social consequences of testing are inextricably bound to the issue of validity:
[A] social consequence of testing, such as adverse impact against females in the use of a quantitative test, either stems from a source of test invalidity or a valid property of the construct assessed, or both. In the former case, this adverse consequence bears on the meaning of the test scores and, in the later case, on the meaning of the construct. In both cases, therefore, construct validity binds social consequences to the evidential basis of test interpretation and use.’’ (p. 21)
Whether the interpretation and social consequences of the uses of measures become widely adopted (i.e., are adopted in the next edition of the Standards) remains to be seen. Messick’s (1989) definition does reinforce, the idea that although there are many facets to and methods for garnering evidence for inferences about validity, it remains a unitary concept; evidence bears on inferences about a single measure or instrument.
REFERENCES
American Psychological Association 1985 Standards for Educational and Psychological Testing. Washington, D.C.: American Psychological Association.
Bohrnstedt, G. W. 1983 ‘‘Measurement.’’ In Rossi, P. H., J. D. Wright, and A. B. Anderson, eds., Handbook of Survey Research. New York: Academic Press
Bohrnstedt, G. W., and D. Knoke 1988, Statistics for Social Data Analysis. Itasca, Ill.: F.E. Peacock.
Bohrnstedt, G. W. 1992 ‘‘Reliability.’’ In E. F. Borgatta (ed.) Encyclopedia of Sociology. 1st ed., New York: Macmillan.
Bollen, K. A. 1989 Structural Equations with Latent Variables. New York: Wiley.
3211