Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Dawn P. Flanagan - Essentials of Cross-Battery Assessment-Wiley (2013).pdf
Скачиваний:
2
Добавлен:
10.12.2024
Размер:
26.11 Mб
Скачать

Chapter Five

Cross-Battery Assessment of Individuals

from Culturally and Linguistically

Diverse Backgrounds

Introduction

Eager to investigate the potential of what could prove to be a tool that might revolutionize the whole of psychology and its scientific contributions for the good of society, Henry Herbert Goddard quickly set his sights on demonstrating the utility of his English translation of the scale recently developed by Alfred Binet in France. Goddard's primary purpose was not to validate his instrument so much as it was to address a problem he perceived in the level of intelligence of recent immigrants to the United States. That his test was in fact a valid measure of intelligence was simply a given and neither he nor the other believers in the newfound IQ test (e.g., Carl Brigham, Lewis Terman, and others) questioned its validity in any way. And what made the early IQ test truly seductive was its ability to generate numbers that could be used to sort and rank individuals on what was clearly a critical dimension perceived necessary for the survival and well-being of the civilized world: intelligence. So it was that Goddard set out, not to investigate whether IQ tests had any merit or utility, but to attack a perceived decrease in the intelligence of the American population brought on most likely, in his view, by the great wave of immigration sweeping over the nation. And where better to find immigrants for this purpose than Ellis Island?

Goddard (1913) eventually found himself in New York harbor searching through the lines of newly arrived immigrants to find suitable individuals for intelligence testing. Of course, these lines were not comprised of higher status, well-educated, first class passengers but rather those with far more

limited education and means who often crossed the ocean in steerage class. The vast majority of these people had just spent many days at sea in cramped quarters and now stood in long lines waiting anxiously to be processed into the country. His two specially trained female assistants pored over those waiting in line looking specifically for individuals with the telltale appearance of the “feeble-minded” to whom Goddard could administer his test and verify that the current level of immigrant intelligence was indeed falling below normal and in the range of “moron,” a term he himself had coined. Goddard (1913) described the scene in this way:

We picked out one young man whom we suspected was defective, and, through the interpreter, proceeded to give him the test. The boy tested 8 by the Binet scale. The interpreter said, “I could not have done that when I came to this country,” and seemed to think the test unfair. We convinced him that the boy was defective. (p. 105)

Despite the keen insight of the interpreter, Goddard could not or simply chose not to appreciate the potentially significant impact that unfamiliarity with the culture upon which he had adapted his test (i.e., United States) might have had on immigrant test performance and his attempts to measure their intelligence. In fact, he remained so convinced of the validity of his IQ test that he kept at it until he gathered sufficient evidence to suggest that, on average, 80% of all Jewish, Hungarian, Italian, and Russian immigrants were, to use his term, morons, or “mentally defective” (Goddard, 1917). In opposition to the logical alternative explanation of such findings hinted at by his interpreter, Goddard offered a mixed rationale that attributed poor performance to deficiency in both intelligence and moral character. As he stated: “We cannot escape the general conclusion that these immigrants were of surprisingly low intelligence” (p. 251), and “It should be noted that the immigration of recent years is of a decidedly different character from the early immigration” (p. 266) when presumably, people were both smarter and morally superior.

Goddard was not alone, however, in failing to appreciate the significance of cultural and linguistic influences on test performance. Only a handful of years after he had begun his work with immigrants, a similar issue arose when the United States entered World War I. By 1918, Goddard had joined the war effort along with several leading psychologists of the day, including

Robert Yerkes, Lewis Terman, David Wechsler, Edward Thorndike, Carl Brigham, Arthur Otis, and Edwin Boring. This formidable team was commissioned by the Department of the Army to create a practical method for selecting men for officer candidates (i.e., those with the proper levels of intelligence, moral character, and leadership capabilities).

Under Yerkes's leadership and direction, the group initially developed the Army Mental Test based on a collection of both existing and newly developed tasks, including multiple-choice questions pioneered by Otis. In piloting the test, the group quickly ran into the same problem Goddard had encountered—many examinees simply did not speak English well, or at all. Even more did not read in English and could not comprehend the instructions to the tests and thus were effectively foreclosed from responding to the items. The group clearly recognized that language proficiency and reading ability would mitigate test administration (not so much actual performance!). Thus, they created an alternative version of the test, resulting in two forms: the Army Alpha (administered to those who could read American newspapers) and the Army Beta (administered to those who could not).

The modifications to the original form of the test were rather minor and not at all compelling. For example, the multiple-choice items were simply excluded from the examination, but the rest of the test was still administered via verbal instructions in English. Yerkes also trained and used assistants to demonstrate what was presumably expected of the examinee using a blackboard located at the front of the room. The resulting lack of efficacy of these modifications did little to stem the problem of recruits not knowing how to proceed, and ultimately Yerkes was forced to send orderlies about the room to find individuals doing nothing and get them working on the test and doing something, anything in an attempt to respond to the tasks. Yerkes and his staff did not question the presumed innateness of intelligence; therefore, they believed that these minor accommodations were sufficient to generate valid results for the subsequent planned statistical analyses. However, the data continued to suggest otherwise. For example, Yerkes noted that the average raw score on the Army Beta for those recruits who could not read but whose native language was English came in at 101.6, a score that was classified as “Very Superior” and assigned “Grade A.” Conversely, the average raw score for those recruits who not only could not read but also for

whom English was not the native language was found to be only 77.8, which was classified as “Average” or “Grade C.” Who might be considered solid officer material and who was likely to end up as an enlisted man is rather obvious in these findings. Nevertheless, in his final report to the Army, Yerkes (1921) did appear to recognize the issue of experiential differences when he wrote, “There are indications to the effect that individuals handicapped by language difficulty and illiteracy are penalized to an appreciable degree in Beta as compared with men not so handicapped” (p. 395). Unfortunately, his use of the term handicapped set the stage for notions regarding the supposed negative effects of bilingualism that drove understanding in many fields for another 50 years. Worse still was that although Yerkes was putting the issue up for debate and examination, Brigham, one of his Lieutenants most actively involved in analyzing the data, soon provided an alternative explanation for the differences in performance that had nothing whatsoever to do with language or literacy but only the preferred genetic explanation. To rescue Yerkes from his ambivalence, Brigham turned to a sample of the nonnative English–speaking individuals who had first been evaluated with the Army Beta and were then further evaluated with Terman's Stanford-Binet. The reasons for follow up evaluation were never made clear, but it is likely that it was done with the intention of demonstrating that the non-native English speaking group was simply mentally inferior to the native English speakers. The results from this investigation with the Stanford-Binet are presented in Figure 5.1.

Figure 5.1 Mean Mental Age on Stanford-Binet in a Nonnative English– Speaking Sample From Yerkes's Data (1921)

Of particular note is the fact that the data were analyzed by groupings generated on the basis of the number of years of residence in the United States. It seems reasonable that this was perhaps related to Yerkes's concerns about the impact of language proficiency; otherwise, it would be extremely curious to arrange the analyses using a variable of no particular significance. Whatever the case, the results seem rather straightforward: The longer a recruit had lived in the United States, the higher his mental age on the Stanford-Binet. Obviously, the causal relationship between these variables was not due to breathing the air or drinking the water in the country; rather it was related directly to the amount of time spent in the United States, which offered increased opportunities for learning about the culture and for developing better proficiency in English, much as Yerkes seems to have suspected. But Brigham (1923) took a decidedly different slant on the results, which he reported in his own book, A Study of American Intelligence, and rendered this analysis:

Instead of considering that our curve indicates a growth of intelligence with increasing length of residence, we are forced to take the reverse of the picture and accept the hypothesis that the curve indicates a gradual deterioration in the class of immigrants examined in the army, who came

to this country in each succeeding 5 year period since 1902....The average intelligence of succeeding waves of immigration has become progressively lower. (pp. 110–111, 155)

The degree to which Brigham had to twist and convolute his hypothesis to fit the data is dramatic but not surprising. As noted previously, early psychologists tended to reject outright any differences in mental performance or intelligence that could be ascribed to extrinsic differences. The power of the genetic argument and the purpose for which it was being applied (i.e., institutionalization, involuntary sterilization, immigration restriction) meant that environmental or circumstantial influences could not exist or at best had to be of minimal importance. And where such factors actually might be permitted to stand, they invariably had little to do with reasons for relatively poorer performance of diverse individuals. For example, Brigham (1923) allowed that whereas the Army Alpha may be affected by education, “examination Beta involves no English, and the tests cannot be considered as educational measures in any sense” (p. 100). In addition, perhaps spurred by the war effort, there was a strong “patriotic” theme underlying these efforts. Brigham's thoughts along these lines are readily apparent in some of his other assertions, for example:

If the tests used included some mysterious type of situation that was “typically American,” we are indeed fortunate, for this is America, and the purpose of our inquiry is that of obtaining a measure of the character of our immigration. Inability to respond to a “typically American” situation is obviously an undesirable trait. (p. 96)

At the very outset of the development of mental testing and of the IQ test that lay at the heart of the entire endeavor was the issue of experiential and background differences and their influence on the validity of test results, and psychologists did seem to take some notice. For example, even at the most rudimentary level, Goddard, Yerkes, and others certainly acknowledged language issues as a concern because they had either employed an interpreter to administer the test or created an alternative test with language-reduced instructions. In both cases, however, it is likely these steps were taken primarily because it was recognized that administration required some elementary degree of comprehension of the task at hand, not because of any intentional concession regarding fairness. For example, Goddard never

discussed any problems inherent in the translation of the original items from French into English and then translated yet again on the fly into a third language (e.g., Polish, Italian, Russian, etc.) by an interpreter. So when the interpreter pointed out a problem related to the examinee's likely lack of familiarity with the content of the test and its implications for fairness, Goddard dismissed it out of hand in service to his a priori convictions. Likewise, the development of the Army Beta by Yerkes and his team was motivated primarily by concerns of lack of variability in their data stemming from the fact that many recruits did not in fact comprehend or read English and thus did not even attempt to respond to many of the tests. This resulted in an overwhelming number of “0” responses and a concomitant lack of variability, which limited statistical analyses. Yerkes' use of the Beta version was thus an effort to generate data that could be subjected to proper statistical analysis and not any real concern with fairness in testing.

Don't Forget

Fair and equitable interpretation of test results is predicated on an understanding of the assumptions that underlie testing and the degree to which these assumptions are violated in the case of testing an individual whose background experiences and development are different from those of the individuals on whom the test was normed.

The question originally posed to Goddard by his interpreter a full century ago remains a legitimate and relatively simple one and boils down to: Do the results from testing indicate a difference or disorder? Early psychologists may be forgiven for being victims of their own convictions and the prevailing ideology of their time in failing to recognize the significance that experiential factors actually play in the measurement of mental abilities, including intelligence. It is more difficult to understand, however, why after so many decades of research and development in the field of psychometrics, very little attention has been paid to these factors or their significance on the interpretation of data derived from the use of such tests with diverse populations in the present day. Lack of recognition of such factors, and the attitudes and beliefs of a century ago that permitted them to remain largely ignored, have not rendered their importance any less significant in the testing arena. If anything, they have become increasingly more of a concern, given the rapid and dramatic changes in the ethnic and linguistic composition of the U.S. population, especially over the past two decades and in light of future

projections (e.g., U.S. Census Bureau, 2009). Couple this change in demographics with the reawakening interest in educational reform and accountability, and the current salience that testing, and anything that bears on the validity of test results, has even today can quickly be appreciated. For example, mention the terms black-white achievement gap, bilingual education, or adequate yearly progress, and the ensuing debate quickly turns to questions regarding testing, its value, and, ultimately, its validity—issues that psychologists have wrestled with since the invention of tests.

As modern-day practitioners turn their attention to the evaluation of an ever-expanding array of cognitive disorders (e.g., specific learning disability, intellectual disability, executive function deficits, etc.) among individuals from diverse cultural and linguistic backgrounds, it must be recognized that current methods, tools, and procedures bring with them a legacy that continues to either minimize, misunderstand, or ignore altogether potential threats to validity including cultural, linguistic, educational, and economic variables. This is not to say that the tests available for use by practitioners today are biased on the basis of this legacy; rather, tests are and remain undeniable artifacts of the people and the culture from which they are created and as such, this issue must be a central consideration in establishing a test's validity. As Sattler (1992) noted:

Probably no test can be created that will entirely eliminate the influence of learning and cultural experiences. The test content and materials, the language in which the questions are phrased, the test directions, the categories for classifying the responses, the scoring criteria, and the validity criteria are all culture bound....[I]n fact, all human experience is affected by the culture, from prenatal development on. (p. 579)

Sattler's observation outlines a fundamental premise with which practitioners must approach evaluation of individuals from diverse cultural and linguistic backgrounds—that tests will always reflect specific values, utilize culture-specific content to one extent or another, and expect possession of ageor grade-appropriate development in their content, design, and structure. As noted previously, this statement should not be construed as suggesting that tests are biased but rather that it is important to evaluate the degree to which the various assumptions embedded in the foundation of any test are violated as a result of cultural and linguistic difference. When