
1cavalli_sforza_luigi_luca_genes_peoples_and_languages
.pdfin detail. I am not a linguist, but I find Greenberg's arguments most convincing. Furthermore, Greenberg has been through this before. Many years ago, he proposed a now widely accepted classification of . African languages into only four families: Afroasiatic, including all Semitic languages and most languages in Ethiopia and North Africa; Nilo-Saharan, comprising languages spoken along the upper Nile and the southern Sahara; Niger-Kordofanian, which includes most central, southern, and West African languages, especially the Bantu ones; and Khoisan languages, spoken by Khoi-Khoi and San populations in southern Africa. Greenberg's classification endured a barrage of criticism when it was first proposed, and it is now widely accepted. A similar change of attitude toward the Amerindian classification will probably come with time.
An examination of some of the objections that Greenberg's colleagues have raised against his classifications can help us understand both the objective difficulties that affiict studies of linguistic evolution, and also the subjective ones, which are typified by Greenberg's attaclrers. Languages change very rapidly and it is terribly hard to establish clear connections between distant languages. Significant phonolOgical and ·semantic changes to all languages occur over time. The magnitude of these changes complicates the reconstruction and evaluation of linguistiC commonalities. Grammar also evolves, although usually sufficiently slowly to aid tlle recognition of more ancient linguistic connections. Under the pressure of phonetiC and semantic change, a language rapidly becomes incomprehensible. Modem languages derived from Latin would not be understandable to a Roman of two thousand years ago-a thousand-year separation is often enough to render a language incomprehensible to its first speakers. After a separation of five to ten thousand years, the rate of recognizably similar words can drop to ten percent or less. FOltunately, certain words and certain parts ofspeecll exhibit a slower rate of change and give us a better chance of discerning more distant linguistic relationships.
As for the problems caused by misguided methods, some antiGreenberg linguists believe it is impossible to posit a quantitative relationship between any two languages.. By disallOwing reliable
137
measurements, and by limiting the relationship between two languages only to "related or not related," the American linguists opposing Greenberg have ruled out the possibility of hierarchical classifications, an essential prerequisite to taxonomy.
Interestingly, tins. position completflly contradicts the view of linguists who use sophisticated methods to measure linguistic similarity, based on the fraction ofwords from a standard list that have a detectable common origin. This approach was developed by an American linguist, Morris Swadesh. He suggested that the probability that a word will lose its Original meaning stays constant over time. When the fraction of related words retained after a known peliod of time has been calculated (by observing the change from Latin to the modern Romance languages, for example), we can construct a "calibration" curve, allOwing us to read the time elapsed since two living languages shared a common ancestor. This method, which has been called "glottochronology," uses a "linguistic clock" very similar to the "molecular clock" we discussed previously for genetiCS. In biology, we have the advantage of using many proteins or DNA sequences to obtain many independent estimates of the date of separation between two species. Unfortunately, there is no such variety and wealth of data in linguistics to strengthen our conclusions. Glottochronology is a less rigorous method than those used in biology, and is especially difficult to apply to distant comparisons when the fraction of cognate words more than ten thousand years ago is very small. The word lists cannot grow, becanse only a limited number of words change slowly; moreover, every. word has its own rate of change, a fact neglected by glottochronology, which assumes a constant rate of change.
Other linguists insist that the resemblance between similar words in different languages be examined in light of "classical sound correspondences," which are very strict rules of phonolOgical change. If these rules are not followed exactly, they say, two words cannot be considered "cognates," that is, one cannot determine if they share a common origin. Greenberg has answered them with an impressive list of exceptions to these rules within the IndoEuropean language family and others. He concluded that it would
138
be impossible to establish the Indo-European language family if these rules were applied too rigorously. Fortunately, the IndoEuropean family was proposed and accepted before the theory of sound correspondences adopted its most rigid fonnat.
Finally, some linguists believe that the parent language that gave rise to a family or, in general, a cluster of languages must be reconstructed in order to demonstrate a phylogenetic relationship among families. Here too, biology provides an analogy-when the "consensus" DNA sequence is calculated from the sequence of two modern species. The consensus sequence is the best guess of the ancestral sequence that would require the fewest changes to generate the diversity observed in a particular sample. But the search for consensus is less rigorous in linguistics because linguistic variation is much wider than biolOgical variation, since only four nucleotides make up DNA. In "biology, some proteins are so important to an organism that little or no change could be tolerated. TIms, many sequences of proteins change extremely slowly, and it is possible to prove their relationship without reconstructing ancestral sequences of millions or even billions of years ago. Knowledge of a protolanguage may be helpful in comparative analyses, but imposing this exercise on all linguistic claSSifications is a serious limitation when so few proto-languages have been generated. Furthermore, reconstructions have a low probability of being entirely reliable. Greenberg's method avoids this impasse. It may be more subjective than is deSirable, but it can go much further than other methods.
TI,e classification of families by Merritt Ruhlen (a student of Greenberg's) appears to me to be satisfactory for comparing genetic and linguistic evolutions, as we will do in the next section. Defining a family does not appear to be an entirely objective task, but the distinctions between families, subfamilies, and supetfamilies are mostly a matter of convenience and are unnecessary for certain purposes. What matters is the possibility of establishing a simple, logical, and hierarchical relationship. Unfortunately, most modern classifications stop at the level of families, of which there are as many as seventeen in Ruhlen's unifYing system. There are some superfamilies, but, as already noted, modern linguistic methods
139
have not yet generated a complete tree growing from a single source.
It is of interest to consider some of the proposed superfamilies, even ifthey are rather Controversial. According to Ruhlen, Austric is a superfamily consolidating four families: Miao-Yao (spoken in pockets of southern China, and northern Vietnam, Laos, and Thailand); Austroasiatic, comprising Munda langnages spoken in northern India and Mon-Khmer (spoken mostly throughout Southeast Asia); Daic (spoken widely in southern China and much of Southeast Asia), and Austronesian. There are some 1,000 Austronesian languages, spoken by about one hundred eighty million people, including Taiwanese aborigines and Malayo-PolynesianS. The latter group ranges from Taiwan to Polynesia, parts of Melanesia, the Philippines, Indonesia, Malaysia, and as far west as Madagascar. The oldest Austronesian langnages are spoken by Taiwanese abOrigines. Whether or not this superfamily is accepted, it ties together a vel)' wide geographic region, including Southeast Asia, both insular and non-insular areas, and a large number ofislands in the two oceans separated by Southeast Asia.
Superfamilies extending over Europe and Asia are of particular interest. 1Wo linguistic groupings in this region are current today and are closely related: Nostratic and Eurasiatic. Rejected at first by most linguists, they are slowly being recoguized. The Nostratic superfamily, as originally described by various Russian scientists, includes the Indo-European, Uralic (spoken across the uraIic mountains), Altaic (widely spoken in Central Asia), Afroasiatic, which comprises many North African and also Semitic languages, Dravidian (currently spoken almost only in southern India), and South Caucasian families. One Russian linguist, Vitaly Shevoroshkin, showed that the Nostratic superfamily has strong similarities to the Amerindian group, as defined by Greenberg. The Eurasiatic superfamily proposed by Greenberg is similar to Nostratic, but it is different in the extension given to some families like Altaic and comprises smaller families like Eskimo and Chukchi, in addition to Japanese. Thus, Eurasiatic extends further east than Nostratic, but not as far southwest, as it does not include Afroasiatic and Ilravidian, which, according to Greenberg, have an older origin.
140
One could continue building on this tree by adding an earlier branch, leading to the Nostratic/Amerindian group on ·one side, and to a new, older superfamily on the other, Dene-Caucasian. Sapir initiated this new grouping, but Sergei Starostin proposed it officially only a few years ago. The Dene-Caucasian superfamily includes essentially three families: North Caucasian, Na-Dene, and SinoTibetan. The last family of languages is spoken ·by almost a billion individuals (in China; India, Nepal, Burma, and also Southeast Asia, plus some isolates hl Europe and West Asia), and is therefore the most populous one.
Thus, one could draw this approximate diagram:
Eurasiatic Afroasiatic Dravidian
V
Nosh-atic Amerindian
v~-
Eurasian
This hierarchy of superfamilies includes almost all of Europe, northern Africa, most ofAsia, and all of America. It lacks only three African families, Khoisan, Niger-Kordofanian and Nilo-Saharan, as well as Australian (170 languages), and Indo-Pacific, a group of700 languages, spoken mostly in New Guinea but also in neighboring islands and in the Andaman Islands near Malaysia.
There is, however, a small group of languages called isolates, which most linguists are unable to clasSify in any of the betterestablished families. The best known is Basque. Still spoken by approximately 12,000 French and perhaps a million and a half Spanish, this language is likely a relic of a pre-Neolithic period and is pOSSibly related to the language spoken by Cro-Magnons, the first modem humans in Europe. But it has certainly changed enough
141
that modem Basques and Cro-Magnons would not be able to communicate if they had the chance to meet. In fact they probably wouldn't even recognize their languages as related. Several linguists suggest a relationship between Basque and modem languages of the northern Caucasus. It is thus possible that one or more pre-Indo-European languages were spoken in Paleolithic Europe. Other linguists see even more encompassing resemblances among Basque, Caucasian, Sino-Tibetan, and Na-Dene languages. The last are spoken in the northwestern region of North America. Another claim is that Burushaski, an isolate spoken in a high valley in the Himalayas, is related to Basque and Caucasian. And, according to other linguiSts, Sumerian, Etruscan, and other linguistic "fossils" belong to the same ancient family, Dene-Caucasian. If the Nostratic/Amerindian group were added to the Dene-Caucasian group to form a hypothetical Eurasian superfamily, it may have extended across all Europe and Asia (except the southeast), and spread to the Americas. This giant superfamily later differentiated into several branches, and local twigs of this tree flourished and extended to regions far and wide.
These are interesting and hopeful hypotheses that need to be explored further. If we want to stay on absolutely firm ground, the situation is worse than simply lacking a reliable tree to link all modem languages: it is not even certain that all languages share a common origin. Most linguists consider both problems insoluble. It's a bit like trying to determine if all life on the planet had a Single origin. (Many biolOgists believe in a Single origin, since there is only one form of the twenty aminoacids found in proteins.) Greenberg noticed that there is at least one word all the linguistic families share: the root ttk. It means finger, or the number one (a semantic shift, which reqUires no explanation). In other languages we find other semantic changes of this root, which also appear acceptable, "hand" and "arm," for example, or "pOint, indicate." In French, doigt (pronounced "dwa") and in Italian, dito (meaning "finger") come from the Latin root digit.
Extending this example, American linguist John D. Bengtson has proposed, with Ruhien, about thirty other roots that are nearly as
142
universal. But it will take a long time for other linguists to examine and.accept these newest results. As might be expected, there are vel)' few roots that are common to most languages. Most of them designate parts of the body, or are personal pronouns or small numbers (one, two, and three). It is not surprising that words conserved since the beginning of linguistic diversification are among the first words we learn: eyes, nose, mouth, and so on. But there are others that were certainly very important in Paleolithic life and have been preserved in many languages; "lice" is one example.
Comparison of linguistic Families with the Genetic Tree
Even without a comprehensive linguistic tree, we can still compare our genetic tree to existing linguistic trees. There are some impressive similarities.
In figure 12, the language families have been drawn next to the populations that speak the respective languages. We see that a family corresponds to one or more branches in the genetic tree. Sometimes a language family is represented in the joint genetic-linguistic tree by only one branch, because the populations speaking these languages were grouped together in the genetic analysis. In effect they show great Similarity, either genetic or ethnographic, and live near each other. An example is the Bantu subfamily, belonging to Niger-Kordofanian, which is genetically homogeneous and distinct from other African groups. Although the word Bantu deSignates a language group, it is also useful as a biolOgical categol)'. Other genetic groupings also have been corroborated by linguistic information; for example, southern Indians speak Dravidian languages and the Na-Dene speak Native American ones. Shared language families often point to a common genetic and ethnic background.
The genetic tree shown in figure 12 is composed of 38 populations, some ofwhich are broadly grouped (e.g., Europeans, Melanesians). There are only 16 language families (we had no genetiC data on Caucasian populations when we drew this tree). Therefore some
143

Genetic'Ii'ee |
|
Linguistic Families |
|
|
|
Mbnti Pygmy --- Original language unknown |
|
|
|||
::~rican ~ Niger-KordofanJan |
|
|
|
||
Nilotic |
|
NilQ.·Saharan |
|
|
|
San (Bushmen) -- |
IChoisan |
|
|
|
|
Ethiopian~ |
|
|
|
|
|
Berber, N..A~ Afro-Asiatic==============11 |
|
||||
S. W. Asian |
|
|
|
:: |
|
Iranian |
|
|
|
I: |
|
Europe~ |
Indo-Euoopean |
|
:: |
|
|
Sordinia~ |
=====il |
|
|||
In.dian |
|
|
|
:: |
|
S. E. Indian |
. _ |
Dravidian ======== |
1',1) |
t=J--1I |
Z |
~~~d =-=---t-: Uralic-Yukaghir |
--i |
~ .-:: |
i |
||
•.g. i·=il |
~ |
||||
I |
|
|
'}g. :: |
~ |
|
~~:;O~ Sino-Tibetan |
5i:: |
~ |
|||
Korean |
|
|
|
:: |
~ |
Japanese |
|
AJtaic |
== ===:oJ: |
l |
|
Ainu |
. |
|
|
|
|
N. Turkie |
|
|
|
|
|
Es'khno |
|
Esld.mo-Aleut |
|
|
|
Chukchi |
|
Chukchi-Kamehatkall |
|
I |
|
s . Amerlnd3 |
|
|
: |
|
|
C.Amerind |
|
Amerind - - -- -- - -- -- - __ I |
|
||
N.Amerind |
|
|
|
|
|
N. W. Amerilld |
-- |
Na-Dene |
|
|
|
S. Chinese ---- |
|
Sino-Tibetan |
|
|
|
Mon-Khmer--- |
Aust~.atie |
|
|
|
|
Thai |
|
Daie |
|
|
|
Indonesian |
|
|
Austrie |
|
|
Malaysian ~ |
Austronesian |
|
|
|
|
Philippine |
|
|
|
|
|
Polynesian |
|
|
|
|
|
Micronesian |
|
|
|
|
|
Melanesian -----,.....,. I d P '6 |
|
|
|
||
New Guinean ~ |
11 0- aCl Ie |
|
|
|
|
Australian ---- |
|
Austnilian |
|
|
|
Figure 12. The comparison of genetic and linguistic trees (Cavalli-Sforza et al.
1988. pp. 6002-6).
144
populations in the genetic tree must and indeed usually do belong together to the same linguistic family. We can immediately note that populations that are adjacent in the genetic tree usually speak languages of the same family. Because of this we can use the genetic tree to help us date the approximate origin of a linguistic family. Most language families appear to have developed dUling a brief period, between 6,000 and 25,000 years ago.
There are, however, exceptions to the tendency of populations speaking related languages to be close in the ge~etic tree. Ethiopians,
for example, are palt of the African genetic brancll, but they generally speak languages from the Afroasiatic family. whicll are widespread in North Africa and the Middle East, where the people are generally Caucasoid. Ethiopians are in effect a bit more African than Caucasoid genetically and more· Caucasoid than most Africans linguistically. The Saami (Lapps) illush-ate another exception to this mle: they are genetically Caucasoid but speak a Uralic language whose other representatives live mostly in northeastern European Russia and northwestern Siberia near ti,e Urals. TI,e Uralic people of Asia are generally MongolOid, and ti,e Lapps are a mix of Caucasoid (probably from Scandinavia) and MongolOid (of Siberian origin), with a prevalence of the former. Even without looking at their genes, we can see this in the color oftheir skin and hair, and in the color and shape oftheir eyes, whicll vary in type from MongolOid to Caucasoid.
There is a simple explanation for these disagreements between genetic and linguistic classifications. As we have explained, tI,ese two populations are the products of relatively recent genetic admixture: of Europeans and Sibelians for the Lapps, and of Africans and Arabs for the Ethiopians. In the genetic tree, the populations are placed with ti,e groups that conhibuted t1;e greater proportion of genes. Extended admixture may also put them in a more isolated and somewhat intermediate position in the tree. The genetic effects of population admixture are mucll Simpler and more predictable than linguistic cllange. Genes of a mixed population occur in proportions corresponding to those of its ancestral parental populations. But a genetically mixed population tends to preserve only one of the two original languages. Sometimes, the language of a mixed
145
population will not change at all; more often, however, we find a few words or, sometimes, sounds borrowed from the other language. Grammar is more resistant to change than vocabulruy. As for the origin of the mixtures that generated Ethiopians and Lapps, we know of intimate contacts between Arabs and'Mricans in Ethiopia during the first millennium B.C. An Arab-Ethiopian kingdom established a capital first in Arabia (in the Saba region) and later in northern Ethiopia (at Aksum).. But earlier contacts may have occurred that took place too early for history to record. We also know that the Lapps have been in their present territory for at least 2,000 years. In both these cases, the shortage of written records limits our ability to know just how far back contact originated. In each case, the degree of genetic mixing established can vary and depends on the amount and mode ofcontact between two populations.
It is easy to calculate that the genome of a peoplewho received a constant genetic flow of 5 percent per generation from its neighbors would keep only 70 percent of its original genome after three centuries. This is the average value of admixture for Mrican Americans, who have retained 70 percent of their original genome and received 30 percent from mostly white settlers. If this flux continues with the same speed, Mrican Americans will have little more than 10 percent of their original genes after 1,000 years of habitation in America. In the cases of Lapps and EthiopianS, their parent populations may have been in reciprocal contact for a long time (perhaps several thousand years), and .degrees of admixture are greater than those observed in African Americans, who were in contact with whites for a much shorter pedod, and under conditions of strong social inferiority.
We can find still other interesting exceptions to exact ·correspondence of the genetic and linguistic trees. Genetically, Tibetans belong to the group of northern Mongols, but they speak a SinoTIbetan language like the Chinese. The Chinese represented in our tree, however, originate from southern China and are genetically more similar to southern Mongols. History comes to our rescue in this case as well. According to Chinese histOrians, TIbetans are related to the northern Chinese. Starting from the third century B.C.
146