Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Тексты / 3 / Math.doc
Скачиваний:
171
Добавлен:
02.05.2014
Размер:
287.74 Кб
Скачать

MATHEMATICS IN THE AGE OF JANE AUSTEN: ESSENTIAL SKILLS OF 1800

Source: Mathematics Teacher, Nov2000, Vol. 93 Issue 8, p670, 10p, 1 diagram, 8bw

Author(s): Gray, S. I. B.

THE ACADEMY AWARD-WINNING MOVIE Sense and Sensibility presented a wonderful vision of life in early nineteenth-century England. In the absence of television, radio, movies, and videos, families sought entertainment in a manner far different from today's. The Dashwood girls--Elinor, Marianne, and Margaret--filled their days with visiting, reading, practicing the pianoforte, needleworking, and letter writing, not to mention gossiping and matchmaking. Long days were highlighted by a wonderfully relaxed midday family meal, during which conversation was paramount. Above all, Jane Austen portrays a concern for the thoughts and feelings of one's immediate acquaintances and pride in one's village.

An upper-middle-class family of the landed gentry-the Dashwoods--would have been interested in a proper education for their children. The textbooks of that era are a clue as to what was considered essential mathematics. Gender and class differences abound. Books were scarce. Paper was not mechanically produced until 1801, and power driven printing machines did not appear until 1812. Only a few books were available for instructors and students.

The mathematics books for both "young ladies" and "young men" open with a discussion of the four fundamental operations--addition, subtraction, multiplication, and division. All books' carefully discuss these concepts and the related vocabulary. Students were asked to learn the terms addend, minuend, subtrahend, difference, multiplicand, multiplier, product, divisor, dividend, and quotient. Each section included basic algorithms for calculating. In general, both the algorithms and vocabulary have endured and are similar to those found in contemporary American textbooks. Although explanations were brief, the basic material is the same.

After the opening sections on the basics, the differences are great. Commonly, the four operations are followed with a chapter on the "rule of three direct," or "golden rule," which is now called ratio and proportion and is solved by cross multiplying. After learning four operations and one method for finding an unknown quantity, students were thought to be prepared for adult mathematics. Books ended with tables--especially for weights, measures, and money--all of which were notoriously awkward in the English system.

We next examine the unique features of three books that were widely circulated between 1800 and 1810, or during the height of the Napoleonic era. An effort has been made to preserve the original grammar, spelling, capitalization, and punctuation.

MATHEMATICS TEXTBOOKS FOR THE YOUNG LADIES

Like the accomplished and distinguished writer Jane Austen, William Butler (1806) furnishes the reader with a sample of domestic detail in his book intended for "young ladies." See figure 1. Butler asserts that he is an experienced teacher of young ladies. His book presents mathematics in an unusual format. We find 619 problems, which are organized in alphabetical and numerical order. The topics range from astronomy, anchovies, and cork to parchment, the plague, and the steam engine. The literary content of the problems is worthy of an Austen. The problems display an integrated approach to teaching mathematics and cut across literature, history, science, and geography. They include quotations from Virgil, Milton, Pope, Shakespeare, and the Bible. Moreover, beginning with addition, the author apparently tried to increase the degree of difficulty of the mathematics as he progressed through the arrangement. Reading a problem or two helps one appreciate the scope and sequence of the curriculum.

Addition

No. 36 Pay a baker's bill of two pounds, a grocer's of three pounds, a milliner's of five pounds, a linen-draper's of sixteen poufids, and a cheesemonger's of seven pounds, and find the amount of the whole.

No. 38 Virgil, the celebrated Latin poet, was born near Mantua, in Italy, seventy years before the nativity of our Saviour; how many years have elapsed since that event to the present year 18057

Subtraction

No. 92 Magna Charta. Runnymede... is reverenced by every son of liberty, as the spot where the liberties of England received a solemn confirmation... [and is] considered the bulwark of English LIBERTY. The celebrated charter in question was wrenched from John in 1215; How long has that happy event preceded 18057

Ans. 590 years.

Multiplication

No. 133 Coaches. Coaches, as well as almost all other kinds of carriages which have since been made in imitation of them, were invented by the French, and the use of them is of modem date. Under Francis I who was a contemporary with our Henry VIII there were only two coaches; that of the queen, and that of Diana, natural daughter of Henry II. The kings of France, before they used these machines, traveled on horseback; the princesses were carried on litters, and ladies rode behind their squires. Till about the middle of the 17th century there were but few coaches in Paris; but prior to the late revolution in that capital, they were estimated at 15,000, exclusive of hackney-coaches (horse drawn taxis), and those let out for hire.

The introduction of coaches into England is ascribed by Mr. Anderson, in his History of Commerce, to Fitz Allen, earl of Rundl, in the year 1580; and about the year 1605, they were in general use among the nobility and gentry of London.

In the beginning of the year 1619, the earl of Northumberland, who had been imprisoned since the Gunpowder-Plot, obtained his liberation. Hearing that Buckingham was drawn about with six horses in his coach (being the first that was so) the earl put on eight to his, and in that manner passed from the Tower through the city.

Hackney-coaches, which, according to Maitland, obtained this appellation from the village of Hackney, first began to ply the streets of London, or rather wait at inns, in 1625, and were only twenty in number. So rapid, however, has been their increase since that period, that London and Westminster now contains 1100.

Suppose each coach to earn 16 shillings a day on an average, which is deemed a very moderate computation, the sum of £880 sterling is expected daily in the metropolis, in coach-hire, exclusive of what is spent in glass coaches, or unnumbered ones. What is the weekly, monthly, and yearly expenditure in the use of these vehicles?

Ans. £6,160 per week; £24,640 per month, and £321,200 per year; reckoning 13 months 1 day to the year.

A common form for correcting the calendar was to add a thirteenth month from time to time. This method was used in 1806. Butler's answer may be calculated using 365 + 1 days.

Multiplication of money, or compound multiplication

No. 219 Potatoes. Potatoes are the most common esculent (succulent) root now in use among us, though little more than a century ago they were confined to the gardens of the curious, and presented as a rarity. They form the principal food of the common people in some parts of Ireland:

Leeks to the Welsh, to Dutchmen butter's dear; Of Irish swains, potatoes are the cheer.

--Gay

Potatoes were originally brought to us from Santa Fe, New Mexico, North America, and as has been asserted, by Sir Francis Drake, in the year 1586. Others mention the introduction of them into our country about 1623; whilst others affirm that they were first cultivated in Ireland, about Younghall, in the county of Cork, by Sir Walter Raleigh, in 1610, and that they were not introduced into England till the year 1650. Peru, in South America, is the natural soil of potatoes, particularly the fertile province of Quito, whence they were transplanted to other parts of America. It is the root only of the potato plant that is eatable.

There are two varieties in general use; one with a white, and the other with a red root. And besides these, there is a new kind, first brought from America, which that "patriot of every dime," the late Mr. Howard, cultivated in 1765 at Cardington, near Bedford. They were also propagated in from the adjacent counties. Many of these potatoes weigh four or five pounds each; and hogs and cattle are found to prefer them to the common sort. They are moreover deemed more nutritive than others; being more solid and sweet, and containing more farina or flour. As an esculent plant, they appear also worthy of cultivation; being it is said, when well boiled, equal, when roasted, preferable to the common sort.

Immense quantities of potatoes are raised in Lancashire for exportation. Mr. Pennant says, that 30 or 40,000 bushels are annually exported to the Mediterranean Sea from the environs of Warrington, at the medium of 1 shilling 2 pence per bushel. A single acre of land sometimes produces 450 bushels. What are 179 bushels of potatoes worth at one shilling, 2 pence per bushel?

Ans. £10 8 shillings 10 pence.

Division

No. 243. Velocity of light.

Lot there be light, said God, and forthwith light Ethereal, first of things, quintessence pure, Sprung from the deep .... --Milton

Mathematicians have demonstrated, that light moves with such amazing rapidity, as to pass from the sun to our planet in about the space of eight minutes. Now, admitting the distance, as usually computed to be 95,000,000 of English miles, at what rate per minute does it travel?

Ans. 11,875,000 miles.

At 438 pages, this book is longer than most textbooks of the Austen era. Butler includes elaborate notes, footnotes, and references for further study. See figure 2. He closes with twenty-seven pages of arithmetic tables, including the essential monetary conversions, abbreviated here for the contemporary reader. The one obvious omission in this highly successful book is illustrations. Not one drawing is included.

4 farthings = 1 pence [4 qrs. = 1 d.]

12 pence = 1 shilling [12 d. = 1 s.]

20 shillings = 1 pound sterling [20 s. = 1£]

21 shillings make 1 guinea

MATHEMATICS TEXTBOOKS FOR YOUNG MEN

A young man's mathematics instruction was designed to give "secret satisfaction to the possessor, and contribute to render him an agreeable and useful Member of Society." Unlike the problems in Butler, those in Wallis's The Self-Instructor, or, Young Man's Best Companion (1811) are stated without elaboration:

What is the value of 21 gallons of brandy at 7 shillings 9 pence per gallon?

What is the value of 108 lbs. of indigo lahore at 7 shilling 8 pence per pound?

Reduce 246 Venetian ducats de Banco into sterling money.

Admit an army of 32,400 men were formed into a square battalion. Find the rank and file.

For the golden rule or rule of three, that is, ratio and proportion, the student is advised that the "chief difficulty is the placing of the numbers."

If 12 gallons of brandy cost £4, 10 shillings, what will 134 gallons cost?

The author suggests that £4, 10 shillings be changed to "the lowest mentioned."

The Self Instructor, or, Young Man's Best Companion contains chapters for "Joiners, Painters, Glaziers, Sawyers, and Bricklayers," as well as a chapter on "Planometry." See figures 3 and 4. The book discusses such unusual regular figures as the undecagon, which has eleven sides, and the quindecagon, with fifteen sides. It includes globes, cones, and pyramids, including the frusta, as well as algorithms for taking square and cube roots. It teaches a method for finding the length and mast of a ship. The young man is given information on the financial arrangements for bookkeeping, wills, and legal matters. The sections on longitude and latitude, which were important measures for a seafaring country, include the following: "New Mexico, including California, is bounded by 'unknown lands' on the north, Louisiana on the East, Old Mexico, and Pacific on West. The chief town is Santa Fe 36 degrees north latitude and 104 degree west longitude." The book closes with the statement that algebra was first known in Europe in 1494 and that printing-of all types--had been carried on in Westminster Abbey from that time until now.

Decimals are included, and the book describes unusual methods of handling "vulgar [common] fractions" (Wallis 1811, p. 96).

To reduce fractions

To reduce a fraction, a prime common divisor, not necessarily the greatest common divisor, was written in the position of an exponent.

56[sup 2]|28[sup 2]|14[sup 7]|2/84|42|21|3

To add fractions

A product of the denominators, not the LCD, was used.

3/4 + 2/7 + 5/6 = 126/168 + 48/168 + 140/168 = 314/168

To divide fractions

The method was not to invert and multiply but to find the product of the numbers joined by each of the first pair of arrows.

(15/16) / (2/3) = 45/32 = 1 13/32

(15/16) / (3/2) = 32/45 = 1 13/32

THE OPIE COLLECTION

Those two books represent the mathematics typically learned by middle-and upper-middle-class teenagers. The reader might ask about younger readers and more advanced mathematics students. For the answer, we turn to the Opie Collection, a special collection in the Bodleian Library, Oxford University. When it was presented to the university, the Opie, as it is known, was dedicated by Prince Charles, and it will soon be available for North American viewers through the UMI microfiche collection. See www.umi.com/hp/Support/Research /Files/220.html.

The 20 000 items in the collection include fairy tales, nursery rhymes, games, comic books, and coloring books, as well as game boxes and other educational items. Early American children's books, especially those reprinted in London, are part of the Opie. They include Tommy Thumb's Song Book (1794), which is thought to be the earliest known surviving edition of what may have been the first English nursery rhyme book. Mother Goose had already been printed in Boston. Then, and now, rhymes with counting were considered to be a child's first, and possibly best, introduction to arithmetic.

For the youngest students

At 3 3/4 by 2 1/4 inches, the size is the first thing that one notices about A Compendium of Simple Arithmetic; in which the First Rules of That pleasing Science are made familiar to the Capacities Of Youth, a book for elementary-school-age children. These books were "little books for little people." Indeed, the Opie has books so small that they can scarcely be held between the thumb and index finger. They typically begin with writing and spelling the counting numbers. Wallis's Compendium (1800), the title page of which is shown in figure 5, then goes to great length to explain the advantages of the "cypher," or place value, and the "decadary" system.

Wallis writes--for young children--that "neither a Euclid nor an Archimedes with all their wonderful mechanical powers" was able to extricate their number system from a "labyrinth of confusion." As in other titles of this decade, addend, minuend, and subtrahend are explained, but products are composed of "factors." The checking of subtraction and division is called the "PROOF" in bold letters. Definitions appear, for example, "Simple division is the finding how often one simple number is contained in another. The calculation is written as

Dividend Quotient

Divisor 3) 12 (4

or, for a longer problem, as follows:

833) 3104679 (3727 88/833

6056

2257

5919

88

Another definition is, "Reduction is the conversion of numbers from one name to another, but still retaining the same value." Although it was written for young children, this tiny book, like all titles in this article, contains tables for wine measure, as well as ale and beer measure. See figure 6.

For more advanced students

Three particular qualities of mathematics of this era should be noted:

1. For British students, advanced mathematics was synonymous with geometry, and most students studied an edition of the first six books of Euclid's Elements. The most popular edition of that time was the one by Robert Simson, of the University of Glasgow. The obvious advantage the student using later editions by Simson is that the Euclidean propositions were each followed by a proof. Moreover, each book of Euclid was accompanied by sample examination questions.

Another edition of Euclid was written by John Playfair, of the University of Edinburgh. It contains his Axiom 12, now known as Playfair's axiom, which states, "Two straight lines that intersect one another cannot both be parallel to the same straight line." This statement, and its implied deviation from earlier editions of Euclid, evolved into the largest controversy of nineteenth century British mathematics.

In his preface (1795, pp. iv-v), Playfair remarks that Dr. Simson had been "the most successful" modern editor and had left "very little room for the ingenuity of future editors to amend or improve the text of Euclid or its many translations." Playfair wrote that Simson's objective was "to restore the writings of Euclid to their original perfection, and to give them to modern Europe as nearly as possible in the state wherein they made their first appearance in ancient Greece." Playfair praised Simson by stating that he knew languages, was profoundingly skilled in geometry, and was an "indefatigable" researcher. To "restore" Euclid was a perfect mission for Simson. Playfair, however, believed that despite Simson's endeavors to remove corruptions, something was "remaining to be done." Playfair wrote that "alterations might be made that would accommodate Euclid to a better state of the mathematical sciences," and thus the Elements would be "improved and extended," more than at any "former period."

2. Until the American Revolution, one book--a single copy--was typically shipped across the Atlantic and then carefully used by an instructor to lead advanced mathematics students through a course of study. The Revolution brought about a change. In 1803, for example, an edition of Simson was printed in Philadelphia. Copies of Simson's later editions are still available in several older libraries.

3. Although mathematics journals existed, scant exchange occurred between German mathematicians and French or English mathematicians. However, the Opie collection does contain a fine translation from the University of Paris of Selected Amusements in Philosophy and Mathematics proper for agreeable exercising of the Minds of Youth (Despiau 1801). The introductory material is similar to that in the English books previously described in this article, but it ends with a discussion of topics that are now associated with probability. It includes factorials, permutations, combinations, Pascal's triangle, and various types of "gaming" odds--all topics that were highly developed in France. Actuarial tables on expected length of life include corrections for the large number of deaths that occurred in the first year of life. See figures 7, 8, and 9.

FOR THE UNIVERSITY STUDENT

The British Library has a copy of the "most important parts" of the arithmetic and algebra examinations required of candidates for an "ordinary" bachelor of arts degree from Cambridge in the early nineteenth century. A Cambridge or Oxford degree did not--and still does not--have "breadth" requirements. Unlike in American universities, one who "reads maths" studies no other subjects. The undergraduate degree is given at the end of three years. Mathematics majors must successfully write examinations that include only mathematics questions.

The arithmetic problems in the early 1800s required computational skills, conversion of measures and money, extraction of square and cube roots, and applications to business, especially interest and discount. Most of the algebra is commonly taught in high school today. However, some problems are unusual, whereas others are surprisingly familiar. Consider, for example, the following:

5. What will be the price of carpeting a room of 13 feet 4 inches long, and 12 feet 6 inches broad, at 4 shillings 6 pence a square yard?

Ans. £4. 3s. 4 d., or 4 pounds sterling, 3 shillings 4 pence.

12. Extract the square root of x[sup 4] + 8x[sup 3] - 64x + 64.

Ans. x[sup 2] + 4x - 8.

13c. Solve the equation

1/x + a + 1/x + 2a + 1/x + 3a = 3/x.

Ans. -11 +/- Square root of 13/6 a.

15. Expand

1/Square root of a - x

to 4 terms by the binomial theorem.

Ans. 1/a[sup 1/2] + x/2a[sup 3/2] + 3x[sup 2]/8a[sup 5/2] + 5x[sup 3]/16a[sup 7/2] + &c.

The answer in Arithmetic and Algebra (Wallis 1835 p. 327) is incorrect. The answer should be

1/a[sup 1/2] + x/2a[sup 3/2] + 3x[sup 2]/8a[sup 5/2] + 5x[sup 3]/16a[sup 7/2] + &c.

See Anton (1992, p. 730).

16. Insert 6 arithmetic means between 1/2 and 2/3.

Ans. 1/2, 11/21, 23/42, 4/7, 25/42, 13/21, 9/14, 2/3,

Find the sum of the series.

Ans. 4 2/3.

17. Define a logarithm; and shew that log N[sup p] = p log N. Having given log[sub 10][sup 2] = .30103 and log[sub 10][sup 3] = .4771213, find log[sub 10]36 and log[sub 10].018.

Ans. 1.5563026 and 2.2552726.

The answer 2.2552726 represents the centuries old notation and "tables" answer of characteristic + mantissa, or (-2) + (.2552726), and is equivalent to the contemporary calculator answer of (-1.7447275).

CONCLUSION

These publications furnish a record of the skills thought to be essential at the turn of another century. These mathematical records illustrate the continued need to develop good materials and tests. The era that gave us the legendary names of Trafalgar, Waterloo, Nelson, and George III was preparing its young for the increasingly complex global society.

In Britain today, parents are just as concerned as Americans with the education of their children. Specific topics debated in Parliament and discussed in the media are uncannily similar to those in the United States. Testing for teacher competence in mathematics and English, meeting standards, reducing class size, overcoming the shortage of qualified teachers, finding after-school care, and censuring underachieving schools are discussed at least as much-or more--in the United Kingdom than they are in the United States. The BBC and the government broadcast professional commercials in which a famous person, for example, Paul McCartney, reminisces about a favorite teacher. The government rates schools, and the ratings appear in newspapers. Being scrutinized and meeting standards are accepted as part of the system.

THE EVOLUTIONARY CHARACTER OF MATHEMATICS

Source: Mathematics Teacher, Nov2000, Vol. 93 Issue 8, p692, 3p

Author(s): Davitt, Richard M.

In her article "The Changing Concept of Change: The Derivative from Fermat to Weierstrass," Grabiner (1983) notes the following:

Historically speaking, there were four steps in the development of today's concept of the derivative, which I list here in chronological order. The derivative was first used; it was then discovered; it was then explored and developed; and it was finally defined. That is, examples of what we now recognize as derivatives first were used on an ad hoc basis in solving particular problems; then the general concept lying behind these uses was identified (as part of the invention of calculus); then many properties of the derivative were explained and developed in applications to mathematics and to physics; and finally, a rigorous definition was given and the concept of derivative was embedded in a rigorous theory.

As Grabiner observes, the historical order of the development of the derivative is exactly the reverse of the usual order of textbook exposition, which tends to be formally deductive rather than intuitive and inductive. Grabiner's article contains a number of other well-articulated historical and pedagogical messages, and I strongly encourage every mathematics instructor to read it in its entirety. However, this article emphasizes only her use-discoverexplore/develop-define (UDED) paradigm to describe the derivative's evolution. This model is extremely useful for constructing accounts of the evolution of numerous mathematical concepts and theories in addition to the derivative. In various courses that I teach, I often ask my students to use UDED to compile their own accounts of the evolution of mathematical entities. Occasionally, I have also required students to report their findings to the class, but the final, structured account is usually intended for the individual student's benefit alone.

Such assignments have many advantages. By encouraging my students to refer to such reputable histories of mathematics as those cited in the bibliography in constructing their accounts, I introduce them to the history of mathematics in a manner that is not overwhelming. This same exercise helps students understand that because most historical accounts are somewhat subjective, students need to justify their historical claims by citing reliable sources. For example, by using the UDED paradigm, students can learn to appreciate the basis that an author uses to assert that Isaac Newton and G. W. Leibniz invented calculus, that Girolamo Cardano was the first to solve the general cubic equation, that Carl F. Gauss, Janos Bolyai, and Nikolai Lobachevsky invented non-Euclidean (hyperbolic) geometry, and the like. Furthermore, as Grabiner observes, students learn that creating mathematics is often incremental, inductive, and exciting and that our modern versions of mathematical theories are polished diamonds that started off as rough pieces of carbon.

When I heard a colleague in the physics department describe the scientific method as "the development of knowledge from observation of specifics to conjecture to experiment to theory," it dawned on me that the UDED paradigm is essentially nothing more than using the scientific, or experimental, method to describe how mathematical theories and concepts evolve. Fuzzy foreshadowings, false starts, and dead ends have occurred in developing scientific models before such modern theories as those of the atom, light, heat, electricity, evolution, and the cosmos have crystallized and have been accepted as legitimate scientific theories. Students need to see this connection of shared modi operandi in the evolution of both mathematics and the natural sciences.

The accounts that teachers and students write using UDED can be detailed, brief, or anywhere in between. At times, the "big picture" is precisely what students should absorb; at other times, a mini-term paper might be appropriate. In assigning the UDED account as a student project, the instructor can easily set the parameters for the UDED project.

One of my favorite abridged applications of the UDED model is using it to construct a brief chronicle of the acceptance of the principle of mathematical induction as a valid method of proof in mathematics. In the sixth century B.C.E., the Pythagoreans certainly used the ideas underlying this principle when, proceeding geometrically, they conjectured and accepted as "true" such number-theoretic patterns as theorem S, which states that the sum of the first n odd integers is equal to the nth square number (Burton 1999, pp. 91-93). Francesco Maurolico gave the first formal inductive proof in the history of mathematics when he proved theorem S by induction; his proof (discovery) can be found in his work Arithmeticorum Libri Duo, published in 1575, the year of his death (Burton 1999, p. 426). In the next century, Blaise Pascal explored and developed the technique of mathematical induction in connection with his work on the arithmetic triangle and its applications (Burton 1999, pp. 418-28). Although John Wallis and Augustus De Morgan helped name this procedure induction, only in the latter part of the nineteenth century did Richard Dedekind--and then Gottlob Frege and Giuseppe Peano--define it mathematically. When formulating their sets of categorical properties for the natural numbers, each included the principle of mathematical induction or one of its logical equivalents as an axiom (Katz 1998, pp. 735-37).

USING "UDED" TO DESCRIBE THE EVOLUTION OF COMPLEX NUMBERS

The UDED model can also be used to describe the evolution of the complex numbers, a more commonplace high school mathematical topic than induction. Girolamo Cardano and other sixteenth-century Italian algebraists reluctantly began to use complex numbers when they saw that negative values appearing under the radical sign in the Cardano-Tartaglia formulas for solving specific cubic equations sometimes corresponded to recognizable real roots and when Cardano attempted to solve the problem of dividing 10 into two parts such that the product is 40. In Ars Magna, his famous algebra text of 1545, Cardano showed by "completing the square" that the two parts must be 5 + Square root of -15 and 5 - Square root of-15. Although he checked that these answers formally satisfied the conditions of the problem, he still regarded them as being "fictitious" and useless; he was only halfheartedly using complex numbers.

A generation later, Raphael Bombelli discovered the complex numbers in analyzing the "irreducible case" of the cubic equation when all three roots are real and nonzero and yet negative values always appear under the radical when a Cardano-Tartaglia type formula is used. When he published his treatise Algebra in 1572, he became the first mathematician bold enough to accept the existence of "imaginary," or complex, numbers and to present an algebra for working with such numbers. He assumed that they behaved like other numbers in calculation and proceeded to manipulate them formally, with Square root of -a x Square root of -a = -a for a > 0 being his key observation.

During the next three centuries, many mathematicians explored and developed various aspects of the complex, that is, imaginary, numbers. For example, in conjunction with their formative work in analytic geometry, calculus, and algebra, such mathematicians as Rene Descartes, Isaac Newton, G. W. Leibniz, Leonhard Euler, Jean d'Alembert, Carl F. Gauss, and Bernhard Riemann all employed complex numbers in describing their theories of equations, formulating the general logarithmic and exponential functions, and devising analytic tools for modeling and solving real-world problems. Casper Wessel, Jean Argand, and Carl F. Gauss contributed a crucial development to accepting and understanding the nature of complex numbers when they began to represent them geometrically in the real plane, much as we do today.

Finally, William Rowan Hamilton established the theory of complex numbers on a firm mathematical footing when he defined them in terms of ordered pairs of real numbers in almost the same way that modern textbooks define them. This definition and his rules for performing arithmetical calculations with his ordered pairs can be found in his 1837 paper "The Theory of Conjugate Functions, or Algebraic Couples; with a Preliminary and Elementary Essay on Algebra as the Science of Pure Time." Additional details concerning this UDED account of the evolution of the complex numbers can be found in Burton (1999) and Katz (1998).

"UDED" AND THE EVOLUTION OF BRANCHES OF MATHEMATICS

The UDED paradigm can also be used to construct brief accounts of the evolution of such entire branches of mathematics as Euclidean geometry. Most ancient peoples used formulas to calculate the areas of simple rectilinear figures and to approximate the circumference and areas of circles. For example, the early Egyptians, Babylonians, and Chinese used algorithms to compute the volumes of rectangular blocks, cylinders, and pyramids. Furthermore, the latter two civilizations discovered the general Pythagorean theorem and used it in geometrical and astronomical applications. These civilizations had no real notion of an axiomatic system on which they could base "proofs" of their geometric formulas and theorems. As most students do today, they accepted their geometrical results on the basis of diagrams and intuition and often did not even distinguish between exact and approximate answers.

From the sixth century B.C.E. to the beginning of the third century S.C.E., Thales, Pythagoras, Eudoxus, Plato, Aristotle, and other Greek mathematicians and philosophers shaped mathematics into a deductive, axiomatic science and discovered Euclidean geometry. Around 300 B.C.E., Euclid compiled their accumulated discoveries in geometry and number theory and presented them axiomatically in his famous book, the Elements.

Over the next two millennia, Euclidean geometry was explored and developed by mathematicians from virtually every society that learned of the Elements. Such additional mathematical advances occurred as Archimedes' replacement of the Euclidean theorem "The areas of circles are to one another as the squares on their diameters" with a proof of the precise Babylonian formula "The area of any circle is equal to the area of a right triangle in which one of the legs is equal to the radius and the other to the circumference" (equivalent to the modem formula area = pi r[sup 2]). However, the principal explorations and developments did involve repeated attempts to prove that Euclid's fifth, or parallel, postulate followed as a theorem from his other four more self-evident postulates and his common notions. The celebrated attempts of Proclus, ibn al-Haytham, John Wallis, Girolamo Saccheri, Adrien-Marie Legendre, Johann Lambert, and untold others were doomed to failure because--as we now know from the work of Janos Bolyai, Carl F. Gauss, and Nikolai Lobachevsky in the early nineteenth century--Euclid was indeed on sound logical ground when he made his parallel postulate an axiom for his geometry. It is logically independent of his other four.

Finally, at the very end of the nineteenth century, David Hilbert completely and logically defined Euclidean geometry in his classic monograph Foundations of Geometry (1899). Hilbert began his treatment of Euclidean geometry by postulating three undefined terms (point, line, and plane) connected by three undefined relations--incidence (on), order (betweenness), and congruence. He then offered a set of twenty-one axioms on which a logically consistent and complete treatment of Euclidean geometry could be based. In axiomatic studies of Euclidean geometry today, authors often distill Hilbert's collection of twenty-one axioms down to a set of fifteen logically independent axioms by combining related ones and deleting those that are implied by the others.

The principal pedagogical message here is that anyone purporting to offer high school geometry students a complete, deductive study of Euclidean geometry will fail. NCTM's curricular standards and recommendations indicate that a school geometry course should emphasize discovery, applications, and a representative sample of truly accessible proofs of such theorems as the Pythagorean theorem. Additional details concerning this UDED account of the evolution of Euclidean geometry can be found in Burton (1999) and Katz (1998).

CONCLUSION

Topics in addition to those already noted to which the UDED paradigm can be applied without unduly forcing the issue include the evolution of the concept and theory of a function, limit, infinite series, the integral, the number zero, negative numbers, real numbers, the theory of equations, and numerical procedures. It can be applied to describing the evolution of such entire branches of mathematics as non-Euclidean geometry, analytical geometry, and algebra (both manipulative and structural); such subareas of modern algebra as group theory; and trigonometry.

I encourage classroom teachers of mathematics to use Grabiner's generic paradigm both as a tool for their own acquisition of authentic historical accounts of the evolution of mathematical topics and as a pedagogical stratagem for their students to do the same.

GENERALIZED FIBONACCI SEQUENCES

Source: Mathematics Teacher, Oct2000, Vol. 93 Issue 7, p604, 3p

Author(s): Bradley, Sean

Everyone loves the Fibonacci sequence. It is easy to describe, yet it gives rise to a vast amount of substantial mathematics. Physical applications and connections with various branches of mathematics abound. What could be better, unless someone told us that the Fibonacci sequence is but one member of an infinite family of sequences that we could be discussing? The generalization that follows has great potential for student and teacher exploration, as well as discovery, wonder, and amusement.

The Fibonacci sequence is defined by the recurrence relation F[sub 0] = 0, F[sub 1] = 1, and F[sub n + 1] - F[sub n] + F[sub n - 1], for all integral n is greater than or equal to 1. The Fibonacci numbers can be generalized in various ways. Horadam (1965) furnishes one example. He defines a collection of sequences that depend on the real numbers a and b, as well as arbitrary integers k and q, as follows: we let w[sub 0] = a, W[sub 1] = b, and W[sub n + 1] = k x w[sub n] - q x W[sub n - 1]. The Fibonacci sequence has a = 0, b = 1, and q = -1. For example, 8 = 1 x 5 - (-1) x 3.

A subset of these sequences is interesting enough to deserve wider recognition among teachers and students of mathematics. We consider Horadam's sequences with w[sub 0] = 0, w[sub 1] = 1, and W[sub n + 1] = k x W[sub n] + W[sub n - 1] for all n is greater than or equal to 1. That is, instead of adding two consecutive terms to find the next term, as in the Fibonacci sequence, we first multiply the current last term in the list by k, then add the result to the next-to-last term. When k = 1, the result is just the ordinary Fibonacci sequence. Table 1 gives the first few terms of several generalized Fibonacci sequences.

We begin by investigating a few properties of this infinite collection of sequences, bringing along only the quadratic formula.

FIBONACCI-LIKE PROPERTIES

For many, the attractions of the Fibonacci sequence are the many elegant identities that it satisfies and the curious properties that it possesses. This article offers eight to illustrate. For more, see Hoggatt (1969) or a variety of other sources. The first five properties follow:

(F1) The GCD of F[sub n] and F[sub n + 1] is I for all integral n is greater than or equal to 0.

(F2) F[sub n] divides F[sub n x m] for all positive integers m, for all integral n > 0.

(F3) F[sub n, sup 2] - F[sub n - 1] x F[sub n + 1] = (-1)[sup n + 1] for all integral n is greater than or equal to 1.

(F4) F[sub 2, sub 2] + F[sub n + 1, sup 2] = F[sub 2n + 1] for all integral n is greater than or equal to 0.

(F5) {F[sub 1] + F[sub 3] + F[sub 5] + ... + F[sub 2n - 1] = F[sub 2] {F[sub 2] + F[sub 4] + F[sub 6] + ... + F[sub 2n] = F[sub 2n + 1] - 1

The first four statements are still true if any of the generalized Fibonacci sequences W[sub n] replace the Fibonacci sequence F[sub n]. Property (5) needs only the minor modification

(F5a) {w[sub 1] + w[sub 2] + ... + w[sub 2n - 1] = w[sub 2n]/k {w[sub 2] + w[sub 4] + ... + w[sub 2n] = w[sub 2n + 1] - 1/k

These extensions of Fibonacci properties convince us that these sequences are special and are worthy of further investigation. We can prove them by modifying the standard proofs of statements (F1) to (F5) for the Fibonacci sequence. See, for example, Hoggatt (1969). As an example, we prove property (F3) for the generalized sequences. That is, we prove

(F3a) w[sub n, sup - 2] - w[sub n - 1] x w[sub n + 1] = (-1)[sup n + 1],

for all integral n is greater than or equal to 1, where W[sub n + 1] = k x w[sub n] + w[sub n - 1], using mathematical induction.

We first note that when n = 1, the identity holds, since w[sub 1, sup 2] - w[sub 0] x w[sub 2] = 1 - 0 = (-1)[sup 2]. We assume that statement (F3a) is true for a particular value of n. Adding the quantity k x w[sub n] x w[sub n + 1] to each side of statement (F3a) and simplifying give

[Multiple line equation(s) cannot be represented in ASCII text]

Thus, statement (F3a) holds for n + 1 as well. The proof is complete.

NEARLY GOLDEN RATIOS

The golden ratio is the number

phi = 1 + Square root of 5/2

= 1.6180339887....

It arises in a variety of geometric contexts as a length or a ratio of lengths. See, for example, section 4 of Hoggatt (1969). We focus on a surprising property of the number phi, namely,

(F6) 1/phi = phi - 1,

and two connections between phi and the Fibonacci sequence,

(F7) [Multiple line equation(s) cannot be represented in ASCII text]

and

(F8) phi[sup n] = F[sub n] x phi + F[sub n - 1].

Property (F6) states that phi and its reciprocal differ only by 1, an integer, even though each is an irrational number with nonrepeating, nonterminating decimal expansion. An impressive way to demonstrate this property to students is to ask them to enter phi as

1 + Square root of/2

in their calculators, then use the reciprocal key. The decimal part does not change. Properties (F7) and (F8) hint at a complex intertwining of the golden ratio and the Fibonacci sequence. In particular, property (F7) tells us that the ratios of successive Fibonacci numbers,

1/1, 2/1, 3/2, 5/3, 8/5, 13/8, 21/13, 34/21,...,

approach the golden ratio phi. Property (F8) relates that every positive integral power of phi is a multiple of phi plus a constant, and these constants come from the Fibonacci sequence. For example,

phi[sup 2] = 1 x phi + 1, phi[sup 3] = 2 x phi + 1, phi[sup 4] = 3 x phi + 2,....

Let us see how properties (F6) to (F8) relate to the generalized Fibonacci sequences. We begin by considering how to verify property (F7). Since the terms of the Fibonacci sequence are defined by F[sub n + 1] = F[sub n] + F[sub n-1], the ratios of successive terms in the Fibonacci sequence are given by the equation

(1) F[sub n + 1]/F[sub n] = F[sub n] + F[sub n - 1]/F[sub n] = 1 + F[sub n - 1]/F[sub n],

for integral n is greater than or equal to 1. As n gets larger and larger, the ratio on the left-hand side of equation (1) approaches a limit, Which we call r. A calculus class can derive a proof. That is,

(2) F[sub n + 1]/F[sub n] arrow right r

as n arrow right Infinity. Then the ratio on the far right-hand side of equation (1) must approach 1/r, since it is a ratio of Fibonacci numbers in the reverse order of those on the left-hand side of equation (1). In other words,

(3) F[sub n - 1]/F[sub n] arrow right 1/r

as n arrow right Infinity. Putting equations (2) and (3) together with equation (1), we get

(4) [Multiple line equation(s) cannot be represented in ASCII text]

as n arrow right Infinity. It follows that r[sup 2] = r + 1 , or r[sup 2] - r - 1 = 0. By using the quadratic formula to solve this last equation, we arrive at the positive value of

r = 1 + Square root of 5/2

= phi,

the golden ratio. Equation (4) actually proves property (F6), since now we know that phi = r.

What happens if we repeat this process with any of the generalized Fibonacci sequences? We first consider some examples. Using the entries in table 1 when k = 2, we see that the successive ratios of this generalized Fibonacci sequence are

When k = 3, the ratios are

2/1, 5/2, 12/5, 29/12, 70/29, 169/70, 408/169,...,

with decimal approximations

2, 2.5, 2.4, 2.416, 2.41379..., 2.41428..., 2.41420..., ....

When k = 3, the ratios are

3/1, 10/3, 33/10, 109/33, 360/109, 1189/360, 3927/1189,...,

with decimal approximations

3, 33, 3.3, 3.30, 3.30275..., 3.3027, 3.30277...,....

Each sequence of ratios seems to be approaching a definite number. What numbers are they?

We can proceed in a manner analogous to the way that we derived equation (4). In the case in which k = 2, we have

(5) w[sub n + 1]/w[sub n] = 2 x w[sub n] + w[sub n - 1]/w[sub n] = 2 + w[sub n - 1]/w[sub n].

If the ratios on the left-hand side are converging to r, then the ratios on the far right-hand side of this equation are approaching 1/r. As n arrow right Infinity, equation (5) becomes

r = 2 + q/r.

Solving for r, we find that the positive value for r is

2 + Square root of 2[sup 2] + 4/2 = 1 + Square root of = 2.4142135624....

When k = 3, equation (5) becomes

r = 3 + 1/r,

with positive solution

r = 3 + Square root of 3[sup 2] + 4/2

= 3 + Square root of 13/2

= 3. 3027756377....

In general, for the sequence w[sub n] defined by w[sub n + 1] = k x w[sub n] + w[sub n - 1], we find that the ratios of successive terms in a generalized Fibonacci sequence approach r[sub k], which is the solution of

(F7a) r[sub k] = k + 1/r[sub k].

These numbers, which we call nearly golden ratios, are given by the formula

r[sub k] = k + Square root of k[sup 2] + 4/2.

Just like the golden ratio, phi, each of the numbers r[sub k] differs from its reciprocal by an integer because (F7a) can be rewritten as

1/r[sub k] = r[sub k] - k.

Letting k run through all nonnegative integers, we have a complete list of positive real numbers whose reciprocals have the same decimal part as the numbers themselves, resulting in an interesting exercise for students. The proof begins with equation (F7a). Table 2 gives some examples.

What about property (F8)? Its more general form is

(F8a) r[sup n, sub k] = w[sub n] x r[sub k] + W[sub n -1].

I leave it to the reader to verify the proof using induction.

CONCLUSION

This brief introduction is not nearly the whole story. We do not know the whole story about the Fibonacci sequence itself, so how could we give a complete account of these Fibonacci-like sequences. Properties (F1) through (F8) are merely a sampler of some of the well-known Fibonacci properties, identities, and connections that the newer Fibonacci-like sequences satisfy. I hope that you and your students are motivated to set off on the trail of more.

TABLE 1 Generalized Fibonacci Sequences

Legend for Chart:

A - k

B - The First Few Terms of w[sub n]

A B

2 0, 1, 2, 5, 12, 29, 70, 169, 408, 985, 2378, 5741,...

3 0, 1, 3, 10, 33, 109, 360, 1189, 3927, 12 970,...

4 0, 1, 4, 17, 72, 305, 1292, 5473, 23 184, 98 209,..

5 0, 1, 5, 26, 135, 701, 3640, 18 901, 98 145,...

TABLE 2 The First Four Nearly Golden Ratios and Their Reciprocals

Legend for Chart:

A - k

B - r[sub k]

C - 1/r[sub k]

A B C

1 1.6180339887 ... 0.6180339887 ...

2 2.4142135623 ... 0.4142135623 ...

3 3.3027756377 ... 0.3027756377 ...

4 4.2360679775 ... 0.2360679775 ...

5 5.1925824035 ... 0.1925824035 ...

GROUP SYMMETRIES CONNECT ART AND HISTORY WITH MATHEMATICS

Source: Mathematics Teacher, May2000, Vol. 93 Issue 5, p364, 7p, 6 diagrams, 29c

Author(s): Natsoulas, Anthula

THROUGHOUT HISTORY, different cultures have produced designs to be used as ornamentation, as part of ceremonies, and as religious symbols. Many of these designs are mathematical in nature, and their bases are often the transformations of reflection and rotation in the plane. The images form groupings that appear to have an underlying unity. Thus, history and art merge to create a medium through which students can study the concrete operations of reflection and rotation in the plane, as well as the more abstract concept of symmetry groups. The resulting patterns give students a sense of the potential for creativity inherent in mathematics. Exploring group symmetries within the context of such designs furnishes enriching experiences, connects art and history to mathematics, enhances the understanding of transformations in the plane, and shows the common underlying structure of algebra and geometry. Students should have the opportunity to see connections within mathematics and between mathematics and the various arenas of human activity and should develop an understanding of the types of reasoning that form the basis of mathematical thought.

All cultures participate in the six mathematical activities of counting, locating, measuring, designing, playing, and explaining (Bishop 1988); but designing results in some of the richest and most diverse outcomes. The beautiful designs created by different cultures mirror the uniqueness of their histories. Peoples of the Eastern and Western worlds have used mathematical ideas to create patterns in woven fabrics; ornamentation for religious objects and places of worship; and adornment of the walls, floors, and ceilings of the homes of nobles. A significant amount of mathematics, including the principles of symmetrical relationships, is implicit in such designs.

This article focuses on two types of symmetries--rotation and reflection, their underlying structure as a mathematical group, and their presence in the designs of diverse cultures. Patterns created by applying these symmetry operations offer students a visual image of closure, identity, inverse, and associativity, which form the axiomatic basis of algebra. Through patterns, this article intuitively develops the concept of symmetry groups and gives formal definitions of rotation and reflection symmetry and symmetry groups.

The design examples in this article focus primarily on those of Cyprus and Ethiopia, two nations whose mathematical art is not well known. The mosaics of Cyprus, typical of those found throughout the Roman world, date to between the fourth and eighth centuries C.E. and contain many intricate geometric patterns. It is believed that at one time designs for mosaics were collected in pattern books.

The form of Christianity introduced in Ethiopia in the first half of the fourth century and the art forms that developed from it became an integral part of the lives of its people. The Ethiopians developed elaborately designed crosses that they used both as jewelry and in religious processions. In the town of Lalibela, an important center of medieval Ethiopia, several rock-hewn churches built during the thirteenth century include geometric patterns.

REFLECTION AND ROTATION SYMMETRY

A symmetry is defined to be a motion of an object such that the appearance of the object is unchanged. A reflection symmetry is determined by a line, called the line of reflection, through which the original object is reflected. For each point of the original object, its distance to the line is the same as the distance of its corresponding image point. A rotation symmetry is determined by a rotation of the object around a fixed point called the rotocenter. The amount of rotation can be expressed as a fraction of a full turn or by the degrees of rotation in a counterclockwise direction.

Figure 1 includes a range of designs from Ethiopia and Cyprus that display different kinds of symmetry. The teacher can ask students to group those items that appear to have the same kinds of symmetry. A set of objects that have the same kinds of symmetry belongs to the same symmetry group. Thus, in figure 1, items (a) and (d) both belong to the same symmetry group, since rotations of 180 degrees or reflections around a vertical or horizontal line through the center return the design to its original appearance. Similarly, the interior part of the cross in item (e) and the circular portion of item (f) belong to the same symmetry group, since both exhibit 90 degree rotation symmetry. In like manner, the two designs enclosed within the circles in item (c) belong together, since both exhibit 60 degree rotation symmetry. The reader should explore the various reflection and rotation symmetries of item (b).

The Ethiopian cross, excluding its base, in figure 2 has both reflection and rotation symmetry. It contains four lines of reflection. If the figure is rotated through a one-quarter turn, a one-half turn, or a three-quarter turn--or equivalently, 90 degrees, 180 degrees, and 270 degrees, respectively--the appearance of the object remains unchanged. These symmetries are shown in figure 3 with a second Ethiopian cross, again excluding the base shown at the top of the figure. Since the figures are hand carved, the curved lines may not all line up precisely, but the artist clearly had such symmetries in mind when creating the figure.

The mosaic design shown in figure 4 (p. 366) is from Kourion in Cyprus; it contains the same symmetries as the Ethiopian cross. Students can test the rotation symmetries with a piece of tracing paper on which a coordinate axis is drawn or with two overhead transparencies. They can place the origin at the rotocenter on top of a copy of the design, trace an outline of one of the arms, and rotate the paper or transparency to show the symmetry.

ROTATION AND REFLECTION SYMMETRIES IN THE SQUARE

The square demonstrates the same rotation and reflection symmetries as the Ethiopian cross and the Cypriot mosaic design. See figure 5. A description of the symmetry motions can be simplified. For the square, instead of thinking of the one-quarter turn, one-half turn, and three-quarter turn as different motions, the one-quarter turn can be considered as the unit motion. Thus, the one-half turn is the one-quarter turn applied twice, and the three-quarter turn is the one-quarter turn applied three times. Students can verify this result by manipulating tracing paper or transparencies as previously described. In general, the smallest rotational symmetry of an object is represented by r and successive rotations by r[sup 2], r[sup 3], r[sup 4], and so on. For the square, r is the one-quarter turn and r[sup 2], r[sup 3], and r[sup 4] represent turns of two-quarters, or one-half; three-quarters; and four-quarters, respectively.

Although more than one line of reflection often exists, specifying only one line is sufficient. Reflection with respect to this line can be represented by m. The remaining reflections can then be created by combining rotation and reflection motions. In general, a sequence of r's and m's indicates that these symmetry motions are applied sequentially to an object, with the order in which they are applied being read from right to left. In this article, the symbol "diamond" is used to indicate the sequential application of motions. For the square, we can define the line of reflection to be the vertical one, as shown in figure 6. The sequence r diamond m indicates a reflection through this line followed by a rotation of one-quarter turn counterclockwise, which is equivalent to a reflection through the original diagonal AC. This sequence is shown in figure 7.

GROUP SYMMETRIES OF THE SQUARE

A symmetry group is a special case of a mathematical group, but great diversity exists among the members of any one symmetry group. In spite of the differences, the implicit mathematical characteristics that determine group membership allow even the untrained eye to recognize the unity. Figure 8 shows examples from Ethiopia and Cyprus that are members of one symmetry group; all the designs contain exactly four rotation symmetries and exactly four reflection symmetries. For item (c), consider the inner cross. For items (e), (f), and (g), consider only the outlines and not color or internal design variations. Members of a symmetry group that contains only the four rotation symmetries of the square are shown in figure 9 (p. 368). For the mosaic design, consider only the pattern outline and not color variations.

Identity and inverse operations

A complete discussion of symmetry groups includes two additional operations that can be applied to a figure. The identity symmetry motion, denoted by "1," leaves the original figure unchanged. An inverse symmetry motion returns the object to the original figure. In the square, for the basic rotation unit r of one-quarter turn counterclockwise, the inverse rotation is denoted by r[sup -1] and is a three-quarter turn counterclockwise. Thus, r[sup -1] diamond r = 1; that is, applying a counterclockwise three-quarter turn after applying a counterclockwise one-quarter turn leaves the original figure unchanged.

THE SYMMETRIES OF THE EQUILATERAL TRIANGLE

The equilateral triangle contains symmetries analogous to the reflection and rotation symmetries of the square. Figure 10 shows examples of designs that contain only threefold rotation symmetry. The rotocenter is the point of intersection of the angle bisectors of the triangle; the unit of rotation, r, is a one-third turn, or 120 degrees. A rotation of two-thirds of a turn, or 240 degrees, is represented as r[sup 2]. Figure 11 is a sketch of the triangle showing these rotation symmetries. The equilateral triangle also contains three lines of reflection, as shown in figure 12. Students can convince themselves that these lines are lines of reflection by drawing the lines on an equilateral triangle, cutting out the triangle, and folding along the lines.

Many designs that contain threefold rotation symmetry also contain the reflection symmetries of the equilateral triangle. To illustrate both rotation and reflection symmetries combined, a figure that has reflection symmetry through its center is placed in each third of the triangle, as in figure 13. The mosaic design shown in figure 14 is from Kourion in Cyprus; it illustrates the rotation symmetry of the equilateral triangle. The original is not well preserved, but the intended threefold symmetry of the pattern is evident. All figures that contain both the rotation and reflection symmetries of the equilateral triangle belong to a single symmetry group; see figure 15 for examples.

As with the square, indicating one line of reflection and one rotation is sufficient for the equilateral triangle. This line of reflection, m, can be the perpendicular bisector drawn from vertex A in the originating A ABC, as shown in figure 12. The rotation unit of 120 degrees is represented by r. If the lines of reflection m, m[sub 1], and m[sub 2] remain fixed and do not change position as the triangle is rotated, reflection in line m[sub 1] can be expressed as "m r," that is, a one-third turn followed by a reflection in m. Similarly, reflection in line m[sub 2] can be expressed as "m diamond r[sup 2]," that is, two one-third turns and reflection in m. Thus, all symmetries of the equilateral triangle can be expressed as a set of six motions in terms of r and m: [1, r, r[sup 2], m, mr, mr[sup 2]}. The symbol "diamond" can be omitted when the meaning of the sequence of motions is clear. Students should convince themselves that this set of six motions expresses all symmetries contained within the equilateral triangle. Again, a model with tracing paper can make the experience more concrete.

AN EXTENSION EXPLORATION

Students in advanced classes can explore consecutive applications of the symmetry motions in more depth and in abstract form. These applications can be related to a mathematical group that is a collection of elements and an operation applied to the elements that satisfy the following characteristics: (1) the set of elements is closed with respect to the defined operation; (2) an identity element exists; (3) for each element in the set, an inverse element exists; and (4) the operation is associative. Taking as the set of elements the symmetry motions of the equilateral triangle and the operation diamond as the application of the motion read from right to left, table ! (p. 370) shows the outcomes of applying diamond to the set {1, r, r[sup 2], m, mr, mr[sup 2]} with itself. The convention for reading the order of operations is row by column.

The outcomes in table 1 can be simplified to the symmetries shown in table 2 (p. 370). Students can verify that the equilateral triangle with the set of six symmetries and the operation of diamond satisfy the properties of a mathematical group. The outcomes from combining the six symmetries can be written in terms of the original set of symmetries. A study of the table verifies the properties of closure, identity, and inverse. Associativity can be explored by considering a number of examples, such as (m diamond r[sup 2])diamond r = m diamond (r[sup 2] diamond r) = m. Students can conclude that the property of associativity appears to hold, even though it has not been proved.

DEVELOPING INTEGRATED UNITS WITH ART, HISTORY, AND OTHER ACTIVITIES

Symmetry groups can also help students see mathematics as a human activity that overcomes the sterility that is sometimes associated with it. The mathematical developments shown in this article offer an opportunity to develop an interdisciplinary unit among the mathematics, social studies, and art teachers. For the mosaics of Cyprus, the historical link could be studying the Roman world during classical and early medieval times. A link with art could include studying how mosaic designs are created on paper and transferred to tiles or onto pavement. Students could create their own mosaic designs on graph paper and then render them onto unit squares of wood or cardboard with small colored tiles set into mastic, using grout to fill in any remaining spaces. For the crosses of Ethiopia, the historical link could be studying the adaptation of Christianity by an African culture.

Students can also create their own designs to illustrate the different types of group symmetry. By working collaboratively with a defined set of symmetries, each group of students can create a design to illustrate the given set of symmetries. Graph paper, straightedges, and compasses are all that are needed, although computer software can serve as a modern tool. The differences among the resulting designs illustrate the common underlying mathematical concepts and the potential for diversity in their interpretation.

CONCLUSION

A mathematical group is often difficult for students to understand. Symmetry groups furnish a visual image for this abstract concept and a cultural environment in which it can be embedded. The designs that are members of any one symmetry group are both the same and different. The similarities exist because of the universality of the underlying mathematical principles; the differences exist because of the differences in the cultures that produce them. The mosaic designs of Cyprus and the religious art of Ethiopia are radically different with respect to the media that were used to create them and the uses to which they were put. But the significant mathematics that is at the base of their creation is the same and should not be taken lightly. As Stevens (1996, 168) quotes Herman Weyl,

[o]ne can hardly overestimate the depth of geometric imagination and inventiveness reflected in these patterns. Their construction is far from being mathematically trivial. The art of ornament contains in implicit form the oldest piece of higher mathematics known to us.

The visual images that lead to an informal definition of the concept of a symmetry group can lay the foundation for more formal definitions and higher levels of abstraction. For all students, the examples shown can provide a concrete visual image and intuitive notion of the mathematical unity that underlies a mathematical group.

TABLE 1 Application of Consecutive Motions of Symmetries of the Equilateral Triangle

Legend for Chart:

A - diamond

B - 1

C - r

D - r[sup 2]

E - m

F - mr

G - mr[sup 2]

A B C D

E F G

1 1 r r[sup 2]

m mr mr[sup 2]

r r r[sup 2] r[sup 3]

rm rmr rmr[sup 2]

r[sup 2] r[sup 2] r[sup 3] r[sup 4]

r[sup 2]m r[sup 2]mr r[sup 2]mr[sup 2]

m m mr mr[sup 2]

mm mmr mmr[sup 2]

mr mr mr[sup 2] mr[sup 3]

mrm mrmr mrmr[sup 2]

mr[sup 2] mr[sup 2] mr[sup 3] mr[sup 4]

mr[sup 2]m mr[sup 2]mr mr[sup 2]mr[sup 2]

TABLE 2 Application of Consecutive Motions of Symmetries of the Equilateral Triangle Simplified

Legend for Chart:

A - diamond

B - 1

C - r

D - r[sup 2]

E - m

F - mr

G - mr[sup 2]

A B C D

E F G

1 1 r r[sup 2]

m mr mr[sup 2]

r r r[sup 2] 1

mr[sup 2] m mr

r[sup 2] r[sup 2] 1 r

mr mr[sup 2] m

m m mr mr[sup 2]

1 r r[sup 2]

mr mr mr[sup 2] m

r[sup 2] 1 r

mr[sup 2] mr[sup 2] m mr

r r[sup 2] 1

Fig. 1 Symmetrical designs from Cyprus and Ethiopia

ASTRONOMICAL MATH

Source: Mathematics Teacher, Dec99, Vol. 92 Issue 9, p786, 7p, 1 chart, 12 diagrams Author(s): Ryden, Robert

High school mathematics teachers are always looking for applications that are real and yet accessible to high school students. Astronomy has been little used in that respect, even though high school students can understand many of the problems of classical astronomy. Examples of such problems include the following: How did classical astronomers calculate the diameters and masses of Earth, the Moon, the Sun, and the planets? How did they calculate the distances to the Sun and Moon? How did they calculate the distances to the planets and their orbital periods? Many students are surprised to learn that most of these questions were first answered, often quite accurately, using mathematics that they can understand.

The NCTM's Standards stress the importance of connections among various branches of mathematics and between mathematics and other disciplines; the astronomy problems that follow combine algebra, geometry, trigonometry, data analysis, and a bit of physics. My geometry and algebra students have seen most of these problems and could understand them. They have also been able to experience making distance measurements themselves by using the method of parallax, which is explained in this article.

THE SIZE OF EARTH

By the third century B.C.E., many scientists were convinced that Earth was spherical. One clue was that during an eclipse of the Moon, the edge of Earth's shadow always appeared to be an arc of a circle. Because of the belief that Earth was spherical, much discussion occurred about how to measure its circumference.

Eratosthenes, who was director of the great library at Alexandria, Egypt, found the first successful method. He had learned that at noon on the day of the summer solstice, in Syene, in southern Egypt, the bottom of a well was illuminated by the Sun; therefore, the Sun was directly overhead there. In Alexandria, in northern Egypt, the Sun was not directly overhead on that day. Any vertical pole casts a shadow. By measuring a pole's shadow and using the ratio of the shadow's length to the pole's height, as shown in figure 1, Eratosthenes was able to calculate Theta, the Sun's angle away from the vertical. Figure 2 shows how he used that information: Reasoning that the Sun's rays striking Alexandria were essentially parallel to those striking Syene, he realized that his angle Theta was the same as the difference in latitude between the two cities. Knowing the distance, D, between them, he was able to calculate the full circumference of Earth. His measure for Theta was 7 Degree 12', which is one-fiftieth of a complete circle.

Because caravans could cover the distance between the cities in fifty days, traveling at the rate of one hundred stadia a day, he assumed that the distance between the cities was five thousand stadia and that the circumference of Earth was therefore 50 x 5000, or 250 000, stadia. The actual length of a stadium in modern units is not known, but it is believed to have been about one-tenth of a mile, which makes Eratosthenes' value for the circumference agree remarkably well with the value accepted today.

FIRST ATTEMPT TO MEASURE THE DISTANCE TO THE SUN AND MOON

Also in the third century B.C.E., Aristarchus of Samos measured the ratio of the Sun's distance from Earth to the Moon's distance from Earth by using a method illustrated in figure 3 (Abell 1964). He reasoned that at the first and third quarters of the Moon, the angles EM[sub 1]S and EM[sub 3]S must be right angles. All he needed was angle M[sub 1] EM[sub 3], and either a scale drawing or trigonometry would give him the distance ratio that he wanted. He assumed that the Moon's orbit is circular, that its orbital velocity is uniform, that the Sun is sufficiently near that angle M[sub 1]EM[sub 3] is measurably different from 180 degrees, and that he could observe the instants of first and third quarter sufficiently accurately. All his assumptions were incorrect, but his method makes sense in principle. He determined, inaccurately, that first quarter to third quarter took about one day longer than third quarter to first quarter. With this information and the length of the month, he determined that M[sub 1]ES was about 87 degrees and that the distance from Earth to the Sun was therefore about twenty times larger than the distance from Earth to the Moon.

THE PLANETS

In the sixteenth century, Copernicus, who had proposed the heliocentric theory of the Solar System, calculated the orbital periods of the planets and their distances from the Sun. He was able to give distances only in terms of the distance from Earth to the Sun. This Earth-to-Sun distance is called the astronomical unit (AU). For example, he found that the distance from Mars to the Sun was 1.5 AU--he could not give this distance in miles or other terrestrial units because he did not know the size of the AU in those units. The following paragraphs give Copernicus's methods for periods and distances, but the problem of the size of the AU was not solved until long after his time.

The orbital period of a planet, the time required for it to complete an orbit relative to the "fixed" stars, is called the sidereal period. We could determine the sidereal period easily if we could observe from a fixed point far outside the Solar System. Since we must instead observe from a moving platform, Earth, we must infer the sidereal period from the synodic period, which is the interval of time between one alignment of Sun, Earth, and a planet and the next equivalent alignment. Figures 4 and 5 (Abell 1964) illustrate how Copernicus determined sidereal periods from synodic periods. The procedure for inferior planets, that is, those closer to the Sun than Earth, differs slightly from that for superior planets, that is, those that are farther away.

Figure 4 shows Earth with Venus, an example of an inferior planet. At position 1, Earth (E[sub 1]), Venus (V[sub 1]), and the Sun are collinear. This orientation is easy to observe from Earth. After one sidereal period, Venus has made one orbit and returned to position V[sub 2] = V[sub 1]; but in that time Earth has moved to E[sub 2], so we cannot directly observe that Venus has completed an orbit. Venus catches up with Earth at position 3. One synodic period has elapsed since position 1 because the two planets are again collinear. From E[sub 1] to E[sub 3], Earth has made N orbits, and N (Earth) years have therefore elapsed, which is the synodic period of Venus. In general, N will not be an integer. In the same amount of time, Venus has made N + I orbits. The sidereal period, S, of Venus is the time for one orbit; that is,

S = time/number of orbits

= synodic period/number of orbits between alignments

= N Earth years/N + 1 orbits

= N/N + 1 Earth years/orbit.

Figure 5 shows Earth with Mars, an example of a superior planet. Both planets begin at position 1, where they are collinear with the Sun. Earth completes an orbit and returns to position E[sub 2] - E[sub 1], then catches Mars at position 3, where the planets and the Sun are again collinear; and one synodic period has elapsed. From position I to position 3, Earth has made N orbits; therefore, N (Earth) years have elapsed. This time, N will probably be greater than 1. In the same amount of time, Mars has made only N - 1 orbits. As with Venus, the sidereal period, S, is the time for one orbit; that is,

S = synodic period/number of orbits between alignments

= N Earth years/N - 1 orbits

= N/N - 1 Earth years/orbit.

For example, Jupiter's synodic period is 1.094 Earth years; S -1.094/(1.094 - 1) = 11.6 years.

Copernicus found orbital radii of inferior planets by using the idea illustrated in figure 6 (Abell 1964). When the planet is at greatest elongation, which is the maximum angular separation in the sky of a planet and the Sun, then angle EPS must be a right angle because the line of sight, EP, is tangent to the planet's orbit. If angle PES is measured, PS can be found by scale drawing or by trigonometry. As previously mentioned, PS will be expressed in terms of ES, the astronomical unit.

The orbital radius of a superior planet is a little more complicated to determine. Figure 7 (Abell 1964) illustrates Copernicus's reasoning. Position 1 is called opposition because when the planet is viewed from Earth, the planet is exactly opposite the Sun in the sky. Position 2, where the planet and the Sun are 90 degrees apart in the sky, that is, angle P[sub 2]E[sub 2]S - 90 Degrees, is called quadrature. Copernicus timed the interval between opposition and quadrature; because he knew the sidereal periods of Earth and the planet, he could determine the angles P[sub 1]SP[sub 2] and E[sub 1]SE[sub 2] as fractions of complete orbits. Angle P[sub 2]SE[sub 2] followed by subtraction; and then PS could be determined, again in terms of ES, the astronomical unit. For example, the time from opposition to quadrature for Mars is 104 days. Therefore,

E[sub 1] SE[sub 2] = 104 days/365 x 360 Degrees

approximately equal to 103 Degrees.

Since the sidereal period of Mars is 687 days,

P[sub 1]SP[sub 2] - 104 days/687 days x 360 Degrees

approximately equal to 55 Degrees.

By subtraction, angle P[sub 2]SE[sub 2] approximately equal to 48 Degrees; and by trigonometry, PS approximately equal to 1.5 ES approximately equal to 1.5 AU.

Table 1 shows the values that Copernicus obtained for the planets known at that time and compares them with modern values.

Copernicus still assumed, as other astronomers had before him, that planetary orbits were circles or combinations of circles. Johannes Kepler, a student of Tycho Brahe, discovered otherwise. At the end of the sixteenth century, Brahe made detailed star and planet observations covering a period of about twenty years. After Brahe's death Kepler spent years analyzing Brahe's data, concentrating on the data for Mars, and in 1609 he published his findings--that the planets move around the Sun in ellipses. That discovery, in spite of the fact that the eccentricity of Mars's orbit is only about one-tenth, is a tribute to his powers of analysis, as well as to the accuracy and thoroughness of Brahe's observations.

To determine that orbits were ellipses, Kepler had to calculate the distance from Mars to the Sun at many different places in its orbit. Figure 8 shows his method (Abell 1964). From any position E[sub 1] of Earth, the angle SE[sub 1]M is measured. The sidereal period of Mars is 687 days, after which Mars has returned to M and Earth, having made almost two complete revolutions, is at E[sub 2]. From E[sub 2], angle SE[sub 2]M is measured. At 687 days Earth is (2)(365.25) - 687 = 43.5 days short of two full revolutions, from which information angle E[sub 1]SE[sub 2] can be calculated. SE[sub 1] and SE[sub 2] are known (1 AU--but a problem arises with this assumption, as described in the following paragraph). From this information can be found E[sub 1]E[sub 2], which allows the solution of triangle E[sub 1]E[sub 2]M, which leads to triangle SE[sub 1]M or SE[sub 2]M and the distance SM. Kepler found SM at many points along the orbit of Mars by choosing from Brahe's records the elongations of Mars--angles SE[sub 1]M or SE[sub 2]M--on each of many pairs of dates separated from each other by intervals of 687 days.

A question that I have been unable to answer was how Kepler dealt with the fact that SE is not really constant because Earth's orbit is also an ellipse. I assume that he must have found a way around the problem, but without more information I can only speculate on how he did it.

Kepler published three findings, which have become known as Kepler's laws of planetary motion. They are as follows:

1. The planets move around the Sun in ellipses, with the Sun at one focus.

2. A line connecting a planet with the Sun will sweep out equal areas in equal times. This phenomenon occurs because a planet moves faster when it is closer to the Sun. In figure 9, the time interval from E[sub 3] to E[sub 4] equals the time interval from E[sub 1] to E[sub 2], and area SE[sub 1]E[sub 2] equals area SE[sub 3]E[sub 4].

3. The squares of the planets' periods of revolution are proportional to the cubes of their distances from the Sun. So P]sup 2] = Ka[sup 3], where a is the length of the semimajor axis of the elliptical orbit. When P is measured in years and a in astronomical units, K = 1.

Table 2 illustrates Kepler's third law for the planets known in his time. Incidentally, these data can be used for a wonderful problem in data analysis. During a unit on nonlinear data analysis, I gave my advanced-algebra students the data in the first three columns, and they were able to determine that P = f(a) is a power function with exponent 3/2.

MASS OF THE SUN AND OTHER OBJECTS

As Duncan (1981) says, "Kepler's laws summed up neatly how the planets of the solar system behaved without indicating why they did so." Newton, who comes into the story at this point, built on Kepler's work to develop his law of universal gravitation, which has allowed us to weigh the Sun, Moon, and planets. Start with the formula

F = mv[sup 2]/r,

which is for the centripetal, or inward, force F needed to cause a mass m to move with velocity v around a circle of radius r. A planet of mass m[sub p] revolving around the Sun has velocity

v = 2 Pi r/P,

where P is its period of revolution. Substituting for v in the force formula gives

F = m[sub p] v[sup 2]/r

= m[sub p] 4 Pi[sup 2] r[sup 2]/r P[sup 2]

= M[sub p] 4 Pi[sup 2] r/P[sup 2]

Kepler's third law says that P[sup 2] = kr[sup 3], where k is a proportionality constant. Substituting further then gives

F = m[sub p] 4 Pi[sup 2] r/kr[sup 3]

= 4 Pi[sup 2]/k x m[sub p]/r[sub r],

that is,

F proportional to m[sup p]/r[sup 2].

The sun exerts that force, F, on the planet. At a given distance r, it is proportional to the mass of the planet.

By Newton's third law of motion, the planet exerts the same force on the Sun. Since the Sun's force on the planet depends on the mass of the planet, it seems reasonable to suppose that the planet's force on the Sun depends on the mass of the Sun, which means that the mutual force depends on both masses. The mutual force cannot depend on the sum of the masses, since doubling a mass doubles the force but doubling one term of a sum does not double the sum; that is, a + 2b is not twice a + b. Newton assumed that the force depended on the product of the masses, an assumption that agrees with the result that doubling either factor in a product doubles the product, that is, (2a) x b = a x (2b) = 2(ab). So for the Sun and a planet, the mutual force of attraction, F, is

F proportional to m[sub s] m[sub p]/r[sup 2]

or

F proportional to m[sub s] m[sub p]/r[sup 2]

where G is a constant that needs to be determined by experiment and r is the distance between the centers of the two objects. Newton spent many years investigating this phenomenon and it took the invention of a little thing called calculus to prove that r is the distance between the centers of the objects.

The preceding result is Newton's law of universal gravitation. By universal, Newton meant that it applies equally to all objects, both terrestrial and celestial. To test his law, Newton compared the falling of an object at the surface of Earth (the famous apple?) to the falling of the Moon. Figure 10 (Feynman 1995) shows what is meant by a "falling" Moon. In one second, the Moon travels from A to B in its orbit. If Earth did not attract the Moon, it would travel along the tangent instead. Thus the distance s is the distance it has "fallen." In right triangle ABC,

s/x = x/2r - s approximately equal to x/2r,

since s < x < r; or

s approximately equal to x[sup 2]/2r.

The quantity x is the distance that the Moon travels in one second. Since the moon's average distance from the center of Earth is about 385 000 km,

x = 1 second/1 month x 2 Pi r approximately equal to 4.24 x 10[sup -7] x 2 Pi x 3.85 x 10[sup 8] m approximately equal to 1026 m.

Substituting this result into the previous formula gives

s approximately equal to x[sup 2]/2r

approximately equal to (1026m)[sup 2]/2 x 3.85 x 10[sup 8] m

approximately equal to 0.0014 m.

So 0.0014 m is the distance that the Moon falls in one second. At the surface of Earth, which is 6400 km from its center, an object falls about 5 m in one second. If Earth's gravitational pull varies inversely as the square of the distance from its center, Newton reasoned, then the distance that the Moon falls in one second should be (6400/385000)[sup 2], or approximately 0.00028 times the distance that an object on Earth falls in one second. Our figures agree with Newton's reasoning, since (0.00028)(5) approximately equal to 0.0014.

DETERMINING THE ASTRONOMICAL UNIT

The traditional way to determine the distance to inaccessible objects is by triangulation. Triangulation to very distant objects is usually done using the concept of parallax. Parallax describes the phenomenon that occurs when you hold your finger in front of your face and alternately close your left eye and your right eye. Your finger appears to shift its position with respect to the background. Figure 11 illustrates how this idea can be used to measure the distance to object O. First stand at a position A so that object O is aligned with some other object, much farther away than O. Then move to the side to a new position B. Object O and the more distant object are no longer aligned but rather subtend some angle q at your eye. This angle can be measured. If the more distant object is sufficiently far away, the lines to it from A and B are nearly parallel and <p approximately equal to <q. Angle p is called the parallax angle. As long as angle p is small, the baseline AB can be taken as an arc of a circle with center at O and radius x. Since AB and the parallax angle can be measured, distance x can be calculated using the arc-length formula from geometry, giving

AB = p/360 2 Pi x arrow right = 180 x AB/Pi p

Since distances to astronomical objects are so enormous, angle p is always very small, sometimes only a fraction of a second of arc; and so approximating a segment with an arc does not make any measurable difference. The smallness of angle p also explains why measuring the base angles at A and B, as would be done in solving a triangle that was less "long and skinny," is impractical.

Students cannot collect their own data for most of the problems in this article, but they can get hands-on experience using parallax to measure distances, as my geometry classes have done. In addition to a tape measure, the only equipment needed is a device that measures small angles with some accuracy. Working in groups of four, my students made their own parallax-measuring devices using a piece of Styrofoam about 60 cm by 15 cm. See figure 12. About 50 cm from one end, they placed a row of pins spaced so that adjacent pins would subtend angles of 0.5 degree when viewed from that end. Of course, they had to use the previously mentioned arc-length formula to calculate how far apart to place the pins. After making their measuring devices, they practiced measuring small angles and distances in the classroom, where I could be certain that they knew what they needed to do and what quantities they needed to measure, that is, angle q and baseline AB. I then sent the groups outside after giving each group a description of some specific object on campus, such as a water tower or telephone pole; a specific place to stand to measure its distance; and a specific object to use as the distant background object. I also gave each group a photograph with these objects marked, to help them orient themselves.

To find the length of the astronomical unit, astronomers could in theory measure the parallax of the Sun from two points on Earth's surface. Obtaining this measurement is nearly impossible in practice, though, because of the Sun's angular size, brightness, and distance. However astronomers can triangulate the distance to some planet, say, Mars, to obtain its distance from Earth in miles or kilometers. Its distance is already known in AU, and from this result, the size of an AU can be calculated. This result then gives the scale of the entire Solar System.

The story told here of necessity is incomplete, but I hope that it is tantalizing enough with all its connections to history and science to encourage interested teachers and students to explore the topic further.

The author thanks Craig Merow for his assistance in preparing this manuscript for publication.

TABLE 1 Orbital Radii of Planets, in AU

Legend for Chart:

A - Planet

B - Copernicus's Value

C - Modern Value

A B C

Mercury 0.36 0.387

Venus 0.72 0.723

Earth 1.00 1.00

Mars 1.5 1.52

Jupiter 5 5.20

Saturn 9 9.54

TABLE 2 Illustration of Kepler's Third Law

Legend for Chart:

A - Planet

B - Semimajor Axis a (AU)

C - Sidereal Period P (yrs.)

D - A[sup 3]

E - p[sup 2]

A B C D E

Mercury 0.387 0.241 0.058 0.058

Venus 0.723 0.615 0.378 0.378

Earth 1 1 1 1

Mars 1.524 1.881 3.54 3.54

Jupiter 5.203 11.86 141 141

Saturn 9.539 29.46 868 868

CONSISTENT HISTORIES AND QUANTUM MEASUREMENTS

Source: Physics Today, Aug99, Vol. 52 Issue 8, p26, 6p, 2 diagrams, 1bw

Author(s): Griffiths, Robert B.; Omnes, Roland

The traditional Copenhagen orthodoxy saddles quantum theory with embarrassments like Schrodinger's cat and the claim that properties don't exist until you measure them. The consistent-histories approach seeks a sensible remedy.

Students of quantum theory always find it a very difficult subject. To begin with, it involves unfamiliar mathematics: partial differential equations, functional analysis, and probability theory. But the main difficulty, both for students and their teachers, is relating the mathematical structure of the theory to physical reality. What is it in the laboratory that corresponds to a wavefunction, or to an angular momentum operator? Or, to use the picturesque term introduced by John Bell,(n1) what are the "beables" (pronounced BE-uh-bulls) of quantum theory--that is to say, the physical referents of the mathematical terms?

In most textbooks, the mathematical structures of quantum theory are connected to physical reality through the concept of measurement. Quantum theory allows us to predict the results of measurements--for example, the probability that this counter rather than that one will detect a scattered particle. That the concept of measurement played an important role in the early development of quantum theory is evident from Niels Bohr's account of his discussions with Albert Einstein at the 1927 and 1930 Solvay conferences.(n2) And it soon became part of the official "Copenhagen" interpretation of the theory.

But what may well have been necessary for the understanding of quantum theory at the outset has not turned out to provide a satisfactory permanent foundation for the subject. Later generations of physicists who have tried to make a measurement concept a fundamental axiom for the theory have discovered that this raises more problems than it solves. The basic difficulty is that any real apparatus in the laboratory is composed of particles that are presumably subject to the same quantum laws as the phenomenon being measured. So, what is special about the measuring process? Is not the entire universe quantum mechanical?

When quantum theory is applied to astrophysics and cosmology, the whole idea of using measurements to interpret its predictions seems ludicrous. Thus, many physicists nowadays regard what has come to be called "the measurement problem" as one of the most intractable difficulties standing in the way of understanding quantum mechanics.

Two measurement problems

There are actually two measurement problems that conventional textbook quantum theory cannot deal with. The first is the appearance, as a result of the measurement process, of macroscopic quantum superposition states such as Erwin Schrodinger's hapless cat. The second problem is to show that the results of a measurement are suitably correlated with the properties the measured system had before the measurement took place--in other words, that the measurement has actually measured something.

The macroscopic-superposition problem is so difficult that it has provoked serious proposals to modify quantum theory, despite the fact that all experiments carried out to date have confirmed the theory's validity. Such proposals have either added new, "hidden" variables to supplement the usual Hilbert space of quantum wavefunctions, or they have modified the Schrodinger equation so as to make macroscopic superposition states disappear. (For a discussion of two such proposals, see the two-part article by Sheldon Goldstein in PHYSICS TODAY, March 1998, page 42, and April 1998, page 38.) But even such radical changes do not resolve the second measurement problem.

Both problems can, however, be resolved without adding hidden variables to the Hilbert space and without modifying the Schrodinger equation. In a series of papers starting in 1984, an approach to quantum interpretation known as consistent histories, or decoherent histories, has been introduced by us and by Murray Gell-Mann and James Hartle.(n3) The central idea is that the rules that govern how quantum beables relate to each other, and how they can be combined to form sensible descriptions of the world, are rather different from what one finds in classical physics.

In the consistent-histories approach, the concept of measurement is not the basis for interpreting quantum theory. Instead, measurements can be analyzed, together with other quantum phenomena, in terms of physical processes. And there is no need to invoke mysterious long-range influences and similar ghostly effects that are sometimes claimed to be present in the quantum world.(n4)

Quantum histories

The two measurement problems, and the consistent-histories approach to solving them, can be understood by referring to the simple gedanken experiment shown in figure 1. A photon (or neutron, or some other particle; it makes no difference) enters a beam splitter in the a channel and emerges in the c and d channels in the coherent superposition:

(1) |a> arrow right |s> = (|c> + |d>)/square root of 2.

Here |a>, |c>, and |d> are wavepackets in the input and output channels, and |s> is what results from |a> by unitary time evolution (that is, by solving the appropriate Schrodinger equation) as the photon passes through the beam splitter.

The photon will later be detected by one of two detectors, C and D. To describe this process in quantum terms, we assume that |C> is the initial quantum state of C, and that the process of its detecting a photon in a wavepacket |c> is described by

(2) |c>|C> arrow right |C[sup *]>,

where |C[sup *]> is the triggered state of the detector after it has detected the photon. Once again, the arrow indicates the unitary time evolution produced by solving Schrodinger's equation. It is helpful to think of |C> and |C[sup *]> as physically quite distinct: Imagine that a macroscopically large pointer, initially horizontal in |C>, is moved to a vertical position in the state |C[sup *]> when the photon has been detected.

By putting together the processes (1), (2), and the counterpart of (2) that describes the detection of a photon in the d channel by detector D, one finds that the unitary time development of the entire system shown in figure 1 is of the form

(3) |a>|C>|D> arrow right |s> = (|C[sup *]>|D> + |C>|D[sup *]>)/ square root of 2.

Ascribing some physical significance to the peculiar macroscopic-quantum-superposition state |S> in (3) poses the first measurement problem in our gedanken experiment. The difficulty is that |S> consists of a linear superposition of two wavefunctions representing situations that are visibly, macroscopically, quite distinct: The pointer on C is vertical and that on D is horizontal for |C[sup *]>|D>, whereas for |C>|D[sup *]> the D pointer is vertical and the C pointer is horizontal. In Schrodinger's famously paradoxical example, the two distinct situations were a live and a dead cat. A great deal of effort has gone into trying to interpret |S> as meaning that either one detector or the other has been triggered, but the results have not been very satisfactory.(n5)

The first measurement problem is an almost inevitable consequence of supposing that, in quantum theory, a solution of Schrodinger's equation represents a deterministic time evolution of a physical system, in the same way as does a solution of Hamilton's equations in classical mechanics. That was undoubtedly Schrodinger's point of view when he introduced his equation. The probabilistic interpretation now universally accepted among quantum physicists was introduced shortly thereafter by Max Born. Since then, chance and determinism have maintained a somewhat uncomfortable coexistence within quantum theory, with many scientists continuing to share Einstein's view that resorting to probabilities is a sign that something is incomplete.

A stochastic theory

By contrast, the consistent-histories viewpoint is that quantum mechanics is fundamentally a stochastic or probabilistic theory, as far as time development is concerned, and that it is not necessary to introduce some deterministic underpinning of this randomness by means of hidden variables. The basic task of quantum theory is to use the time-dependent Schrodinger equation, not to generate deterministic orbits, but instead to assign probabilities to quantum histories--sequences of quantum events at a succession of times--in much the same way that classical stochastic theories assign probabilities to sequences of coin tosses or to Brownian motion. This perspective does not exclude deterministic histories, but those are thought of as arising in special cases in which the probability of a particular sequence of events is equal to 1.

For the gedanken experiment in figure 1, the consistent-histories solution to the first measurement problem consists of noting that a perfectly good description of what is happening is provided by assuming that the initial state is followed at a later time by one of two mutually exclusive possibilities: |C[sup *]>|D> or |C>|B[sup *]>. They are related to each other in much the same way as heads and tails in a coin toss. That is to say, the system is described by one (and, in a particular experimental run, only one) of the two quantum histories:

(4) |a>|C>|D> arrow right |C[sup *]>|D> or |a>|C>|D> arrow right |C>|D[sup *]>,

where the arrow no longer denotes unitary time development. Quantum theory assigns to each history a probability of 1/2. (Of course, to check this prediction, one would have to repeat the experiment using several photons in succession, each time resetting the detectors.)

The troublesome macroscopic quantum superposition state |S> of (3) appears nowhere in (4). Indeed, as we discuss below, the rules of consistent-histories quantum theory mean that |S> cannot occur in the same quantum description as the final detector states employed in (4). Therefore, the first measurement problem has been solved (or, at least it has disappeared) if one uses the stochastic histories in (4) in place of the deterministic history in (3).

The fundamental beables of consistent histories quantum theory--that is, the items to which the theory can ascribe physical reality, or at least a reliable logical meaning--are consistent quantum histories: sequences of successive quantum events that satisfy a consistency condition about which more is said below. A quantum event can be any wavefunction--that is to say, any nonzero element of the quantum Hilbert space. The two histories in (4), as well as the single history in (3), are examples of consistent quantum histories. They are thus acceptable quantum descriptions of what goes on in the system shown in figure 1.

At this point, the reader may be skeptical of the claim that the first measurement problem has been solved. We have simply replaced (3), with its troublesome macroscopic quantum superposition state, by the more benign pair of histories in (4). But as long as (3) is an acceptable history--as is certainly the case from the consistent-histories perspective--how can we claim that (4) is the correct quantum description rather than (3)? Or is it possible that both (3) and (4) apply simultaneously to the same system? Before attempting an answer, let us take a slight detour to introduce the concept of quantum incompatibility, which plays a central role in the consistent-histories approach to quantum theory.

Quantum incompatibility

The simplest quantum system is the spin degree of freedom of a spin-1/2 particle, described by a two-dimensional Hilbert space. Every nonzero (spinor) wavefunction in this space corresponds to a component of spin angular momentum in a particular direction taking the value 1/2 in units of h. Thus the quantum beables of this system, in the consistent-histories approach as well as in standard quantum mechanics, are of the form S[sub w] = 1/2, where w is a unit vector pointing in some direction in three-dimensional space, and S[sub w] is the component of spin angular momentum in that direction. (Actually, S[sub w] = 1/2 corresponds to a whole collection of wavefunctions obtained from each other through multiplication by a complex number, and thus to a one-dimensional subspace of the Hilbert space.)

The nonclassical nature of quantum theory begins to appear when one asks about the relationship of these beables, or quantum states, for two different directions w. If the directions are opposite, for example +z and -z, the states S[sub z] = 1/2 and S[sub -z] = 1/2 are two mutually exclusive possibilities, one of which is the negation of the other. Thus they are related in the same way as the results of tossing a coin: if heads (S[sub z] = 1/2) is false, tails (S[sub z] = -1/2) is true, and vice versa. This means, in particular, that the proposition "S[sub z] = 1/2 and S[sub z] = -1/2" can never be true. It is always false.

That this is a reasonable way of understanding the relationship between S[sub z] = 1/2 and S[sub z] = -1/2 is confirmed by the fact that if a spin-1/2 particle is sent through a Stern-Gerlach apparatus with its magnetic field gradient in the z direction, the result will be either S[sub z] = 1/2 or -1/2, as shown by the position at which the particle emerges. Precisely the same applies to any other component of spin angular momentum. Thus, for example, S[sub x] = 1/2 is the negation of S[sub x] = -1/2. (As an amusing aside, we note that when Otto Stern proposed in 1921 to demonstrate the quantization of angular-momentum orientation, Born assured him that he would see nothing, because such spatial quantization was only a mathematical fiction.(n6))

But what is the relationship of beables that correspond to components of spin angular momentum for directions in space that are not opposite to each other? How, for example, is S[sub x] = 1/2 related to S[sub z] = 1/27 In consistent-histories quantum theory, "S[sub x] = 1/2 and S[sub z] = 1/2" is considered a meaningless expression, because it cannot be associated with any genuine quantum beable, that is, with any element of the quantum Hilbert space. Note that every non-zero element in that space corresponds to S[sub w] = 1/2 for some direction w, so there is nothing left over that could describe a situation in which two components of the spin angular momentum both have the value 1/2.

Putting it another way, there seems to be no sensible way to identify the assertion "S[sub x] = 1/2 and S[sub z] = 1/2," with S[sub w] = 1/2 for some particular direction w. (For a more detailed discussion, see section 4A of reference 7.) That agrees, by the way, with what all students learn in introductory quantum mechanics: There is no possible way to measure S[sub x] and S[sub z] simultaneously for a spin-1/2 particle. From the consistent-histories perspective, this impossibility is no surprise: What is meaningless does not exist, and what does not exist cannot be measured.

Meaningless or simply false?

It is very important to distinguish a meaningless statement from a statement that is always false. "S[sub z] = 1/2 and S[sub z] = 1/2" is always false, because S[sub z] = 1/2 and S[sub z] = -1/2 are mutually exclusive alternatives. The negation of a statement that is always false is a statement which is always true. By contrast, the negation of a meaningless statement is equally meaningless. The negation of the meaningless assertion "S[sub x] = 1/2 and S[sub z] = 1/2," following the ordinary rules of logic, is "S[sub x] =-1/2 or S[sub z] =-1/2." In consistent-histories quantum theory, this latter assertion is just as meaningless as the former. How, after all, would one go about testing it by means of an experiment?

This spin-1/2 example is the simplest illustration of quantum incompatibility: Two quantum beables A and B, each of which can be imagined to be part of some correct description of a quantum system, have the property that they cannot both be present simultaneously in a meaningful quantum description. That is, phrases like "A and B" or "A or B," or any other attempt to combine or compare A and B, cannot refer to a real physical state of affairs. Many instances of quantum incompatibility come about because of the mathematical structure of Hilbert space and the way in which quantum physicists understand the negation of propositions. Others are consequences of violations of consistency conditions for histories. In either case, the concept of quantum, incompatibility plays a central role in consistent histories. Failure to appreciate this has, unfortunately, led to some misunderstanding of consistent-histories ideas.

Now let us return to the discussion of the histories in (3) and (4). The two histories in (4) are mutually exclusive; if one occurs, the other cannot. Think of them as analogous to S[sub z] = 1/2 and S[sub z] = -1/2 for a spin-1/2 particle. On the other hand, each of the histories in (4) is incompatible, in the quantum sense, with the history in (3), which one can think of as analogous to SI = 1/2. Indeed, the relationship between the state |S> in (3) and the states |C>|D[sup *]> and |C[sup *]>|D> in (4) is formally the same as that between the state S[sub x] = 1/2 and the states S[sub z] = 1/2 and S[sub z] = -1/2. Consequently, the question of whether (3) occurs rather than, or at the same time as, the histories in (4) makes no sense.

It may be helpful to push the spin analogy one step further. Imagine a classical spinning object subjected to random torques of a sort that leave L[sub x], the x component of angular momentum, unchanged while randomly altering the other two components, L[sub y] and L[sub z]. In such a case, a classical history that describes only L[sub x] will be deterministic; it will have a probability of 1. L[sub z], on the other hand, can be described by a collection of several mutually exclusive histories, each having a nonzero probability.

Of course, classical histories of this kind can always be combined into a single history, whereas the deterministic quantum history in (3), corresponding to the L[sub x] history in this analogy, cannot be combined with the stochastic histories in (4), the analogs of the L[sub z] histories. Nevertheless, the analogy has some value in that it suggests that (3) and (4) might be regarded intuitively as describing alternative aspects of the same physical situation. Although all classical analogies for quantum systems break down eventually, this one is less misleading than trying to think of (3) and the set of histories in (4) as mutually exclusive possibilities. It helps prevent us from undertaking a vain search for some "law of nature" that would tell us that (4) rather than (3) is the correct quantum description.

The second measurement problem

Particle physicists are always designing and building their experiments under the assumption that a measurement carried out in the real world can accurately reflect the state of affairs that existed just before the measurement. From a string of sparks or bubbles, for example, they infer the prior passage of an ionizing particle through the chamber. Extrapolating the tracks of several ionizing particles backward, they locate the point where the collision that produced the particles took place. But according to many textbook accounts of the quantum measuring process, retrodictions that use experimental results to infer what the particle was doing before this kind of measurement was made are not possible. Should we conclude, then, that experimenters don't take enough courses in quantum theory?

The consistent-histories analysis shows that the experimenters do, in fact, know what they are doing, and that such retrodictions are perfectly compatible with quantum theory. It also provides general rules for carrying out retrodictions safely, without producing contradictions or paradoxes. The consistent-histories approach even offers some insight into why the textbooks have often regarded retrodiction as dangerous.

The basic idea can be illustrated once again by reference to figure 1. Suppose the photon has been detected by detector C. In which channel was it just prior to detection: channel c or d? The very nature of the question tells us that (3) is of no help; we must resort to the histories in (4). But even they are inadequate, because they tell us nothing about what the photon is doing at intermediate times. To address that question, we must consider the following refinements of the histories in (4):

(5) |a>|C>|D> arrow right |c>|C>|D> arrow right |C[sup*]>|D>, |a>|C>|D> arrow right |d|C>|D> arrow right >|C>|D[sup *]>,

in which intermediate events have been added to describe the photon after it passes through the beam splitter, but before it is detected. The consistent-histories rules assign a probability of 1/2 to each of these histories. That means it is impossible, given the initial state, to predict whether the photon will leave the beam splitter through channel c or d. But if the final detector state is |C[sup *]>|D>, meaning that C has detected the photon, then the first history in (5), not the second, is the one that actually occurred. So, at the intermediate time, the photon was in state |c> rather than |d>. That is to say, it was in the c channel.

Why has this rather obvious way of solving the second measurement problem been overlooked for so long? Probably because a quantum physicist who grew up with the standard textbooks will describe the situation in figure 1 by means of a pair of histories

(6) |a>|C>|D> arrow right |s>|C>|D> arrow right |C[sup *]>|D>, |a>|C>|D> arrow right |s>|C>|D> arrow right |C>|D[sup *]>,

in which, at the intermediate time, the photon is in the superposition state |s> defined in (1). He will wait until the measurement takes place and then "collapse" the wavefunction for reasons that he may not understand very well. But at least they make more sense to him than does the macroscopic quantum superposition state |S> of (3).

From the standpoint of consistent histories, such a physicist is, in effect, employing the histories in (6), which are perfectly good quantum beables, as part of a stochastic quantum description. However, if the photon is in the superposition state |s> at the intermediate time, quantum incompatibility implies that it makes no sense to ask whether it is in the c channel or the d channel. That question can be asked only in the context of the histories in (5).

The existence of a quantum description employing the set of histories in (6), in which the question of the relationship between the measurement result and the location of the photon before the measurement is meaningless, does not invalidate the conclusion reached by means of the histories in (5), which provide a definite answer to that question. It is a quite general feature of quantum reasoning that various questions of physical interest can be addressed only by constructing an appropriate quantum description. That is quite unlike classical physics, where a single description, such as specifying a precise point in the phase space of a mechanical system, suffices to answer all meaningful questions.

Consistency conditions

The beables in consistent-histories quantum theory are a collection of mutually exclusive histories to which probabilities are assigned by the dynamical laws of quantum mechanics (Schrodinger's equation). If the histories involve just two times, as in (4), these probabilities are given by the usual Born rule--namely, the absolute square of the inner product of the time-evolved initial state and the final state in question. Histories involving three or more times, as in (5), require a generalization of the Born rule and additional consistency conditions to assure that the probabilities make physical sense.

Not all collections of mutually exclusive histories satisfy the mathematical conditions of consistency. The consistent-histories approach ascribes physical meaning only to histories that satisfy the consistency conditions. Other cases are regarded as meaningless; that is to say, they are rather like trying to simultaneously ascribe values for S[sub x] and S[sub z] to a spin-1/2 particle. (See the box above for additional remarks on consistency conditions.)

Consistency conditions are needed for a consistent discussion of the quantum double-slit experiment,(n8) in which a wavepacket approaches the slits at time t[sub 1], it passes through one or the other slit just before t[sub 2], and it arrives at t[sub 3] at some point in the interference zone, where waves from the two slits interfere with each other. It turns out that histories in which the particle passes through a particular slit and then arrives at a particular point in the interference zone do not satisfy the consistency conditions, and thus do not constitute acceptable quantum beables. That will come as no surprise to generations of students who have been taught that asking which slit the particle passes through is not a sensible question. In this respect, the consistency conditions support the physicist's usual intuition at the same time as they provide a precise mathematical formulation applicable in other situations where intuitive arguments are not sufficient for precise analysis.

On the other hand, if there are detectors just behind the two slits, one's physical intuition says that it should be sensible to say which slit the particle passes through. Such intuition is used all the time in designing experiments in which collimators are placed in front of detectors. In that case, the relevant histories, which are the analogs of (5), turn out to be consistent. Furthermore, even if there are no detectors behind the slits, there are consistent histories in which the particle passes through a particular slit and then arrives in a spread-out wavepacket in the interference zone, rather than at a particular point. (See the box for more details in an analogous situation involving a Mach-Zehnder interferometer.)

The physical consequences of consistency conditions are still being explored, and there is not yet complete agreement even on their mathematical form. However, the different formulations one finds in references 9, 10, and 11 do not seem to make any significant difference in most physical applications.

Classical limit

Because classical mechanics provides an excellent description of the motion of macroscopic objects in the everyday world, one would expect that quantum theory, in an appropriate limit, would yield the laws of classical physics to very good approximation. This conclusion is supported by Paul Ehrenfest's argument, which one finds in elementary textbooks, to the effect that average values of certain quantum observables satisfy equations similar to those of classical mechanics. But that is not a satisfactory solution to the problem of the classical limit, for two reasons: One wants to know how individual systems behave, not just the ensemble to which such an average applies. Furthermore, such an average, in the usual textbook understanding of quantum theory, refers to the results of measurements, and is not valid when measurements are not made.

In the consistent-histories approach, the classical limit can be studied by using appropriate subspaces of the quantum Hilbert space as a "coarse graining," analogous to dividing up phase space into nonoverlapping cells in classical statistical mechanics. This coarse graining can then be used to construct quantum histories. It is necessary to show that the resulting family of histories is consistent, so that the probabilities assigned by quantum dynamics make good quantum mechanical sense. Finally, one needs to show that the resulting quantum dynamics is well approximated by appropriate classical equations.

Demonstrating all this in complete detail is a difficult problem. But so is the analogous problem of finding the behavior of a large number of particles governed by classical mechanics. Indeed, the problem of showing that a system of classical particles will exhibit thermodynamic irreversibility, a typical macroscopic phenomenon, has not yet been settled to everyone's satisfaction, despite a continuing effort that goes back to Ludwig Boltzmann's work a century ago. (See the articles by Joel Lebowitz in PHYSICS TODAY, September 1993, page 32, and by George Zaslavsky in this issue, page 39.)

Nonetheless, calculations carried out by one of us,(n11, n12) and by Gell-Mann and Hartle,(n10) indicate that, given a suitable consistent family, classical physics does indeed emerge from quantum theory. Of course the classical equations are only approximate. They must be supplemented by including a certain amount of random noise, as one would expect from the fact that quantum dynamics is a stochastic process. In many circumstances, this quantum noise will not have much influence, but it can be amplified in systems that exhibit (classical) chaotic behavior. Even so, because the classical dynamics of such systems is noisy for all practical purposes, even if it is deterministic in principle, they are not likely to exhibit distinctive quantum effects.

The consistency of a family of histories for a macroscopic system is often ensured by quantum decoherence, an effect closely related to thermodynamic irreversibility. (See the article by Wojciech Zurek in PHYSICS TODAY, October 1991, page 36.) Demonstrating that quantum systems actually exhibit irreversible behavior in the thermodynamic sense, on the other hand, is not trivial. There are conceptual and computational difficulties similar to those that arise when one considers a classical system of many particles. Nonetheless, there seems at present to be no difficulty, in principle, that prevents us from understanding macroscopic phenomena in quantum terms, including what happens in a real measurement apparatus. Thus, by interpreting quantum mechanics in a manner in which measurement plays no fundamental role, we can use quantum theory to understand how an actual measuring apparatus functions.

We are grateful to Todd Brun, Sheldon Goldstein, James Hartle, and Wojciech Zurek for comments on the manuscript. One of us (Griffiths) acknowledges financial support from the National Science Foundation through grant PHY 9602084.

Consistency Conditions: An Application

The consistency conditions as formulated in reference 9 are obtained by associating with each of the histories in a particular family a "weight" operator on the Hilbert space, and then requiring that the weight operators for mutually exclusive histories be orthogonal to each other--the operator inner product being generated by the trace. This somewhat abstract prescription is best understood by working through simple examples, such as the one in section 6C of reference 8. Here, we give an application of the consistency conditions to a situation of some physical interest.

Consider the Mach-Zehnder interferometer illustrated in figure 2. A wavepacket of light passing through the first beam splitter B[sub 1] is reflected by a pair of mirrors, C and D, onto a second beam splitter B2 preceding the output channels e and f. The effect of B[sub 1] on the wavepacket |a> of a photon in the initial a channel at time t[sub 1] is to produce, at a slightly later time t[sub 2], the same kind of superposition |s> of wavepackets |c> and |d> in the c and d arms of the interferometer as we had in equation (1). The effect of the second beam splitter is given by

(7) |c> arrow right (|e> + |f>)/ square root of 2 |d> arrow right (-|e>+ |f>)/square root of 2,

where |e> and |f> are wavepackets in the output channels at t[sub 3]. The optical paths have been so arranged that the two |e> components in (7) appear with opposite phases.

Therefore, when we combine (1) and (7), we see that the photon entering at a must emerge in channel f, corresponding to the three-time history

(8) |a> arrow right |s> arrow right |f>,

which satisfies the consistency conditions simply because it is a solution of Schrodinger's equation.

On the other hand, the pair of mutually exclusive histories

(9) |a> arrow right |c> arrow right |f> and |a> arrow right |d> |f?,

in which the particle passes through either the c or d arm at the intermediate time t[sub 2] and then emerges in the f channel, are not consistent, because the corresponding weight operators are not orthogonal. The reader may check this by the methods of reference 9, but it will require some work.

Consequently, it makes no sense to say that the particle passes through the c or the d arm and then emerges in the f channel. However, the two histories

(10) |a> arrow right |c> arrow right (|e> + |f>)/ square root of 2 |a> arrow right |d> arrow right (-|e> + |f>)/ square root of 2

are consistent, because here the weight operators are orthogonal. Again we leave the proof as an exercise. Thus it makes perfectly good sense to say that the photon passes through the c arm and emerges in a certain coherent superposition of states in the two output channels, or through the d arm to emerge in a different superposition.

This Mach-Zehnder example is analogous to the canonical double-slit experiment, if one regards passing through the c or d arm as analogous to passing through the upper or lower slit, and emerging in e or f as analogous to the particle arriving at a point of minimum or maximum intensity in the double-slit interference zone.

AN ANALYSIS OF GENERALIZATION IN THE XCS CLASSIFIER SYSTEM

Source: Evolutionary Computation, Summer99, Vol. 7 Issue 2, p125, 25p, 2 diagrams, 14 graphs Author(s): Lanzi, Pier Luca

Abstract

The XCS classifier system represents a major advance in learning classifier systems research because (1) it has a sound and accurate generalization mechanism, and (2) its learning mechanism is based on Q-learning, a recognized learning technique. In taking XCS beyond its very first environments and parameter settings, we show that, in certain difficult sequential ("animat") environments, performance is poor. We suggest that this occurs because in the chosen environments, some conditions for proper functioning of the generalization mechanism do not hold, resulting in overly general classifiers that cause reduced performance. We hypothesize that one such condition is a lack of sufficiently wide exploration of the environment during learning. We show that if XCS is forced to explore its environment more completely, performance improves dramatically. We propose a technique, based on Sutton's Dyna concept, through which wider exploration would occur naturally. Separately, we demonstrate that the compacmess of the representation evolved by XCS is limited by the number of instances of each generalization actually present in the environment. The paper shows that XCS's generalization mechanism is effective, but that the conditions under which it works must be clearly understood.

Keywords

Learning classifier systems, XCS, generalization, genetic operators.

Соседние файлы в папке 3