Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Economics of strategy 6th edition.pdf
Скачиваний:
441
Добавлен:
26.03.2016
Размер:
3.36 Mб
Скачать

Report Cards 341

REPORT CARDS

Thus far, we have described a variety of ways that vertically differentiated firms can reassure consumers about product quality: disclosure, warrantees, and branding. When judging product quality, consumers do not have to rely on information provided by sellers and may instead draw on their own experiences and those of friends and family. Before there were brands and warrantees, this was the dominant way that consumers solved the shopping problem. One can imagine an eighteenth-century village square filled with townspeople debating where to get their horses shoed and how to find the best seamstress in town. Consumers still ask friends and family for recommendations; this is how many people find a primary care physician, a restaurant, or a hair stylist. Consumers also rely on trusted agents to assist with shopping. Primary care physicians help patients choose specialists and hospitals. Real estate agents help home buyers. Wine merchants recommend the best grand cru burgundies.

Independent firms have emerged in many markets to codify quality evaluation. In 1894, the National Board of Fire Underwriters established the Underwriters’ Electrical Bureau (the predecessor to Underwriters Laboratories). The Underwriters’ Bureau charged a fee for testing and reporting on the safety of fittings and electrical devices. Manufacturers willingly paid the fee because doing otherwise could be interpreted as an indication of poor quality. The Underwriters’ Bureau report is an example of a quality report card—a grade that can be used to evaluate quality. The Underwriters’ Bureau provided a simple pass/fail grade. Other report cards can have much finer gradations, as we will see. Thanks to the dramatic reduction in the cost of obtaining and analyzing information about product quality, firms that construct report cards have the potential to create enormous value for consumers and capture much of that value for themselves in the form of profits.

Report cards are ubiquitous. Consumer Reports publishes rankings for hundreds of consumer products. Car and Driver magazine publishes an annual list of the 10 best cars. The British magazine Times Higher Education ranks the world’s 200 leading universities. Students rate professors on teacher evaluations. There are countless sources of rankings of U.S. hospital quality. If a report card is well constructed, high-quality sellers will stand a better than average chance of receiving a high ranking. Consumers may benefit from these rankings in three ways:

1.Consumers can more easily identify high-quality sellers.

2.Because consumers can identify high-quality sellers, the elasticity of demand with respect to quality increases. This gives sellers an incentive to improve quality.

3.Some consumers are willing to pay more for quality than others, and the highest quality sellers may lack the capacity to serve all consumers. Report cards can improve sorting by matching consumers who highly value quality to the best sellers.

Because of the policy importance of raising health care quality, considerable research has been done on the effectiveness of hospital report cards, and advocates can point to evidence of all three effects. Hospitals with good report card scores gain market share, and report card scores increase across the board after report cards are introduced. (There is some doubt about whether the latter reflects quality increases or “gaming,” as explained in Example 10.5.) There is even evidence that surgical report cards facilitate sorting by steering the most complex cases to the best surgeons.5

Nearly all measures of quality are subject to random noise. One surgeon’s patient may die, while another surgeon’s patient lives for reasons that the surgeons cannot

342 Chapter 10 Information and Value Creation

control. The Porsche tested by Car and Driver magazine may be the one lemon out of a thousand to roll off of the Stuttgart assembly line. When quality is measured with noise, the objectively best quality seller in a market may not receive the top ranking. Even so, the report card can still be valuable. As long as rankings are positively correlated with actual quality, consumers will still steer their business to higher ranking sellers, and sellers will gain customers by making further improvements to quality. But these benefits will be muted. Consumers may pay attention to other product attributes such as price and location if they are not sure about the accuracy of quality rankings, and sellers may invest less in quality if they are unsure whether the investment will translate into a higher market share.

Unfortunately, noise is not the only problem afflicting some report cards. Report cards that cover some aspects of performance but not others can encourage a problem known as multitasking (or what is commonly known as “teaching to the test”). And if a report card score depends on input from the customer, sellers may shun some business in order to boost their score. This is known as selection. Unless due care is taken with the construction of report cards, multitasking and selection can do more harm than good.

Multitasking: Teaching to the Test

Report cards usually measure some aspects of a product’s performance but not others, perhaps focusing on metrics that “make sense” and that are easy to measure. The result can be report cards that do more harm than good. To understand the danger of the quick and dirty report card, we must introduce the concept of agency, which we describe in much more detail in Chapter 12. In an agency relationship, one party (the agent) is hired by another (the principal) to take actions or make decisions that affect the payoff to the principal. In the present context, we can think of the architect of the report card as the principal who hopes to improve the performance of the agent.

When designing a contract or a report card, the principal needs to be aware of the potential for multitasking. In layperson terms, multitasking means that someone is trying to do two or more things at once. In economics, the multitasking problem arises when efforts to promote improvements on one dimension of performance are confounded by changes in other dimensions of performance. This is sometimes known as teaching to the test. These efforts to promote improvement can involve direct financial incentives, such as when you pay your son to clean the bedroom and he spends less time cleaning up after himself in the kitchen. But they can also involve report cards. For example, if automobile report cards emphasize fuel economy, manufacturers might respond by making their cars lighter and thereby jeopardizing safety.

Bengt Holmstrom and Paul Milgrom explain that multitasking is a potential problem whenever two things occur simultaneously6:

Incentive contracts or report cards are incomplete in the sense that they do not cover all relevant aspects of performance.

The agent (or the son in the domestic example) has limited resources that must be allocated across tasks, where different tasks affect different aspects of performance.

Unfortunately, these two conditions are present for nearly all experience and credence goods. The result is that a contract or report card designed to boost some aspects of performance will necessarily affect other aspects of performance not covered by the contract or report card. Example 10.3 presents an example of multitasking in education.

Report Cards 343

EXAMPLE 10.3 TEACHERS TEACHING TO THE TEST7

Accountability is the new buzzword in American public school education. The most important example is the federal “No Child Left Behind” Act, one of the signature achievements of President George W. Bush. This Act requires all government-run schools to administer annual statewide standardized tests. Schools must demonstrate regular improvements in test scores or be subject to various forms of oversight and restructuring, including replacement of staff. Students at schools that fail to show improvement must be given the option to enroll elsewhere, taking enrollment-based funding with them to the new school.

Although most schools have demonstrated test score gains since No Child Left Behind was implemented, critics contend that the Act has caused teachers to emphasize performance on a single standardized test at the expense of broader and potentially more important skills. Although the jury is still out on the Act, research on similar laws that have been tried in various states is not encouraging. The evidence seems to show that America’s public school teachers are very good at teaching to the test.

Texas was one of the first states to introduce accountability via standardized testing. In 1994, Texas introduced the Texas Assessment of Academic Skills (TAAS) program, which was designed to emphasize higherorder thinking skills. All students in grades 3–8 took the exam, and students had to pass an eleventh-grade-level exam to graduate from high school. The state held teachers and principals accountable for student performance by linking compensation and promotion to test scores. The TAAS program ended in 2003, when it was superseded by a new program tailored to meet the requirements of No Child Left Behind.

The TAAS program appeared to be an immediate success. Between 1994 and 1998, math and reading scores increased dramatically and the achievement gap between whites and minorities narrowed. However, other

evidence suggests that the improvement on the high-stakes TAAS test may have masked more subtle changes in the curriculum. TAAS did not examine science, social studies, or art, and many schools suspended these classes for weeks so that students could have extra time preparing for the math questions on the TAAS. In some cases, social studies and art teachers spent significant time performing grammar drills, while math teachers drilled basic skills rather than teach higher-level concepts not covered by the TAAS. Some evidence of the impact of these curricular changes appears in test scores on a “low-stakes” test, the National Assessment of Educational Progress (NAEP), which was not linked to compensation. Overall NAEP scores did not dramatically improve, while the white-minority achievement gap on the NAEP actually increased. The latter may reflect concern on the part of teachers at predominantly minority schools that their students would have difficulty passing the TAAS.

Many other cities and states hold their schools accountable for student performance on just one or two high-stakes standardized tests. Studies of schools in Chicago and Florida confirm the research evidence from Texas. Students score much higher on the high-stakes test but show much smaller gains, if any, on low-stakes tests. These findings give comfort to both sides of the debate about using standardized tests for accountability. When schools are held accountable, administration and teachers respond. But accountability has unintended consequences. If we are testing math but not science, we had better be certain that we want our students to get better at math but not science.

Recent events in Georgia suggest that the unintended consequences of accountability can deteriorate from debatable changes in curriculum to something far worse. In July 2011, 178 teachers and principals at 44 Atlanta schools were found to be directly involved in cheating on the state’s standardized test.

344 Chapter 10 Information and Value Creation

Bingxiao Wu provides an interesting example of “teaching to the test” by fertility clinics.8 In the early 2000s, the U.S. Department of Health and Human Services (HHS; the principal) began publishing report cards on fertility clinics (the agents). Originally, HHS highlighted the percentage of treatments that resulted in a live birth, and after the report cards were published, the live birth scores improved significantly. But did this reflect improvements made through superior technology and training, or was something else going on? Clinics had another option for increasing their live birth score: they could implant multiple embryos, which also increases the chances of multiple live births. Sure enough, this is exactly what happened. A few years later, HHS revised the report card to highlight both the single live birth and multiple live birth scores. The multiple live birth rates fell significantly, while the single live birth rates increased above the baseline (pre report card) levels.

This example has two important features. First, HHS was able to include all relevant outcomes in its final report card. Wu’s results suggest that including all relevant outcomes eliminates harmful teaching to the test. This is easy in the case of fertility clinics, where two measures are sufficient to capture almost everything that might matter to a patient. In its widely read automobile report cards, Consumer Reports presents dozens of dimensions of quality and probably omits many others that consumers might care about. In many situations, it is simply impractical to report all relevant outcomes. Consider a report card for the treatment of prostate cancer. Relevant outcomes include mortality, pain, incontinence, and impotence. It would be very costly and perhaps impossible to measure and report all of these outcomes.

A second important feature of fertility clinic report cards is that the steps initially taken by clinics to improve their scores on the reported live birth score (they implanted multiple embryos) harmed performance on the unreported multiple birth score. The results of multitasking do not always offset each other. Reporting prostate cancer mortality rates might cause doctors to improve their surgical technique (for example, through retraining). This might lead to reductions in all of the relevant outcomes, not just mortality. When constructing report cards, it is therefore useful to consider how the measured dimensions and unmeasured actions interact. Suppose that the principal cares about two dimensions of performance X and Y but can only measure and/or pay for X. We expect the agent to increase X. The agent will also increase Y if X and Y are complements in production and will decrease Y if they are substitutes in production. This is summarized in Table 10.2.

What to Measure

Most report cards contain several different quality measures. For example, the Times Higher Education ranking of world universities is a weighted average of five scores including teaching, international mix, and research. Other rankings of universities may include admission percentages and yields (percentage of admitted students who enroll), and still others may report student satisfaction, the size of the library, and the success

TABLE 10.2

Does Paying for X Result in Less of Y?

Pay for X and:

X and Y are complements X and Y are substitutes

Get more of X and more of Y Get more of X and less of Y

Report Cards 345

of the athletic program. To make sense of this dizzying array of report cards, we begin our discussion of quality measurement by providing a taxonomy of quality measures.

Health care sociologists have developed a taxonomy of health care quality measures. Donabedian divides quality measures into several categories.9

Outcome: This is what consumers ultimately care about. Most outcome measures tend to be specific to the good or service.

Process: Does the seller use accepted practices to produce the good or service? Process measures are useful if outcomes are hard to measure, there are concerns about multitasking, and good processes are known to lead to desirable outcomes. Process measures can promote multitasking by encouraging agents to invest in the reported processes but scale back on unreported processes.

Input: Is labor well trained? Does the seller use the latest manufacturing technologies? Input measures are useful if outcomes are hard to measure, there are concerns about multitasking, and good inputs are known to lead to desirable outcomes. Input measures can promote multitasking by encouraging agents to invest in the reported inputs but scale back on unreported inputs.

Table 10.3 gives examples of measures in each category for a range of goods and services. As seen in the table, process measures are more common for services than they are for products.

Consumers mostly care about outcomes. Most diners want to know if the food is good (an outcome measure), but only the most dedicated foodies pay much attention to the method of preparation (process) or how the kitchen staff is trained (input). Parents will tolerate most any curriculum (process) and teacher credentials (input) if their children get high test scores and gain admission to good colleges (outcomes). But

TABLE 10.3

Examples of Outcome, Process, and Input Measures of Quality

Industry

Outcome

Process

Input

Automobiles

Acceleration, braking,

 

 

 

fuel economy, safety

 

 

Smart Phones

Speed, network coverage

 

Operating system, number of

 

 

 

available applications

Air Travel

On-time arrival rates;

 

Pilot training, frequency of

 

lost baggage

 

schedule, composition of the

 

 

 

fleet

Restaurants

Customer satisfaction,

Method of food

Training of kitchen staff,

 

hygiene

preparation

freshness and source of

 

 

 

ingredients

Hair Salons

Customer satisfaction

 

Products used (e.g., Aveda)

Education

Test scores, college

Curriculum (e.g.,

Certification of teachers

 

placement, income of

number of Advanced

 

 

graduates

Placement Classes)

 

Hospitals

Customer satisfaction,

Prescribe appropriate

Staffing of hospitals,

 

mortality, morbidity

tests, procedures, and

credentials of doctors,

 

 

drugs

availability of latest technology

 

 

 

 

346 Chapter 10 Information and Value Creation

there are several reasons why it might make sense to report process and input measures of quality in addition to, or instead of, outcome measures:

Outcome data may be unavailable. When ranking universities, it is essential to consider the quality of teaching. But it is difficult to develop an outcome-based measure of teaching quality. Instead, report cards like Times Higher Education rely on input measures such as faculty/student ratios and the PhD/bachelor’s degree ratio (claiming that universities that award many doctorates have a more research-led teaching environment, which is assumed to enhance teaching).

It may be difficult to obtain outcome measures for a large sample. This can cause statistical imprecision when there is measurement noise. For example, it is common to rank public schools based on the standardized test scores of graduating students. But there are typically fewer than 100 students in a given grade in a U.S. public school, so a school’s ranking will have a big random component, based on which children were graduating that year, how they felt on the day of the test, and the specific questions appearing on that test. As a result, a high-quality school may have a low ranking due merely to random chance. In addition, a school near the top of the rankings might be statistically indistinguishable from a school at the bottom, making it difficult to base economic decisions on report card rankings.

Noisy rankings often exhibit mean reversion. In general, firms with high scores will have more than their share of good luck; that is, the noisiness in the rankings has worked in their favor. Luck tends to even out (or it wouldn’t be luck), so some high-scoring firms are likely to see their scores “revert to the mean” in subsequent rankings. For example, a public school may report high SAT scores because it happens to have an exceptional group of students that year. Next year, those unusually high scores are likely to be replaced by scores closer to the average. All organizations, be they schools or car makers, should avoid blowing their own horn when they receive unusually high report card scores. Nor should consumers be surprised when a seller’s high ranking reverts to the mean.

In some cases, differences in outcomes across sellers may reflect differences in the customers, rather than differences in seller quality. In health care, this is dealt with through case mix adjustment, which we describe in Example 10.5.

Given these problems, it may not be feasible or desirable to report outcomes. Certifiers should consider using process and input measures when the following conditions hold:

1.Process and input measures are positively linked to favorable outcomes. For example, research demonstrates that giving beta blocker drugs to heart attack patients improves their prospects for survival. Some health care report cards list the percentage of heart attack patients who have been prescribed beta blockers.

2.It is relatively inexpensive to measure process and inputs, and the same measures can be obtained from all firms. Data on beta blockers can be readily obtained from health care claims data.

3.Processes and inputs are not easily manipulated through multitasking. It takes virtually no time for a physician to prescribe beta blockers. It is unlikely that this takes away from other important tasks.

Although process and input measures can be useful, most certifiers focus on outcomes, which are, after all, what consumers ultimately care about. Outcome measures

Report Cards 347

EXAMPLE 10.4 CALORIE POSTING IN NEW YORK CITY RESTAURANTS

The United States is the most obese nation in the world. As of 2010, about one-third of U.S. adults were obese (with a body mass index over 30). On average, obesity reduces life expectancy by six to seven years and has been estimated to cost the U.S. economy at least $150 billion annually. Much of the blame goes to Americans’ penchant for eating out, especially at fast-food restaurants where a burger, fries, and soft drink can easily top 1,000 calories and 40 grams of fat.

Many policy analysts believe that Americans would avoid Big Macs, fried chicken, and donuts if only they knew how unhealthy the food was.

In 2006, the New York City Board of Health approved a new rule mandating calorie posting by restaurants. Following several legal challenges, the law was implemented in mid2008. Other cities and states would soon follow suit. New York City health inspectors verify the information and can fine restaurants up to $2,000 for noncompliance. Economists Bryan Bollinger and Philip Leslie have performed the first systematic study of whether such “calorie posting” affects American’s food choices.10 The results are perhaps slightly discouraging to those who hoped that a little bit of disclosure would help solve America’s obesity epidemic.

Bollinger and Leslie obtained information on over 100 million transactions at all Starbucks coffee shops in New York City, Boston, and Philadelphia, over a 14-month period spanning the implementation of the law. (Boston and Philadelphia serve as controls for time trends that might affect menu choices.) Starbucks fans may be surprised that the economists did not study McDonald’s or Kentucky Fried Chicken. Their decision to study Starbucks was partly pragmatic—they had a personal connection that provided the data. But Starbucks is not off the hook when it comes to filling out Americans’ waistlines: a “grande” caffé mocha and muffin can easily top 750 calories. In fact, one might argue that consumers already know that a Big Mac and fried chicken have a lot of calories, but the high caloric content of a simple Starbucks snack might have come as a big

surprise. Studying Starbucks offered another advantage: many patrons use a “Starbucks Card” that entitles them to special deals but also allows the economists to track the same customer’s purchases over time.

Bollinger and Leslie obtained the following results:

Mandatory calorie posting causes the average calories per transaction to decline by 6 percent, from 247 to 232 calories per transaction. The effect was long lasting; there was no evidence that, over time, consumers regressed to their old habits.

Average beverage calories per transaction changed little, if at all; almost all of the calorie reduction came from reduced food purchases or substitution to lower calorie food items.

Customers who averaged more than 250 calories prior to calorie posting (these would mainly be customers who made food purchases) decreased calories per transaction by 26 percent.

Starbucks did not experience a statistically significant change in revenue overall. However, Starbucks stores located within 100 meters of a Dunkin’ Donuts experienced a 3 percent increase in revenue. It could be that Dunkin’ Donuts customers discovered that a seemingly innocuous poppy seed bagel with cream cheese contains 560 calories, more than any food product at Starbucks.

As a final note, Bollinger and Leslie conjecture that none of this would have occurred without mandatory disclosure. Starbucks would probably be unwilling to be the first to disclose, inasmuch as its relatively “healthy” offerings pack hundreds of calories and might cause customers to take their business elsewhere. And if Starbucks did not begin the unraveling process, it is doubtful that Dunkin’ Donuts, McDonald’s, and the like would voluntarily disclose.

348 Chapter 10 Information and Value Creation

usually vary by industry. Health care report cards list mortality rates. Automobile report cards detail acceleration and braking statistics as well as reliability. The state of New York reports the nutritional content of restaurant food, as discussed in Example 10.4. One outcome measure that cuts across industries is customer satisfaction, which simply indicates whether current and past customers like the good or service. This is what individuals learned from each other when they met in the town squares of yore. Amazon and other web sites are the modern equivalents of the town square, with the added benefit that they aggregate the opinions of hundreds of customers. Consumers can glean the Internet and other sources to obtain customer satisfaction ratings for automobiles, electronics, books, music, movies, restaurants, universities . . . nearly anything that is available for purchase.

Despite the ubiquity of customer satisfaction report cards, a prospective consumer who uses customer satisfaction ratings to compare sellers is necessarily relying on a noisy and possibly biased quality measure, for several reasons:

Different customers may use different criteria to measure quality. One person may be satisfied with a restaurant because the food is fresh, while others may value flawless execution in the kitchen, attentive service, or an innovative wine list. As a result, different customers may generate different rankings.

Customers have an incentive to exaggerate their ratings in order to influence the average score. For example, someone who liked but did not love the movie The Hangover 2 might give the DVD an Amazon.com rating of five stars (out of five) so as to offset the low ratings given by most other reviewers, thereby helping to move the average higher.

Customers may be reluctant to leave negative feedback. One study found that less than 1 percent of eBay buyers offer negative feedback.11 Researchers have

attributed the lack of negative feedback to desires to be “nice” and to fear of seller retaliation.12 The desire to be nice is especially important when face-to- face purchases are required.

Consumer feedback is unverifiable. As a result, individuals may offer feedback without having consumed the product. This invites sellers to leave favorable feedback of their own products while disparaging competitors, as is commonly alleged about reviews at Amazon, Angie’s List (a web site that provides detailed customer feedback on local service providers), and other sites that do not make much effort to verify the veracity of the reviewer.

There are several ways to adjust consumer satisfaction scores to deal with many of these problems. Certifiers can report median rather than mean scores, thereby removing consumer incentives to exaggerate their scores. Certifiers can offer rewards to individuals whose ratings predict peer ratings.13 Certifiers can also report scores on individual dimensions (e.g., Zagat’s reports separate scores for a restaurant’s food quality, ambience, and service) or provide a weighted average score that gives more weight to more important dimensions of quality. The weights can be derived using the regression and survey techniques discussed in the Appendix to Chapter 9.

Consumer satisfaction ratings may suffer from several other shortcomings. Response rates to voluntary surveys are notoriously low. For example, only about 10 percent of Consumer Reports subscribers respond to their annual survey. The result is motivation bias, whereby avid fans and disgruntled customers are disproportionately likely to turn in their surveys, leaving the average consumer unsure about how to

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]