
Словари и журналы / Психологические журналы / p109Journal of Occupational and Organizational Psychology
.pdf
109
Journal of Occupational and Organizational Psychology (2002), 75, 109–114
© 2002 The British Psychological Society
www.bps.org.uk
Short research n ote
Exercise order and assessment centre performance
Peter Bycio* and Baniyelme Zoogah
Williams College of Business, Xavier University, USA
Ratings from an operational assessment centre (AC) were examined as a function of the order that candidates participated in the assessments. Assessment scheduling had a signiŽ cant multivariate effect on the preconsensus exercise-based ratings. The effect was small, however, accounting for at most 1% of the rating variance. Differences in exercise order were seen as being unlikely to result in serious unfairness to applicants, although programme designers should be sensitive to the possibility of early vs. late-day performance differences.
The assessment centre (AC) method has been used for many years to help make a variety of employment decisions (Thornton, 1992). Use of the method typically requires that candidates participate in a series of assessments, including job-related situational exercises. Assessor judgments are then made based on their observations from these multiple sources.
As with most tests, it is important that the AC experience be as standardized as possible. However, there are many ways in which non-standardization can creep in to potentially create unfairness (Cohen, 1978). These include design issues such as differences in the order that candidates participate in the assessments and variations in the length and nature of the breaks between assessments.
Cohen and Sands (1978) evaluated the impact of the order in which candidates participated in the exercises. Although they did not find significant differences, at least two aspects of their study may have masked a true effect. First, their order manipulation involved only two exercises in which assessees either completed the In-basket or a Leadership problem first. Also, the sample size was small (N=67) resulting in a relative lack of statistical power. In the present investigation, exercise order was re-examined using a substantially stronger manipulation involving 10 different orders of 9 assessment activities. Moreover, a large sample size was used, which increased statistical power.
Although specific hypotheses were not advanced, given only one previous study in the area, some empirical work supports the notion that exercise order could be a concern. For example, AC participation is anxiety provoking, even for otherwise well-adjusted candidates and these anxiety levels correlate negatively with overall AC performance (Fletcher & Kerslake, 1993). Practice in situational exercises has been proposed as a way to prevent undue anxiety and promote more accurate assessment
*Requests for reprints should be addressed to Peter Bycio, Williams College of Business, Xavier University, 3800 Victory Parkway, Cincinnati, Ohio 45207-5163, USA (e-mail: bycio@xu.edu).
110 Peter Bycio and Baniyelme Zoogah
(Fletcher & Kerslake, 1993). Thus, in terms of exercise order, candidates might benefit from a calming familiarity effect when they perform sequentially in similar assessments relative to those who face frequent changes. For example, since assessee performance has been shown to vary with exercise form (individually oriented role plays vs. group discussions) (see Schneider & Schmitt, 1992) the potential impact of this type of change across orders is worthy of investigation.
Many other kinds of fundamental change are typically embedded within exercise orders, which could differentially impact candidate anxiety and performance. These include whether the assessor is a passive observer or active participant (Zedeck, 1986) and the degree to which the assessment and/or required candidate response is standardized (Thornton, 1992). One might also wonder if performance in a given assessment is enhanced when preceded by a lunch break? The 10 orders examined in this study allowed for an empirical evaluation of the potential impact of many of these variables.
Method
Subjects
Data from a 1-day AC designed to meet the selection and developmental objectives for the position of manufacturing supervisor were studied. A group of 12 candidates per AC were evaluated by five assessors. The sample consisted of ratings of either 1862 or 1914 candidates (depending on the variables involved) who were evaluated during the first 3 years of the programme.
AC activities
There were nine major AC activities, excluding lunch. Five were situational exercises including two group-oriented forms: the Problem Solving Group (PSG) and the Human Relations Group (HRG) and three individually oriented ones: the 24-item In-basket and subsequent Interview (IB-IN), the Role-Play (RP), and the Interview (INTER). See Bycio, Alvares, and Hahn (1987) for further details concerning the exercises.
The four remaining activities included the completion of a Personal Information Questionnaire (QUEST) that was the basis for questions during the INTER. A self-administered (no assessor present) 1-hour Videotape exercise (VIDEO) was used where candidates responded to a questionnaire concerning job-related situations. The responses were available at the consensus meeting, although no performance rating was made. Finally, an AC coordinator administered two paper-and-pencil tests concerning mechanical comprehension (MEC) and numerical (NUM) ability. These results were available at the consensus meeting as well.
Exercise order
Within a given run of the AC, each of the 12 assessees experienced the programme in a different order except for candidates 1 and 7 as well as 4 and 10 for whom the sequence was the same. Potentially important variations with respect to form were present in that some had a series of individually oriented assessments back-to-back. Moreover, some orders had periods free from the direct scrutiny of assessors interspersed between assessor-present activities, whereas others began the day with three consecutive assessor-present exercises. Finally, some candidates completed the paper-and-pencil tests sequentially while others did not. The specifics of the 10 different orders are shown in Table 1.
Assessor ratings
Each of the five assessors saw a given candidate in a different exercise and rated eight managerial abilities along with overall performance in the activity. At a subsequent meeting, assessors

Table 1. Variations in exercise order
1a |
2 |
3 |
4a |
5 |
6 |
7 |
8 |
9 |
10 |
|
|
|
|
|
|
|
|
|
|
PSG |
PSG |
PSG |
PSG |
PSG |
PSG |
PSG |
PSG |
PSG |
PSG |
IB-IN |
VIDEO |
RP |
IB-IN |
VIDEO |
RP |
RP |
VIDEO |
RP |
VIDEO |
QUEST |
IB-IN |
QUEST |
QUEST |
IB-IN |
QUEST |
IB-IN |
QUEST |
IB-IN |
QUEST |
HRG |
QUEST |
IB-IN |
HRG |
QUEST |
IB-IN |
QUEST |
IB-IN |
QUEST |
IB-IN |
LUNCH |
LUNCH |
LUNCH |
LUNCH |
LUNCH |
LUNCH |
LUNCH |
LUNCH |
LUNCH |
LUNCH |
MEC |
MEC |
INTER |
MEC |
MEC |
INTER |
INTER |
MEC |
INTER |
MEC |
NUM |
INTER |
NUM |
NUM |
INTER |
NUM |
NUM |
INTER |
NUM |
INTER |
INTER |
NUM |
MEC |
INTER |
NUM |
MEC |
MEC |
NUM |
MEC |
NUM |
VIDEO |
RP |
HRG |
RP |
HRG |
VIDEO |
HRG |
RP |
VIDEO |
HRG |
RP |
HRG |
VIDEO |
VIDEO |
RP |
HRG |
VIDEO |
HRG |
HRG |
RP |
Note. PSG=Problem-Solving Group; IB-IN=In-basket and interview; QUEST=Personal Information Questionnaire; HRG=Human Relations Group; MEC=Mechanical Test; NUM=Numerical Test; INTER=Interview; VIDEO=Videotape; RP=Role-Play. An assessor was present for the PSG, IB-IN, HRG, INTER, and the RP.
aTwo of the 12 candidates experienced the program in this order.
order Exercise
111

112 |
Peter Bycio and Baniyelme Zoogah |
|
|
Table 2. Descriptive statistics for the dependent variables |
|
||
|
|
|
|
|
|
|
Standard |
|
|
|
deviation |
|
|
Mean (M ) |
(SD) |
|
|
|
|
Preconsensus dataa |
|
|
|
Problem Solving Group |
5.57 |
1.25 |
|
In-basket+Interview |
5.34 |
1.47 |
|
Role-Play |
5.76 |
1.31 |
|
Human Relations Group |
5.61 |
1.22 |
|
Interview |
5.74 |
1.25 |
|
Mechanical Test |
46.30 |
9.96 |
|
Numerical Test |
18.47 |
6.93 |
|
Consensus ratingsb |
|
|
|
Organizing and planning |
5.51 |
1.20 |
|
Analysing |
5.57 |
1.20 |
|
Decision making |
5.60 |
1.16 |
|
Controlling |
5.49 |
1.21 |
|
Oral communications |
5.87 |
1.17 |
|
Interpersonal relations |
5.76 |
1.00 |
|
In•uencing |
5.45 |
1.21 |
|
Flexibility |
5.57 |
1.09 |
|
Overall |
|
5.60 |
1.20 |
aN=1862 for preconsensus data. bN=1914 for consensus ratings.
reached consensus on the ability ratings and overall ACperformance. A9-point scale was used to collect all the ratings. The means (M) of the exercise-specific and consensus ratings ranged from 5.34 to 5.87. The standard deviations (SD) were between 1.00 and 1.47. Descriptive statistics for the dependent variables are shown in Table 2. Although inter-rater estimates were not collected, a high level of internal consistency reliability was reflected in the high correlations among the within-exercise ratings which averaged .75 (see Bycio et al., 1987).
Statistical analyses
Two one-way MANOVAs were performed using assessment order (10 levels) as the independent variable. The preconsensus data (the exercise-specific overall ratings and the MECand NUMtest scores) were used as dependent variables in the first MANOVA, while the consensus ratings of the eight abilities and of overall AC performance were used in the second. T-tests were employed to examine specific facets of variation within the orders.
Results
Exercise order had a significant multivariate effect on the preconsensus data (F(63,12964)=1.46, p<.01). Univariate analysis revealed that only performance in the RP (F(9,1852)=2.00, p<.04) and the HRG (F(9,1852)=1.97, p<.04) were significantly affected by exercise order. A significant post hoc Bonferroni t test revealed that candidates in Order 6 received an overall RP rating that was a half-point higher than those in Order 2. Further analysis indicated that the RP was the second exercise of the day in Order 6, whereas it was the second last Order 2 assessment. Moreover, unlike
Exercise order |
113 |
any of the other exercises, the RP was always scheduled either near the beginning or near the end of the day. In fact, when early-day RP performance (in four orders) was compared to the late-day outcome (six orders), the difference was significant (t(1987)=2.74, p<.01), although small in magnitude at .17.
Another indication that the multivariate order effect on the preconsensus data was not especially robust was the lack of significant mean differences involving the HRG. Moreover, when the sample was split, based on the overall AC performance rating, the multivariate preconsensus effect was sustained for the below-average performers (F(63,4765)=1.45, p<.01) but not for the above-average ones. For the sample as a whole, exercise order accounted for only about 1%of the variance in the RP and HRG ratings.
The MANOVA involving the overall consensus judgments was not significant (F(81,17136)=1.16, p<.15). Exercise order accounted for less than 1%of the variance in the overall rating of AC performance.
Although few of the overall effects were significant, we evaluated the possibility of differences within certain targeted subsets of the orders. For example, is performance on a given standardized paper-and-pencil test stronger in orders where it is immediately preceded by a test of a similar type? Similarly, is performance in a given one-on- one exercise (such as the RP) stronger in orders where it follows another one-on-one assessment (such as the INTER) as opposed to a group exercise (the PSG)? Further, is performance in a given assessment stronger in orders where it immediately follows a lunch break as opposed to when it does not? The only significant finding associated with these types of tests was not as expected. Performance in the one-on-one RP was stronger in orders where it immediately followed the PSG as opposed to when it followed another one-on-one assessment, the INTER (t(996)= - 2.22, p<.05).
Discussion
This study examined order differences in AC performance using a robust manipulation and a large sample size. Performance in the RP was stronger when the exercise was held in the morning as opposed to the late afternoon. Contrary to expectations, RP performance was also stronger when it followed the group-oriented PSGas opposed to the one-on-one INTER. Finally, all of the effects were small in magnitude and typically failed to hold when the analysis was confined to those who performed well in the AC overall.
Given the number of orders examined, more consistent meaningful performance effects might have been expected. However, not all of the potentially important forms of order variation were examined. For example, in no case did candidates participate in back-to-back group-oriented exercises. Also, although the AC under investigation was centrally developed and monitored, it was implemented at a variety of plants by different programme coordinators and assessors. Thus, although there were no significant order effects within any of the frequently used locations, site variation has the potential to complicate the evaluation of the programme as a whole (e.g. Clingenpeel, 1988).
At this point, it appears reasonable to conclude that order effects if any, are likely to be small. Further, while there is no evidence that exercise form is an important sequencing variable, programme designers should be sensitive to the possibility of early versus late-day performance differences.
114 Peter Bycio and Baniyelme Zoogah
Acknowledgements
We thank Joyce Allen and the anonymous reviewers for their feedback concerning earlier versions of this paper. This research was supported by funding from the D. J. O’Conor Chairship.
References
Bycio, P., Alvares, K. M., & Hahn, J. (1987). Situational specificity in assessment center ratings: A confirmatory factor analysis. Journal of Applied Psychology, 72, 463–474.
Clingenpeel, R. E. (1988, April). A research program on General Motors’ foreman selection assessment center. Symposium conducted at the Third Annual Conference of the Society For Industrial and Organizational Psychology, Dallas, Texas.
Cohen, S. L. (1978). How well standardized is your organization’s assessment center? The Personnel Administrator, 23, 41–51.
Cohen, S. L., & Sands, L. (1978). The effects of order of exercise presentation on assessment center performance: One standardization concern. Personnel Psychology, 31, 35–46.
Fletcher, C., & Kerslake, C. (1993). Candidate anxiety level and assessment center performance.
Journal of Managerial Psychology, 8, 19–23.
Schneider, J. R., & Schmitt, N. (1992). An exercise design approach understanding assessment centre dimension and exercise constructs. Journal of Applied Psychology, 77, 32–41.
Thornton, G. C., III (1992). Assessment centers in human resource management. New York: Addison-Wesley.
Zedeck, S. (1986). A process analysis of the assessment center method. In B. M. Staw & L. L. Cummings (Ed.), Research in organizational behavior (pp. 259–296). Greenwich, CT: JAI Press.
Received 7 September 1999; revised version received 2 February 2001