Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
framework_en.pdf
Скачиваний:
68
Добавлен:
14.02.2016
Размер:
1.16 Mб
Скачать

Assessment

Indirect assessment, on the other hand, uses a test, usually on paper, which often assesses enabling skills.

Direct assessment is effectively limited to speaking, writing and listening in interaction, since you can never see receptive activity directly. Reading can, for example, only be assessed indirectly by requiring learners to demonstrate evidence of understanding by ticking boxes, finishing sentences, answering questions, etc. Linguistic range and control can be assessed either directly through judging the match to criteria or indirectly by interpreting and generalising from the responses to test questions. A classic direct test is an interview; a classic indirect test is a cloze.

Descriptors defining different aspects of competence at different levels in Chapter 5 can be used to develop assessment criteria for direct tests. The parameters in Chapter 4 can inform the selection of themes, texts and test tasks for direct tests of the productive skills and indirect tests of listening and reading. The parameters of Chapter 5 can in addition inform the identification of key linguistic competences to include in an indirect test of language knowledge, and of key pragmatic, sociolinguistic and linguistic competences to focus on in the formulation of test questions for item-based tests of the four skills.

9.3.7 Performance assessment/knowledge assessment

Performance assessment requires the learner to provide a sample of language in speech or writing in a direct test.

Knowledge assessment requires the learner to answer questions which can be of a range of different item types in order to provide evidence of the extent of their linguistic knowledge and control.

Unfortunately one can never test competences directly. All one ever has to go on is a range of performances, from which one seeks to generalise about proficiency. Proficiency can be seen as competence put to use. In this sense, therefore, all tests assess only performance, though one may seek to draw inferences as to the underlying competences from this evidence.

However, an interview requires more of a ‘performance’ than filling gaps in sentences, and gap-filling in turn requires more ‘performance’ than multiple choice. In this sense the word ‘performance’ is being used to mean the production of language. But the word ‘performance’ is used in a more restricted sense in the expression ‘performance tests’. Here the word is taken to mean a relevant performance in a (relatively) authentic and often work or study-related situation. In a slightly looser use of this term ‘performance assessment’, oral assessment procedures could be said to be performance tests in that they generalise about proficiency from performances in a range of discourse styles considered to be relevant to the learning context and needs of the learners. Some tests balance the performance assessment with an assessment of knowledge of the language as a system; others do not.

This distinction is very similar to the one between direct and indirect tests. The Framework can be exploited in a similar way. The Council of Europe specifications for different levels (Waystage, Threshold Level, Vantage Level) offer in addition appropriate detail on target language knowledge in the languages for which they are available.

187

Common European Framework of Reference for Languages: learning, teaching, assessment

9.3.8 Subjective assessment/objective assessment

Subjective assessment is a judgement by an assessor. What is normally meant by this is the judgement of the quality of a performance.

Objective assessment is assessment in which subjectivity is removed. What is normally meant by this is an indirect test in which the items have only one right answer, e.g. multiple choice.

However the issue of subjectivity/objectivity is considerably more complex.

An indirect test is often described as an ‘objective test’ when the marker consults a definitive key to decide whether to accept or reject an answer and then counts correct responses to give the result. Some test types take this process a stage further by only having one possible answer to each question (e.g. multiple choice, and c-tests, which were developed from cloze for this reason), and machine marking is often adopted to eliminate marker error. In fact the objectivity of tests described as ‘objective’ in this way is somewhat over-stated since someone decided to restrict the assessment to techniques offering more control over the test situation (itself a subjective decision others may disagree with). Someone then wrote the test specification, and someone else may have written the item as an attempt to operationalise a particular point in the specification. Finally, someone selected the item from all the other possible items for this test. Since all those decisions involve an element of subjectivity, such tests are perhaps better described as objectively scored tests.

In direct performance assessment grades are generally awarded on the basis of a judgement. That means that the decision as to how well the learner performs is made subjectively, taking relevant factors into account and referring to any guidelines or criteria and experience. The advantage of a subjective approach is that language and communication are very complex, do not lend themselves to atomisation and are greater than the sum of their parts. It is very often difficult to establish what exactly a test item is testing. Therefore to target test items on specific aspects of competence or performance is a lot less straightforward than it sounds.

Yet, in order to be fair, all assessment should be as objective as possible. The effects of the personal value judgements involved in subjective decisions about the selection of content and the quality of performance should be reduced as far as possible, particularly where summative assessment is concerned. This is because test results are very often used by third parties to make decisions about the future of the persons who have been assessed.

Subjectivity in assessment can be reduced, and validity and reliability thus increased by taking steps like the following:

developing a specification for the content of the assessment, for example based upon a framework of reference common to the context involved

using pooled judgements to select content and/or to rate performances

adopting standard procedures governing how the assessments should be carried out

providing definitive marking keys for indirect tests and basing judgements in direct tests on specific defined criteria

requiring multiple judgements and/or weighting of different factors

undertaking appropriate training in relation to assessment guidelines

188

Assessment

checking the quality of the assessment (validity, reliability) by analysing assessment data

As discussed at the beginning of this chapter, the first step towards reducing the subjectivity of judgements made at all stages in the assessment process is to build a common understanding of the construct involved, a common frame of reference. The Framework seeks to offer such a basis for the specification for the content and a source for the development of specific defined criteria for direct tests.

9.3.9 Rating on a scale/rating on a checklist

Rating on a scale: judging that a person is at a particular level or band on a scale made up of a number of such levels or bands.

Rating on a checklist: judging a person in relation to a list of points deemed to be relevant for a particular level or module.

In ‘rating on a scale’ the emphasis is on placing the person rated on a series of bands. The emphasis is vertical: how far up the scale does he/she come? The meaning of the different bands/levels should be made clear by scale descriptors. There may be several scales for different categories, and these may be presented on the same page as a grid or on different pages. There may be a definition for each band/level or for alternate ones, or for the top, bottom and middle.

The alternative is a checklist, on which the emphasis is on showing that relevant ground has been covered, i.e. the emphasis is horizontal: how much of the content of the module has he/she successfully accomplished? The checklist may be presented as a list of points like a questionnaire. It may on the other hand be presented as a wheel, or in some other shape. The response may be Yes/No. The response may be more differentiated, with a series of steps (e.g. 0–4) preferably with steps identified with labels, with definitions explaining how the labels should be interpreted.

Because the illustrative descriptors constitute independent, criterion statements which have been calibrated to the levels concerned, they can be used as a source to produce both a checklist for a particular level, as in some versions of the Language Portfolio, and rating scales or grids covering all relevant levels, as presented in Chapter 3, for self-assessment in Table 2 and for examiner assessment in Table 3.

9.3.10Impression/guided judgement

Impression: fully subjective judgement made on the basis of experience of the learner’s performance in class, without reference to specific criteria in relation to a specific assessment.

Guided judgement: judgement in which individual assessor subjectivity is reduced by complementing impression with conscious assessment in relation to specific criteria.

An ‘impression’ is here used to mean when a teacher or learner rates purely on the basis of their experience of performance in class, homework, etc. Many forms of subjective rating, especially those used in continuous assessment, involve rating an impression on the basis of reflection or memory possibly focused by conscious observation of

189

Common European Framework of Reference for Languages: learning, teaching, assessment

the person concerned over a period of time. Very many school systems operate on this basis.

The term ‘guided judgement’ is here used to describe the situation in which that impression is guided into a considered judgement through an assessment approach. Such an approach implies (a) an assessment activity with some form of procedure, and/or

(b)a set of defined criteria which distinguish between the different scores or grades, and

(c)some form of standardisation training. The advantage of the guided approach to judging is that if a common framework of reference for the group of assessors concerned is established in this way, the consistency of judgements can be radically increased. This is especially the case if ‘benchmarks’ are provided in the form of samples of performance and fixed links to other systems. The importance of such guidance is underlined by the fact that research in a number of disciplines has shown repeatedly that with untrained judgements the differences in the severity of the assessors can account for nearly as much of the differences in the assessment of learners as does their actual ability, leaving results almost purely to chance.

The scales of descriptors for the common reference levels can be exploited to provide a set of defined criteria as described in (b) above, or to map the standards represented by existing criteria in terms of the common levels. In the future, benchmark samples of performance at different common reference levels may be provided to assist in standardisation training.

9.3.11Holistic/analytic

Holistic assessment is making a global synthetic judgement. Different aspects are weighted intuitively by the assessor.

Analytic assessment is looking at different aspects separately.

There are two ways in which this distinction can be made: (a) in terms of what is looked for; (b) in terms of how a band, grade or score is arrived at. Systems sometimes combine an analytic approach at one level with a holistic approach at another.

a)What to assess: some approaches assess a global category like ‘speaking’ or ‘interaction’, assigning one score or grade. Others, more analytic, require the assessor to assign separate results to a number of independent aspects of performance. Yet other approaches require the assessor to note a global impression, analyse by different categories and then come to a considered holistic judgement. The advantage of the separate categories of an analytic approach is that they encourage the assessor to observe closely. They provide a metalanguage for negotiation between assessors, and for feedback to learners. The disadvantage is that a wealth of evidence suggests that assessors cannot easily keep the categories separate from a holistic judgement. They also get cognitive overload when presented with more than four or five categories.

b)Calculating the result: some approaches holistically match observed performance to descriptors on a rating scale, whether the scale is holistic (one global scale) or analytic (3–6 categories in a grid). Such approaches involve no arithmetic. Results are reported either as a single number or as a ‘telephone number’ across categories. Other more analytical approaches require giving a certain mark for a number of dif-

190

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]