- •Using the electronic version
- •Bookmarks
- •Moving around the text
- •Finding a word or phrase in the text
- •Using the hyperlinks in the text
- •Copying the text
- •Printing the text
- •CONTENTS
- •PREFATORY NOTE
- •NOTES FOR THE USER
- •SYNOPSIS
- •1 The Common European Framework in its political and educational context
- •1.2 The aims and objectives of Council of Europe language policy
- •1.4 Why is CEF needed?
- •1.5 For what uses is CEF intended?
- •1.6 What criteria must CEF meet?
- •2 Approach adopted
- •2.1.1 The general competences of an individual
- •2.1.2 Communicative language competence
- •2.1.3 Language activities
- •2.1.4 Domains
- •2.1.5 Tasks, strategies and texts
- •2.3 Language learning and teaching
- •2.4 Language assessment
- •3 Common Reference Levels
- •3.1 Criteria for descriptors for Common Reference Levels
- •3.2 The Common Reference Levels
- •3.3 Presentation of Common Reference Levels
- •3.4 Illustrative descriptors
- •Communicative activities
- •Strategies
- •3.5 Flexibility in a branching approach
- •3.6 Content coherence in Common Reference Levels
- •3.7 How to read the scales of illustrative descriptors
- •4 Language use and the language user/learner
- •4.1 The context of language use
- •4.1.1 Domains
- •4.1.2 Situations
- •4.1.3 Conditions and constraints
- •4.1.4 The user/learner’s mental context
- •4.2 Communication themes
- •4.3 Communicative tasks and purposes
- •4.3.4 Ludic uses of language
- •4.3.5 Aesthetic uses of language
- •4.4 Communicative language activities and strategies
- •4.4.1 Productive activities and strategies
- •4.4.2 Receptive activities and strategies
- •4.4.4 Mediating activities and strategies
- •4.4.5 Non-verbal communication
- •4.5 Communicative language processes
- •4.5.1 Planning
- •4.5.2 Execution
- •4.5.3 Monitoring
- •4.6 Texts
- •4.6.1 Texts and media
- •4.6.2 Media include:
- •4.6.3 Text-types include:
- •4.6.4 Texts and activities
- •5 The user/learner’s competences
- •5.1 General competences
- •5.1.1 Declarative knowledge
- •5.1.2 Skills and know-how
- •5.1.4 Ability to learn
- •5.2 Communicative language competences
- •5.2.1 Linguistic competences
- •5.2.2 Sociolinguistic competence
- •5.2.3 Pragmatic competences
- •6 Language learning and teaching
- •6.1 What is it that learners have to learn or acquire?
- •6.1.3 Plurilingual competence and pluricultural competence
- •6.1.4 Variation in objectives in relation to the Framework
- •6.2 The processes of language learning
- •6.2.1 Acquisition or learning?
- •6.2.2 How do learners learn?
- •6.3 What can each kind of Framework user do to facilitate language learning?
- •6.4 Some methodological options for modern language learning and teaching
- •6.4.1 General approaches
- •6.5 Errors and mistakes
- •7 Tasks and their role in language teaching
- •7.1 Task description
- •7.2 Task performance
- •7.2.1 Competences
- •7.2.2 Conditions and constraints
- •7.2.3 Strategies
- •7.3.1 Learner competences and learner characteristics
- •7.3.2 Task conditions and constraints
- •8.2 Options for curricular design
- •8.2.2 From the partial to the transversal
- •8.3 Towards curriculum scenarios
- •8.3.1 Curriculum and variation of objectives
- •8.3.2 Some examples of differentiated curriculum scenarios
- •8.4.1 The place of the school curriculum
- •8.4.3 A multidimensional and modular approach
- •9 Assessment
- •9.1 Introduction
- •9.2.2 The criteria for the attainment of a learning objective
- •9.3 Types of assessment
- •9.3.3 Mastery CR/continuum CR
- •9.3.5 Formative assessment/summative assessment
- •9.3.6 Direct assessment/indirect assessment
- •9.3.7 Performance assessment/knowledge assessment
- •9.3.8 Subjective assessment/objective assessment
- •9.3.9 Rating on a scale/rating on a checklist
- •9.3.10 Impression/guided judgement
- •9.3.11 Holistic/analytic
- •9.3.12 Series assessment/category assessment
- •9.4 Feasible assessment and a metasystem
- •General Bibliography
- •Descriptor formulation
- •Scale development methodologies
- •Intuitive methods:
- •Qualitative methods:
- •Quantitative methods:
- •Appendix B: The illustrative scales of descriptors
- •The Swiss research project
- •Origin and Context
- •Methodology
- •Results
- •Exploitation
- •Follow up
- •References
- •The descriptors in the Framework
- •Document B1 Illustrative scales in Chapter 4: Communicative activities
- •Document B2 Illustrative scales in Chapter 4: Communication strategies
- •Document B3 Illustrative scales in Chapter 4: Working with text
- •Document B4 Illustrative scales in Chapter 5: Communicative language competence
- •Document B5 Coherence in descriptor calibration
- •Appendix C: The DIALANG scales
- •The DIALANG project
- •The DIALANG assessment system
- •Purpose of DIALANG
- •Assessment procedure in DIALANG
- •Purpose of self-assessment in DIALANG
- •The DIALANG self-assessment scales
- •Source
- •Qualitative development
- •Translation
- •Calibration of the self-assessment statements
- •Other DIALANG scales based on the Common European Framework
- •Concise scales
- •Advisory feedback
- •References
- •Document C1 DIALANG self-assessment statements
- •Document C3 Elaborated descriptive scales used in the advisory feedback section of DIALANG
- •The ALTE Framework
- •The development process
- •Textual revision
- •Anchoring to the Council of Europe Framework
- •References
- •Document D1 ALTE skill level summaries
- •Document D2 ALTE social and tourist statements summary
- •Document D3 ALTE social and tourist statements
- •Document D4 ALTE work statements summary
- •Document D5 ALTE WORK statements
- •Document D6 ALTE study statements summary
- •Document D7 ALTE STUDY statements
- •Index
Assessment
Indirect assessment, on the other hand, uses a test, usually on paper, which often assesses enabling skills.
Direct assessment is effectively limited to speaking, writing and listening in interaction, since you can never see receptive activity directly. Reading can, for example, only be assessed indirectly by requiring learners to demonstrate evidence of understanding by ticking boxes, finishing sentences, answering questions, etc. Linguistic range and control can be assessed either directly through judging the match to criteria or indirectly by interpreting and generalising from the responses to test questions. A classic direct test is an interview; a classic indirect test is a cloze.
Descriptors defining different aspects of competence at different levels in Chapter 5 can be used to develop assessment criteria for direct tests. The parameters in Chapter 4 can inform the selection of themes, texts and test tasks for direct tests of the productive skills and indirect tests of listening and reading. The parameters of Chapter 5 can in addition inform the identification of key linguistic competences to include in an indirect test of language knowledge, and of key pragmatic, sociolinguistic and linguistic competences to focus on in the formulation of test questions for item-based tests of the four skills.
9.3.7 Performance assessment/knowledge assessment
Performance assessment requires the learner to provide a sample of language in speech or writing in a direct test.
Knowledge assessment requires the learner to answer questions which can be of a range of different item types in order to provide evidence of the extent of their linguistic knowledge and control.
Unfortunately one can never test competences directly. All one ever has to go on is a range of performances, from which one seeks to generalise about proficiency. Proficiency can be seen as competence put to use. In this sense, therefore, all tests assess only performance, though one may seek to draw inferences as to the underlying competences from this evidence.
However, an interview requires more of a ‘performance’ than filling gaps in sentences, and gap-filling in turn requires more ‘performance’ than multiple choice. In this sense the word ‘performance’ is being used to mean the production of language. But the word ‘performance’ is used in a more restricted sense in the expression ‘performance tests’. Here the word is taken to mean a relevant performance in a (relatively) authentic and often work or study-related situation. In a slightly looser use of this term ‘performance assessment’, oral assessment procedures could be said to be performance tests in that they generalise about proficiency from performances in a range of discourse styles considered to be relevant to the learning context and needs of the learners. Some tests balance the performance assessment with an assessment of knowledge of the language as a system; others do not.
This distinction is very similar to the one between direct and indirect tests. The Framework can be exploited in a similar way. The Council of Europe specifications for different levels (Waystage, Threshold Level, Vantage Level) offer in addition appropriate detail on target language knowledge in the languages for which they are available.
187
Common European Framework of Reference for Languages: learning, teaching, assessment
9.3.8 Subjective assessment/objective assessment
Subjective assessment is a judgement by an assessor. What is normally meant by this is the judgement of the quality of a performance.
Objective assessment is assessment in which subjectivity is removed. What is normally meant by this is an indirect test in which the items have only one right answer, e.g. multiple choice.
However the issue of subjectivity/objectivity is considerably more complex.
An indirect test is often described as an ‘objective test’ when the marker consults a definitive key to decide whether to accept or reject an answer and then counts correct responses to give the result. Some test types take this process a stage further by only having one possible answer to each question (e.g. multiple choice, and c-tests, which were developed from cloze for this reason), and machine marking is often adopted to eliminate marker error. In fact the objectivity of tests described as ‘objective’ in this way is somewhat over-stated since someone decided to restrict the assessment to techniques offering more control over the test situation (itself a subjective decision others may disagree with). Someone then wrote the test specification, and someone else may have written the item as an attempt to operationalise a particular point in the specification. Finally, someone selected the item from all the other possible items for this test. Since all those decisions involve an element of subjectivity, such tests are perhaps better described as objectively scored tests.
In direct performance assessment grades are generally awarded on the basis of a judgement. That means that the decision as to how well the learner performs is made subjectively, taking relevant factors into account and referring to any guidelines or criteria and experience. The advantage of a subjective approach is that language and communication are very complex, do not lend themselves to atomisation and are greater than the sum of their parts. It is very often difficult to establish what exactly a test item is testing. Therefore to target test items on specific aspects of competence or performance is a lot less straightforward than it sounds.
Yet, in order to be fair, all assessment should be as objective as possible. The effects of the personal value judgements involved in subjective decisions about the selection of content and the quality of performance should be reduced as far as possible, particularly where summative assessment is concerned. This is because test results are very often used by third parties to make decisions about the future of the persons who have been assessed.
Subjectivity in assessment can be reduced, and validity and reliability thus increased by taking steps like the following:
•developing a specification for the content of the assessment, for example based upon a framework of reference common to the context involved
•using pooled judgements to select content and/or to rate performances
•adopting standard procedures governing how the assessments should be carried out
•providing definitive marking keys for indirect tests and basing judgements in direct tests on specific defined criteria
•requiring multiple judgements and/or weighting of different factors
•undertaking appropriate training in relation to assessment guidelines
188
Assessment
•checking the quality of the assessment (validity, reliability) by analysing assessment data
As discussed at the beginning of this chapter, the first step towards reducing the subjectivity of judgements made at all stages in the assessment process is to build a common understanding of the construct involved, a common frame of reference. The Framework seeks to offer such a basis for the specification for the content and a source for the development of specific defined criteria for direct tests.
9.3.9 Rating on a scale/rating on a checklist
Rating on a scale: judging that a person is at a particular level or band on a scale made up of a number of such levels or bands.
Rating on a checklist: judging a person in relation to a list of points deemed to be relevant for a particular level or module.
In ‘rating on a scale’ the emphasis is on placing the person rated on a series of bands. The emphasis is vertical: how far up the scale does he/she come? The meaning of the different bands/levels should be made clear by scale descriptors. There may be several scales for different categories, and these may be presented on the same page as a grid or on different pages. There may be a definition for each band/level or for alternate ones, or for the top, bottom and middle.
The alternative is a checklist, on which the emphasis is on showing that relevant ground has been covered, i.e. the emphasis is horizontal: how much of the content of the module has he/she successfully accomplished? The checklist may be presented as a list of points like a questionnaire. It may on the other hand be presented as a wheel, or in some other shape. The response may be Yes/No. The response may be more differentiated, with a series of steps (e.g. 0–4) preferably with steps identified with labels, with definitions explaining how the labels should be interpreted.
Because the illustrative descriptors constitute independent, criterion statements which have been calibrated to the levels concerned, they can be used as a source to produce both a checklist for a particular level, as in some versions of the Language Portfolio, and rating scales or grids covering all relevant levels, as presented in Chapter 3, for self-assessment in Table 2 and for examiner assessment in Table 3.
9.3.10Impression/guided judgement
Impression: fully subjective judgement made on the basis of experience of the learner’s performance in class, without reference to specific criteria in relation to a specific assessment.
Guided judgement: judgement in which individual assessor subjectivity is reduced by complementing impression with conscious assessment in relation to specific criteria.
An ‘impression’ is here used to mean when a teacher or learner rates purely on the basis of their experience of performance in class, homework, etc. Many forms of subjective rating, especially those used in continuous assessment, involve rating an impression on the basis of reflection or memory possibly focused by conscious observation of
189
Common European Framework of Reference for Languages: learning, teaching, assessment
the person concerned over a period of time. Very many school systems operate on this basis.
The term ‘guided judgement’ is here used to describe the situation in which that impression is guided into a considered judgement through an assessment approach. Such an approach implies (a) an assessment activity with some form of procedure, and/or
(b)a set of defined criteria which distinguish between the different scores or grades, and
(c)some form of standardisation training. The advantage of the guided approach to judging is that if a common framework of reference for the group of assessors concerned is established in this way, the consistency of judgements can be radically increased. This is especially the case if ‘benchmarks’ are provided in the form of samples of performance and fixed links to other systems. The importance of such guidance is underlined by the fact that research in a number of disciplines has shown repeatedly that with untrained judgements the differences in the severity of the assessors can account for nearly as much of the differences in the assessment of learners as does their actual ability, leaving results almost purely to chance.
The scales of descriptors for the common reference levels can be exploited to provide a set of defined criteria as described in (b) above, or to map the standards represented by existing criteria in terms of the common levels. In the future, benchmark samples of performance at different common reference levels may be provided to assist in standardisation training.
9.3.11Holistic/analytic
Holistic assessment is making a global synthetic judgement. Different aspects are weighted intuitively by the assessor.
Analytic assessment is looking at different aspects separately.
There are two ways in which this distinction can be made: (a) in terms of what is looked for; (b) in terms of how a band, grade or score is arrived at. Systems sometimes combine an analytic approach at one level with a holistic approach at another.
a)What to assess: some approaches assess a global category like ‘speaking’ or ‘interaction’, assigning one score or grade. Others, more analytic, require the assessor to assign separate results to a number of independent aspects of performance. Yet other approaches require the assessor to note a global impression, analyse by different categories and then come to a considered holistic judgement. The advantage of the separate categories of an analytic approach is that they encourage the assessor to observe closely. They provide a metalanguage for negotiation between assessors, and for feedback to learners. The disadvantage is that a wealth of evidence suggests that assessors cannot easily keep the categories separate from a holistic judgement. They also get cognitive overload when presented with more than four or five categories.
b)Calculating the result: some approaches holistically match observed performance to descriptors on a rating scale, whether the scale is holistic (one global scale) or analytic (3–6 categories in a grid). Such approaches involve no arithmetic. Results are reported either as a single number or as a ‘telephone number’ across categories. Other more analytical approaches require giving a certain mark for a number of dif-
190