Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Lectures on testing.doc
Скачиваний:
0
Добавлен:
01.07.2025
Размер:
148.99 Кб
Скачать
  1. The main characteristics of tests

How to distinguish a good test from a less good one? And what makes a test good? It is necessary to work out some guidelines by revising test characteristics – criteria for evaluating tests.

A very important feature is practicality. Any test must be not time-consuming in terms of class-hours and our own time outside the class. It should also be practical in terms of physical resources such as tape recorders and photocopiers. And at last its preparation and administration should not demand too much of money.

A test must have some degree of reliability, i.e. it must be consistent and under the same conditions and with the same students it should give similar results. There are several ways for making tests more reliable:

  • Administration: the same test is administered for the same time under the same conditions; provide uniform and non-distracting conditions of administration.

  • Size: the larger the sample is (the more tasks learners have to perform), the greater is the probability the test as a whole is more reliable: accurate information does not come cheaply, the more important the decisions based on a test are, the longer the test should be.

  • Layout and legibility: ensure that tests are well laid out and perfectly legible, tests shouldn’t be badly typed or photocopied or have too much text in too small a space.

  • Instructions: should be clear, concise and explicit. If it is possible on some occasion to misinterpret instructions, some candidates will certainly do that. It’s not always the weakest candidates who are misled by ambiguous instructions, it’s often the better candidate who is able to provide the alternative interpretation. Test writers should not rely on the students’ powers of telepathy to elicit the desired behaviour.

  • Familiarity of tasks: candidates should be familiar with both formats and testing techniques.

  • Scoring: appropriate criteria should be chosen beforehand and students should be informed of them, scoring should be objective (provide a detailed scoring key) and, in case of subjective tests reliability is achieved through standardization (agree acceptable responses and appropriate scores) and training of raters (examiners).

There are various methods for measuring the reliability of the test – most statistical, but the simplest way is test re-test (provided learners have equal treatment in the interval).

It is also very important that our assessment has validity, that we are clear about what we want to assess and that we are assessing that and not something else. For example, if we want to assess listening we must only consider understanding and not assess our students’ ability to read or write or their ability to produce accurate language. Or the following test item has low validity if we wish to measure only writing ability in a General English class “Is photography an art or a science?” – since it demands some knowledge of photography.

There are several types of validity:

  • Face validity: how acceptable a test is to the public (teachers, students, authorities, etc). The test should look right, be convincing.

  • Content validity: how representative of the learners needs and syllabus content the items of the test are, how adequately the expected content has been sampled.

  • Concurrent validity: whether the candidates’ performance on this test is comparable with their performance on other tests, with students’ self-assessment, with teachers’ ratings etc.

  • Predictive validity: whether the test predicts how well the test-taker will perform in future, e.g. at final exams.

Reliability and validity are constantly in conflict: the greater reliability of a test is, the less validity it ahs. E.g. writing down the translation equivalents of 500 words is a reliable but not a valid type of writing. Or real-life tasks like letter-writing have higher validity at the expense of reliability. The best way often is to devise a valid test then establish ways of increasing its reliability. The tester has to balance gains in one against losses in the other.

The last characteristic is backwash (washback) effect – influence of testing on teaching. What and how we test often predetermines what and how we teach or what and how learners study. The influence can be either positive or negative. There is a tendency to test what is easiest to test rather than what is most important to test. Weighting of different abilities which are tested does not correspond to the course objectives. If we claim to teach communicatively, we cannot use tests containing mainly multiple-choice grammar items. Or if we include items based on one unit only instead of the three ones covered, the students would feel cheated. Or cramming for half a year before the exam makes our teaching exam-oriented. Sometimes it is positive, e.g. teachers cannot any longer ignore writing letters in class as it is included into compulsory external independent evaluation.

Ways of achieving beneficial washback effect:

  • Test the abilities whose development you want to encourage;

  • Sample widely and unpredictably;

  • Use direct testing (authentic tasks, the skills we are interested in fostering);

  • Make testing criterion-referenced (scoring of tests results may be norm-referenced or criterion referenced. Norm referencing consists of putting the students in a list or scale depending on the mark they achieved in the test A pass in the test might be decided as the top 60% of students with 40% failing. This is often used in public examinations but is not suitable for classroom testing. Criterion referencing consists of making decisions about what is a pass and what is a fail before the results are obtained. Each candidate’s performance is decided irrespective of the rest of the candidates).

  • Base achievement tests on the objectives of the course.

  • Always go over tests and their results with students (students should realize where they went wrong, what their strong and weak points are, and can think about what they need to do to get better results the next time).

In these ways results from formal tests can feed into learning and give students, as well as the teacher, vital information about both performance and progress.

LECTURE 2

WRITING AND CHOOSING TESTS

  1. Stages of test construction or selection.

  2. What a test consists of. Checklists.

  3. The best-known test-techniques and their analysis.

1. Stages of test construction or selection.

Whenever we start teaching a class we are to plan an assessment task programme, that is, to plan when (what weeks), what (what skills, use of English) and for how long assessment tasks we shall have. If possible, inform students of it beforehand.

An important thing is to decide on weighting between different elements in the course. Your assessment should reflect your teaching, the syllabus you follow. This may seem obvious but it is surprising how often “communicative’ classes have tests which are grammar-based. This has a very negative washback effect on students. They quite naturally come to feel that while speaking and listening are good fun but what really matters is grammar.

Having decided on weighting, we need to establish priorities. We cannot test everything that students have done throughout the course. We must therefore look at our syllabus and choose a sample of areas to assess formally. For example, a class of post-elementary students -4th Form: lexical areas: classroom/animals/homes/food/travel; grammar: revision and introduction of Present Simple/continuous, past simple, present perfect, future: going to, countables; listening: listening for gist and specific information, stories, dialogues, radio programmes; writing: writing letters and postcards about own lives, etc.

Then we are to write specifications, if we are designing a test) or to get acquainted with those specifications which are written for the test we want to choose. It should include:

  • the test purpose (what kind of test is it: progress, achievement, proficiency);

  • description of the test taker (young adult, small children, students of the mathematics department, applicants, etc);

  • test level (difficulty);

  • Construct (theoretical framework for test);

  • Description of suitable language course or textbook;

  • Number of sections (papers);

  • Time for each section;

  • Weighting for each section;

  • Target language situation (what students need to perform the tasks for);

  • Text-types;

  • Text length;

  • Language skills to be tested;

  • Language elements to be tested;

  • Test tasks;

  • Test methods;

  • Rubrics (instructions);

  • Criteria for marking;

  • Descriptions of typical performance at each level.

The next stage in case of test writing is item writing and moderation (checking them with colleagues, senior colleagues, etc). In case of test selection it is item analysis. They recommend pre-testing if possible and analyzing pretest results. Rejecting bad items and creating an item bank.

Then come scoring schemes and their analysis.

Interpreting test results, standardisation (agreement between raters on the meaning and interpretation of criteria used for assessment), setting pass marks.

At last – improving tests, monitoring and revising.

    1. What a test consists of. Checklists.

Each test may include the following parts.

  • Test handbook (especially the so-called internationally recognized tests) – a publication for stakeholders of a test (candidates, teachers)that contains information about the format and the content of a test. Format means test structure, including time allocated to components, weighing assigned to each component, the number of passages presented, and item types (elicitation procedures for oral tests) with examples.

  • Test task – a separate task performed by candidates. It may be of different formats (true/false, multiple choice, short answer response, etc).

  • Item – an individual question in a test that requires the candidate to produce an answer.

  • Rubric – instructions given to candidates to guide their responses to a particular test task.

  • Stem – the stimulus in a multiple-choice task.

  • Options– the answers from which one is to be chosen as the right one.

  • Answer sheet – a special form (blank) for students to fill with the selected numbers of answers, selected options etc.

When analyzing and selecting a test, separate test tasks and items, a teacher can use a checklist to help him or her.

  • Is there more than one possible answer if it is a close-ended test?

  • Is there no correct answer?

  • Is there enough context provided to choose the correct answer?

  • Could a test-wise student guess the answer without reading or listening to the text?

  • Does it test what it says it is going to test? (or does it test something else?)

  • Does it test the ability to do puzzles, or IQ in general, rather than language?

  • Does it test student imagination rather than their linguistic ability?

  • Does it test students’ skills or content knowledge of other academic areas?

  • Does it test general knowledge of the world?

  • Does it test cultural knowledge rather than language?

  • Are the rubrics clear and concise? Is the language in the instructions more difficult than that in the test task, task text or in each item?

  • Will it be time-consuming to mark and difficult to work out scores?

  • Are there any typing errors that make it difficult to do?

During your practical classes you will have a chance to analyse real test taken from published leaflets or books and find out what is wrong with them using the checklist.

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]