I am teaching a course for public school English teachers in Brasilia and one of the topics addressed is assessment. The aim of this part of the course is to improve teachers’ assessment literacy, allowing them to provide informed feedback on the assessment system used in their institution and develop assessment systems and tools that are in keeping with the most current assessment practices.
While going over the ELT assessment literature and discussing topics such as reliability, validity, washback, practicality, formative versus summative assessment, formal versus informal assessment, and alternative (or authentic) versus traditional assessment, we noticed some common misconceptions. The focus of this post, then, is to discuss these issues that seemed problematic to this group of teachers and that may be problematic to many others.
1- Validity and reliability
It is sometimes easy to confuse these terms. Validity addresses whether the assessment instrument accurately measures the construct it is intended to measure (e.g. communicative competence) and the content it was designed to assess, in a congruent way with how this content is addressed in the classroom. A composition test in a course that doesn’t teach composition skills is not a valid instrument.
I like to use the term alignment rather than validity when referring to classroom assessments. An assessment tool needs to be aligned with the learning objectives established for the course and the instructional strategies used in the classroom. If students practiced verb tenses, for example, by way of fill-in-the-blanks exercises only, the teacher can’t expect them to complete a dialogue with questions, using different verb tenses. The assessment items have to be in keeping with the instructional strategies.
Reliability has to do with consistency of students’ scores. In a program that uses the same test for all students, a reliable instrument is one in which the students would obtain the same result, no matter the teacher, the classroom, or the day of the week and time. If one teacher gives students 30 minutes to take the test, and the one next door gives them one hour, this affects the reliability of the instrument.
Reliability and validity need to be taken very seriously in standardized, high-stakes tests. Classroom assessments also need to be valid and reliable, but in a different way. For example, it would be impossible to reach the same level of interrater reliability for performance assessments in the classroom as an international proficiency exam of speaking or writing needs to have. However, some steps can be taken to have at least minimum reliability, such as the development of assessment rubrics and, if possible, the training of teachers to use the rubrics by way of calibration sessions.
Classroom assessments are usually achievement assessments, so a single instrument will probably not encompass all the elements in a communicative competence model like a proficiency test. It might not even encompass the four skills. It is important, though, that the assessment system include the four skills, with perhaps different instruments for each skill or a combination of two skills.
2- Formative and summative versus formal and informal assessment
It is common for teachers to think that all formative types of assessment are informal and the summative types are formal. Informal assessment is the assessment we conduct every day in our classrooms when we want to confirm if students have learned what we are teaching. We engage in informal assessment of our students on a daily basis, whether we are aware of it or not. Formative assessment is assessment that is done during the learning process and that is used to promote learning, rather than merely measure it. There is indeed an intersection between informal and formative assessment, but this doesn’t mean that formative assessment can only be informal. A piece of writing done in multiple drafts and assessed by way of rubrics, a project carried out in various stages, or even a graded exercise that students have the chance to retake are examples of formal yet formative types of assessment.
3- Performance and traditional versus formative and summative assessment
Traditional assessments usually contain selected-response items (e.g. multiple choice, matching, T or F, etc.) and limited constructed responses. Performance assessments are assessments of student production in real-life situations, such as an oral presentation, a piece of writing, or a project. While it is true that performance assessment lends itself more easily to formative uses, it doesn’t mean that all performance assessments are necessarily formative. For example, if a teacher has students do an oral presentation at the end of the course and grades students’ performance without giving them a chance for improvement, this is a summative type of assessment, though it is not traditional. On the other hand, if the teacher gives a multiple-choice quiz, readdresses the topics that students demonstrated difficulties in, and then reassesses them, this is an example of a traditional tool used formatively.
4- Performance assessment and rubrics
While many of the teachers I’m working with use performance types of assessment in their schools, they do not have a clear set of criteria to assess student work, so they assign grades based on their own, internal criterial. In other words, they do not use an instrument that specifies what will be assessed and the level of performance expected. This is problematic in terms of reliability, for different teachers can value different aspects of performance, such as accuracy, fluency, or content. The same student performance, thus, can be assessed very differently by different teachers if the same performance descriptors are not used.
5- Washback and feedback
Teachers also tend to confuse washback and feedback. Feedback is what we should obtain after we have administered an assessment tool. This feedback can come from students, such as how they felt about the assessment, if it was fair, if it measured the intended content, if the testing conditions were appropriate, etc. We can also obtain feedback from the test itself, by way of test analysis. Conversely, washback is related to how the assessment affects teaching. This is not something that can be easily seen with a single assessment at a specific time, but rather, it is the result of a whole assessment policy. For example, if teachers know that their students will be assessed by way of oral activities, they will be much more likely to focus on oral activities in their classroom than teachers whose students will be assessed only by way of written, multiple-choice tests. If the assessment criteria for these oral activities place much more emphasis on fluency than on accuracy, this will also affect how teachers deal with fluency versus accuracy in their classroom.
These were the issues brought up in our class discussions, based on the assigned readings for the course. Are there other assessment issues that you think are confusing, problematic or controversial?