top of page

To many in the fields of Education, Rhetoric and Composition or Writing Studies, and psychometrics, writing assessment is a contentious topic. As Behizadeh and Engelhard and Yancey ("Looking Back") demonstrate, writing studies scholars and psychometricians have taken strikingly different approaches to writing assessment, with writing studies scholars typically expressing most concern for the validity of the assessment, and psychometricians placing most emphasis on reliability, especially inter-rater reliability. These different foci have ultimately shaped what we know and think we can do with writing assessment: as Neal and Huot show us, the answers we are able are determined by the questions that we ask. Thus, framing problems of writing assessment as inter-rater reliability problems leads researchers to focus solely on improving inter-rater reliability. Alternatively, framing problems of writing assessment as striking divergences between what is taught in the classroom and what practices are used to place students into the classrooms or determine their readiness to exit these classrooms enables researchers to consider a number of issues: how can we improve the validity of writing assessments? what relationships exist between reliability and validity? should we assess authentic student texts instead of using multiple choice tests on usage and punctuation or timed-essay exams? what types of writing should we be assessing? how many pieces of writing should we be assessing? will these assessments change from context to context? Though answers to these questions certainly vary, having conversations about these points is an important part of the assessment process.

 

Many Writing Studies scholars agree that locally-designed assessments are best because they necessarily take into consideration the local context (e.g. Yancey "Looking Back," Neal, Huot, Broad, Whithaus, Wardle and Roozen, Hamp-Lyons and Condon, Anson et al.). Teachers can come together to discuss their values and beliefs, their classroom practices and experiences, and student work as a way of developing a set of shared outcomes for a course or program (see, especially, Broad on Dynamic Criteria Mapping). Most importantly, the outcomes and assessment practices should focus on student learning as the primary goal, saving the data-mining process  for "the back end" (Neal 84). Assessment practices also need to be assessed and revised in response to local changes, new understandings of best practices from the field, and data on how the assessment affects multiple populations of students. Inherent in this approach to portfolio assessment is the understanding that Automated Essay Scoring (AES), indirect assessments of writing, holistic scoring of timed essay tests, and standardized high-stakes tests operate with invalid constructs of writing for a number of reasons:

 

1. Some of these assessments are multiple-choice, and therefore do not measure actual samples of student work.
2. The assessments that do examine samples of student writing collect those samples in abnormal contexts. In other words, students do not often write five paragraph essays within a limited time-span to be "read" by computers or highly-calibrated readers (see Condon).
3. Many of these tests are actually measurements of students' abilities to take a test.
4. High-stakes standardized assessments are often racially skewed, favoring white and middle- and upper-class students over other populations (see Condon's introduction to Race and Writing Assessment).


Locally-designed direct assessments of authentic student writing--that is, writing that develops naturally in the classroom over the term--come closer to operating with the "sociocultural/contextualist construct dominant in contemporary writing research" (Dryer 24). For a more focused discussion of classroom assessment, response, and grading, see "Theory of Response and Grading." For an extended discussion of program assessment, see "Theory of Program Assessment."

Locally-Designed Direct Assessments

Glossary

Psychometricans: educational testing specialists who focus on quantitative measurement of identifiable elements of writing.

 

Validity: though definitions of validity have been changing for the past decade, many Writing Studies scholars define validity as a measurement of "accuracy" and "appropriateness." Different types of validity exist, though these are often gathered under the heading of "unified construct validity." See Neal's account of "Validity and Reliability" for more information.

 

Reliability: consistently measuring what you intend to measure (Yancey, "Looking Back" 135). Cherry and Meyer see reliability as "a necessary but not a sufficient condition for validity" (30).

 

Inter-rater Reliability: the consistency with which different readers give the same scores to the same papers (Huot and Neal).

 

Holistic Scoring: a method of assessment focusing on "student writing as a unity without sub-scores or separable aspects" (White 19).

bottom of page