Norm-Referenced vs. Criterion-Referenced Grading: A Key Window into Our Ed Values

One especially useful way to think about standards-based grading – and its various offshoots (outcomes-based learning, competency-based education, proficiency-based learning [as we call it in Vermont]) is the distinction between criterion-referenced assessments and norm-referenced assessments. It’s also a reveal way to examine into what we value about education.

Norm-referenced tests compare a test-taker to other test-takers. In many cases – the SAT, for instance – the tests are actually designed to establish a range of scores for an effective sorting of student achievement. Earlier in my career, working in a traditional grading system, I often found myself creating assessments in this way: asking challenging questions not because I expected all students to answer all of the questions correctly (or to insist students reassess until they did), but because I wanted to see who had paid attention, studied hard, could score highly. I didn’t expect most students to get certain questions correct; those questions were only on the test to help differentiate top students from others. Obviously I’d have liked all students to do well, but I didn’t expect it. I knew I had a range of student ability levels in my class, and a range of work ethics, and so I expected a good assessment to reflect that reality. So I expected a wide range of scores; anything short of that would have meant that the test was too easy.

Criterion-referenced tests are entirely different. There the focus is on demonstrating skill up to a certain standard; it simply doesn’t matter how one does relative to others. A driver’s test is a good example: the goal is simply to pass, to demonstrate a certain level of competence; beyond that, it doesn’t matter. You either pass, or you don’t. In a sense, it’s “pass or fail.”

That said, it seems important to differentiate between a true criterion-referenced test and a norm-referenced test with a minimum score required to “pass.” The traditional grading system did require a certain level of grade in order to pass – usually a 60%. So in a sense, this was the “standard” or criterion required. But in my understanding, such a system is not truly criterion-referenced for the simple reason that the primary focus on the overall system was a comparison of student achievement. There was, for example, no overt focus on learning the exact amount of material required to receive a 60%. In practice, doing so required only some nebulous combination of work completion, class participation, and occasional flourishes of academic competence on real tests or quizzes. There was no carefully delineated standard – in many schools, attaining a minimum passing grade merely requires some variety of showing up, not goofing off too much in class, and just sort of hanging around until they pass you.

The problem of course is that one purpose of schools is in fact to rank students. Competition for scarce resources beyond high school (jobs, college placements, scholarships) means that high schools will always have to “rank” students in some fashion. Even under the current proficiency system in Vermont, which does its best to downplay norm-referencing (with many schools employing only a 1-4 grading system – in which the vast majority of students score only between 2-4), students and families are still going to be gunning to distance themselves from others by scoring as high as possible on tests and grade point averages.

That said, the benefit of criterion-referenced grading is its clarity around just what it is that students are supposed to be able to know and to do. As I said, under the traditional norm-referenced system, I rarely designed assessments (or units, in fact) with one very clear set of goals and purposes for students, and I rarely saw it as my direct mission to ensure that all students would be able to demonstrate success in attaining these goals or aims. A criterion-referenced system forces teachers always to keep their eyes on our targets, and on student achievement relative to those targets.

The contrast though runs deeper than just the assessment goals; it is focused on basic educational goals. Are we concerned that students are simply learning certain key material, or are we concerning that all students are learning similar amounts relative to each other?

To use the driver’s license metaphor: Are we concerned that all kids are passing their driver’s test, or are we concerned that all kids are scoring well above the passing mark, at similar levels to each other?

In an old post, the writer Freddie deBoer makes an interesting point about this. His view is that we pretend we’re concerned about criterion-referenced achievement, when actually we’re concerned about norm-referenced achievement. He makes this point by quoting an article from the Times, years back, about charter schools:

“What people really care about is relative performance, but they don’t know it, and lack a vocabulary to really understand what they’re asking for. So consider this passage:

Kristen Lewis, one of the directors of Measure of America, said the data revealed, in essence, two separate public school systems operating in the city. There are some great options for the families best equipped to navigate the application process. But there are not enough good choices for everyone, so every year thousands of children, including some very good students, end up in mediocre high schools, or worse.

“The average kid has to be able to get a good education, because most people are average,” Ms. Lewis said. “It’s great that the highfliers are succeeding, and they deserve the chance to succeed. But so do the average kids.”

“But: what then can a “good education” mean, if it’s combined with the idea that most people are average? What can succeeding mean if we concede that most people are average? What people typically mean by a good education is strong relative performance on academic benchmarks. But the average student can only ever reach an average position on those benchmarks, in the basic, Lake Wobegon sense. If a student climbs above others, those others necessarily move down the rankings.”

So deBoer’s point is that when we say we want all students to “get a good education,” we don’t mean that we want that student to learn a set amount of material to a certain level of proficiency. We actually mean that we want those children to be able to score well relative to other students. For deBoer, that’s a mirage we’re chasing – if “those” students we’re focusing on suddenly move up in the rankings, others move down – and then, what about those students? Don’t they deserve a “good education”?

Here deBoer goes even farther to diagnose the real problem, as he sees it. It’s not that we actually want to (or imagine we can) flatten the curve and simply get all students learning to the exact same level, it’s that we want to know that anyone can rise to the top:

“Rather, the implicit endorsement here and elsewhere seems to be of academic mobility, not of academic equality. Just as people implicitly treat the economic system as more just if any individual can rise out of poverty to get a good job, people think that the system is more just if a poor black kid can rise out of a Bronx public school and go to Yale.”

But for deBoer, that is, again, a mirage to be chasing:

“While I would love it if more poor black kids could rise out of Bronx public schools to Yale, those students would simply be leaving most other kids behind. Why should we value the interests of the risers over the mass of students?”

In deBoer’s view, the inherent tension is not necessarily the old division (familiar to any who has read ed philosophy) of excellence versus equality, but instead of something even more specific:

“The fact of the matter is, mobility is necessarily antagonistic to equality. Every student who moves up pushes another one down. These values are in direct tension, and yet no one seems to pause for a moment and really critically evaluate what we’re asking for. If your interest is in promoting equality then you should agitate against mobility, as true mobility will result only in more outliers – both above and below the mean.”

In this sense, it’s not just that we want every student to have a chance at being excellent, but that we want any child, regardless of their origins or circumstances, to have a chance at performing well relative to others. That of course is very different from wanting every student to be able to receive a good education, to learn certain skills and content up to a certain level of proficiency. It’s also very different from wanting every student to identify what he or she is best at or likes most, and then pursue that goal through the educational system to the best of their ability. This is the problem, in a way, with the classic equality-of-opportunity metaphor – that of creating a “level playing field”:

Even if the playing field is level, someone is going to win and someone has to lose.