Grades are like sausages
( some comments on course grading)

- Andy Ruina, Cornell University

        Grades are like sausages.
            They cease to inspire respect
                   in proportion to
                        how much you know about
                            how they are made.*

What are grades for?

Grades serve various purposes:

Motivation. You study to learn. But playing the get-a-good-grade game might encourage you to study. Teachers try to make grading schemes which reward activities that they think correlate with learning.

Evaluation. You want to know how much you have learned compared to what you might have learned. You also want a label by which others can judge you. So you came to a school that assigns grades. For example, if one professor gives you some flavor of C instead of D or F, other professors interpret that to mean that you know enough to not drag down the classes you take from them, classes that depend on the material for which you just got that C grade. For whatever purposes, people can judge you as knowing little about a subject if you got a D, minimally competent if you got a C, on top of things with a B, and very good at the subject with an A.

Fairness. Something there is that loves a grade (Robert Frost wrote: Something there is that loves a wall) . We human beings like to have signs showing who has done more or less well. Grades are a reward and they are a punishment. This aspect of grades is not necessarily related to motivation or evaluation, but rather to serve our sense of just rewards.

Tests make people learn. Modern research seems to show that just the act of taking a test on a topic increases long term retention of that topic. Something about the anxiety of trying to nail things down helps cement the ideas.

What's wrong with grades?

Despite the positives above, grading is not all good. Grades make people anxious. Grades cause unpleasant competition. And worst of all at an institution that is supposedly all about learning, grades can distract students from learning. That's what I (Andy) mostly think. I'd rather give all As and tell everyone "You don't have to stay" and then just teach to people who are in it to learn. I'd still give tests, for the 4th reason above, but no grades.

But the peer pressure against this is too big. So big that I haven't even looked into the possible consequences. And I reluctantly accept that I basically have to give grades. And I do my best to serve the four things above, as seen by students and by others.

Grading schemes

If we wanted to grade based on net knowledge of the material, the best scheme would be to make the whole course grade be based only on a rigorous final exam. But we know that many students would then procrastinate. To maximize motivation we might grade based on homework and attendance, and that's all, or just on daily quizzes. But that doesn't show anything direct about knowledge and competence. And if we wanted to reward steady even performance, we wouldn't have any "drop the lowest". If you try to balance all of these things, and try to fix various things that students perceive as unfair (like getting dinged for having one bad day, for whatever reason), you end up with a formula that is as complicated as the US tax code.

One crazy feature of grading at, say, Cornell is that a number between 0 and 100, possibly given with 3 significant digits, is rounded somehow to give a letter grade with, effectively, less than one significant digit. Then that letter grade is averaged to give another number, the GPA, that is reported with to up to 3 `significant' digits. That is, the rounding to make letter grades just adds random variation to the GPAs. On top of that, there is no universal policy for the meaning of an A, B or C.The meaning of a given grade varies from course to course, and from year to year within a course. So, whether you like grades or not, the schemes in force are certainly not designed to maximize accuracy of evaluation.

More informative than a letter grade would be to instead report a student's rank in the class. This would also eliminate all of the suffering about grade cutoffs. But reporting class rank seems to suffer too much from the faults of  `grading on a curve': it's too dry and too explicitly competitive. And reporting rank also has the problem of calculating the equivalent of a GPA: would a rank of 1 in a class of 3 be counted the same as a rank of 1 in a class of 103? Certainly not. But ranking 1/103 higher than 1/3 wouldn't be fair either, because then geniuses in small classes would have lower GPAs than geniuses who took large classes. So, for a given course, a rank and class size might be meaningful, but it is not too meaningful for comparing classes. Of course this could be fixed by reporting a rank and an an error range. At the end, instead of a GPA would be a rank and a measure of its accuracy. I guess it is folly to think, however, that some single number describing a college career, however complicated the formula generating it from however complicated a course-grading scheme, will much-better correlate with average success in law school, medical school, graduate school, or job performance, than any other single number.

Grading on a curve?

What is grading on a curve? Usually in a large course each student's total work for a semester is translated into a single number between, say, 0 and 100. The collection of such numbers for a given class, one number for each student, has some distribution. In some ideal situations, for an infinitely large class, the distribution of grades might lie on a `bell curve' (that is, be a Gausian distribution). Whether or not the distribution for a given class is close to Gausian or not, a professor can calculate an average and standard deviation for the distribution. Then the professor can assign grades according to a rule like this: a mean score is a C, one standard deviation above the mean is a B, etc. Each point on the bell curve is at or above a given grade cutoff. Hence the name 'grading on a curve'. More generally, a professor may look at a ranked list of all student scores and decide on grade cutoffs based on how many A's, B's and C's the professor wants to give. This is also called grading on a curve even though the curve (the distribution) shape is never described with standard functions (Gausians etc).

What is the alternative? Not grading on a curve would be assigning a grade cutoff based on a total number with no accounting for the number of A's, B's and C's. For example, everyone with at total score over 80 would get at least a B. This might be called `straight' grading or `performance based' grading.

Grading on a curve is generally considered bad. If there are only so many A's to be had in a class, students are thus competing for the A's. One person doing well makes it harder for another to get an A. This is the `cut-throat' world of pre-med courses that people are so repulsed by. It rewards a student, if he could get away with it, for sabotaging another student's work. It gives no reward for students helping each other. Bad. Everyone thinks so.

Performance-based grading is almost impossible. However, great as it sounds, to grade really objectively is nearly impossible. In a performance-based scheme, a professor would have to set the cutoffs before the semester started, setting the exam problems before the course also. Further, the professor would have to do nothing in response to mid-semester scores. He or she couldn't adjust the harshness of grading the problems, and could do no adjustment such as teaching to the test, or giving extra help, or adjusting the grade cutoffs year to year, or giving special bonuses, etc.

Imagine a professor who claims to do merit-based `straight' grading. Now imagine that, for reasons unbeknownst to the professor, suddenly one year, all of the students in his class were of the caliber of the former top third. Would that professor then give nothing but A's from that point forward? Or imagine the converse, that suddenly Cornell becomes a 2nd tier school because of some wild rumor circulating amongst the nation's high-school students. Would that professor then give nothing but C's and D's? I don't think so. By some means or another, most (all?) professors rescale their schemes over time to make the distribution of A's, B's and C's what they think is reasonable.

So, somehow or another, no matter what they say, all professors grade on a curve. There is really no practical way around this.

How to get rid of the cut-throat competitiveness? One way is to notice students helping each other, and give reward for this. Another is, despite the facts, to claim that the course is not graded on a curve. For example, I usually say that `the grade I give to the median student will be higher if I feel that the overall class performance is high'. Another is to try to de-emphasize grades in the overall atmosphere of the class. Despite the fact that students want good grades and the professor has to assign a grade in the end, the professor can make it clear that his/her emphasis is on teaching and learning.

 

What does your grade really really mean, in a deep sense?

It means that a bunch of numbers were assigned to you by arbitrary rules and then cooked into an arbitrary formula to make a number which was rounded and converted to a symbol sequence involving the letters A to F possibly followed by a + or -. Its like the joke, which is also true: "What's the definition of IQ?" Answer "It's what IQ tests measure." Your grade is just your grade. We try to make it correlate with what others would call your knowledge. But the correlation is not perfect. Some people who know little about the course content will have a high grade because they happened to memorize the right sample problems. And some people who have deep understandings of the material will have a low grade because they were thinking about something else on the final exam day. And a million other things that make the correlation between grade and deep knowledge (however you want to define it), imperfect. What does your grade really mean? Your grade really means ... really ... deeply ... your grade really really means ... what the first sentence of this paragraph says.

 

Grade cutoffs

"I'm sooo close!" For a typical course there are about 10 cutoffs in about the 40 point range between 60 and 100. That means that about 25 out of 100 students are within one point of that crucial cutoff for that important grade that they really care about. And about 12 are within a half a point. And about 6 within a quarter of a point. Push everyone up a grade who is within a quarter of a point? Then that makes a new lower effective cutoff, and the same story repeats. Please ponder this if you are close to a cutoff. How do most professors deal with this? In a small class they put the cut off at the high end of a gap in grades. In a big class they don't announce the cutoff. Or they kind of lie, they announce a cutoff and then push up everyone who is within two points. So the actual cutoff is hidden and no-one thinks they are close to the cutoff. Despite the deception (white lying?), the pressure on faculty to do one of these things is irresistible. Anyway you cut it, at Cornell, every semester, there are about 10,000 students within 1 point of a grade cutoff for at least one of their courses, 1000 within a tenth of a point and 100 within one hundredth of a point.

 

Why did we bring up sausage? (See the top of this page)

Some people like to eat sausage but don't like to think about what sausage is. If you do think about it, you can figure it out. You don't really have to go into a sausage factory to know that animals are slaughtered there, that all manner of body parts are ground up and mixed together from, perhaps, all manner of animals.

Same with grades and grading. Even if you have not seen the spy-camera video of the professor and TAs talking while they assign grades, you know that all kinds of numbers have to be manipulated this way and that by mere mortals. You know that someone has to decide grade cutoffs by some more-or-less arbitrary rules. You know that every semester at Cornell about 10,000 students are within one point of the cutoff for a grade. These are unpleasant things about grades which you could figure out without being shown explicitly. But, above, we have made all the crudeness explicit.

*The original quote is “Laws, like sausages, cease to inspire respect in proportion as we know how they are made."
                                             - John Godfrey Saxe (1869)