04 October 2016

The SAT and zombies

The New York Times op-ed columnist Paul Krugman often talks about zombie ideas – ideas that are unsupported by any evidence but that continue to linger on in the mainstream, where they are kept alive by Very Serious People who should really know better, but who collectively choose to bury their heads in the sands because it suits their needs to do so.

As far as the SAT is concerned, I would like to nominate two myths in particular for zombie status:

1) Arcane vocabulary

As I’ve pointed out countless times before (hey, someone has to keep saying it), virtually all of the supposedly obscure vocabulary tested on the old SAT was in fact the type of moderately sophisticated and relatively common language found in the New York Times

As I’ve also pointed out before, this is a misconception that could be clearly rectified by anyone willing to simply look at a test, but alas, people who hold very strong convictions are wont to reject or ignore any evidence to the contrary. Call this Exhibit A for confirmation bias. 

2) The guessing penalty

To be clear, there is no automatic correlation between guessing and answering questions correct vs. incorrectly. A student can guess wildly and still get a question right, or answer with absolute certainty and get it wrong. In theory, it was possible to guess on every single question of the old SAT and still receive a perfect score; likewise, a student could conceivably answer every question confidently without getting a single one right.

The quarter-point deduction on the old SAT was designed as a counterbalance to prevent students from receiving scores that did not reflect their knowledge, and to prevent strategic guessers from exploiting the structure of the test to artificially inflate their scores (as can now be done on both the new SAT and the ACT).

More than any other ideas, though, at least one of these two seems to make an appearance in virtually every article discussing the SAT, regardless of how valid the other points are.

For example, in a recent discussion of this year’s slight drop in SAT scores (old test), Nick Anderson of The Washington Post states that “The College Board jettisoned much of the old test’s arcane vocabulary questions, dropped the penalty for guessing and made the essay optional” – a sentence that remarkably contains not just one but two SAT words! 

And in her otherwise excellent Reuters article on the College Board’s failure to ensure that SAT math questions conformed to the test specifications, Renee Dudley makes several references to the “obscure” vocabulary on the old test. Just for grins, I went through her article looking closely at the choice of vocabulary and found nearly a dozen “SAT words” (including some real faves like prescient and succinct).

She also alludes to the fact that “The new test contains no penalty for guessing wrong, and the College Board encourages students to answer every question.” 

As I read Anderson and Dudley’s articles, it occurred to me that the inclusion of these zombie ideas has actually become a sort of rhetorical tic, one that anyone writing about the changes to the SAT is effectively obligated to mention. 

Obviously, these references involve two of the biggest changes to the test and can hardly be avoided, but I think that something more than just that is going on here. 

Consider, for example, what isn’t said: although it is sometimes stated that rSAT math problems are intended to have more of a “real world” basis, the fact that geometry has been almost entirely removed from the exam is almost never explicitly mentioned.

In addition, the kind of disparaging language used to describe SAT vocabulary is notably absent when it comes to math. I have yet to encounter any piece of writing in which geometry was dismissed an “obscure” subject that lacked any relevance to (c’mon, say it with me) “college and career readiness.” Nor does one regularly read  articles sympathetic to students who whine that they’ll never actually use the Pythagorean Theorem for anything outside geometry class.  

Why? Because depicting a STEM subject – any STEM subject – that way would be taboo, given the current climate. Even if the College Board has decided that geometry isn’t one of the “skills that matter most,” the virtual elimination of that subject from the test is a matter that must be pussyfooted around. 

On a related note, the arcane vs. relevant discussion also plays to fears that students will be insufficiently prepared to compete in the 21st century economy. The goal in emphasizing “relevant” vocabulary is to provide reassurance that the students won’t fall behind; that the College Board can now be trusted to ensure they are prepared for the real world. 

At the same time, this is essentially a rhetorical sleight of hand designed to disparage the humanities without appearing too obviously to do so – a euphemism for people who do not know what euphemisms are because, of course, such words have been deemed irrelevant, and why bother to learn things that aren’t relevant?

The unspoken implication is that acquiring a genuinely rich, adult-level vocabulary is not really an important part of education; that it is possible to be prepared for college-level reading equipped with only middle school-level words; and that it is possible to develop “high level critical thinking skills” without having a commensurate level of vocabulary at one’s disposal. In short, that it is possible to be educated without being educated.

That is of course not possible, but it provides a comforting fantasy.

Call this the respectability politics of anti-intellectualism – a way of elevating ignorance to the level of knowledge by painting knowledge not as something overtly bad but as something merely irrelevant. That is a much subtler and more innocuous-sounding construction, and thus a far more insidious one.

As for the “guessing penalty” myth… This phrase is in part designed to reinforce a narrative of victimization. Its goal is to elicit pity for the poor, under-confident students whose scores did not reflect what they knew because they were just too intimidated to bring themselves to pick (C), even if they were almost sure it was the answer. 

Framing things in terms of guesses rather than wrong answers makes it much easier to evoke sympathy for these students. After all, why should anyone – especially a member of an already oppressed group – be punished for guessing?

The conflation of guessing and punishment also helps perpetuate a central American myth about education, namely that more confidence = higher achievement. By that logic, it is assumed that students (sometimes implicitly but often explicitly understood as female, underrepresented minority, and first/generation low-income) would perform better if only they knew they wouldn’t lose additional points for taking a risk. If these students felt more confident, so the argument goes, their scores would improve as well.

In reality, however, there is often an inverse relationship between confidence and knowledge: if anything, the most confident students tend to be ones who least understand what they’re up against. (True story: the only student who ever told me he was going to answer every question right was scoring in the high 300s.) Helping these students feel more confident does nothing to increase their knowledge and can actually cause them to overestimate their abilities. In fact, when students begin to acquire more knowledge and obtain a more realistic understanding of where they actually stand, it is common for their confidence to actually decrease.

The really interesting part about the phrase “guessing penalty,” however, is that it can also be understood in another way – one that directly contradicts the way described above.

An alternate, perhaps more charitable, interpretation of this phrase is that students were formerly penalized for guessing too much. Not realizing that they would lose an extra quarter-point for wrong answers, they would try answer every question, including ones they had no idea how to do, and lose many more points than was necessary.

Understood this way, the term “guessing penalty” refers to the fact that the scoring system made it almost impossible for students to wild-guess their way to a high score. I suspect that this was the original meaning of the term. (As a side note, I can’t help but wonder: when people argued for the elimination of the quarter-point penalty, did they realize that they were actually arguing in favor of making the SAT easier to game?)

According to this view, students who cannot afford tutors or classes to teach them “tricks” about which questions to skip cannot possibly compete with their more privileged peers. Here again, the obvious goal is to frame the issue in terms of equity.

At this point, one might observe a contradiction: students were on one hand are described as being so cowed by the thought of losing ¼ of a point that they could not even bring themselves to guess, and yet they were on occasion also presented as being so oblivious to that penalty that they tried to answer every question. 

But back to the subject at hand.

Another reason I suspect the socio-economic argument against the “guessing penalty” has so much traction is that it would seem to be backed up by commonsense reality.

While plenty of students managed to figure out the benefits of skipping sans coaching, it is also true that a certain type of student could benefit significantly from some help in that department. Given two students with the same level of foundational knowledge, starting scores, and ability to integrate new information, the one with the tutor would typically be at an advantage. That’s pretty hard to dispute.

Whether this particular type of help is inherently more problematic than other types of help – help that more privileged students will continue to receive, quarter-point penalty or no quarter-point penalty – is, however, subject to debate.

Based on my experience, I would actually argue that in fact the quarter-point deduction made the old SAT an overall harder test to tutor than it would have been otherwise, and far less vulnerable to the kind of simple tricks and strategies that mid-range students can, to some extent, use on both the new SAT and the ACT.

The reality is that teaching students to skip questions on the old SAT was not always such a straightforward process; in some cases, it was a downright nightmare. It was only really effective when students had a good sense of which questions they were likely to answer incorrectly – that is, when the only questions they consistently got wrong were the ones they had difficulty answering. Unfortunately, this was usually only the case for about the top 10-15% of students.

In contrast, trying to help a student who was consistently both confident and wrong figure out which and how many questions to skip was often an exercise in futility. Because such students often didn’t know what they didn’t know, and had a corresponding tendency to overestimate their knowledge, there was no clear correlation between how they perceived themselves to be doing and how they were actually doing. This was most problematic on the reading section, where easy and hard questions were intermingled; there was no way to tell them, for example, to focus on the first twenty questions. 

When students’ knowledge was really spotty, it was difficult to determine whether they should even be encouraged to skip more than a few questions on the entire test because there was absolutely no guarantee they’d get enough of the questions they did answer right to save their score from being a complete disaster. And it was also necessary to be careful when discussing which question types to avoid because if students came across one such question phrased in an unfamiliar way, they might not recognize it as something to avoid. 

As a tutor, I came to loathe those situations because they forced me to treat the test as a cheap guessing game, particularly if the students were short term. Eventually, I stopped tutoring people in that situation altogether because things were so hit-or-miss. Often, their scores did not improve at all, and sometimes they even declined.

In addition, some students flat-out refused to even try skipping, regardless of how much I begged/pleaded with/cajoled them. I had students who repeatedly promised me they would try skipping some questions on their next practice test and then answered every question anyway, every time. I never even managed to figure out how many questions, if any, they should skip, and so I couldn’t advise them.

At the opposite extreme, I had students who knew – knew – that they could skip at most one or two questions suddenly freak out on the real test and skip seven.

The point here is that no matter how much tutoring they had received, and no matter how many thousands of dollars their parents had paid, the kids were the ones who ultimately had to self-assess in the moment and make the decisions about what they likely could and could not answer. Sometimes they stuck to the plan, and sometimes they panicked or got distracted by the kid sitting in front of them tapping his pencil and spontaneously threw out everything we’d discussed. No one could do it for them. And if their assessments were inaccurate and they messed up, their score inevitably took a real hit. The limits of tutoring were exposed in a very blatant way.

One last point:

On top of everything I’ve discussed so far, there is also the issue of which groups of students get compared in discussions about equity. When it comes to test-prep, there is a foundational level below which strategy-based tutoring is largely ineffective. If we’re talking about the most profoundly disadvantaged students, then it’s unlikely the kind of classes or tutoring that are generally blamed for the score gap would bring these students up to anywhere remotely close to the range of their middle-class peers.

Yes, certain individual students might draw considerable benefit, but on the whole, the results would probably be at best minuscule. The amount of intervention needed to truly close the gap would be staggering, and it would have to start long before eleventh grade. But that’s a deep systemic issue that goes far beyond the SAT, and thus it’s easier to simply make superficial changes to the test.

I suspect – although I do not have any hard evidence to back this up – that the effects of tutoring are felt most strongly somewhere in the middle: between say, the lower-middle class student and the upper-middle class student who attend similarly good schools, take similar classes, and have similar skills and motivation levels – students who stand to benefit more or less equally from tutoring. If the former cannot even afford to take a class while the latter meets with her $150/hr. private tutor twice a week for six months, there’s a pretty good chance the difference will show up in their scores.

This is of course still a problem, but it’s a somewhat different problem than the one that usually gets discussed.

Moreover, the elimination of the wrong-answer penalty will give privileged mid-range students an even larger advantage. Yes, students who do not have access to coaching can now guess randomly without worrying about losing additional points, but students who do have access to coaching can be taught to guess strategically, filling in entire sections with the same letter to guarantee a certain number of points while spending time on the questions they’re most likely to answer correctly.

This is particularly true on the reading section. Because there are fewer question types, and the passages are not divided up over multiple sections, students on the lower end of average who have modest goals can be more easily taught to identify what to spend time on and what to skip than was the case before.

The result is that the achievement gap is unlikely to disappear anytime soon, regardless of the College Board’s machinations.

2 Responses

  1. In general, I agree wholeheartedly with your thoughts about the College Board and standardized testing in general. I, too, am a tutor for the SAT/ACT and, in fact, use and love your books in the course of my work with students.

    On the guessing penalty here, however, there are a couple of points to make.

    First, eliminating the guessing penalty did not change the scaled scores a jot. The scores are “forced” into a Bell curve with the College Board essentially choosing the center point (smoothing out a lot of statistical concepts that are boring and arcane). It is far more likely that the increase in mean scores we have seen moving from the old SAT to the rSAT is a result of a conscious effort on the College Board’s part to gain market share back from the ACT. The old averages linger in the public perception (witness the whole premise of your article), and the new scores therefore seem better, ergo: students focus on the new SAT rather than the upstart (and current champion by market share) ACT.

    Second, the guessing penalty did, in fact, verifiably impact certain segments of the test-taking population more than others, women in particular. Take a listen to the following podcast (or read the transcript) for a description of experiments with men and women in “guessing penalty” environments and “no guessing penalty” environments. http://freakonomics.com/podcast/gender-barriers/ The persistent gap in SAT achievement can be attributed to this phenomenon. For additional support, consider that on the ACT, where there has never been a guessing penalty, the mean composite scores are the same for the genders.

    As a tutor, I was thrilled to discover that the guessing penalty had been eliminated because it unnecessarily muddied the waters in testing strategies without changing how the raw to scaled scores relationship works.

    Thanks again for all of your informative, thoughtful, and thought-provoking blog posts!

    1. admin

      Hi Susan,

      Thank you for your thought-provoking comment. My goal in the post was primarily to focus on the rhetorical uses of the phrases in question, but yes, I am actually aware of the research on gender and guessing. I’m by no means an expert on the subject, but despite the study you cite, my impression is that the resulting effect on test scores is not entirely clear-cut and may in fact be negligible. In addition, there are a variety of interfering factors such as item difficulty and other skills foregrounded by the multiple-choice format that make the precise relationship difficult between format, gender, and score difficult to tease out. (See: http://www.eajournals.org/wp-content/uploads/Gender-Differences-in-Guessing-Tendencies-in-Mathematics1.pdf, https://www.nite.org.il/files/reports/e215.pdf).

      In terms of the SAT, it seems reasonable to assume that in general, if boys are more likely to guess, they are more likely to guess on all questions – including ones that they get wrong. The quarter-point deduction should therefore serve as a counterbalance to what would otherwise be a guessing *bonus* rather than a guessing penalty. The fact that girls were leaving questions blank on the old SAT should have actually helped them.

      If, however, boys were only guessing on the questions they were pretty certain about but leaving blank the ones they truly did not know how to do, whereas girls were leaving both types blank, then yes, the guessing alone would seem to cause the discrepancy; however, knowing what to answer and what to skip requires some pretty sophisticated self-assessment skills that the vast majority of high school students are unlikely to possess. I could be wrong here, but that just doesn’t jibe with what I’ve observed.

      It is of course possible that higher levels of mathematical knowledge help boys recognize which questions they could safely guess on, but again, that’s the sort of meta-cognitive skill you really only tend to see among the highest achievers. I can see it being true for kids in the 650+ range, but not for the testing pool as a whole. And at any rate, that sort of advantage would still be due to higher levels of mathematical understanding, albeit in a less direct way.

      Re: the Freakonomics interview, I would be interested to know the sample size, as well as the composition of the subject pool. Coffman indicates that she’s at Harvard Business School, so it’s reasonable to ask who exactly were the participants in the experiment. If they were indeed Harvard students, or even students at Boston-area colleges (or even just plain old Cambridge residents) then they were extraordinarily un-representative of the national testing pool. Presumably, both male and female participants actually had a higher-than-average capability of answering the questions correctly, so it would be hard to extrapolate the results to a pool comprising subjects of both genders with a much lower, and more uneven level, of mathematical achievement.


Leave a Reply