When scores for the June SAT were released last month, many students found themselves in for a rude surprise. Although their raw scores were higher than on their previous exam(s), their scaled scores were lower, in some cases very significantly so.
An article in The Washington Post recounted the story of Campbell Taylor, who in March scored a 1470—20 points shy of the score he needed to qualify for a scholarship at his top-choice school:
[T]he 17-year-old resolved to take the test again in June and spent the intervening months buried in SAT preparation books and working with tutors. Taylor awoke at 7:30 a.m. Wednesday and checked his latest score online. The results were disappointing: He received a 1400.
He missed one more question overall in June than in March but his score, he said, dropped precipitously. And in the math portion of the exam, he actually missed fewer questions but scored lower: Taylor said he got a 770 in March after missing five math questions but received a 720 in June after missing just three math questions.
A student who contacted me, asking me to call attention to the situation, described something similar:
My personal experience is similar to others, my score dropped by the 90 points that most students are reporting. My June SAT score was a 1390 but with previous scales it should have been a 1480. My score was actually 10 points off from what most colleges that I am planning to apply to are expecting. Another girl I talked to had a June SAT score of 1150 but with the previous scale it should have been a 1240. She was looking to gain more scholarships and aid for the college she was accepted into.
When the student emailed me, she included a breakdown of the number of questions at each level difficulty level on the last few exams, and in comparison to the May test, there were notably fewer hard questions on all three sections of the June test (17 vs. 21 for reading; 3 vs. 9 for writing; and 16 vs. 25 for math).
Now obviously, it is impossible to ensure perfect consistency from exam to exam, and an easier test should have a less forgiving scale. If you’re interested in the nitty-gritty of how scales get tinkered with post-exam, Brian McElroy of McElroy Tutoring has a detailed explanation of the process. But I would also argue that to get too caught up in the minutiae of equating exams pre-test, post-test, etc. is really to miss the point here.
Yes, there is no way to predict with 100% certainty how a particular group of test-takers will fare on a given exam. But that said, the very fact that the College Board somehow ended up with such strikingly different numbers of hard questions on back-to-back administrations suggests that something is very wrong.
To state the obvious, the number of questions at each difficult level should remain more or less consistent from test to test; a student who answers more questions correctly on a retake should not see their score drop by these numbers. 10 or 20 points, fine, 30 maybe, but 50-100 is just too extreme. By definition, a standardized test must be consistent. If it isn’t consistent, it isn’t standardized. These kinds of wild swings simply did not occur before David Coleman took over, a fact that is even more notable when you consider that there were five levels of difficulty rather than just three. That version of the test may have had its problems, but it was calibrated exceedingly carefully and produced remarkably stable results.
Even if you accept that this level of variation is acceptable, there seems to be an additional problem. A student who commented on the WaPo article also made the following point, which interestingly was not mentioned in the article:
There is also the fact that 4 questions were thrown out by CollegeBoard for this test, 2 in reading and 2 in writing. Throwing out 4 questions (marked “unscoreable”) is unheard of. It reeks of a flawed test that was rushed. CB’s response is that students weren’t penalized for those missing 4 questions, but they were. Why? Because they still had to spend time answering them! And if these questions were so flawed that they had to be thrown out, it is not a stretch to believe students spent an inordinate amount of time trying to answer them.
As I recall, the CB also threw out questions on one of the first of the new exams administered. At the time, it could be passed off as a normal part of the transition period, but more than two years in, that excuse doesn’t hold water.
To understand how this type of scaling inconsistency could happen—particularly when nothing comparable occurred prior to 2016—it is important to realize that although ETS is still playing a role in the administration of the SAT, the exam is now being written directly by the College Board for the first time in its history. That was a major shift, and one that never received anywhere near enough scrutiny.
According to sources I spoke with around the time the redesigned exam was introduced, the most experienced College Board psychometricians were left out of the development process for the new test and replaced by weaker hires from the ACT.
And while there is still an experimental section on the new exam, it is no longer universally administered (at least to the best of my knowledge), and the selection process for new questions does seem to have become notably less rigorous. In the past, questions were field-tested for several years with a variety of demographic groups to ensure scoring consistency, but the current fiasco suggests that things are a lot sloppier now.
If you’re a senior already committed to taking the SAT, there is unfortunately little you can do at this point other than remain aware that scoring has the potential to be exceedingly inconsistent, and know that the published scales may not in fact be accurate. If you can stand to do so, you might want to allow for one additional test, in case something unexpected happens when you retake.
It’s possible that the College Board will tread more careful when constructing future tests. But then again, given the inroads the CB has made into the state testing market and in recapturing market share from the ACT, the organization doesn’t have much of an incentive to be careful—huge numbers of students will still be required to take the SAT regardless of its scoring irregularities, and students who sign up for the Saturday test can be dismissed as whiners who don’t properly appreciate the subtleties of the equating process. If things are working well enough, why bother to fix them? Besides, admitting error is not exactly something the College Board is known for, especially these days.
So if you’re just beginning the test-prep process, I would still strongly recommend taking a hard look at the ACT, which remains a far less risky prospect in terms of scoring consistency. This is particularly true if you are aiming for merit scholarships that have a clear cut-off. If your ability to pay for college is on the line, this is not a chance you should take.
I have found that a good, motivated student who is thoroughly tutored can score a 36 on the ACT–with less difficulty in English, but also in reading. The SAT has questions that are sometimes so bizarre, petty, or convoluted, that it is almost impossible to get an 800. I try to pass the buck to their guidance department, but if pressed (at all) I say that in my experience, despite the timing issue, the ACT is a clearer, fairer test of actual hard work put in.I won’t say of ability, because I also tell my students that exactly what is unclear, but it is NOT their intelligence.Maybe perseverance; financial resources; attention to detail! I love tutoring, but wish I felt the way I felt about the “old”–really old SAT.
I am a tutor (to be fair I am a teacher with 8 areas of certification and a principals lic). I agree with this accessment. Last year I tutored multiple students to a perfect act (and there were only 2760 total given out). It is easier to get a 36 due to consistency of content. However, the national merit test is SAT. The best defense is for a student to study sat prior to the psat, then change to ACT, then change back to sat. It provides the broadest skills coverage. The issue of the thrown out questions accounts for a significant skew. If they threw out and did not readjust the curve, most people would have lost 50-75 points on English. The only way this changes is for a major media outlet to investigate. However, this will not help this year’s seniors. Study hard – there are two more ACT tests! You can do it! You are already a top scholar!
I completely agree. First, use Erica’s book! Second, do as many practice tests as possible. The ACT is simply a better test.
Just a note about August 25 admin of SAT at Sleepy Hollow in NY. The mom’s report. First, a long trip for my student, but closest testing center. 45 minute drive, arrived at 7:25 for 8:00 test (instructed to be there by 7:45).Students were told the room was to small to accommodate them. Test began at 10:00. First break was a bathroom break; second was 5 minutes, no eating allowed, had to remain in room. My student had a stress/hunger headache.Had been up since 6:45 minimum. A lot of time and money went into prepping this bright young woman, who really wanted this to be the final verbal score. I can’t wrap my head around how/why this happened. A long test made much longer and an unfair assessment for these students, with not a lot of time left to get this behind them.
Can anyone out there help this tutor? Had an amazingly convoluted and downright stupid conversation with someone at the collegeboard, but got nowhere. My student (a math whiz) did quite well on the english/writing portion of the December 2018 SAT. However, with 5 wrong, he scored a 31 and was understandably disappointed. Is the raw score conversion data available for tests? How do you get from 31 to 40 with only 5 wrong? Color me confused. When I tried to explain this, I was told about ACT concordance tables; old versus new SATs; score report degree of difficulty (which he stated affected the score, throwing me off completely)score verification; QAS. Finally, information can only be given to a student or parent.Others’ experiences?
I am commenting four years later only because I was searching for the date of the June fiasco test. I had one student whose test was totally different from everyone elses test and her scaling was very different. I could never get a straight answer out of the College Board as to why this happened. Is this something the CB does on all tests? When comparing the questions (details…not actually questions), the types of questions were different.
I’m not a fan of the ACT unlike most of the previous comments and your thoughts. The timing is terrible and the math is much more difficult (students that haven’t had Precalculus cannot answer about 20 questions). I have also found (like the SAT), that the scaling is not totally consistent. Nothing compares to the June 2018 SAT exam, but the ACT isn’t any better than the SAT at scaling.