The SAT will still have an experimental section — but not everyone will be taking it

The Washington Post reported yesterday that the new SAT will in fact continue to include an experimental section. According to James Murphy of the Princeton Review, guest-writing in Valerie Strauss’s column, the change was announced at a meeting for test-center coordinators in Boston on February 4th.

To sum up:

The SAT has traditionally included an extra section — either Reading, Writing, or Math — that is used for research purposes only and is not scored. In the past, every student taking the exam under regular conditions (that is, without extra time) received an exam that included one of these sections. On the new SAT, however, only students not taking the test with writing (essay) will be given versions of the test that include experimental multiple-choice questions, and then only some of those students. The College Board has not made it clear what percentage will take the “experimental” version, nor has it indicated how those students will be selected.

Murphy writes:

In all the public relations the company has done for the new SAT, however, no mention has been made of an experimental section. This omission led test-prep professionals to conclude that the experimental section was dead.

He’s got that right — I certainly assumed the experimental section had been scrapped! And I spend a fair amount of time communicating with people who stay much more in the loop about the College Board’s less publicized wheelings and dealings than I do.

Murphy continues:

The College Board has not been transparent about the inclusion of this section. Even in that one place it mentions experimental questions—the counselors’ guide available for download as a PDF — you need to be familiar with the language of psychometrics to even know that what you’re actually reading is the announcement of experimental questions.

 The SAT will be given in a standard testing room (to students with no testing accommodations) and consist of four components — five if the optional 50-minute Essay is taken — with each component timed separately. The timed portion of the SAT with Essay (excluding breaks) is three hours and 50 minutes. To allow for pretesting, some students taking the SAT with no Essay will take a fifth, 20-minute section. Any section of the SAT may contain both operational and pretest items.

The College Board document defines neither “operational” nor “pretest.” Nor does this paragraph make it clear whether all the experimental questions will appear only on the fifth section, at the start or end of the test, or will be dispersed throughout the exam. During the session, I asked if all the questions on the extra section won’t count and was told they would not. This paragraph is less clear on that issue, since it suggests that experimental (“pretest”) questions can show up on any section.

When The Washington Post asked for clarification on this question, they were sent the counselor’s paragraph, verbatim. Once again, the terminology was not defined and it was not clarified that “pretest” does not mean before the exam, but experimental.

For starters, I was unaware that the term “pretest” could have a second meaning. Even by the College Board’s current standards, that’s pretty brazen (although closer to the norm than not).

Second, I’m not sure how it is possible to have a standardized test that has different versions with different lengths, but one set of scores. (Although students who took the old test with accommodations did not receive an experimental section, they presumably formed a group small enough not to be statistically significant.) In order to ensure that scores are as valid as possible, it would seem reasonable to ensure that, at bare minimum, as many students as possible receive the same version of the test.

As Murphy rightly points out, issues of fatigue and pacing can have a significant effect on students’ scores — a student who takes a longer test will, almost certainly, become more tired and thus more likely to incorrectly answers questions that he or should would otherwise have gotten right.

Second, I’m no expert in statistics, but there would seem to be some problems with this method of data collection. Because the old experimental section was given to nearly all test-takers, any information gleaned from it could be assumed to hold true for the general population of test-takers.

The problem now is not simply that only one group of testers will be given experimental questions, but that the the group given experimental questions and the group not given experimental questions may not be comparable.

If you consider that the colleges requiring the Essay are, for the most part, quite selective, and that students tend not to apply to those schools unless they’re somewhere in the ballpark academically, then it stands to reason that the group sitting for the Essay will be, on the whole, a higher-scoring group than the group not sitting for the Essay.

As a result, the results obtained from the non-Essay group might not apply to test-takers across the board. Let’s say, hypothetically, that test takers in the Essay group are more likely to correctly answer a certain question than are test-takers in the non-Essay group. Because the only data obtained will be from students in the non-Essay group, the number of students answering that question correctly is lower than it would be if the entire group of test-takers were taken into account. 

If the same phenomenon repeats itself for many, or even every, experimental question, and new tests are created based on the data gathered from the two unequal groups, then the entire level of the test will eventually shift down — perhaps further erasing some of the score gap, but also giving a further advantage to the stronger (and likely more privileged) group of students on future tests.

All of this is speculation, of course. It’s possible that the College Board has some way of statistically adjusting for the difference in the two groups (maybe the Hive can help with that!), but even so, you have to wonder… Wouldn’t it just have been better to create a five-part exam and give the same test to everyone? 

Is the College Board playing a rigged game?

Is the College Board playing a rigged game?

A couple of weeks ago, I posted a guest commentary entitled “For What It’s Worth,” detailing the College Board’s attempt not simply to recapture market share from the ACT but to marginalize that company completely. I’m planning to write a fuller response to the post another time; for now, however, I’d like to focus on one point that was lurking between in the original article but that I think could stand to be made more explicit. It’s pretty apparent that the College Board is competing very hard to reestablish its traditional dominance in the testing market – that’s been clear for a while now – but what’s less apparent is that it may not be a fair fight.

I want to take a closer look at the three states that the College Board has managed to wrest away from the ACT: Michigan, Illinois, and Colorado. All three of these states had fairly longstanding relationships with the ACT, and the announcements that they would be switching to the SAT came abruptly and caught a lot of people off guard. Unsurprisingly, they’ve also engendered a fair amount of pushback. (more…)

So now we come to the end (well… sort of)

Here I was, all set for the SAT to take its final bow when, in a remarkable twist, it was announced that hundreds of testing centers would be closed and the January test postponed until Feb. 20th thanks to the blizzard about to descend on the east coast.

Given that it was 60 degrees on Christmas Day in New York City and that this is the first real snowfall of the year, I can’t help but find this to be an bizarrely coincidental turn of events. It would seem that the SAT is not about to go quietly.

That notwithstanding, tomorrow is still the last official SAT test date, and thus I feel obligated to post a few words in tribute to an exam that’s had a disproportionately large impact on my life over these last few years. (Full disclosure: I’m also posting this now because I’ve gone through the trouble of writing this post, and if I wait another month, I might get caught up in something and forget to post it.)  

I’ll do my best not to get all mushy and sentimental. 

From time to time, various students used to ask me hedgingly whether I loved the SAT. It was a reasonable question. After all, who would spend quite so much time tutoring and writing about a test they didn’t really, really like?

I can’t say, however, that I ever loved the SAT in a conventional sense. The test was something I happened to be good at more or less naturally (well, the verbal portion at least), and tutoring it was something I just happened to fall into. I didn’t start out with any particular agenda or viewpoint about the test; it was simply a necessary hurdle to be dealt with on the path to college, and as I saw it, my job was to make that hurdle as straightforward and painless as possible. To be sure, there were aspects of the tests that were genuinely interesting to discuss, and don’t even get me started on the let’s-use-Harry-Potter-examples-to-define-vocabulary-fests, but as I always told my students, “You don’t have to like it — you just have to take it.”

What I will say, though, is something I’ve heard from many tutors as well as from many students (and their parents), namely that after spending a certain amount of time grappling with the SAT, picking it apart and understanding its strengths as well as its shortcomings, you develop a sort of grudging respect for the test. For a lot of students, the SAT is the first truly challenging academic obstacle they’ve faced — the first test they couldn’t ace just by reading the Sparknotes version or programming their calculator with a bunch of formulas. For the students I tutored long-term, there was almost always a moment when it finally sank in: Oh. This test is actually difficult. I’m going to have to really work if I want to improve. And usually they rose to the challenge. 

But the interesting part is that what started out as no more than a nuisance, another hoop to jump through on the way to college, could sometimes turn into a real educational experience — one that left them noticeably more comfortable reading college-level material, whether or not they got all the way to where they wanted to go. And when they did improve, sometimes to levels beyond what their parents had thought them capable of, their sense of accomplishment was enormous. They had fought for those scores. Perhaps I lack imagination, but I just don’t see students having those types of experiences quite as often with the new test. 

That’s a best-case scenario, of course; I think the worst-case scenarios have been sufficiently rehashed elsewhere to make it unnecessary for me to go into all that here. But regardless of what you happen to think of the SAT, there’s a lot to be said for having the experience of wrestling with something just high enough above your level to be genuinely challenging but just close enough to be within reach. 

This test has also led me down roads I never could have foreseen. While I’ve also been primarily interested in the SAT’s role as a cultural flashpoint, in the way it sits right at the crux of a whole host of social and educational issues, it’s also taught me more than I ever could have imagined about what constitutes effective teaching, how the reading process works, and about the gap between high school and college learning. And I’ve met a lot of (mostly) great people because of it, many of whom have become not only colleagues but also friends. I never thought I’d say this, but I owe the SAT a lot. It wasn’t a perfect test, but considered within the  narrow confines of what it could realistically be expected to demonstrate, it did its job pretty well. 

So on that note, I’m going to say something that might sound odd: to those of you taking this last test, consider yourselves lucky. Consider yourselves lucky to have been given the opportunity to take a test that holds you to an actual standard; that gives you a snapshot of the type of vocabulary and reading that genuinely reflect what you’ll encounter in college; that isn’t designed to pander to your ego by twisting the numbers until they’re all but meaningless. 

And if you’ve been granted a reprieve for tomorrow, enjoy the snow day and catch up on your sleep. 

 

 

What is ETS’ role in the new SAT?

Update #2 (1/27/16): Based on the LinkedIn job notification I received yesterday, it seems that ETS will be responsible for overseeing essay grading on the new SAT. That’s actually a move away from Pearson, which has been grading the essays since 2005.  Not sure what to think of this. Maybe that’s the bone the College Board threw to ETS to compensate for having taken the actual test-writing away. Or maybe they’re just trying to distance themselves from Pearson. 

Update: Hardly had I published this post when I discovered recent information indicating that ETS is still playing a consulting role, along with other organizations/individuals, in the creation of the new SAT. I hope to clarify in further posts. Even so, the information below raises a number of significant questions. 

Original post: 

Thanks to Akil Bello over at Bell Curves for finally getting an answer:

Screen Shot 2016-01-19 at 2.48.03 PM

(In case the image is too small for you to read, the College Board’s Aaron Lemon-Strauss states that “with rSAT we manage all writing/form construction in-house. use some contractors for scale, but it’s all managed here now.” You can also view the original Twitter conversation here.) 

Now, some questions:

What is the nature of the College Board’s contract with ETS? 

Who exactly is writing the actual test questions?

Who are these contractors “used for scale,” and what are their qualifications? What percentage counts as “some?”

What effect will this have on the validity of the redesigned exam? (As I learned from Stanford’s Jim Milgram, one of the original Common Core validation committee members, many of the College Board’s most experienced psychometricians have been replaced.) 

Are the education officials who are mandated the redesigned SAT in Connecticut, Michigan, Colorado, Illinois, and New York City aware that the test is no longer being written by ETS? 

Why has this not been reported in the media? I cannot recall a single article, in any outlet, about the rollout of the new test that even alluded to this issue. ETS has been involved in writing the SAT since the 1940s. It is almost impossible to overstate what a radical change this is. 

For what it’s worth (how the College Board stole the state-testing market from the ACT)

For what it’s worth (how the College Board stole the state-testing market from the ACT)

For those of you who haven’t been following the College Board’s recent exploits, the company is in the process of staging a massive, national attempt to recapture market share from the ACT. Traditionally, a number of states, primarily in the Midwest and South, have required the ACT for graduation. Over the past several months, however, several states known for their longstanding relationships with the ACT have abruptly – and unexpectedly – announced that they will be dropping the ACT and mandating the redesigned SAT. The following commentary was sent to me by a West Coast educator who has been closely following these developments.  

For What It’s Worth

On December 4, 2015 a 15-member evaluation committee met in Denver, Colorado to begin the process of awarding a 5-year state testing contract to either the ACT, Inc. or the College Board. After meeting three more times (December 10, 11, and 18th) the evaluation committee awarded the Colorado contract to the College Board on December 21, 2015. The committee’s meetings were not open to the public and the names of the committee members were not known until about two weeks later.

Once the committee’s decision became public, parents complained that it placed an unfair burden on juniors who had been preparing for the ACT. Over 150 school officials responded by sending a protest letter to Interim Education Commissioner Elliott Asp. The letter emphasized the problem faced by juniors and also noted that Colorado would be abandoning a test for which they had 15 years of data for a new test with no data. (more…)

Sleight of hand: an illustration of PSAT score inflation

A couple of posts back, I wrote about a recent Washington Post article in which a tutor named Ned Johnson pointed out that the College Board might be giving students an exaggeratedly rosy picture of their performance on the PSAT by creating two score percentiles: a “user” percentile based on the group of students who actually took the test; and a “national percentile” based on how the student would rank if every 11th (or 10th) grader in the United States took the test — a percentile almost guaranteed to be higher than the national percentile. 

When I read Johnson’s analysis, I assumed that both percentiles would be listed on the score report. But actually, there’s an additional layer of distortion not mentioned in the article. 

I stumbled on it quite by accident. I’d seen a PDF-form PSAT score report, and although I only recalled seeing one set of percentiles listed, I assumed that the other set must be on the report somewhere and that I simply hadn’t noticed them.

A few days ago, however, a longtime reader of this blog was kind enough to offer me access to her son’s PSAT so that I could see the actual test. Since it hasn’t been released in booklet form, the easiest way to give me access was simply to let me log in to her son’s account (it’s amazing what strangers trust me with!).

When I logged in, I did in fact see the two sets of percentiles, with the national, higher percentile of course listed first.  But then I noticed the “download report” button, and something occurred to me. The earlier PDF report I’d seen absolutely did not present the two sets of percentiles as clearly as the online report did — of that I was positive.

So I downloaded a report, and sure enough, only the national percentiles were listed. The user percentile — the ranking based on the group students who actually took the test — was completely absent. I looked over every inch of that report, as well as the earlier report I’d seen, and I could not find the user percentile anywhere.

Unfortunately (well, fortunately for him, unfortunately for me), the student in question had scored extremely well, so the discrepancy between the two percentiles was barely noticeable. For a student with a score 200 points lower, the gap would be more pronounced. Nevertheless, I’m posting the two images here (with permission) to illustrate the difference in how the percentiles are reported on the different reports.

Screen Shot 2016-01-16 at 8.19.42 PMScreen Shot 2016-01-16 at 5.48.09 PM

Somehow I didn’t think the College Board would be quite so brazen in its attempt to mislead students, but apparently I underestimated how dirty they’re willing to play. Giving two percentiles is one thing, but omitting the lower one entirely from the report format that most people will actually pay attention to is really a new low. 

I’ve been hearing tutors comment that they’ve never seen so many students obtain reading scores in the 99th percentile, which apparently extends all the way down to 680/760 for the national percentile, and 700/760 for the user percentile. Well…that’s what happens when a curve is designed to inflate scores. But hey, if it makes students and their parents happy, and boosts market share, that’s all that counts, right? Shareholders must be appeased. 

Incidentally, the “college readiness” benchmark for 11th grade reading is now set at 390. 390. In contrast, the I confess: I tried to figure out what that  corresponds to on the old test, but looking at the concordance chart gave me such a headache that I gave up. (If anyone wants to explain it me, you’re welcome to do so.) At any rate, it’s still shockingly low — the benchmark on the old test was 550 — as well as a whopping 110 points lower than the math benchmark. There’s also an “approaching readiness” category, which further extends the wiggle room. 

A few months back, before any of this had been released, I wrote that the College Board would create a curve to support the desired narrative. If the primary goal was to pave the way for a further set of reforms, then scores would fall; if the primary goal was to recapture market share, then scores would rise. I guess it’s clear now which way they decided to go.