Reuters breaks major story on SAT cheating in Asia

As predicted, the College Board’s decision to bar tutors from the first administration of the new SAT had little effect on the security of the test; questions from the March 5th administration of the new SAT quickly made an appearance on various Chinese websites as well as College Confidential

Reuters has now broken a major story detailing the SAT “cartels” that have sprung up in Asia, as well as the College Board’s inconsistent and lackluster response to what is clearly a serious and widespread problem.

It’s a two-part series, and it clearly takes the College Board to task for allowing the breaches.

As SAT was hit by security breaches, College Board went ahead with tests that had leaked

How Asian test-prep companies swiftly exposed the brand-new SAT

The fact that old SATs (particularly those used for international administrations) are regularly recycled has been the College Board’s dirty not-so-little secret for a while now. Apparently, that practice will continue with the new exam.

What even people who are aware that the College Board recycles tests do typically realize, however, is that the organization does so according a pattern, and thus tutors/companies in the know can often predict which test will be administered in a given location and prepare their students accordingly.

One way of mitigating the problem would of course be to create single-use tests; however, those would be more expensive to produce and would not solve the problem of test-takers in earlier time zones passing questions and answers to test-takers in later ones. 

In addition, SATs test forms have disappeared from the locked boxes in which they were sent (the problem is described in more detail on this College Confidential thread). The problem also lies with the testing centers and proctors themselves. And since students are reconstructing the tests, this is not something that can be solved by barring tutors from the exam. 

Interestingly, this is the first major article I’ve encountered to openly call attention to the the stake that the College Board — and colleges themselves —  has in allowing the cheating to continue. Because international students almost uniformly pay sticker price, they are a major source of revenue for colleges. They also provide a steady stream of STEM students (although the numbers are higher at the graduate level than at the undergraduate level).

Not coincidentally, China sends more students to the United States than any other country; indeed, the number of Chinese students has nearly doubled from about 160,000 in 2010-11 to around 300,000 in 2014-15.

You can also read the College Board’s response, according to which the leaks are the merely fault of a handful of “bad actors.”  

Tellingly, there is not a single mention of the recycled tests. 

Also tellingly, David Coleman failed to comment.

An analysis of problems with PSAT scores, courtesy of Compass Education

Apparently I’m not the only one who has noticed something very odd about PSAT score reports. California-based Compass Education has produced a report analyzing some of the inconsistencies in this year’s scores.

The report raises more questions than it answers, but the findings themselves are very interesting. For anyone who has the time and the inclination, it’s well worth reading.

Some of the highlights include:

  • Test-takers are compared to students who didn’t even take the test and may never take the test.
  • In calculating percentiles, the College Board relied on an undisclosed sample method when it could have relied on scores from students who actually took the exam.
  • 3% of students scored in the 99th percentile.
  • In some parts of the scale, scores were raised as much as 10 percentage points between 2014 and 2015.
  • More sophomores than juniors obtained top scores.
  • Reading/writing benchmarks for both sophomores and juniors have been lowered by over 100 points; at the same time, the elimination of the wrong-answer penalty would permit a student to approach the benchmark while guessing randomly on every single question.

Race to the bottom

Following the first administration of the new SAT, the College Board released a highly unscientific survey comparing 8,089 March 2016 test-takers to 6494 March 2015 test-takers. 

You can read the whole thing here, but in case you don’t care to, here are some highlights:

  • 75% of students said the Reading Test was the same as or easier than they expected.
  • 80% of students said the vocabulary on the test would be useful to them later in life, compared with 55% in March 2015.
  • 59% of students said the Math section tests the skills and knowledge needed for success in college and career.

Leaving aside the absence of some basic pieces of background information that would allow a reader to evaluate just how seriously to take this report (why were different numbers of test-takers surveyed in 2015 vs. 2016? who exactly were these students? how were they chosen for the survey? what were their socio-economic backgrounds? what sorts of high schools did they attend, and what sorts of classes did they take? what sorts of colleges did they intend to apply to? were the two groups demographically comparable? etc., etc.), this is quite a remarkable set of statements.

Think about it: the College Board is essentially bragging — bragging — about how much easier the new SAT is.

Had a survey like this appeared even a decade ago, it most likely would be have been in The Onion. In 2016, however, the line between reality and satire is considerably more porous.

To state the obvious, most high school juniors have not ever taken an actual college class (that is, a class at a selective four-year college), and it is exceedingly unlikely that any of them have ever held a full-time, white collar job. They have no real way of knowing what skills — vocabulary, math, or otherwise — will actually be relevant to their futures.

Given that exceedingly basic reality, the fact that the College Board is touting the survey as being in any way indicative of the test’s value is simultaneously hilarious, pathetic, and absurd.

So, a few things.

First, I’ve said this before, but I’ll reiterate it here: the assertion that the SAT is now “more aligned with what students are learning in school” overlooks the fact that the entire purpose of the test has been altered. The SAT was always intended to be a “predictive” test, one that reflected the skills students would need in college. Unlike the ACT, it was never intended to be aligned with a high school curriculum in the first place.

Given the very significant gap between the skills required to be successful in the average American high school and the skills necessary to be successful at a selective, four-year college or university, there is a valid argument to be made for an admissions test aligned with the latter. But regardless of what one happens to think about the alignment issue, to ignore it is to sidestep what should be a major component of the conversation surrounding the SAT redesign.

Second, the College Board vs. ACT, Inc. competition illustrates the problem of applying the logic of the marketplace to education.

In order to lure customers from a competitor, a company must of course aim to provide those customers with an improved, more pleasurable experience. That principle works very well for a company that manufactures, say, cars, or electronics.

If your customers are students and your product is a test, however, then the principle becomes a bit more problematic.

The goal then becomes to provide students with a test that they will like. (Indeed, if I recall correctly, when the College Board first announced the redesign, the new test was promoted as offering an improved test-taking experience.)

What sort of test is that? 

A simpler test, of course.

A test that inflates scores, or at least percentile rankings.

A more gameable test: one on which it is technically possible to obtain a higher score by filling in the same letter for every single question than by answering any of the questions for real.

A test that makes students feel good about themselves, while strategically avoiding anything that might directly expose gaps in their basic knowledge — gaps that their parents probably don’t know their children possess and whose existence they would most likely be astounded to discover. (Trust me; I’ve seen the looks on their faces.)

Most of the passages on the English portion of the ACT are written around a middle school level, as are the Writing passages on the new SAT. Unlike the ACT, which assigns separate scores to the English and Reading portions, the new SAT takes things a step further and combines Reading and Writing portions into a single Verbal score. As a result, the SAT allows students reading below grade level to hide their weaknesses much more effectively.

Indeed, I’d estimate that most of my ACT students, many of whom switched from the SAT because the reading was simply too difficult, were reading at somewhere between a seventh- and a ninth-grade level. Those students are pretty obviously the ones the College Board had in mind when it redesigned the verbal portion.

Forgive me for sounding like an old fogey from the dark ages of 1999 here, but should a college admissions test really be pandering to these types of students? (Sandra Stotsky, one of two members of the Common Core validation committee to reject the standards, has suggested that the high school Common Standards be applied to middle school students as a benchmark for judging whether they are ready for high school.)

And for colleges, do the benefits of collapsing the distinction between solid-but-not-spectacular readers and the exceptional readers truly outweigh the drawbacks? Those sorts of differences are not always captured by grades; that is exactly what has traditionally made the SAT useful. 

Obviously, the achievement gap is the omnipresent elephant in the room. Part of the problem, however, is the college admissions system poses such vastly different challenges for different types of students; there’s no way for a single test to meet everyone’s needs. 

I’m not denying that for students aiming for elite colleges, the college admissions process can easily spiral out of control. I’ve stood on the front lines of it for a while now, and I’ve seen the havoc it can wreak — although much of the time, that havoc also stems from unrealistic expectations, some of which are driven by rampant grade inflation. An 1100 (1550) SAT was much easier to reconcile with B’s and an occasional C than with straight A’s. 

A big part of the stress, however, is simply a numbers game: there are too many applicants for too few slots at too few highly desirable schools. Changing the test won’t alter that fact. 

If anything, a test that produces more high-scoring applicants will ultimately increase stress levels because yet more students will apply to the most selective colleges, which will in turn rely more heavily on intangible factors. Consequently, their decisions are likely to become even more opaque. 

At the other extreme, the students at the bottom may in fact be lacking basic academic vocabulary such as “analyze” and “synthesize,” in which case it does seem borderline sadistic to test them on words like “redolent” and “obstreperous.”  It’s pretty safe, however, to assume that students in that category will generally not be applying to the most selective colleges. But in changing the SAT so that the bottom students are more likely to do passably well on it, the needs of the top end up getting seriously short shrift. No one would argue that words like “analyze” aren’t relevant to students applying to the Ivy League; the problem is that those students also need to know words like “esoteric” and “jargon” and “euphemism” and “predicated.”

The easiest way to reduce the gap between these two very disparate groups is of course to adjust to the test downward to a lower common denominator while inflating scores. But does anyone seriously think that is a good solution? Lopping off the most challenging part of the test, at least on the verbal side, will not actually improve the skills of the students at the bottom. It also fails to expose the students at the top to the kind of reading they will be expected to do. And even if the formerly ubiquitous flashcards disappear and stress levels temporarily dip, the underlying issues will remain, and in one guise or another they will inevitably resurface. 

I’m not naive enough to think that the SAT redesign will have an earth-shattering effect on most high school students. The students who have great vocabularies and read non-stop for pleasure won’t suddenly stop doing so because a handful of hard words are no longer directly tested on the SAT. The middling ones who were going to forget all of those flashcards they tried to memorize will come out pretty much the same in the end. The ones who never intended to take the test will sit through it in school because they have no choice, but I know of no research to suggest that are more likely to complete a four-year degree as a result. Plenty of students whose parents initially thought Khan Academy could replace Princeton Review will discover that their children need some hand-holding after all and sign them up for a class — especially if all of their friends suddenly seem to be scoring above the 95th percentile. Not to mention the thousands of kids who will ignore the redesign altogether and take the ACT, just as they intended to do in the first place.

Rather, my real concern is about the message that the College Board is sending. Launching a smear campaign to rebrand the type of moderately challenging vocabulary that peppers serious adult writing as “obscure” might have been necessary to win back market share, but it was a cheap and irresponsible move. It promotes the view that a sophisticated vocabulary is something to be sneered at; that simple, everyday words are the only ones worth knowing. Even if that belief is rampant in the culture at large, shouldn’t an organization like the College Board have some obligation to rise above it? It suggests that knowledge acquired through memorization is inherently devoid of value. It misrepresents the type of reading and thinking that college-level work actually involves. It exploits the crassest type of American anti-intellectualism by smarmily wrapping it in a feel-good blanket of social justice. And it promotes the illusion that students can grapple with adult ideas while lacking the vocabulary to either fully comprehend them or to articulate cogent responses of their own. 

What is even more worrisome to me, however, is that the College Board’s assertions about the new test have largely been taken at face value. Virtually no one seems to have bothered to look at an actual recent SAT, or interviewed people who actually teach undergraduates (as opposed to administrators or admissions officers), or even stopped to consider whether the evidence actually supports the claims —  that whole “critical thinking” thing everyone claims to be so fond of. 

And that is a problem that goes far, far beyond the SAT. 

The College Board is kicking tutors out of the March SAT

This just in: earlier today I met with a tutor colleague who told me that  the College Board had sent emails to at least 10 of his New York-area colleagues who were registered for the first administration of the new SAT, informing them that their registration for the March 5th exam had been transferred to the May exam. Not coincidentally, the May test will be released, whereas the March one will not. 

Another tutor had his testing location moved to, get this… Miami. 

I also heard from another tutor in North Carolina whose registration was also transferred to May for “security measures.” Apparently this is a national phenomenon. Incidentally, the email she received gave her no information about why her registration had been cancelled for the March test. She had to call the College Board and wait 45 minutes on hold to get even a semi-straight answer from a representative. Along with releasing test scores on time, customer service is not exactly the College Board’s strong suit. 

I can’t imagine that the College Board is doing this out of concerns that too much tutors taking the test will throw the curve. Proportionally, their number would be so small as to have a statistically negligible effect. 

If the College Board sincerely thinks that they’re promoting equity by delaying tutors’ to the real test for a couple of months, however, they clearly haven’t spent time on College Confidential. Some of the more avid posters will presumably reconstruct the entire test in a matter of hours, then pass the information along to various tutors who manage to stay in the loop (who will in turn pass it along to other tutors, and so on). Not to mention younger tutors, whose registration is less likely to be questioned. Surely the College Board can’t cancel the registration of  every single potential test-taker over the age of 17!

At any rate, an even smaller and more select group will end up having access to the first test than would have access to it if the College Board just allowed all the tutors to take the test — a test that, presumably, will also be recycled as an international or makeup test at some point, perhaps even later this year. (I believe that when the test changed in 2005, again in March, the College Board did release that exam.) 

Besides, I don’t think it’s too much of a stretch to assume that pirated copies of the March 5th test will showing up across Asia by, say, March 6th. Even if it’s not the same test given in the U.S., there’s a pretty good chance it will also be recycled somewhere at some point. 

For a company so enamored of technology, the College Board can be astoundingly naive about its ability to impose test security in the Internet age. (Or perhaps it’s that they’re being deliberately naive, but that is a subject for another post.) 

At any rate, this kind of puts another dent in the College Board’s whole “transparency” thing… 

The SAT will still have an experimental section — but not everyone will be taking it

The Washington Post reported yesterday that the new SAT will in fact continue to include an experimental section. According to James Murphy of the Princeton Review, guest-writing in Valerie Strauss’s column, the change was announced at a meeting for test-center coordinators in Boston on February 4th.

To sum up:

The SAT has traditionally included an extra section — either Reading, Writing, or Math — that is used for research purposes only and is not scored. In the past, every student taking the exam under regular conditions (that is, without extra time) received an exam that included one of these sections. On the new SAT, however, only students not taking the test with writing (essay) will be given versions of the test that include experimental multiple-choice questions, and then only some of those students. The College Board has not made it clear what percentage will take the “experimental” version, nor has it indicated how those students will be selected.

Murphy writes:

In all the public relations the company has done for the new SAT, however, no mention has been made of an experimental section. This omission led test-prep professionals to conclude that the experimental section was dead.

He’s got that right — I certainly assumed the experimental section had been scrapped! And I spend a fair amount of time communicating with people who stay much more in the loop about the College Board’s less publicized wheelings and dealings than I do.

Murphy continues:

The College Board has not been transparent about the inclusion of this section. Even in that one place it mentions experimental questions—the counselors’ guide available for download as a PDF — you need to be familiar with the language of psychometrics to even know that what you’re actually reading is the announcement of experimental questions.

 The SAT will be given in a standard testing room (to students with no testing accommodations) and consist of four components — five if the optional 50-minute Essay is taken — with each component timed separately. The timed portion of the SAT with Essay (excluding breaks) is three hours and 50 minutes. To allow for pretesting, some students taking the SAT with no Essay will take a fifth, 20-minute section. Any section of the SAT may contain both operational and pretest items.

The College Board document defines neither “operational” nor “pretest.” Nor does this paragraph make it clear whether all the experimental questions will appear only on the fifth section, at the start or end of the test, or will be dispersed throughout the exam. During the session, I asked if all the questions on the extra section won’t count and was told they would not. This paragraph is less clear on that issue, since it suggests that experimental (“pretest”) questions can show up on any section.

When The Washington Post asked for clarification on this question, they were sent the counselor’s paragraph, verbatim. Once again, the terminology was not defined and it was not clarified that “pretest” does not mean before the exam, but experimental.

For starters, I was unaware that the term “pretest” could have a second meaning. Even by the College Board’s current standards, that’s pretty brazen (although closer to the norm than not).

Second, I’m not sure how it is possible to have a standardized test that has different versions with different lengths, but one set of scores. (Although students who took the old test with accommodations did not receive an experimental section, they presumably formed a group small enough not to be statistically significant.) In order to ensure that scores are as valid as possible, it would seem reasonable to ensure that, at bare minimum, as many students as possible receive the same version of the test.

As Murphy rightly points out, issues of fatigue and pacing can have a significant effect on students’ scores — a student who takes a longer test will, almost certainly, become more tired and thus more likely to incorrectly answers questions that he or should would otherwise have gotten right.

Second, I’m no expert in statistics, but there would seem to be some problems with this method of data collection. Because the old experimental section was given to nearly all test-takers, any information gleaned from it could be assumed to hold true for the general population of test-takers.

The problem now is not simply that only one group of testers will be given experimental questions, but that the the group given experimental questions and the group not given experimental questions may not be comparable.

If you consider that the colleges requiring the Essay are, for the most part, quite selective, and that students tend not to apply to those schools unless they’re somewhere in the ballpark academically, then it stands to reason that the group sitting for the Essay will be, on the whole, a higher-scoring group than the group not sitting for the Essay.

As a result, the results obtained from the non-Essay group might not apply to test-takers across the board. Let’s say, hypothetically, that test takers in the Essay group are more likely to correctly answer a certain question than are test-takers in the non-Essay group. Because the only data obtained will be from students in the non-Essay group, the number of students answering that question correctly is lower than it would be if the entire group of test-takers were taken into account. 

If the same phenomenon repeats itself for many, or even every, experimental question, and new tests are created based on the data gathered from the two unequal groups, then the entire level of the test will eventually shift down — perhaps further erasing some of the score gap, but also giving a further advantage to the stronger (and likely more privileged) group of students on future tests.

All of this is speculation, of course. It’s possible that the College Board has some way of statistically adjusting for the difference in the two groups (maybe the Hive can help with that!), but even so, you have to wonder… Wouldn’t it just have been better to create a five-part exam and give the same test to everyone? 

Is the College Board playing a rigged game?

A couple of weeks ago, I posted a guest commentary entitled “For What It’s Worth,” detailing the College Board’s attempt not simply to recapture market share from the ACT but to marginalize that company completely. I’m planning to write a fuller response to the post another time; for now, however, I’d like to focus on one point that was lurking between in the original article but that I think could stand to be made more explicit. It’s pretty apparent that the College Board is competing very hard to reestablish its traditional dominance in the testing market – that’s been clear for a while now – but what’s less apparent is that it may not be a fair fight.

I want to take a closer look at the three states that the College Board has managed to wrest away from the ACT: Michigan, Illinois, and Colorado. All three of these states had fairly longstanding relationships with the ACT, and the announcements that they would be switching to the SAT came abruptly and caught a lot of people off guard. Unsurprisingly, they’ve also engendered a fair amount of pushback.

The fact that the College Board consistently underbid the ACT is entirely unsurprising. Unlike ACT, Inc., the College Board receives enormous amounts of revenue from the AP program: over $750 million a year. (In contrast, the ACT takes in only about $300 million per year.) With that kind of money, the College Board can afford to withstand some losses.

When you look at the College Board vs. ACT, Inc. numbers in the state-test bidding market, though, an interesting non-pattern emerges. In fact, the amounts by which the College Board underbid the ACT in Illinois, Colorado, and Michigan varied wildly.

The bids were as follows:

Illinois: SAT – $14.3 million, ACT – $15.67 million; underbid by $1.37 million

Colorado: SAT $14.8 million, ACT – $23.4 million; underbid by $6.6 million

Michigan: SAT – $17.1 million, ACT – $32.5 million; underbid by $15.4 million

To emphasize: the College Board bid only $1.37 million less than ACT, Inc. in Illinois but over $15 million less in Michigan. That’s a pretty huge difference.

Even though the bids were submitted confidentially, the College Board presumably could have easily obtained information about the ACT’s current contracts in those states and tailored its proposals based on those figures.

But why the huge range then? Yes, it seems clear that the ACT is cheaper to administer in Illinois than in Michigan or Colorado, but for the College Board’s Illinois bid to be even remotely consistent with the other two, a figure more in the range of $8-10 million would be expected.

So the question remains: how was the College Board able to succeed with such a close bid in Illinois when a bid significantly lower than the ACT’s was necessary to win in Colorado, and an far lower bid was necessary to win in Michigan?

Thanks to densely populated Chicago and its suburbs, Illinois is a major prize; it seems unlikely that the College Board just didn’t care that much about winning that state.

A far more likely explanation is that David Coleman’s connections run deep in Illinois via the Grow Network, the company that Coleman and his CCSS buddy Jason Zimba founded to analyze test-data for schools and that was later purchased by McGraw Hill; the company worked for the Chicago public schools back when their chief executive was none other than former US Secretary of Education Arne Duncan. Given that background, it strikes me as entirely possible that the College Board was advised that it only needed to provide a token underbid so that the switch to the SAT could be justified on financial grounds.

In contrast, the College Board’s ties to Michigan and Colorado weren’t nearly as entrenched – it was necessary to underbid by more than a token amount in order to wrest those states away from the ACT. Lucky for the College Board, it has hundreds of millions of dollars in revenue from the AP program to cushion the blow.

In Michigan, however, there were serious questions about the fairness of the whole process. As Jed Applerouth of Applerouth Tutoring recently posted on his company’s blog

Everyone was shocked when Michigan, in the heart of ACT territory, dropped its 8-year relationship with the ACT and opted instead to give every junior a redesigned SAT. This decision resulted largely from an ambiguous request for proposal issued by Michigan’s Department of Education. The confusing RFP led the ACT, Inc. to submit a bid which incorporated the added expense of a graded essay per student, while the College Board submitted a bid without this substantial expense. The lower bid won the day. Despite howling protests by the ACT reps, the decision stood.

That seems like a rather large factor to have been overlooked. Why, one might ask, was the Michigan DOE not willing to at least reconsider its decision in light of that information?

In Colorado, things were even hairier. The announcement was made abruptly, right before Christmas break, and caught just about everyone off guard. Writing on Diane Ravitch’s blog, Colorado teacher Michael Mazenko offered this commentary:

In following the story [of Colorado’s switch to the SAT], I am particularly bothered by mention of the decision being made by “a selection committee” that no one I know following the issue has heard of. When Colorado passed HB1323 which required that the junior level test be put out to bid, there was no talk of a committee. Previous coverage and discussion of the subject made no mention of the committee. With no names of members, no one was available for questions and comment beyond CDE’s spokesman. Additionally, I am troubled by the connection to the PARCC test and implication that the decision is an attempt to force Colorado to remain tied to PARCC. Just a couple weeks ago, CDE interim head Eliot Asp and State Board of Education President Steve Durham implied that Colorado would leave PARCC after this spring’s test. Durham noted that a majority on the Board are “opposed to this test.” Yet, shortly after those comments, the state named Rich Crandell – of Arizona and Wyoming – as the sole choice to head CDE. That surprised many in state, for Crandell was instrumental in promoting CCSS and PARCC in Arizona. Prior to this week, most people expected that Colorado would stay with ACT and withdraw from PARCC to replace it with the ACT-Aspire for grades 3-10. Now, everything is up in the air, and schools will scramble to prepare for an entirely new test and system in just three months.

You can also click here to read the letter written jointly by a group of Colorado superintendents to the state’s Board of Education here. It clearly outlines the problems involved in the switch to the SAT, including the loss of longitudinal data, and raises serious questions about whether due process was followed. 

Colorado did agree to retain the ACT for this year’s juniors but will be switching to the SAT next year.

But why the rush? Why the lack of transparency?

Something about this scenario seems awfully familiar: this is basically how Common Core was put together – quickly and in secret, without input from or concern for the people most directly affected by the imposition of the standards, namely students and the teachers who have geared their curriculums toward helping students prepare for the ACT for the last decade-and-a-half.

Lest there be any remaining doubt, for all its talk about “creating opportunity,” the College Board has one purpose: to recapture market dominance and serve its shareholders, whose demands and goals may or may not be aligned with those of any of the other players involved in the testing game. That students will be required to serve as guinea pigs for what is effectively a brand new test of questionable authorship and validity is of no real concern. But it also seems as though the College Board may be receiving some help in these endeavors.