NYS Psychometric Hocus Pocus Explained

questh-armstrong-roberts-hands-of-magician-performing-magic-trick-pulling-rabbit-out-of-top-hat

The Story of Jane and Johnny

 In 2015 Jane took the 6th grade NYS math test. Out of 72 possible raw points, she earned 45 points or 62% of the available raw points.

According to NYSED’s 2015 Raw Score to Scale Score Conversion Chart, Jane earned a scale score of 316.

NYSED’s Definitions of Performance Levels for the 2015 Grades 3-8 Mathematics Tests tells us that the range of scale scores for a proficient performance (level 3) on the 2015 6th grade math test is 318-339. A scale score of 318 is considered the “cut score” for a level 3 on the 2015 6th grade NYS Math test.

Jane almost made it! She earned 62% of the available raw points. Had Jane earned 46 of the available raw points, or 64%, her raw score would have been converted to a scale score of 318 and she would be considered proficient.

Eleven year old Jane has just been told that she is NOT on track to being career and college ready.

In 2016 Jane’s cousin Johnny took the 6th Grade NYS Math Test. Johnny earned 35 out of 67 possible raw points, or 52% of the available raw points.

According to NYSED’s 2016 Raw Score to Scale Score Conversion Chart, Johnny earned a scale score of 318.

As per NYSED’s Definitions of Performance Levels for the 2016 Grades 3-8 Mathematics Tests, the range of scale scores for a proficient performance (level 3) on the 2016 6th grade math test is 318-339. A scale score of 318 is considered the “cut score” for a level 3 on the 2015 6th grade NYS Math test.

What do you know? The cut score is the same as 2015! Johnny, who only earned 52% of available raw points on his test in 2016 is considered proficient. He is on track to career and college readiness, whatever that means.

How can this be?

Jane earned 62% of the available raw points compared to Johnny’s 52%. You might be wondering if NYS made it easier for Johnny to pass the test than Jane. If you have read the newspaper, you know that the Commissioner of Education has assured us this is not true because the cut scores didn’t change. And they didn’t! (see above) Yet somehow, Johnny was deemed to be proficient while earning fewer raw points on the test than Jane. What gives?

Some people say that Jane and Johnny’s scores can be explained by something called equating.

The idea is that from year to year state tests can be a little different. One year a test might be slightly harder, the next year – slightly easier. The folks who (mis)use these scores want to be able to compare them from year to year. In order to account for these fluctuations in test difficulty, the number of raw points needed to get the same scale score is occasionally tweaked from one year to next. If the test is easier, maybe you need a point or two more to be considered proficient. If the test is harder, maybe you need a point or two fewer to earn the same scale score. Equating is the term used for making these adjustments.

If we look at the difference between the percentages of raw points needed to achieve a scale score of 318 (the cut score for proficiency) for the 6th grade math test in 2016 (52%) versus 2015 (64%), we can see that it dropped twelve percentage points. Is that a lot? Well, let’s go back and look at the raw score to scale score conversions going all the way back to 2013, when Common Core based state tests first began. The percentage of raw points needed to achieve a proficient performance (level 3) typically went up or down no more than 3 or 4 percentage points on any test up until 2016. So yes, a drop of 12 percentage points is unprecedented. And here’s the thing, we are only talking about one test. If we look at all of the 2016 NYS ELA and Math tests in grades 3-8, the percentage of raw points needed to achieve the cut score for proficiency dropped on all but one test.

Eleven out of twelve, or ninety-two percent of the 2016 tests were made easier to pass. Compare this to 2014 when fifty percent of the tests saw an increase in the percentage of raw points needed to earn a proficient performance and the other fifty percent saw a decrease. In 2015, the required percentage of raw points required to earn a proficient performance was lowered on only forty-two percent of the tests. Clearly there was a great deal of equating going on in 2016.

Lower Threshold.jpg

As I mentioned earlier, in order to justify equating the percentage of raw points needed to achieve a given performance level in a given year, there must be a marked change in the difficulty of the test from the year before. By that logic, one could infer that the State had to make it easier to pass the 2016 tests because they were significantly more difficult than the 2015 tests.

But there’s a problem with that argument. Commissioner Elia has made numerous, emphatic statements that the content of the 2015 and 2016 tests were comparable in difficulty (I refuse to use the “r” word). If the tests were comparable, what’s with the tinkering?

There is one definitive way that NYSED could prove that it was necessary to engage in significant equating of this year’s scores. After a test is given, each item on the test is assigned something called a p-value. Put simply, a test item’s p-value is its’ relative difficulty based on the number of students who answered it correctly and incorrectly. Ostensibly, any equating that takes place when a raw score is converted to a scale score is done based on these p-values. Therefore, one can assume that NYSED has the p-values from the 2016 tests. Typically these p-values are part of a technical report that is usually released about 10 months after the state tests are given. While NYSED may not have the technical report ready now, we know that the p-values for ALL test items have been calculated – the tests could not be scored without them. Why not release the p-values for ALL test items immediately and put all speculation to rest?

Sum of Change

At the very least, NYSED owes us an explanation, preferably one that makes sense. Given the high stakes attached to these test scores, it matters whether or not they are reliable and valid. Based on these scores schools can be closed, teachers fired, and children labeled as being on track for future failure.

The NYS Education Department has an admitted history of inflating and manipulating test scores. Knowing the facts, it would be naïve not to question the State’s testing data. Rather than lash out at the parents asking questions, NYSED should offer a well-reasoned explanation and release all relevant test data.

CIRCUS0476_preview