Students Assessments Aren’t Delivering on Their Promise for Accountability or Equity

Now is the Time to Reassess our Student Assessment Policies

By Erin Harless, Manager of Research and Learning, NewSchools Venture Fund

The education sector is in the midst of an overdue reckoning about its role in perpetuating systemic racism in this country, from curricula that perpetuates a Eurocentric worldview to an educator workforce that does not reflect the student population. Yet the assessment community, those responsible for creating and analyzing the tools that “reveal” inequity in our public school system, has been mostly absent from the conversation about racial justice. Why? My hunch is that the (flawed) perception of statistics and assessment as objective are inoculating the assessment community from engaging in racial injustice conversations. But assessments are created and interpreted by people, and therefore they are inherently subjective. Moreover, the methodology and framing used to label schools and students as “good” or “bad,” “basic,” “proficient,” or “advanced” have very real implications.

There is a wide range of perspectives about the role that assessment should play in accountability, even among my colleagues here at NewSchools. My hope here is to share my personal lessons and perspective about the issues I believe are inherent in our current accountability system, as well as potential paths forward.

“I was taught that you cannot fix what you don’t measure. But in recent months, my thinking has been challenged by researchers and activists who have brought forward compelling arguments to remake our accountability framework.”

As a white researcher working in a mostly white field, I have always viewed standardized assessment as an equity tool — that by publicly reporting on school and sub-group performance, we can hold schools and policymakers accountable. I was taught that you cannot fix what you don’t measure. But in recent months, my thinking has been challenged by researchers and activists who have brought forward compelling arguments to remake our accountability framework.

End-of-year standardized achievement tests provide little valuable information to teachers, parents, or school leaders. Because of their widespread use and high-stakes nature, standardized tests consist primarily of multiple-choice or short-answer items that are less expensive to score. These item types limit the type of skills that students are asked to demonstrate and limit the quality of feedback or information gleaned to inform future instruction. A study conducted by NWEA in 2016 underscored standardized assessments’ lack of instructional value finding only 37% of surveyed principals found results from state accountability tests to be useful, and 60% of teachers rated their state accountability system as “poor” or “fair,” rather than “good” or “excellent.” And while some might argue that summative assessments do not necessarily need to inform instruction, my response would be, “why not?” It is time to reflect on why we ask educators and students to spend significant energy and instructional time on an assessment that provides no actionable information to support students’ growth.

Ostensibly, the purpose of standardized assessment is to capture whether students have met grade-level standards. However, these assessments are designed in a way that rank orders students, creating an unnecessary zero-sum game. Many standardized assessments used in education measure student achievement on a unidimensional scale. Companies design assessments explicitly to ensure a wide distribution of scores by selecting items that create the greatest spread of scores. As Susan Lyons wrote in her recent article for the Center for Assessment, “The most valued items for estimating [achievement scales] are those that are best at discriminating among examinees… The unidimensional scale is used because it is an excellent tool for doing exactly what it was developed to do, reliably rank-order individuals along a continuum.”

It’s important to note that the statistical tools Lyons describes are accepted practice in the assessment community; in this context, “discrimination” means distinguishing between test-takers rather than suggesting prejudice. And, norm-referenced assessments are useful in specific contexts. For example, assessments that measure student growth, like NWEA’s MAP assessment, compare students to a national norm to contextualize how quickly students progress towards learning goals.

But does the education sector need to sort or rank students to capture whether they have met grade-level standards in a given year? I’m not convinced that we do, because proficiency is not a zero-sum game. In theory, my ability to demonstrate proficiency as a third grader should have no bearing on your ability to do the same. So this design principle appears at odds with the stated purpose of proficiency assessment in public schools.

“The ability to predict student achievement by socioeconomic status raises serious doubt about whether the assessments actually measure student learning.”

Despite questions about their validity, standardized assessments have real, negative consequences for students and schools. Research suggests that standardized achievement measures are better indicators of students’ socioeconomic status than of their school’s ability to provide high-quality instruction. For example, a recent study by NWEA found that half of a school’s achievement can be accounted for by the percentage of low-income students in that school. The ability to predict student achievement by socioeconomic status raises serious doubt about whether the assessments actually measure student learning — what researchers call “construct validity” — or if they measure access to economic resources. Interestingly, NWEA’s study did not find the same association between school-level poverty and growth, suggesting that growth may be a much more accurate and meaningful measure for accountability systems. Another common criticism of standardized assessment is that test designers rely on items that assume background knowledge most often held by higher-income and white students. Infamously, an old version of the SAT used the word “regatta” as the correct answer in a multiple-choice item, privileging students who had access to the world of boat racing.

And yet, standardized assessment results have significant implications for school governance and funding. Schools designated as “failing” can be taken over by the state or even (albeit rarely) closed. And since we know that proficiency results are strongly correlated with poverty, we effectively punish schools that serve populations of mostly low-income students while rewarding schools that serve mostly affluent students. In a system that judges schools primarily by state test scores, standardized assessments may contribute to racial segregation in American public schools. When parents with resources choose a school or district based on their perception of school quality (meaning test scores), they are more likely to select a whiter and more affluent school. High-stakes testing may also incentivize a narrowing of the curriculum. As a result, teachers spend less time on untested subjects like art or history and more time on remedial skills that multiple-choice format can assess. This phenomenon disproportionally disadvantages Black, Latino, and low-income students.

And finally, our relentless focus on achievement and outcomes contributes to a deficit-based discourse that blames historically marginalized students and families for their perceived underperformance, rather than focusing on the drivers of inequity: systemic racism, unequal access to high-quality teachers, and inequitable school funding policies, among many others. In How to Be an Antiracist, Ibram Kendi writes, “The acceptance of an academic-achievement gap is just the latest method of reinforcing the oldest racist idea: Black intellectual inferiority” (p. 101). For this reason, Gloria Ladson-Billings has advocated for a shift from a narrative about the “achievement gap” towards “education debt,” which acknowledges generations of inequitable resource allocation and reorients public policy solutions towards the systemic forces that have produced disparities.


Where do we go from here?
At this moment, I believe the assessment community has both the opportunity and the obligation to reflect on the purpose and consequences of our current policies. School closures last spring meant that states were unable to administer end-of-year standardized tests. Many organizations are now calling for a return to standardized assessment without using results for high-stakes accountability.

“It has become increasingly clear to me that our current accountability framework does little to remedy systemic inequity, and at worst, may be actively harming low-income students and students of color.”

But this forced pause creates an opportunity to deeply reexamine the current paradigm, rather than defaulting back to business as usual. Now more than ever, school leaders and teachers need accurate and actionable assessments to ensure that students are getting relevant and high-quality instruction in a tremendously stressful and uncertain time. And, it has become increasingly clear to me that our current accountability framework does little to remedy systemic inequity, and at worst, may be actively harming low-income students and students of color.

Several promising commitments across the field are giving me hope. The Assessment for Learning Project is a grant-making and field-building initiative aimed at redesigning educational assessments with equity at the forefront. In Massachusetts, a group of districts has created a better and fairer accountability system via the Massachusetts Consortium for Innovative Education Assessment. What I’ve learned from these organizations and other leaders in the field have shaped a few recommendations for how we can collectively change our accountability frameworks for the better:

  • Bring students, families, and teachers into the conversation. The current system is not providing actionable information to the most critical stakeholders. Policymakers and researchers should be leveraging the expertise that teachers, students, and their families bring to develop measures that accurately assess progress and provide useful feedback on growth areas.
  • Anchor more heavily on growth than proficiency. Growth is a more sensitive metric than the blunt instrument of proficiency. It more accurately measures schools’ success serving students who enter far below grade level but make significant progress over the year. Unlike proficiency, measures of growth are not strongly correlated with school-level poverty.
  • Focus on measuring inputs, access, and learning conditions (the “opportunity gap”). We know that the “achievement gap” discourse puts the onus on students to improve rather than focusing on the inequitable distribution of resources that drives disparate outcomes. If we can only fix what we measure, let’s measure opportunity — access to highly qualified teachers, advanced coursework, mental health supports, and equitable school funding (e.g., this 2019 report from EdBuild).
  • Use multiple measures to paint a more holistic picture of student achievement and school quality. No Child Left Behind’s successor, the Every Student Succeeds Act (ESSA), allows for each state to include multiple measures in their school accountability and improvement frameworks. States can take advantage of this provision to ensure that their accountability system provides teachers and school leaders with valuable information that informs instruction, such as performance assessment and student portfolios, student GPA (which is a more predictive metric of first-year success in college than SAT scores), and student attendance. States can also expand their definition of student success to include students’ perceptions of their social-emotional development and the school culture and climate. These student voice metrics are predictive of student achievement and can provide school leaders with important information about students’ experiences in school — like whether they feel physically and emotionally safe at school, and whether they feel valued and respected by teachers.
  • Expand the accountability paradigm. When evaluating schools, researchers and policymakers focus almost exclusively on traditional achievement measures, such as graduation rates, proficiency results, and college matriculation. But how can we expect the system to change without changing the incentives? I have been compelled by the work of Dr. Rochelle Gutierrez, a professor at the University of Illinois, who developed a framework of equity that includes the “dominant axis” (the traditional measures of achievement and access), as well as the “critical axis” (which includes student power and identity development) (Gutierrez, 2009). Through this framework, Gutierrez advocates for a system in which students are encouraged to both “play the game” and to “change the game.”

Together, these recommendations would move the sector towards a more equitable assessment approach that provides valid and actionable information about students and policies and resource allocation. And while these recommendations reflect systems-level changes, I know that we cannot leave all the work to policymakers and politicians. As individuals, we must reflect on received wisdom — the lessons we have been taught, explicitly or implicitly — and let our lessons either confirm or challenge the status quo. No one is immune from this responsibility. I certainly am not, and I will continue to grapple with these questions in my work as an education researcher committed to equitable opportunities for all kids and all communities.