Politics & Government

STAAR tests meet state standards but readability of questions unclear, study finds

A sample of statewide STAAR tests mostly align with state standards, but experts found it difficult to assess if individual questions meet reading grade levels, a study released Monday by the University of Texas at Austin found.

A panel of experts from the Meadows Center for Preventing Educational Risk at UT Austin looked at 17 State of Texas Assessments of Academic Readiness, more commonly known as STAAR tests, across subject areas that were used in third through eighth grade in 2019. Mandated by a new state law, the study’s goal was to assess whether the exams are written at an appropriate reading level for each grade and only include content aligned with state standards for that grade or earlier.

The overwhelming majority of exams aligned with Texas Essential Knowledge and Skills content standards, the report found, and within each subject area, the total items that aligned with standards ranged from 93% in social studies to 100% in reading. Aligning with content standards means that students who have mastered the knowledge and skills in the standard’s expectations would be expected to answer the item correctly, according to the report.

However, researchers faced difficulties assessing the readability of test items, citing little prior research in analyzing such small portions of text for that metric. The annual STAAR tests have come under fire in the past for being too difficult and testing above grade level.

The study attempted using a handful of different methods to analyze whether test items met reading grade levels, and found that each time the results shifted substantially for the different metrics: measuring word and sentence length and difficulty, syntax and vocabulary load.

“Because we do not have confidence in these results, we were forced to conclude that analyzing item readability in a reliable manner for this report is not possible,” the report read. “Unless and until additional research provides clear guidance and evidence of a reliable way to evaluate item readability, we cannot recommend conducting analyses of the grade-level readability of test items.”

Many readability formulas require a minimum number of words, such as 150, to produce stable estimates. However, the STAAR test items rarely met this threshold, the study said, ranging from three to 87 words, or an average of 27 words per item.

But the study’s authors stressed that readability is just one component of determining how difficult a test question is, and not the central one.

“Research on accommodations for students with disabilities has shown that reading items to students without disabilities instead of having the students read the items on their own does not affect their test performance,” the study read. “These findings suggest that readability of items is not a significant factor in item difficulty.”

When looking at passages themselves in reading and writing exams, 86% to 97% of passages met the criteria for readability the study defined, or about 30-34 of the 35 passages analyzed.

“TEA is pleased with the report’s findings which show our assessments overwhelmingly align with the Texas Essential Knowledge and Skills (TEKS), with readability appropriate to students’ grade level,” TEA spokesman Jacob Kobersky wrote in an email. “The agency thrives on a culture of continuous improvement and studies like these provide us with additional benchmarks that we can effectively learn from in an effort to better serve Texas students.”

This past legislative session, Sen. Beverly Powell, D-Burleson, had authored a bill that would have required an audit of STAAR tests to ensure they accurately measure student success and proposed eliminating school bonuses based on third-grade test scores. Neither measure passed, and in a statement Monday night Powell stressed that “there are better solutions than an over-reliance on high-stakes standardized testing.”

“Parents, students, and teachers alike know that the STAAR test has flaws. We all know that the success of a student cannot be accurately measured by a single test on a single day,” Powell said. “I am committed to working with public education advocates and leaders at all levels to carefully review the results of this TEA report and other relevant studies, improve the STAAR, and ensure that we accurately measure the success of our students and teachers.”

The study’s findings contrast a recent study by Texas A&M University-Commerce professors published in March that expanded on a previous 2012 study. When applying eight different readability measures to various exams and then calculating the average readability score, the researchers found that the difficulty of STAAR exams were often one to two years ahead of the grade level being assessed.

“Thus, it is believed that many students may be failing the STAAR test because the passages are written above their grade level,” the study read.

Powell said the varying results warrant further scrutiny and study.

“The fact that several studies show varying results is enough cause for concern. I would like to take an in-depth look to make sure that we assess the STAAR tests in a comprehensive way,” Powell said in a statement.

For years STAAR tests have faced pushback from advocates and parents, some who have joined a growing movement of opposing standardized tests and having their kids sit out the mandated assessments. Critics of the STAAR tests have argued they are too difficult and hold too much weight, due to their ability to determine teacher effectiveness, whether a student is promoted a grade or graduates and play a role in whether a school is taken over by the state.

The Texas Education Agency has defended the tests in the past amid worries that they test above grade level, and pointed to Texas’ stagnant reading performance levels as an area that needs work and would help improve test scores.

The study analyzing 2018-19 STAAR tests is the first part of two, with the second portion reviewing 2019-20 tests to be used next year due by Feb. 1. The review was required by House Bill 3, the sweeping school finance bill passed in May that allocated about $6.5 billion more toward public education and $5.1 billion to cut school district taxes.

“If it tells us that there’s issues that need to be addressed, then we’ll address them. If it tells us that the issues are OK, then we’ll still implement the rest of (House Bill) 3906,” Commissioner of Education Mike Morath said of the study at a House Public Education hearing in late October.

Changes to how STAAR tests are implemented are on the horizon thanks to House Bill 3906, which was signed into law and requires a slew of modifications. The changes include administering the tests in parts over the course of multiple days rather than one, administering the tests electronically by the 2022-23 school year, capping the portion of questions in multiple choice format at 75% and more.

While lawmakers did pass a provision that required the study of the tests’ readability, bills that would have taken those measures a step further failed to pass. House Bill 4242 would have not only required a study, but also mandated that tests’ content be at reading level and state standards for each grade, and would have paused student performance from being used as a factor in school closures or in preventing a student from moving up a grade level.

And bills that would have cut down on tests that aren’t federally mandated and essentially eliminated STAAR tests in their entirety didn’t even receive a committee hearing this past session — one of the first steps in the legislative process.

This story was originally published December 2, 2019 at 6:40 PM.

Related Stories from Fort Worth Star-Telegram
Tessa Weinberg
Fort Worth Star-Telegram
Tessa Weinberg was a state government reporter for the Fort Worth Star-Telegram.
Get unlimited digital access
#ReadLocal

Try 1 month for $1

CLAIM OFFER