Computer grading of student essays is tested at Colleyville Heritage High

Posted Saturday, Jul. 02, 2011 0 comments  Print Reprints
A

Should computer programs be used to grade essays?

Have more to add? News tip? Tell us

Ask Judy Huston how many student essays she's read about To Kill a Mockingbird, and she'll likely say it's in the thousands.

After 15 years in the classroom, the last five years teaching pre-AP English at Colleyville Heritage High School, she's heard it all when it comes to Atticus Finch, his kids, Scout and Jem, and the tragically misunderstood "Boo" Radley.

So Huston, as part of a pilot project, enlisted a computer's help this spring when grading the latest submissions by her students on the happenings in the "tired old town" of Maycomb, Ala.

"I said, 'Let's try this out here at the end of the year and see how it works,'" Huston said.

She found that the computer scores for the essays were "very, very close" to hers.

Some parents and students will say this is just not right, turning over the grading of something that can be so subjective to a computer. Some writing experts agree, calling into question the quality and integrity of computer-scoring programs.

But the Grapevine-Colleyville school district and other districts nationwide are committing themselves to an all-digital learning environment that includes some scoring software.

The essay-scoring program used by Huston aligns with the Grapevine-Colleyville district's dedication to creating a digitally integrated learning environment by 2021, school spokeswoman Megan Overman said.

"Technology is becoming a very integral part of instruction," Overman said. "This is another way we're incorporating it."

Writing by formula

Programs dedicated to computer essay scoring began appearing in the 1980s, primarily in higher education. The analytical writing assessment of the Graduate Management Admissions Test was one of the first large-scale uses. The Project Essay Grader and Writers Workbench were also early tools that enabled educators to score thousands of essays in less time.

The program used in Grapevine-Colleyville was part of what is provided by the Holt McDougal educational publishing company to the district, and it includes the online essay-scoring program as an extra tool for teachers along with new literature textbooks. The program aligns with Texas standards and is touted as saving teachers' time by providing students with practice and feedback before their essays are turned in.

Bianca Olson, a Holt McDougal spokeswoman, explained how the program was developed.

A base set of essays is compiled by using prompts that reflect writing typical of student assignments. Student papers that respond to the prompts are fed into the computer database. The computer scoring engine is then trained to "recognize" student papers that coordinate with the set of prompts.

Two human scorers read all papers submitted as the training set is built, Olson said. "When these scorers do not agree on the rankings, a third scorer provides a resolution score."

The papers and their scores are fed into the program and pegged in a point system to serve as a grading reference.

The Holt McDougal program, and others like it, break down an essay into a formula to assess writing style, organization, grammar, spelling, the length of words and complexity of sentences.

So, for instance, the formula's structure for a good essay would include an attention-grabbing introduction, according to the program's example. ("What would you do if a fiery stream of molten rock -- at least 1,300 degrees Fahrenheit -- was aimed right at your doorstep?")

Then comes the thesis -- that scientists have been fighting back, trying different ways to stop lava flows or change their direction. That would be followed by the body of the essay, with subtopic 1 (the bombing of Mauna Loa in Hawaii in 1935 to reroute the flow) and subtopic 2 (using seawater to cool an Icelandic lava flow in 1972).

Through the computer program, a student can take an assignment and repeatedly submit it to see what kind of changes are needed before the final product is submitted to the teacher. They receive a score from 1 to 4 almost immediately after hitting the "send" button.

"The great thing about it is that the student can resubmit the essay many times to better their score," Huston said. "It gives them immediate feedback; then it's up to them to revise their essays to resubmit them to get a better score."

Creativity stifled?

Critics say the formulas such programs use judge word complexity by average word length and favor verbosity over originality.

That was a criticism by Les Perelman, director of writing at the Massachusetts Institute of Technology, in a recent USA Today article.

Perelman is a critic of all standardized writing tests, human or electronic.

The programs can be fooled because of their overemphasis on grammar, structure and word length, critics say. Good sentences, organization and clear transitions score well even if the essay makes no logical sense.

Such formulas are detrimental to student creativity, according to Bob Schaeffer, public education director for FairTest, the National Center for Fair and Open Testing, based in Boston.

The scoring "doesn't take into account all the different ways something can be written well," he said. "The writing styles of Hemingway and James Joyce would be judged as unacceptable."

Additionally, he said, computers can't judge whether an essay's statements are true or factually accurate.

"We're not opposed to the use of computers in grading," Schaeffer said, "but we need to examine the examiners.

"We are asked to take the claims of these companies at face value," he said. "They don't make their products available for independent review or testing. No government agency reviews to make sure they're accurate or fair, nor is there any independent agency that evaluates the claims the companies make."

Humans still in charge

Computerized essay scoring has become much more prominent in the last five years, Schaeffer said.

"One reason it's growing is the explosion of testing that has occurred on the public school level because of No Child Left Behind and the state accountability tests," he said. "This, combined with the budget crunch, because it is costly to have exams graded by human beings."

The Colleyville Heritage assignment counted as a grade, but only after the teacher got in the last word. Teachers can't use the program solely for the students' grades, Huston said, and it will likely be a long time, if ever, before computer grades would stand alone, especially for essay assignments.

"It gives the teacher a score for each student, but I have to read the essay and see if I agree with the score before it stands," Huston said.

Teachers' groups -- including the Texas State Teachers Association, the American Federation of Teachers and the United Educators Association of North Texas -- say they have issued no policy statements or even heard of any concerned members cautioning about a diminished role for teachers due to technological advances such as the computer scoring of essays.

"As a high school teacher for 13 years, I can't imagine not sitting down with my red pen to go over student essays," said Richard Kouri, assistant executive director of public affairs for TSTA. "On the other hand, when faced with 150 freshman essays, I can understand using it if it helps save time."

Kouri said the organization would defer to the opinions of teachers using the online programs.

Whether computer scoring is even permitted often depends on the test involved, state and local officials say.

"How a school district does their own testing is up to them," said DeEtta Culbertson, spokeswoman for the Texas Education Agency. "AP class testing is done through the College Board, and state TAKS tests are graded by readers."

Pre-AP testing is still largely a local affair, since the pre-AP test isn't directly administered by the College Board and doesn't count toward an SAT score or toward Advanced Placement college credit.

"All state-sanctioned tests, including the AP tests, must be read by people," said Lori Burton, spokeswoman for the Region XI Education Service Center. "If two readers cannot reach a similar score, then a third person reads it."

Did students themselves give passing marks to the program?

"I think they enjoyed the process," Huston said, "except they became a little frustrated with all the revision as they went along."

Ultimately, a very human impulse may act as the failsafe for keeping teachers in charge.

"If it were overused I think the kids would be looking for ways to beat the system," Huston said, "which is why you can't rely on the machine."

Shirley Jinkins, 817-390-7657

Looking for comments?

We welcome your comments on this story, but please be civil. Do not use profanity, hate speech, threats, personal abuse, images, internet links or any device to draw undue attention. Comments deemed inappropriate will be removed and repeated abusers will be banned. NOTE: If you log in using your Twitter account, your comments will be signed using the name on your Twitter profile, NOT your Twitter user name. Read our full comment policy.