A combination of human error and hardware failure were responsible for crashing the Keller school district’s computer network in November.
On Nov. 11, school district technicians got an error message that a network server storage disk was running out of room. When staff investigated, they realized that a software feature designed to write to other disks to save space had not been enabled when the system was installed. But after technicians turned the software feature on, the disk being written to failed because of debris from a broken gasket. Instead of correcting the problem, the software overloaded the server and caused six more disks to fail, said Joe Griffin, the district’s chief technology officer.
The result was that the district’s email network and other key websites and online applications crashed, including online lunch payments and the Home Access Center, which tracks students’ grades and attendance.
Most critical applications were restored by Nov. 17, but some features took several days longer before they were brought back.
Griffin and Robyn Garrett, the executive in charge of Dell’s service contract with Keller schools, gave a detailed report of the diagnosis to the Keller school board during a recent meeting. The failed equipment was sent back to the manufacturer to discover the root cause.
Garrett said that the outage was considered most severe with reports going all the way up to Michael Dell, founder and CEO of Dell Computer. A team of more than 60 technicians and engineers from Dell, the Keller school district and several hardware and software vendors worked around the clock to restore service.
To have the combination of the two problems take out the entire system was something engineers with decades of experience had never seen before, Griffin said.
“The odds of that happening, well, I’d rather go to Vegas,” Garrett said.
Having a nightly backup of all the data on the servers off site meant that district officials could still pay employees and gain access to critical student information through SunGard, a disaster recovery service in Pennsylvania.
“No data was breached, and we didn’t lose any data,” Griffin said
The experience showed them the importance of having an outside source to allow the district to complete important business while the system was down, Griffin said.
Griffin said he has been contacted by school districts from around the state to share what needs to be in place in case of a complete system crash.
Sandra Engelland, 817-431-2231