Popular AI detection tools used by schools nationwide are failing to reliably identify computer-generated work, with new Stanford research showing these systems incorrectly flag up to 97% of non-native English speakers’ legitimate essays as AI-written.
“When I started using a checker, I was using it on everything,” said Dr. Dani Kachorsky, an English teacher at Brophy. “Now that I’ve been doing this for a year, I’ll probably use it between 20% and 25% of the time.”
This selective approach reflects growing concerns about reliability. While companies like Turnitin claim 98% confidence in controlled environments, recent Stanford University research reveals troubling accuracy issues. The study found that while detectors performed well with U.S.-born students’ essays, they incorrectly flagged over 61% of English as a Foreign Language test essays as AI-generated.
“Many of these checkers will get triggered by something like Grammarly usage,” Dr. Kachorsky explains.
Despite these limitations, the tools have helped identify actual cases. In one instance, she caught “between 15 or 18 people using AI” on a summer reading assignment.
Some institutions are taking other steps. Vanderbilt University disabled Turnitin’s AI detector, citing concerns about transparency and student impact. The University of Kansas’s Center for Teaching Excellence suggests detection software should serve as information rather than definitive proof.
“It doesn’t necessarily mean that they’re using it in a way they shouldn’t,” Dr. Kachorsky notes about AI flags. “It just means that if you are using it to do your work and you’re not thinking at all, that’s not good for your learning.”