Weโve all felt the creeping suspicion that something weโre reading was written by a large language model โ but itโs remarkably difficult to pin down. For a few months last year, everyone became convinced that specific words like โdelveโ or โunderscoreโ could give models away, but the evidence is thin, and as models have grown more sophisticated, the telltale words have become harder to trace.
But as it turns out, the folks at Wikipedia have gotten pretty good at flagging AI-written prose โ and the groupโs public guide to โSigns of AI writingโ is the best resource Iโve found for nailing down whether your suspicions are warranted. (Credit to the poet Jameson Fitzpatrick, who pointed out the document on X.)
Since 2023, Wikipedia editors have been working to get a handle on AI submissions, a project they call Project AI Cleanup. With millions of edits coming in each day, thereโs plenty of material to draw on, and in classic Wikipedia-editor style, the group has produced a field guide thatโs both detailed and heavy on evidence.
To start with, the guide confirms what we already know: automated tools are basically useless. Instead, the guide focuses on habits and turns of phrase that are rare on Wikipedia but common on the internet at large (and thus, common in the modelโs training data). According to the guide, AI submissions will spend a lot of time emphasizing why a subject is important, usually in generic terms like โa pivotal momentโ or โa broader movement.โ AI models will also spend a lot of time detailing minor media spots to make the subject seem notable โ the kind of thing youโd expect from a personal bio, but not from an independent source.
The guide flags a particularly interesting quirk around tailing clauses with hazy claims of importance. Models will say some event or detail is โemphasizing the significanceโ of something or other, or โreflecting the continued relevanceโ of some general idea. (Grammar nerds will know this as the โpresent participle.โ) Itโs a bit hard to pin down, but once you can recognize it, youโll see it everywhere.
Thereโs also a tendency towards vague marketing language, which is extremely common on the internet. Landscapes are always scenic, views are always breathtaking, and everything is clean and modern. As the editors put it, โit sounds more like the transcript of a TV commercial.โ
The guide is worth reading in full, but I came away very impressed. Before this, I would have said that LLM prose was developing too fast to pin down. But the habits flagged here are deeply embedded in the way AI models are trained and deployed. They can be disguised, but it will be hard to do away with them completely. And if the general public gets more savvy about identifying AI prose, it could have all sorts of interesting consequences.


