Unpacking AI and Wikipedia quality

With gratitude for the collaboration of Amy Pelman (Harker), Robin Gluck (Jewish Community High School of the Bay), Margi Putnam (Burr and Burton Academy), Hillel Gray (Ohio University), Sam Borbas, Cal Phillips, and a special librarian whose name we will not share due to their work.

Thank you so much to Alex for posting to our listserv about an article they read entitled “The Editors Protecting Wikipedia from AI Hoaxes.” Since our Wikipedia editing group meets Wednesday nights on Zoom, we decided to take a look and see if we could come up with a lesson plan for teaching students to understand when they see AI-generated content in Wikipedia. 

We read, experimented, and chatted for a few hours, trying to figure out what would be most helpful. Ultimately, we did not construct a lesson plan, but we have a set of burgeoning ideas and thoughts about approach. We look forward to collaborating with other members of this community to move forward, as needed.

Overall, while we see that some fallacious AI-generated content is making its way into Wikipedia, like it is in so many sources, we do not yet feel there is evidence that it is currently causing particular danger to information quality within Wikipedia.

 A very quick, vastly informal review of literature investigating the quality of Wikipedia content discovers other themes entirely. Overall content quality checking was common in the late 2000s/early 2010s. At that time, most researchers found that Wikipedia tended to be fairly high quality, often higher than the perception of quality by potential users. Over time, the understanding of “quality” and the research on Wikipedia has moved more into questioning the same issues we question in more traditional research sources: identity-related gatekeeping – who is included, who is excluded, how the identities of editors and the creators of source materials cited impacts the completeness of coverage on a given topic. As from early days, articles that get more traffic tend to measure up well if quality checked (e.g, anatomy), meaning that more obscure articles (and, I would argue, less used by students for schoolwork) have a greater chance of maintaining misinformation and errors. One study that looked closely at hoaxes reminded readers that, as of 2016, Wikipedia editors running “new article patrols” meant that 80% of new articles were checked within an hour of posting, and 95% within 24 hours. 

Thus, a significantly larger issue facing Wikipedia today is the substantial fall-off in the number of editors in recent years, which means that page patrolling and other quality-supporting behaviors are also suffering. This is a very real issue. 

On the bright site, there are many more tools that help editors doing quality-sustaining work to figure out where problems lie. I get notified whenever a page I (or my students) have worked on is edited, and when the changes are malicious the vandalism has usually been corrected in the few minutes it takes me to get to the page to check it. While one of the first lines of defense – the “recent changes” page and its sophisticated, bot-driven advanced search – does not yet have a set of choices for suspected AI-created content, I am guessing that we will see that option before too long. Here is how editors can currently filter the list of recent changes, and from the vandalism training I did I observed that the bigger problems tend to be dealt with extremely quickly:

Ultimately, given that genAI content is showing up in so many places, there is no reason to suspect Wikipedia any more than, say, content in many of our databases. In fact, depending on the type of database, articles may have fewer eyes on the lookout for problematic content than does Wikipedia. Certainly, the high-profile Elsivere case and the growing use of AI in our “trusted” news outlets suggested to our editing group that we do not so much need to warn students off of Wikipedia as we need to teach them about the overall changing information landscape and how to work within it.

Here is our brainstorm of potential topics that we might integrate into our teaching that address the increased use of genAI in all sources and in Wikipedia:

Teach about:

– Critical reading of all potential source materials, including – but not limited to – Wikipedia 

-Recognizing AI-reated content

– Identifying what on Wikipedia is “good information,” or learning when to use and not use Wikipedia

– Understanding that AI may be one of several factor that may add level of inaccuracy to Wikipedia, and is one of many factors editors watch out for with regularity

– Teaching about ethics of academic honesty

– Teaching about ethics of AI

– Teaching about AI and academic honesty

AI more generally:

– How do we recognize AI content?

– Google search has AI generate responses queries; does that make Wikipedia less relevant in our students’ information lives?

Wikipedia:

– Are there patterns on Wikipedia that are repeated with AI-generated content?

Wikipedia:WikiProject AI Cleanup/AI Catchphrases is a wonderful source that records a number of phrases that may appear in AI-generated content, as does the Wikipedia:WikiProject AI Cleanup main page

Category:Articles containing suspected AI-generated texts – Wikipedia

-There have been instances where text has appeared on Wikipedia pages that even our group members who have almost no knowledge of generative AI recognized immediately, such as the (long-ago fixed) page on IChing:

that even gives itself away quite explicitly:

– What are positive uses of AI on Wikipedia? (examples: helping with grammar, helping with sources, flagging possible vandalism)

– Look, as a class, at Wikipedia:WikiProject AI Cleanup and follow links to read and discuss the 

various impacts of AI on Wikipedia, and possibly extend that learning to other types of sources as well

– How do Wikipedia reviewers recognize vandalism?

– How quickly is Wikipedia “cleaned up” after an issue is flagged?

– How quickly is AI “cleaned up”?

– Look at recent changes page

– What are Wikipedia’s rules regarding AI-generated content?

– Does AI-created content violate the “No original research” rule? (based on Village Pump article)

So, we apologize that this is kind of a quick-and-dirty set of thoughts without many clear answers. Once more, however, we were all in agreement: Wikipedia appears no more riddled with AI-generated disinformation that other types of information, so learning to assess the quality of whatever you are reading is key.

3 thoughts on “Unpacking AI and Wikipedia quality

  1. This is a really cool read, Tasha; thank you! We’re seeing our 8th & 9th graders next week for research and I’m excited to share your savvy with them. We teach Wikipedia as a research tool, but just like with knives and cheese graters, you gotta know how to use it. 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *