Thursday, June 4, 2026
HomeTechnologyFanfiction writers battle AI, one scrape at a time

Fanfiction writers battle AI, one scrape at a time


In the online world of fanfiction writers, who pen stories inspired by their favorite movies, books, and games, and share them for free, there are unspoken codes of conduct. Among the most important: never charge money for your fanfic, and never steal other peopleโ€™s work.

It makes sense then that fanfic writers were among the first creators to raise the alarm about their work being fed into learning language models powering generative AI without their knowledge or permission. But their efforts to stop the encroachment of AI into fan spaces is an uphill battle.

The latest salvo came in early April, when user nyuuzyou scraped 12.6 million fanfics from the online repository Archive of Our Own (AO3) and uploaded the dataset to Hugging Face, a company that hosts open-source AI models and software.

Nyuuzyouโ€™s upload was quickly discovered by the Reddit community r/AO3, where hundreds of users posted furious reactions. A Tumblr account, ao3scrapesearch, built a search engine that allowed authors to search their usernames and see if their work had been scraped by Nyuuzyou.

โ€œThis is something that takes time and effort and your heart and your soul, and you do this in a community.โ€

Fanfic writers flooded the comment section of the dataset on Hugging Face, getting into arguments with AI defenders. Dckchili defended nyuuzyouโ€™s scrape, claiming that it didnโ€™t matter because Big Tech crawler bots have already scraped the archive numerous times. RaraeAves argued that โ€œthe creepsโ€ are depending on fanfic writers to not fight back when their labor and creativity are being exploited.

When Nikki, a Star Wars fanfic writer who goes by infinitegalaxies online, typed her name in the search engine, she saw that more than 70 of her fics had been scraped. But one jumped out. It was a collective essay sheโ€™d co-authored with 11 other writers to raise awareness about the threat of AI to fandom and uploaded to AO3. The irony did not escape her.

Nikki mostly writes fanfiction about Reylo, the romantic pairing (or โ€œshipโ€) of the characters Rey and Kylo Ren from the Star Wars sequel trilogy. The Reylo fandom is close-knit and prolific, with more than 30,000 Reylo stories posted to AO3. About half are set in the canon Star Wars universe of light sabers and space adventures, but the other half take place in alternative universes and explore everything from coffee-shop romances and workplace dramas to medieval knights and fairy kingdoms. One particularly beloved fic in the fandom is set in 1994 and recasts Kylo Ren as Kyril, a mafia boss in newly post-Soviet Russia. The fandom has produced writers like Ali Hazelwood and Thea Guazon, who have made the leap from fanfic to become highly successful, published romance authors.

For Nikki, the Reylo fandom offered a new sense of belonging. She found a home in the supportive community of writers and readers and relished the freedom to write whatever she wanted.

โ€œFandom is largely a gift economy. Weโ€™re just here to have fun and do things out of the goodness of our heart. And to give things to each other and make work in community,โ€ Nikki says.

This sentiment is echoed by many others in the Reylo community, including Em, who writes under the pen name okapijones. Em fell in love with the characters of Rey and Kylo Ren because they represented the enemies-to-lovers light / dark archetypes that reminded her of Beauty and the Beast and Pride and Prejudice. But she hated the way their story ended in the Star Wars sequel trilogy and went looking for other fans who wanted a different ending.

โ€œFic changed my life. I have met some of the best friends that I have ever had through fic and through the fanfiction community,โ€ Em says. โ€œThereโ€™s no rules, thereโ€™s no editors. Itโ€™s a pure creative playground, and that is going to breed innovation. Some of the most creative stories Iโ€™ve ever read, some of the wildest storytelling, is fanfic. And that excites me as a creator, because you can just do whatever you want.โ€

โ€œThis is something that takes time and effort and your heart and your soul, and you do this in a community,โ€ Nikki says. โ€œAnd then youโ€™re telling me youโ€™re just going to poop it out two seconds on a screen. And I was just like, who asked for this? This is gross.โ€

In 2023 came Sudowriteโ€™s Story Engine, powered in part by OpenAIโ€™s ChatGPT. Nikki remembers watching a video about the new โ€œwriting assistantโ€ AI software that allows users to enter details about characters and plot points and generate an entire novel. She was so appalled that it made her cry. Nikki, who works for a software company, had already seen her workplace shift toward integrating AI. But she hadnโ€™t imagined her hobby would be impacted by it too.

โ€œTrying to knock this stuff down, thatโ€™s probably the best thing that one can be doing now.โ€

Later that year, the prevalence of highly specific sexual terms related to the wolf-biology fanfiction trope of Omegaverse appeared in Sudowrite, revealing that ChatGPT had likely been trained on fanfic without the authorsโ€™ knowledge.

Since then, Nikki and many others have been advocating against AI in all its forms in fandom, including using AI to generate fanfic or fanart.

โ€œItโ€™s theft at its core. Thereโ€™s no ethical use of something thatโ€™s built on stolen labor,โ€ Nikki says. Although sheโ€™s against genAI in principle because of its reliance on data taken without consent, she also says it breaks with fandom norms of free exchange.

โ€œI did it because I love those characters, because I wanted to play in that sandbox, because I wanted people who also love them to read it. It is a gift.โ€ Em says. โ€œThey stole it without my permission.โ€

But over the last few years, fanfic writers say there have been numerous examples of genAI entrepreneurs trying to cash in on their work โ€” such as people like Cliff Weitzman, the CEO of text-to-voice app Speechify, who was found to have scraped thousands of fics from AO3 and uploaded them to WordStream, a website linked to his app, without the authorsโ€™ permission. (He swiftly removed that after fans pushed back on social media.) Then there was Lore.fm, a text-to-speech app from Wishroll Inc, which marketed itself on TikTok as โ€œAudible for AO3.โ€ The app was announced in May 2024 but was withdrawn later that month after fan pushback.

โ€œItโ€™s like a whack-a-mole thing. Every time you turn around, thereโ€™s, like, another grifter trying to steal your shit,โ€ Nikki says.

It may seem odd to hear such a strong sentiment from a writer who, like most fanfic creators, uses copyrighted intellectual property as a โ€œsandboxโ€ to make up their own stories. But advocates for fanworks say they are โ€œtransformative,โ€ meaning a โ€œfanwork creator holds the rights to their own content, just the same as any professional author, artist, or other creator,โ€ according to AO3. This is very different from what a LLM does when, for example, it generates a novel based on prompts. AI canโ€™t replicate the creative human process of โ€œtransformation,โ€ which involves inventing and integrating new ideas. LLMs can only reshuffle and regurgitate content that already exists.

And, unlike the AI-generated books flooding Amazon, one of the principles of fanfiction is that writers do not make any profit from their work.

That hasnโ€™t stopped AI infiltrating fandom in other controversial ways. Some readers, eager to get new updates of their favorite fics, have taken to uploading them into ChatGPT to generate new chapters, much to the consternation of some authors. Some have taken to locking their stories, requiring readers to have an AO3 account to access them or deleting them from the internet altogether.

In the case of nyuuzouโ€™s scrape, fans coordinated online to file take-down notices under the Digital Millennium Copyright Act (DMCA), and the Organization for Transformative Works (OTW), the nonprofit that administers AO3, also filed a takedown. On April 9, Hugging Face disabled the dataset. OTW responded to user concerns about fanfics being scraped in a board meeting on April 26, saying, โ€œWe have added a CloudFlare tool to prevent AI scraping and other bots. This helps a lot but is not perfect. However, more robust solutions would have a significant negative impact on some of our users, especially those using older devices.โ€

Nyuuzou remained unrepentant, filing a counternotice and reuploading the dataset to sites hosted in Russia and China, which are far less responsive to DMCA complaints. Contacted by The Verge via a Telegram account linked on his Hugging Face profile, nyuuzou said he was an 18-year-old student and IT worker in Russia who is โ€œnot interested in fanfictionโ€ and uploaded the dataset for โ€œlegitimate research purposes.โ€

โ€œMy goal was to support community research in areas like content moderation, anti-plagiarism tools, recommendation systems, and archival preservation,โ€ nyuuzou wrote via Telegram. โ€œI think a lot of the disagreement comes from misunderstandings about why these datasets exist. This was never about creating chatbots or large language models for commercial use.โ€

Founded in 2016 by French entrepreneurs, Hugging Face started out building chatbots for teenagers. Since then, the company has expanded to hosting open-source models with the stated aim of โ€œdemocratizing AIโ€ by making machine-learning development accessible to the public.

โ€œOur goal is to enable every company in the world to build their own AI,โ€ Jeff Boudier, Hugging Faceโ€™s head of product, told Amazon Web Services (AWS) in February. But Hugging Face is deeply connected to large companies. In addition to its ongoing collaboration with AWS, IBM invested $235 million in Hugging Face in 2023 and announced it was collaborating with the company on watsonx, IBMโ€™s generative AI platform.

Nyuuzou said he was surprised by OTWโ€™s aggressive reaction to the dataset, writing, โ€œI had hoped for dialogue about how research datasets might align with preservation goals.โ€

โ€œThatโ€™s really disingenuous,โ€ says Alex Hanna, director of research at the Distributed AI Research Institute and author of The AI Con: How to Fight Big Techโ€™s Hype and Create the Future We Want. Sheโ€™s skeptical of the idea that any dataset uploaded to Hugging Face wouldnโ€™t ultimately be used to train LLMs. โ€œWhy would you have a large tranche of unstructured data available on the web if not to train a language model?โ€

Although individual scrapers like nyuuzou are small fry in the wider economy of genAI, which is dominated by billion-dollar companies like OpenAI, Hanna says itโ€™s still up to sites like AO3 to aggressively protect their usersโ€™ work. As for fanfic writers themselves, she thinks Nikkiโ€™s strategy of whack-a-mole is the way to go. โ€œTrying to knock this stuff down, thatโ€™s probably the best thing that one can be doing now,โ€ Hanna says.

Nikki and Em, the fanfic writers, had a more heated response to nyuuzouโ€™s explanation for the scrape.

โ€œFuck you, dude,โ€ Em says. โ€œWe do free labor for the love of the game and are not profiting off of it โ€” other than creating a community, gaining practice for our craft and creating content for characters and stories that we love. And that is being stolen to fuel things that have such larger implications.โ€

Nikki says sheโ€™s determined to keep pushing back against AIโ€™s encroachment into fandom spaces.

โ€œI donโ€™t go looking for a fight,โ€ she says. โ€œBut when people come to us with a fight, I will fight.โ€



Source link

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments

Translate ยป