thisaintbc: (Default)
I heard about this on tumblr, and went digging a little to see what I could verify. Here's what I've got.

Earlier this year, someone scraped AO3 for a dataset, seemingly intended for AI training. Data scrapes aren't new, but this one was large and the person who did it has been sort of flippant about takedown requests. AO3 wasn't alone; several other sites were also scraped.

This lays out the progression of events pretty comprehensively, but the tl;dr version is that the dataset was uploaded to HuggingFace, then a few other websites. At some point, the scraper created their own website to host as well. After receiving DMCAs, most sites have taken down or deleted the data. However, the scraper filed a counter-notice.

I reached out to the OTW (my full message is below the cut at the end of the post) asking if they can let me know if they've pursued legal action against data scraping before & whether they intend to this time. I'll post an update when I get a response.

The takeaways: 
  • The dataset is currently down in most locations where we can realistically expect it to be taken down. PaperDemon advises against visiting the scraper's personal website and I'm inclined to agree; this person has already proven themselves to not adhere to the same ethical values we hold. It also was apparently still up on a site called datafish as of 6-7 hours ago; I'm not sure if that's the personal website in question or something else, although frankly I haven't spent much time trying to look into that piece. You probably don't need to worry about filing DMCAs at this point.
  • That said, the dataset might go back up. At 10:29 GMT on April 11, the scraper commented that they had filed a counter-notice. From the date of receipt of the counter-notice, the copyright holder usually has 10-14 business days to file a lawsuit. If no legal action is taken, the data can go back up. Today is the 11th business day since the counter-notice.
  • HuggingFace's post says the DMCA was from AO3 (actually, "from the representatives of Transformative Works"), so presumably the counter-notice was also sent to AO3. It's possible that they're referring to AO3 when they actually mean an individual, but if we take their statements at face value that means the ball's in the OTW's court.
  • Things are a little heated on HuggingFace. AO3 users are using strong language, and in response some of the people on HuggingFace have started talking about things like torrents, private websites, etc. 
  • So far, the OTW has been silent. That isn't hugely unusual; this is far from the first data-scraping incident, and a lot of this stuff gets handled behind the scenes. But this incident has attracted significant attention on at least tumblr, so I was hoping they would have some kind of statement.
  • Not all AO3 data scraping is unethical. There are plenty of people who scrape AO3 for legitimate fannish purposes, from gathering fandom stats to Auto AO3 (that website people use to look up gift exchange requests). If in some future world we decide to Do Something about data scraping, I think it's important to bear in mind the distinction between ethical vs unethical data scraping. The scraper in this incident has said that they aren't doing this for profit, but to me this absolutely feels like the same thing as a web novel website copying things from AO3 to try to sell.
This incident also highlights for me something I've been saying in private for a while: We need someone doing some kind of AO3 news reporting. The Daily Dot was sort of doing this for a while, but their fandom reporting seems to have fallen off - and they really only focused on major issues and interesting features. What we need is something more comparable to local journalism, that helps us understand the AO3 equivalents of water rates and pot hole repairs.

Right now, it feels like many of us rely on word of mouth and the occasional OTW press release for information. I'm not criticizing OTW's press releases, but like any organization they tend to be slow and careful in putting things out, only wanting to write after things are resolved and they can provide solid answers. But sometimes we need to know there are questions to be asked.

Maybe somebody's already doing this and I'm just unaware of it (lbr, if it exists it's probably here on dreamwidth), so if you know of something like this please drop it in the comments.

Here's what I sent OTW:



I'll let you know what kind of response I get.

kerfuffle

Jul. 30th, 2023 01:40 pm
thisaintbc: (Default)
 
Most of you are probably already aware of the recent kerfuffle about AO3's treatment of certain volunteers, the resignation of three board members, and the ongoing election. If you aren't, this journal does a decent job of summarizing things. In reading up on these events and trying to understand what's going on, I've found myself consistently surprised by pieces of information that just sort of come up in discussion.

A few things I have learned about AO3 over the past few days:
  • Board members can serve as Committee Chairs simultaneous to their term. I understand why it was done this way when the org was first started, but this seems to so obviously fly in the face of any reasonable system of checks and balances. 
  • Many OTW volunteers aren't members, and therefore can't vote in elections. I guess I should have known this, because membership is entirely donation-based and there's no mention of it anywhere on the volunteering page, but I didn't! The fact that someone can volunteer hundreds of hours and not get to vote, but I get to vote because I mailed AO3 a check for less than a meal at a cheap diner would cost me, is jaw-dropping.
  • It is incredibly difficult to find information about the OTW's internal structure. 
    • The board currently has seven seats. This number is nowhere on the Board's fanlore page and I also didn't find it on the OTW's Board of Directors "About" page; I figured this out by going back through the past few election cycles and counting the number of people on the board post-election. Presumably (hopefully) this is somewhere in the elections information and I just didn't see it, but having to hunt for something that seems like pretty basic info is frustrating.
    • The OTW currently has 18 committees. The About page has more information about this than the board, but things like how committee chairs are selected are still incredibly opaque. I've been using AO3 for over a decade and before today I probably would've been able to name less than half of the committees.

Some (maybe a lot!) of you are probably already familiar with these things, but I feel like I don't see people talking about the first two except incidentally in regards to other topics? And I guess I was just wondering if anyone else was flabbergasted by this, or had other things you've learned about the structure of the OTW you wanted to share, or...anything.
thisaintbc: (Default)
In the wake of the OTW’s finance chat, a lot of people on twitter seem to be discussing the possibility of ao3 introducing either full time, part time, or contract-based paid positions, and I’m curious what people in my corner of fandom think those positions should be! I made a poll on twitter (I don’t think you need a twitter account to vote)—I don’t have a paid dw account, so I don’t think I can make a poll here, but if you have an opinion feel free to tell me about it on twitter, comment here, both, neither, wherever!

Profile

thisaintbc: (Default)
Mission

April 2025

S M T W T F S
  12345
6789101112
13141516171819
2021222324 2526
27282930   

Syndicate

RSS Atom

Style Credit

Expand Cut Tags

No cut tags
Page generated Jun. 17th, 2025 03:48 am
Powered by Dreamwidth Studios