Wikipedia, the online nonprofit encyclopedia, laid out a simple plan to ensure its website continues to be supported in the AI era, despite its declining traffic.
If AI paid fairly for their training data, they’d be making the biggest losses in human history.
It’s almost like all successful capitalist business is based on theft and exploitation.
In the age of AI slop that you can’t trust, Wikipedia use is going down??
People think they can trust the slop, is the thing. If they even think so far ahead, they probably think that an answer that exists on wikipedia will just be provided by the AI, saving them the time to search for it themselves. I’ve heard more than one horror story of ChatGPT use in particular backfiring on someone who somehow legitimately thought it was just another form of search engine, and didn’t verify the information provided.
Kind of funny: When Wikipedia was new, people often said that you couldn’t trust information on it because anyone could have written it, even if they were unqualified, biased, or deliberately deceptive. I guess that’s still true today, but with the advent of automated misinformation generators, the Wiki almost seems authoritative in comparison.
Yeah, when I was at school in the early 00s we were specifically banned from referencing Wikipedia as a source because it was seen as untrustworthy.
Which is ridiculous, everybody knows that the reason you should be banned from referencing Wikipedia as a source is because an encyclopedia is not a source
Uh, it’s a tertiary source. It’s still a source, just not one you should be directly citing. They’re great for finding other sources though.
I got a F for plagiarism when I looked up the wiki and dived deeper into the sources and tried to incorporate the ideas and not trying to copy word for word. Apparently 65% was flagged as direct plagiarism from Wiki when I used the sources to write my essay. I was in 6th grade
If we’re being pedantic, yeah, but ‘source’ without qualifiers to me would refer to the one you’d cite. Wikipedia is great for finding general information, and then as you say, finding the source for that information (and also generally a lot more depth to the summary that’s on Wiki).
Tl;dr use Wiki, don’t cite Wiki
You’re supposed to reference the articles that Wikipedia references, not Wikipedia itself
Can confirm, I’ve been a Wikipedia zealot the entire time and people really do seem to have accepted it. If you ignore what else makes them cheer, it’s a huge victory.
Doubtful
I don’t get it though… Why would any company use this when Wikimedia also offers a download of the entirety of Wikipedia, for free?
Maybe it’s because if the AI companies don’t know, then they can hopefully get a little money from them?
You think AI companies care what they scrape. Their system is set up to scrape anything it can get.
Oh I know, I was just thinking that if the AI companies will make an exception for Wikipedia (by paying) like the Wikimedia people think, they could also download the complete thing for free. But yeah they probably won’t do any of that so this was kinda useless I think
They can scrape an ongoing log of interactions between editors about the articles themselves, which is probably fairly worthwhile content honestly. More content there than in articles probably as well.
From skimming that linked page, I think that this download perhaps doesn’t include recent pages? Because in the section talking about enterprise stuff, it mentions the paid API for recent articles
It seems you’re right, I’m just dumb and didn’t read the article I linked
Can’t you just download the entire thing for free?
I imagine this would be discouraged for corporate entities. Corps shouldn’t freeload.
honestly, this will only work if the AI companies were actually ethical which… they’re not known to be.

Paid API…
Wikipedia = Reddit
Wikipedia (or the Wikimedia Foundation) is mostly driven by donations and volunteers, unlike Reddit…
Also, scraping every page on Wikipedia is incredibly heavy, especially compared to things like downloading a compressed copy of the entire site through torrents.








