It's not really their stuff though. The content comes from the users. And part of the reason users are willing to contribute to SE is because of the licensing model and the fact that the data is available outside of SE. Obviously this is more important to some users than others, and probably some percentage don't care about it at all. It's hard to say what those percentages actually look like though.
IF (and it's definitely an "IF") this is an intentional and permanent change by SE management, they are fundamentally changing the basic understanding between users and SE, and they have to understand that some subset of users are likely to quit using SE in response. Again, it's hard to say how many. Maybe enough to have a material impact, or maybe not. That would be the gamble they'd be taking though.
> they are fundamentally changing the basic understanding between users and SE
Given the way they've communicated in the last 2 weeks or so, this seems pretty clear. Before we had employees engaging as real human beings all over the place, and you were talking to Jon, Tim, Robert, Shog, etc. and not "Mr. Ericson, title such-and-such, representing Stack Exchange Inc."
Now all we have is a bunch of announcements, with no discussion, engagement, or even a recognition that anything is even being read. It feels like pissing in the wind – disagreement is one thing, reasonable people can disagree, but ignoring is so much worse; it's like you're not even taken serious.
Stack Exchange has gone through various phases (e.g. the "Jeff era" was different from the "stagnation era" that followed after he left), but the implied social contract was always that the community would offer their spare time and in return they would get a platform and some voice in how that platform is run. There have certainly been moments of friction in this relationship, but the basics of it never changed until now (not even with the whole debacle surrounding the firing of a moderator a few years back).
Before the release of the LLMs where everyone could run it, the amount of slurping was probably manageable. Now that anyone can train an LLM and SO/SE/Reddit/etc are obvious places to go for training data, I can see where the systems would easily be overwhelmed. People contribute to SO/SE because it's a common place to go for community help. Training a for profit chatbot from the community data that wasn't provided to the chatbot by the community seems to break the spirit in which the contributions were made. I'm on the fence of the argument, but most definitely in the direction of not liking all of the model training for free.
A lot of people were only willing to contribute to StackOverflow because of the CC licensing, trusting the knowledge wouldn't be locked up. As a business that depends on vast amounts of volunteer effort they need to balance providing a site where people are willing to contribute against making as much money as they can.
I mean they would just scrape it if there's no data dump. It just makes it harder for the small guys. They probably scraped and are scraping HackerNews.
Generative AI doesn't follow copyright or even explicit software licenses as we have seen in AI art with human signatures and Microsoft Copilot.
There was always the possibility of some sort of aggregator/other front end sitting on top of the SO data. We just didn't know exactly what a successful one would look like until relatively recently. I always limited how much I contributed based on that as likely outcome. Discontinuing the data dump is a much bigger deal to me and completely changes the value proposition of their various sites.
For what it's worth, as someone who has put a lot of writing online, I'm not bothered by having my writing including in the training sets of these LLMs. I write because I want to share knowledge, and it isn't important whether people get the knowledge directly from me versus mediated by friends, LLMs, etc.