Since this seems to be written partly in response to (and honestly, to take advantage of) the recent Slack AI training panic, I took a look to see how Slack have updated their materials in response to that panic.
I think these updates are really good - Slack's previous messaging around this (especially the way they unclearly conflated older machine learning models with new policies for generative AI) was confusing and it wasn't surprising it caused a widespread panic.
It's now very clear what Slack were trying to communicate: they have older ML models for features like channel recommendations which work how you would expect such models to work. They have a separate "Slack AI" addon uou can buy that adds RAG features powered by a foundation model that is never further trained on user data.
I expect nobody will care. Once someone has decided that a company might "train AI" on private data you've already lost that person's trust. It's not clear to me if any company has figured out how they can overcome one of these AI training panics at this point.
>I expect nobody will care. Once someone has decided that a company might "train AI" on private data you've already lost that person's trust. It's not clear to me if any company has figured out how they can overcome one of these AI training panics at this point.
I think it goes beyond a single company, or rather a single incidence of this panic. You're looking at each time this happens as an independent coin flip instead of a series of dominoes that trigger a reaction in multiple directions.
What I mean by that is there's a counterculture sentiment building based off the idea that people have seen this same pattern enough times at this point that they're distrustful of large scale systems by default. It's happening with government institutions, politics, economics, and individual industries like gaming and streaming.
To that end the "panic" is not just a reaction to Slack's (perceived) actions, but an expectation that Slack will be yet another domino in that line of companies that have done the same. It's also difficult to prove a negative (that Slack isn't using private data for training purposes even if they say they're not) so the messaging is up against a very solid wall.
The result here is that public announcements and messaging related to data are under heavy scrutiny, and the media is incentivized to try and make their reporting go viral (ironically for the ad revenue) at the expense of actual journalistic reporting.
I'm not sure what the solution to this problem is, or if there even is one, but promoting self hosting seems like an indicator that the default assumption is that data collected will be abused in some way. Honestly based on the last couple of years it's not an unreasonable assumption either.
Yeah I think Slack's updated stated internal policies are about as reasonable as one can hope for from a tech giant, if one can trust them to stand by those policies. Your article was on my mind when writing this, I guess I should have linked it.
The crux of the matter is whether you can trust a big tech company to do what they claim they will. They all think AI is worth infinite dollars. In that world, without some very clear, painful, straightforward, contractual penalty ... well we've seen that the tech giant plan is that rules are meant to constrain your competitors' behavior, not yours.
If they wrote "If any of your data is discovered to have been in an AI training model, Slack owes you 10x your lifetime payments to Slack, and any involved whistleblowers get 1% of the total paid" in their terms of service, which means if Slack screws this up, the company is immediately bankrupt, that might prove effective. But a promise in a "privacy principles" policy that doesn't appear to actually be incorporated into the core ToS does not have a lot of teeth.
This does seem to be one of the key challenges here: publishing a "principles" document doesn't mean much if you reserve the right to change those principles in the future!
I think you're right: the most convincing version of this would be actual legalese.
I wouldn't be surprised if Slack have this in the contracts they sign with their larger customers, but I don't think those are publicly available.
These documents are new in the last few days:
https://slack.com/blog/news/how-we-built-slack-ai-to-be-secu...
https://slack.com/intl/en-gb/blog/news/how-slack-protects-yo...
I think these updates are really good - Slack's previous messaging around this (especially the way they unclearly conflated older machine learning models with new policies for generative AI) was confusing and it wasn't surprising it caused a widespread panic.
It's now very clear what Slack were trying to communicate: they have older ML models for features like channel recommendations which work how you would expect such models to work. They have a separate "Slack AI" addon uou can buy that adds RAG features powered by a foundation model that is never further trained on user data.
I expect nobody will care. Once someone has decided that a company might "train AI" on private data you've already lost that person's trust. It's not clear to me if any company has figured out how they can overcome one of these AI training panics at this point.
I wrote a bit about this back in December when it happened to Dropbox - there is an AI trust crisis at the moment: https://simonwillison.net/2023/Dec/14/ai-trust-crisis/