Slack users horrified to discover messages used for “AI” training

After launching Slack AI in February, Slack appears to be digging its heels in, defending its vague policy that by default sucks up customers’ data—including messages, content, and files—to train Slack’s global AI models.
↫ Ashley Belanger at Ars Technica

I’ve never used Slack and don’t intend to ever start, but the outcry about this reached far beyond Slack and its own communities. It’s been all over various forums and social media, and I’m glad Ars dove into it to collect all the various conflicting statements, policies, and blog posts Slack has made about their “Ai” policies. However, even after reading Ars’ article and the various articles about this at other outlets, I still have no idea what, exactly, Slack is or is not using to train its “AI” models.

I know a lot of people here think I am by definition against all forms of what companies are currently calling “AI”, but this is really not the case. I think there are countless areas where these technologies can make meaningful contributions, and a great example I encountered recently is the 4X strategy game Stellaris, one of my favourite games. The game recently got a big update called The Machine Age, which focuses on changing and improving the gameplay when you opt to play as cybernetically enhanced or outright robotic races.

As per Steam’s new rules regarding the use of AI in games, the Steam page included the following clarification about the use of “AI”:

We employ generative AI technologies during the creation of some assets. Typically this involves the ideation of content and visual reference material. These elements represent a minor component of the overall development. AI has been used to generate voices for an AI antagonist and a player advisor.
↫ The Machine Age Steam page

The game’s director explained that during the very early ideation phase, when someone like him, who isn’t a creative person, gets an idea, they might generate a piece of “AI” art and put it up on an ideation wall with tons of other assets just to get the point across, after which several rounds of artists and developers mould and shape some of those ideas into a final product. None of the early “AI” content makes it in the game. Similarly, while the game includes the voice for an AI antagonist and player advisor, the voice actors whose work was willingly used to generate the lines in the game are receiving royalties for each of those lines.

I have no issues whatsoever with this, because here it’s clear everyone involved is doing so in an informed manner and entirely willingly. Everything is above board, consent is freely given, and everybody knows what’s going on. This is a great example of ethical “AI” use; tools to help people make a product, easier – without stealing other people’s work or violating various licenses in the process.

What Slack is doing here – and what Copilot, OpenAI, and the various other tools do – is the exact opposite of this. Consent is only sought when the parties involved are big and powerful enough to cause problems, and even though they claim “AI” is not ripping anyone off, they also claim “AI” can’t work without taking other people’s work. Instead of being open and transparent about what they do, they hide themselves behind magical algorithms and shroud the origins of their “AI” training data in mystery.

If you’re using Slack – and odds are you do – I would strongly consider urging your boss to opt your organisation out of Slack’s “AI” data theft operation. You have no idea how much private information and corporate data is being exposed by these Salesforce clowns.

6 Comments

2024-05-17 7:06 pm
Morgan
I moved my workplace from Slack to self-hosted Mattermost a couple of years ago and we haven’t looked back. Slack has more features that might be good for larger companies, but we’re small potatoes and just need reliable chats with searchable history, the ability to make channels and private spaces for siloing projects, and good mobile clients. It can be hosted on something as small as a Raspberry Pi, or in a container or VM.

2024-05-18 4:38 pm
richarson
Ditto here.
Mattermost has been a really good tool for us at $WORK.
I don’t have exact numbers but we’re 120/150 people using it daily.

2024-05-17 8:27 pm
sukru
Thom,
Just for a moment, I thought I would agree with you. But once again we differ on the issue of AI… Or more properly machine learning, as AI is usually secondary in these efforts.
Anyway, as a long term AI/ML engineer and someone with a PhD in this area, I can probably say one thing:
The most important data source to improve a product is its own actual usage logs.
In the past, this was done by “Data Analysts” with statistical tools, answering questions like: “how do we increase our usage metrics?” or “what are the main pain points in our products?”. There were also security folks that used tools like nagios or solarwinds looking at anomalies to strengthen the systems.
Today, those tasks are automated and are done by the AI (okay ML). And… I would argue, it is better to have an agnostic model to look at your data, than a human being that is allowed to run SQL queries on actual user information (they have to in order to do their jobs).
So, overall it is a win for everyone. The product gets richer, the security and privacy improves, and users get a better experience in return.

2024-05-17 9:13 pm
LeFantome
I would agree if we were only talking about ML where the output was just how well the model worked. In this case though, we are talking about LLMs and that is a different beast. An machine learning model is not going to provide my secrets to a competitor in answer to a question. An ML model is not going to produce a new creative work of art in my own otherwise unique style.
I have two problems:
– I completely agree with the AI companies that for LLMs to work, they need access to ALL the information
– I completely agree with the original content owners that using their works to train LLMs is a copyright problem

2024-05-17 9:25 pm
sukru
tanishaj,
This entirely depends on how they handle the situation.
The most ideal solution would be having one “model” for each organization they serve. Or more realistically an “adaption” of a base model for each one. (Base weights would be shared, possibly borrowed from a public model like llama, but per org data is private and small)
Google for example does that for corporate customers:
https://cloud.google.com/blog/products/ai-machine-learning/google-cloud-unveils-ai-and-ml-privacy-commitment
The second best option is using industry standard privacy protecting algorithms to capture only very common patterns and not learning anything private to an organization (like credit card numbers)

2024-05-18 12:47 am
drstorm
What is AI, Thom?

Slack users horrified to discover messages used for “AI” training

About The Author

Thom Holwerda

6 Comments