Slack privacy furore: Company updates legals, says no customer data used to train genAI.

Communicating RAG/ML is hard. Engineering, legal and communications need to sit down and thrash this out in a growing number of companies.

Slack privacy furore: Company updates legals, says no customer data used to train genAI.

“To develop AI/ML models, our systems analyse Customer Data (e.g. messages, content and files) submitted to Slack”, Slack said in language used in its privacy principles for over six months – language belatedly noticed and shared by both users and multiple publications last week.

Journalists writing about this over the past few days have now been treated to Slack warning them of “inaccuracies” in their reporting – “inaccuracies” that were the consequences of Slack’s own documentation, which the company has belatedly updated.

As of Friday, May 17, Slack now explicitly says: “We do not develop LLMs or other generative models using customer data” – and specifies that its systems analyse customer data including files to “develop non-generative AI/ML models for features such as emoji and channel recommendations.”

It still warns in the revised documentation that if users “want to exclude your Customer Data from helping to train Slack global models, you can opt out” (i.e. the default is still “opt in” to train Slack “global models”; customers need to email the company to opt out) but the company says it has a range of protections in place that ensure privacy is protected. 

It points customers to a security whitepaper.

This training helps Slack offer services like “autocomplete” and “emoji” suggestions it said. Slack added in updated documentation that for autocomplete, “suggestions are local and sourced from common public message phrases in the user’s workspace. Our algorithm that picks from potential suggestions is trained globally on previously suggested and accepted completions. We protect data privacy by using rules to score the similarity between the typed text and suggestion in various ways, including only using the numerical scores and counts of past interactions in the algorithm.” (Have fun unpacking that, casual users.)

Its use of generative AI is limited to its opt-in only (February 2024) “Slack AI” add-on it said. This “uses off-the-shelf LLMs where the models are not updated by and don't in other ways retain Customer Data after a request to them… because Slack AI hosts these models on its own AWS infrastructure, Customer Data never leaves Slack's trust boundary, and the providers of the LLM never have any access to the Customer Data.”

The incident, which resulted in numerous users shutting down their Slack workplaces, is the latest example of a software company struggling to communicate how it uses user data for generative AI applications. 

The complexities of explaining retrieval-augmented generation (RAG) workflows as well as other machine learning approaches in a privacy policy appear to be a particular reputational risk for companies.

 (In RAG, a model pulls relevant information from documentation like customer files to help answer questions; using both that documentation data and its training data to create responses; but the documentation fetched is not “held” in the LLM nor used to inform its future responses; it does not, simply, “become” part of an LLM’s repeatable knowledge.) 

Dropbox faced a similar issue in December 2023, when confusion over a new default toggle set to “share with third-party AI” caused an uproar that even saw AWS’s CTO “draw the wrong conclusion”--  he later apologised after publicly flagging his privacy concerns to Dropbox. 

Dropbox later attempted to explain to worried users that “only [their] content relevant to an explicit request or command is sent to our third-party AI partners [OpenAI] to generate an answer, summary, or transcript… your data is never used to train their internal models, and is deleted from OpenAI’s servers within 30 days,” it added in December. 

Slack said: "Our guiding principle as we build this product is that the privacy and security of Customer Data is sacrosanct, as detailed in our privacy policy, security documentation and SPARC and the Slack Terms."

A review on May 17 by The Stack noted however that not one of those documents makes any mention of generative AI nor machine learning. 

Slack meanwhile also collects user data to “identify organisational trends and insights,” a privacy policy from the Salesforce-owned firm adds. 

As The Stack published, the company had yet to respond to questions about what kind of organisational trends it pulls from customer data. 

We will update this story when we receive a response.