Slack outage: "It's always DNS" (and, well...)

"A small number of third-party DNS resolvers did not properly interpret the change"

A Slack outage late last week was down to a DNS (Domain Name System) configuration change to the slack.com domain, the workplace collaboration software provider has confirmed, blaming "a small number of third-party DNS resolvers [that] did not properly interpret the change, causing some users to be unable to connect to Slack" for the outage, which affected tens of thousands of users globally on September 30.

"It's always DNS" goes the old IT chestnut and once again, it was -- and it persisted even after Slack rolled back the the configuration change at 10:02am (PT) on September 30. As the company noted: "Some third-party DNS resolvers still expected the configuration to exist in our slack.com domain. As a result, some third-party DNS resolvers failed to return records for the slack.com domain. Because some internet service providers and public resolvers persist records for up to 24 hours as a standard practice, the issue continued for a percentage of users."

"In the meantime, we reached out to third-party DNS resolvers to request they refresh the record for the slack.com domain, which would resolve the issue for Slack users on that DNS resolver.

"We’re grateful to the internet service providers and DNS community members who worked quickly to refresh records around the globe, improving conditions for many impacted users."

Slack did not detail the precise nature of the configuration change.

A quick online check of Slack's DNS propagation shows it relying on 33 different DNS resolvers -- from better known organisations like OpenDNS, Google, and Cloudflare, to smaller local ISPs.

On another note... DNS security is still problematic

Use of the Internet relies on translating domain names (like “slack.com”) to IP addresses; the job of DNS.

Widely recognised as problematic for a range of reasons, DNS has come in for increased scrutiny in recent years -- DNS lookups are generally unencrypted, since they have to be handled by the network to direct traffic to the right locations -- not least in the wake of mass DNS hijacking attacks in 2017 and 2018, covered in some detail by Cisco Talos here and Mandiant here. (That incident caused CISA to release its first ever emergency directive).

Among the more recent DNS-related incidents was one presented by the team at Israel's Wiz.io at Blackhat in August 2021. They discovered a vulnerability that allowed them to intercept a significant portion of the worldwide Dynamic DNS (DDNS) traffic going through managed DNS providers like Amazon and Google.

(Read details of that attack here).

Exploitation of that issue allowed Wiz's team to “wiretap” the DNS traffic from over 15,000 organisations, including "Fortune 500 companies, 45 U.S. government agencies, and 85 international government agencies.” The information gathered included internal and external IP addresses, computer names, employee names and office locations and more.

See also: NSA: DNS-over-HTTPS “no panacea”. NCSC: Handy if *we* run it, though