LinkedIn suffers sweeping outage
Issue resolved in circa 50 minutes.
Updated 20:17 GMT, February 23. LinkedIn says a fix has been implemented and the issue resolved. It was first flagged at 18:56 UTC on its status page and described as fixed by 19:48 UTC.
LinkedIn this evening confirmed a sweeping outage that prevented companies and individuals from logging in to the platform -- with numerous sponsored corporate events being run on LinkedIn Live knocked offline.
"Some members may be experiencing an issue with accessing LinkedIn on mobile and desktop. We’re working on this as we speak and will provide updates as we have them. Thanks for your patience!" the company said on Twitter at 7.06pm GMT. A strawpoll of users and look at Twitter suggests the issue is global.
https://twitter.com/LinkedInHelp/status/1364290634459340806
The outage appeared briefly resolved by ~7.30pm GMT, with connection issues then resuming (certainly for The Stack). We know of at least one LinkedIn Live event featuring senior Microsoft executives knocked offline.
LinkedIn outage: lots of things to go wrong...
The Microsoft-owned platform runs a detailed engineering blog which gives a flavour of some of its back-end infrastructure. As an October 28 blog notes: "LinkedIn has 12,000 Multirepo codebases, referred to as multiproducts, which represent individual software components developed by our engineering team across the globe. Every day, thousands of code changes are pushed through our Continuous Integration (CI) pipeline.
"Our ecosystem is rapidly evolving and we have seen a double-digit growth in the number of multiproducts year after year. Our source control branching model is trunk-based development, meaning that developers push code changes frequently to the main branch of a multiproduct and avoid long-lived feature branches. A successful code push to our CI pipeline publishes a new version, which can then be deployed into production."
As another update notes, at any given time, LinkedIn’s experimentation platform is serving up to 41,000 A/B tests simultaneously on a user population of over 700 million members. It seems plausible that something buggy made it to production, although failovers should have kicked in. We'll keep our eyes peeled for an RCA in due course.
The incident is the first serious LinkedIn outage in some time, but as the platform aims to compete with numerous webinar and other live events platforms for business, it will have many questioning whether they should have looked elsewhere for more resilience hosts of their events.
Hugops, meanwhile, to those working on bringing the platform back.