The era of control loops for compliance
"There is a continuum here from simple script/task execution to sophisticated workflow engines..."
Balancing speed and safety in a world that's both technologically volatile and increasingly threatening is a challenge that nearly every organization is facing, writes Deepak Giridharagopal, CTO, Puppet. With major security breaches becoming more of the norm than the exception, the stakes have never been higher. Companies are under enormous pressure to maintain constant vigilance over their infrastructure, ensuring that everything is safe, secure, and well looked-after. This pressure comes not just from regulators, but from within via internal security and governance policies. We're at a point where compliance and policy enforcement are now a huge part of many an operations team's mandate. The problem is that it's harder than ever to be truly successful at it.
Right now, it’s incredibly difficult to apply compliance standards to modern infrastructure. Historically, compliance in the enterprise has been based on the standards created by large trade bodies, consortiums, and governments. Given the pace of technological change, the huge uptake of cloud-based systems, and the speed with which engineers can (and do) deploy new systems, we’re all continually playing catch up. The rate of change is so high that even if we have great governance policies today, they may not be relevant tomorrow. The struggle is real!
Many enterprises deal with the problem in the traditional, capital-e-Enterprise way: by slowing things down, putting up more gates and inserting more manual approvals into processes that touch production infrastructure. While massively slowing down the rate of change can help reduce one's exposure, it has major side-effects. The slower you are at making changes, the worse you'll be at quickly responding to problems. And problems are inevitable.
Follow The Stack on LinkedIn
Others have adopted a quasi-automated approach to compliance. They have tools that will periodically scan their estate for compliance issues, and runbooks that outline what to do when problems arise (parts of which may even be automated). This is a definite improvement over the "talk to the hand" approach mentioned earlier. Yet many problems remain. Periodic scans leave you exposed in the interim. Manual remediation is slow, expensive, and error-prone. Policies are often company-specific, yet many scanners are difficult to customize and extend easily.
We can (and should) do better! A major theme in the world of operations is self-healing: systems that can identify issues and then automatically repair things. We see this pattern in all manner of operations tools from configuration management to container orchestration. Looking back upon several decades of technological innovation in these areas, I think the application of control loops to infrastructure management has turned out to be a pretty fantastic idea. Could we apply those principles to compliance?
The era of control loops for compliance
For me, a continuous compliance system should have the following properties:
- It should be event-driven, not exclusively schedule-driven. It should be able to respond to events in the infrastructure at the time they occur, rather than just waiting until Sunday at 3am to run a scan.
- It should feature policy-as-code as the mechanism for defining compliance rules. Every company's rules are different - there are lots of company-specific policies that are just as important as ones made by regulatory bodies. Expressing policies as code makes them easier for teams to create, modify, share, peer-review, and test in ways harmonious with the rest of their modern, infrastructure-as-code practices.
- It should be able to consume events from all the tools and platforms involved in keeping your infrastructure working properly: cloud platforms, container orchestrators, CI/CD tools, ticketing systems, inventory systems, etc. This lets the system respond to changes at different levels of the stack, and at different points in the application lifecycle
- It should have a built-in framework for taking action in response. The more classes of compliance issues you can automatically repair, the more time and money saved. There is a continuum here that can range from simple script/task execution to sophisticated workflow engines, covering actions from filing tickets to sending Slack notifications all the way to automatically reconfiguring the affected system to repair the problem.
- It should present users with an always up-to-date assessment of the compliance status of their infrastructure. If we are receiving events in real time, and if we can respond in real time, then that makes real time reporting possible. Why settle for less?
Such a system would be much better equipped to deal with the compliance challenges of tomorrow. We have all of these major pieces already - there are plenty of compliance scanning tools, event hubs, action engines, and reporting tools out there.
Enterprises may want to step back from their compliance point solutions and instead consider systems that better tie together detection and action to achieve more continuous compliance. If not for the infrastructure you have today, then for what you'll have tomorrow. Like just about everything else in IT, compliance will ultimately have to evolve from working at the speed of a human to working at the speed of the business.