Photo by Markus Winkler / Unsplash

How a platform focus helped Deutsche Bahn

A developer experience platform has been adopted by over 10,000 users, while a Kubernetes platform supports several hundred projects.

Following a race to the cloud by German railway operator Deutsche Bahn in 2016, it was the “grassroots” that ultimately fostered a platform engineering approach with Kubernetes at its heart as its sought to optimize its systems.

The company had decided to sell its on-prem datacentre infrastructure, and rent it back until migration to the public cloud was complete. This was a reaction to the fact the on-prem was slow, hard to scale, and increasingly expensive.

“The main focus was to clean up the data centres,” says Gualter Barbas Baptista, Lead Consultant for Platform Strategy and Enablement, Deutsche Bahn - DB Systel, speaking to The Stack at Kubecon. “Many of the systems are old or monolithic. Not really cloud, in any architecture sense.”

But while a lift and shift to cloud is not necessarily a transformation in itself, in Deutsche Bahn’s case it did lay the groundwork for real transformation: "Moving from traditional structures to a product-based approach, with self-organised teams, with their own business cases,” Baptista explains. “These were all necessary moves that completely transformed the IT landscape.”

This in turn prompted an effort to optimize processes step-by-step. In parallel, at the grassroots, he said, “because of this agile organization, there were teams that started to build up with Kubernetes.”

Some applications were eminently suitable for the Kubernetes treatment, he says, though some were not. That might be because they cannot handle a node being shut down, for instance during autoscaling. Or because licensing models, for example with a CRM system, just didn’t map to a Kubernetes architecture.

This potential for limitless developer choice – or anarchy depending on your point of view – was part of the reason it adopted a platform approach.

“Platforms are being pushed as the standard,” Baptista says. “Not only from the grassroots, but really it is recognised that we need more platforms, because we will not have the capacity to continue doing operations in the traditional way.”

As he put it, “They provide you with a layer which makes it simpler and more efficient to develop in the cloud.”

This reduces the complexity associated with the majority of use cases, he said, and eases the considerable compliance burdens DB has, given it is critical infrastructure, publicly owned, and subject to rigorous data laws both at a national and at an EU level.

In practice, this means developers have a choice of landing zones and platforms. “We have, for example, a service which is a way to access certain AWS cloud services … in a relatively simple way, where you don't need to really care for a known AWS account, for example, which requires a higher level of operational responsibility.”

At the same time, it has a developer experience platform. GitLab is central to this, along with testing tools, source code analysis, and modular pipelining for CI/CD. Baptista says this means developers are “relatively quickly and compliantly able to in one day, go from your source code to production. If you want.”

The tools are generally provided as inner source, he adds, and are also open to contributions. “We don't see only the small platform teams as producing everything for every need. We rather see the potential of it as an ecosystem where the contributions from the larger community will also increase the pace of development of the platform and its evolution.”

And that internal ecosystem is substantial. The developer experience platform has been adopted by over 10,000 users he says, while its Kubernetes platform supports several hundred projects.

So, for example, if DB Cargo develops a new module for pipelining, and it can also be picked up by the long-distance train organisation. “We don't need to repeat things six times, but you develop once and everyone can use it again.”

That focus on reuse and utilization also informs its Green Digitalization efforts which began in earnest in 2022. Like DB’s uptake of Kubernetes, this was initially a grassroots effort, but DB’s CIOs quickly swung their support behind it, he said. A contrast with most of their peers at the time.

At the time, Baptista was product owner for two platform teams, one of them running Kubernetes, together with observability. “And with the observability service, we have the Grafana and Prometheus stack. And with the Kubernetes, we said okay, there is this Kepler component, someone had already looked at it and seen promising results.”

He said he and his team gained backing and funding to produce a prototype MVP and integrate into the underlying platform.  By adding the Kepler metrics on energy consumption, and providing it in near real time, the developers can think about how to manage it.

“You start getting metrics for hundreds of applications,” he said. “Just by deploying centrally.”

This prompted devs to think about how to use tools such as the Vertical Pod Autoscaler, or Kube-downscaler to switch down workloads, “Which are basic things, but which no one had thought until then… People were just waiting wasting resources.”

The objective was to optimize at a platform level, he said, and “because we had the shared platform to achieve 70% utilisation.”

Baptista’s thesis is that project teams and developers are often too conservative in their estimates for the resources they’ll need to keep their workloads running smoothly.

“So we needed to provide tools that tell them, ‘it's okay that you are conservative we have something that can sync that for you, and can adjust these resources as needed, or something that can shut down your development environments when you are not working.’ And so these were simple things.”

“If you present this directly on the dashboards that they’re already looking at every day. And so they can make a change in code and apply some of the architecture principles on green IT and you can see how does this work effectively in practice.”

This is more important than ever, he says. As organizations begin to adopt AI – and struggle to find the power to support it – it will be important to reduce the chance of analysing “parallel data” and to build foundation models that can be reused rather than needing to be retrained from scratch. “You need a kind of platform ecosystem around this.”

Join peers following The Stack on LinkedIn