Unlocking the technical approaches of service mesh for business growth

The API-gateway and service mesh worlds are converging...

The API-gateway and service mesh worlds are converging. It's a convergence that can be attributed to the blurring of the role of the edge (“outside traffic”) and how it brings its capabilities into the mesh (“inside traffic”). The time has come for organisations to manage everything securely within a networked environment from a single control plane to ensure no centralised bottlenecks are created, writes Brian Gracely, VP Product at Solo.io. All these factors directly impact how service mesh is evolving and what we can expect from it in the future.

Service mesh, which controls application-level service-to-service communication over a network, offers a solution for managing application reliability, security, and observability, along with application traffic monitoring and management. Istio has emerged as the defacto service mesh, solidified by Google’s announcement to donate Istio to the Cloud Native Computing Foundation (CNCF), a step which displays the growing importance of this technology for organisations. In fact, in our 2022 Service Mesh Adoption Survey, we found that 89% of respondents reported a very positive impact on application reliability due to using a service mesh. It also states that nearly half (49%) of companies are using a service mesh at some level, with a further 38% evaluating it for future use.

See: Top 10 Apache projects in 2021, from Superset, to NuttX and Pulsar

Even though this is considered a relatively new technology (5yrs old), based on the positive responses in the survey, we anticipate that the future of service mesh will come from innovations in the data plane. This is where the network control points (or “proxies”), through which requests and messages are traveling to, enable the service-to-service communications. Of course, using service mesh to enable this isn’t a novel idea. We’ve been promoting this since 2019. Looking ahead, we expect to see this growing in momentum, with the data plane becoming the environment where innovation will continue to happen.

For us, the data plane is the future of service mesh. We’ve decided to use the open-source Envoy proxy as our data plane of choice, which is integrated into the Istio service mesh. It’s an extremely vibrant, open-source community with our organisation already using the technology to extend the data plane with GraphQL and WebAssembly. Let’s now turn the focus to how Envoy has enabled us to create an environment where service mesh is integrated into the DNA.

Power on tap

Extending the proxy is an essential first step to help ensure that the Istio service mesh reaches its full potential. To this end, we’ve taken the base Envoy proxy and built additional filters and capabilities on top of that. Envoy has a filter architecture where these filters can be built into the proxy. And with extensibility being a core capability of Envoy, we see this as essential to how service mesh will evolve.

To date, we’ve built request transformations, data loss prevention, web-application firewalling, and other functions as filters into the Envoy proxy. This makes for a more powerful proxy that can be used at the edge or in the service mesh. And, we've taken it a step further by building the GraphQL engine into Envoy. Now users can specify GraphQL schema and how individual fields are resolved. Extending Envoy and GraphQL and moving it up through the networking layers bring a lot of power to the Istio service mesh.

Follow The Stack on LinkedIn

The question now turns to how we optimise the data plane or the placement of Layer 7 (application-level) proxies. Fundamentally, these must be focused on improving performance, managing resources better, catering for larger deployments, and injecting extensibility into the environment. As such, the data plane of the service mesh is vital in this regard. Given how requests and messages flow over these proxies, upgrading is an extremely sensitive undertaking. This must also happen within a tight security boundary. So, if there are any exploits, they must be contained to the smallest possible blast radius.

To optimise the data plane and the placement of service proxies, we must consider all these dimensions as well as the different patterns of distribution being used.

Pattern identification

The first pattern is that of the traditional sidecar. This sees the sidecar proxy injected next to the application instance. The sidecar, therefore, becomes another container in the Kubernetes pod running autonomously from the proxy. The drawback of this approach is that the proxy is deployed next to every application instance. And given that there are easily tens of thousands of these at large organisations, it adds significantly to the overhead.

Despite this, many of our customers favour the sidecar as they can configure applications independently from one another. Any changes made to one application will not impact another application. The sidecar also brings with it security granularity. For example, the proxy can represent a specific application and can verify the authenticity of another application contacting it. Another benefit is that we can perform fine-grain upgrades of sidecar proxies independently of the rest of the applications. This provides better control of the blast radius should anything goes wrong.

Another approach to consider is sharing the proxy with the entire node as applications communicating with each other flow through a node-based proxy. This enables these applications not to know or care about sidecar proxies. In practice, this means going around the sidecar more securely.

The benefits of this include the better use of resources. Companies can have a single proxy per node versus the hundreds of proxies per application instance of the sidecar approach. The resource overhead is, therefore, more optimal. However, they lose the advantage of feature isolation as the application identifiers get watered down when the proxy is extended. When sharing the proxy with other applications, it becomes important how the business performs upgrades as the blast radius is more significant than with a sidecar.

A third approach is meeting somewhere in the middle. Instead of having a proxy per service instance on one end, and a proxy by node on the other; you take the best of both worlds. This means having a service proxy per service account (per node). It can be specialised to get the benefits of having a lower blast radius through security isolation, while also enabling the environment to scale.

The final pattern involves moving the sidecar proxies off the node. This requires putting them in a specialised part of the architecture. So, the same policies, traffic, security, and so on, can be used for any deployment model. When people are deploying a large-scale service mesh, they care about secure traffic and how best to control traffic routing when doing new deployments. Using this pattern helps them answer questions like being able to failover, having high availability, and implementing tenancy for teams. Once these are in place, the organisation can optimise resource usage as it goes while also refining the architecture.

The value of service mesh

For any organisation looking to save time and reduce its operational expenses, the service mesh is vital. Given the growth of micro-services in today's digital, agile business environment, having access to a solution that provides platform-level automation while integrating containerised application infrastructures becomes a critical competitive advantage.

The above approaches and the resulting technologies can empower organisations to benefit from a more integrated environment. Furthermore, it gives them access to more secure applications that can be managed on a micro-level. As more organisational services introduce more network traffic, the Istio service mesh is an enabler for growth at a time when managing application deployments quickly and securely is a priority.

See also: Expect to start hearing a lot more about Istio