Only a few years ago, when we talked about infrastructure we meant physical infrastructure: servers, memory, disks, network switches, and all the cabling necessary to connect them. I used to have spreadsheets where I’d plug in some numbers and get back the specifications of the hardware needed to build a web application that could support thousands or even millions of users.
That’s all changed. First came virtual infrastructures, sitting on top of those physical racks of servers. With a set of hypervisors and software-defined networks and storage, I could specify the compute requirements of an application, and provision it and its virtual network on top of the physical hardware someone else managed for me. Today, in the hyperscale public cloud, we’re building distributed applications on top of orchestration frameworks that automatically manage scaling, both up and out.
Using a service mesh to manage distributed application infrastructures
Those new application infrastructures need their own infrastructure layer, one that’s intelligent enough to respond to automatic scaling, handle load-balancing and service discovery, and still support policy-driven security.
Sitting outside microservice containers, your application infrastructure is implemented as a service mesh, with each container linked to a proxy running as a sidecar. These proxies manage inter-container communication, allowing development teams to focus on their services and the APIs they host, with application operations teams managing the service mesh that connects them all.
Perhaps the biggest problem facing anyone implementing a service mesh is that there are too many of them: Google’s popular Istio, the open source Linkerd, HashiCorp’s Consul, or more experimental tools such as F5’s Aspen Mesh. It’s hard to choose one and harder still to standardize on one across an organization.
Currently if you want to use a service mesh with Azure Kubernetes Service, you’re advised to use Istio, Linkerd, or Consul, with instructions as part of the AKS documentation. It’s not the easiest of approaches, as you need a separate virtual machine to manage the service mesh as well as a running Kubernetes cluster on AKS. However, another approach under development is the Service Mesh Interface (SMI), which provides a standard set of interfaces for linking Kubernetes with service meshes. Azure has supported SMI for a while, as its Kubernetes team has been leading its development.
SMI: A common set of service mesh APIs
SMI is a Cloud Native Computing Foundation project like Kubernetes, though currently only a sandbox project. Being in the sandbox means it’s not yet seen as stable, with the prospect of significant change as it passes through the various stages of the CNCF development program. Certainly there’s plenty of backing, with cloud and Kubernetes vendors, as well as service mesh projects sponsoring its development. SMI is intended to provide a set of basic APIs for Kubernetes to connect to SMI-compliant service meshes, so your scripts and operators can work with any service mesh; there’s no need to be locked in to a single provider.
Built as a set of custom resource definitions and extension API servers, SMI can be installed on any certified Kubernetes distribution, such as AKS. Once in place, you can define connections between your applications and a service mesh using familiar tools and techniques. SMI should make applications portable; you can develop on a local Kubernetes instance with, say, Istio using SMI and take any application to a managed Kubernetes with an SMI-compliant service mesh without worrying about compatibility.
It’s important to remember that SMI isn’t a service mesh in its own right; it’s a specification that service meshes need to implement to have a common base set of features. There’s nothing to stop a service mesh going further and adding its own extensions and interfaces, but they’ll need to be compelling to be used by applications and application operations teams. The folks behind the SMI project also note that they’re not averse to new functions migrating into the SMI specification as the definition of a service mesh evolves and the list of expected functions changes.
Introducing Open Service Mesh, Microsoft’s SMI implementation
Microsoft recently announced the launch of its first Kubernetes service mesh, building on its work in the SMI community. Open Service Mesh is an SMI-compliant, lightweight service mesh being run as an open source project hosted on GitHub. Microsoft wants OSM to be a community-led project and intends to donate it to the CNCF as soon as possible. You can think of OSM as a reference implementation of SMI, one that builds on existing service mesh components and concepts.
Although Microsoft isn’t saying so explicitly, there’s a note of its experience with service meshes on Azure in its announcement and documentation, with a strong focus on the operator side of things. In the initial blog post Michelle Noorali describes OSM as “effortless for Kubernetes operators to install, maintain, and run.” That’s a sensible decision. OSM is vendor-neutral, but it’s likely to become one of many service mesh options for AKS, so making it easy to install and manage is going to be an important part of driving acceptance.
OSM builds on work done in other service mesh projects. Although it has its own control plane, the data plane is built on Envoy. Again, it’s a pragmatic and sensible approach. SMI is about how you control and manage service mesh instances, so using the familiar Envoy to handle policies allows OSM to build on existing skill sets, reducing learning curves and allowing application operators to step beyond the limited set of SMI functions to more complex Envoy features where necessary.
Currently OSM implements a set of common service mesh features. These include support for traffic shifting, securing service-to-service links, applying access control policies, and handling observability into your services. OSM automatically adds new applications and services to a mesh by deploying the Envoy sidecar proxy automatically.
Deploying and using OSM
To start with the OSM alpha releases, download its command line interface, osm, from the project’s GitHub releases page. When you run
osm install, it adds the OSM control plane to a Kubernetes cluster with its default namespace and mesh name. You can change these at install time. With OSM installed and running, you can add services to your mesh, using policy definitions to add Kubernetes namespaces and automatically add sidecar proxies to all pods in the managed namespaces.
These will implement the policies you chose, so it’s a good idea to have a set of SMI policies designed before you start a deployment. Sample policies in the OSM GitHub repository will help you get started. Usefully OSM includes the Prometheus monitoring toolkit and the Grafana visualization tools, so you can quickly see how your service mesh and your Kubernetes applications are running.
Kubernetes is an important infrastructure element in modern, cloud-native applications, so it’s important to start treating it as such. That requires you to manage it separately from the applications that run on it. A combination of AKS, OSM, Git, and Azure Arc should give you the foundations of a managed Kubernetes application environment. Application infrastructure teams manage AKS and OSM, setting policies for applications and services, while Git and Arc control application development and deployment, with real-time application metrics delivered via OSM’s observability tools.
It will be some time before all these elements fully gel, but it’s clear that Microsoft is making a significant commitment to distributed application management, along with the necessary tools. With AKS the foundational element of this suite, and both OSM and Arc adding to it, there’s no need to wait. You can build and deploy Kubernetes on Azure now, using Envoy as a service mesh while prototyping with both OSM and Arc in the lab, ready for when they’re suitable for production. It shouldn’t be that long a wait.