Bridging Kubernetes, Cilium & eBPF to Networking Concepts for Network Engineers: Part 4 Services

Understanding how Kubernetes transforms traditional IP networking concepts into a dynamic, software-defined environment is crucial. This article demystifies Kubernetes Service Mesh, making it accessible for network engineers familiar with IPSec tunnels and Layer 7 Firewalls. You may think this doesn't apply to you, but if you're planning a firewall refresh anytime soon, then these concepts will need to be a part of your vernacular going forward.

Part 4: Services – The New Layer for East-West Traffic (Comparable to Application Delivery Networks)

After getting comfortable with Kubernetes networking and security, you'll hear about service meshes (like Istio, Linkerd, or Cilium's service mesh mode). A service mesh focuses on the communication between microservices, providing features like dynamic routing, resiliency (retries, circuit breaking), and end-to-end security (mTLS) at the application layer. To a network engineer, many service mesh capabilities sound like things we've done before – routing, load balancing, ACLs – but implemented in a different place.

What is a service?

In simple terms a service for Kubernetes is "a method for exposing a network application that is running as one or more Pods in your cluster." Each service is a logical grouping of pods to provide a function to the cluster. Each service also has a policy to control how it's accessed and secured in the cluster as well. This "function" can be almost anything, and the port that it uses is dynamically assigned by Kubernetes Services API to prevent communication conflicts. The Kubernetes Service API also abstracts the back-end to make the service seamless and consistent for the front-end. So if pods are added/removed from the service, then the front end API connection criteria won't change at all. Can you imagine if Windows Servers had this. You'd be able to use domain services just by hitting an API, and everything from onbarding to decommissioning users and machines would all be done within that API. Furthermore if Windows Servers were like containers, then you wouldn't have to spin up a new windows server ever, the Services API would automatically spin-up and spin down servers for your domain environment (depending on demand).

What is a service mesh?

Cilium, Istio, Nginx, Kuma, Grey Matter, etc. there are a lot of tools that fall into the service mesh classification. In simple terms, service mesh is an infrastructure layer for service communication. Instead of the network devices handling all routing and security, some of that logic is moved into the application layer. Each service is accompanied by a proxy (e.g. Envoy for Istio) that handles inbound and outbound requests for that service. These proxies form a "mesh" of L7 controls (typically Http) inside the cluster. The mesh's control plane configures these proxies with rules and the data plane enforces the rules– for example, "send 10% of traffic from service A to version 2 of service B" or "enforce mTLS between all services." This might remind you of policy-based routing or application delivery controllers, but instead of a few big appliances, it's distributed into every pod.

Source: https://isovalent.com/blog/post/2021-12-08-ebpf-servicemesh/

However, service mesh's do not have to be implemented with a proxy. Cilium is a good example of how a service mesh is done with Kernel level visibility instead of a mechanism called eBPF. eBPF facilitates a wide range of uses including; performance monitoring, security enforcement, and network traffic management. If you're looking to learn more on eBPF check out this article

Regardless of how the service mesh traffic flow is handled, whether in kernel or via network proxy, think of a service mesh as host-based application routers/firewalls. They operate at Layer 7 but the concept of routing rules is there for example content-based routing (e.g. if URL contains /v2 go to v2 service), much like an F5 L7 switch would do. Also, service meshs handle load balancing at the client side. Instead of relying on kube-proxy's simplistic round-robin, Envoy proxy can do sophisticated load balancing (weighted, least request, etc.) and even handle retries and timeouts per request. These are things network folks might have achieved with a combination of load balancers and custom logic in the past.

Ensuring Services are Available

Another key aspect is service discovery and health monitoring. In Kubernetes, service discovery is largely DNS-based (a service name resolves to a cluster IP). In a service mesh like Istio, each proxy knows about the endpoints for a service (via the control plane) and can apply intelligent logic – for instance, circuit breaking (stop sending traffic to an endpoint that's erroring) or outlier detection (similar to how a load balancer might mark a server as down after failures). So you could say a service mesh is like having a cluster of mini-F5 or Citrix ADCs, one co-located with each application, all programmed centrally to work together.

We talked about this in the first part of this series, but it's worth reiterating when talking about services… A container service mesh enhances service availability by providing intelligent traffic routing, load balancing, and automatic failover mechanisms. It ensures that requests are dynamically distributed across healthy service instances, reducing the risk of downtime caused by failures in individual containers or nodes. Through features like circuit breaking and retries, the mesh can detect and mitigate failures before they impact users. Additionally, built-in observability and health checks enable proactive issue resolution (more info in the third part of this series). while service discovery allows for seamless scaling and rolling updates without disrupting traffic. These capabilities collectively improve resilience, ensuring that microservices remain highly available and performant.

Securing a Service Mesh

Security in a service mesh also parallels network concepts: meshes often enforce Mutual Transport Layer Security (mTLS) for all service-to-service communication, which is akin to deploying IPsec tunnels or SSL offload devices between all your servers. The mesh automates certificate management and encryption, so two services talk over an encrypted channel without developers needing to implement it. This is similar to how in a traditional data center you might have deployed an internal PKI and configured all servers to use TLS – but here the proxies handle it. Additionally, service mesh can do authorization at the service call level ("service A is allowed to call service B"), comparable to firewall rules but at the RPC layer. This may remind a Cisco engineer of micro-segmentation policies, but implemented via sidecar proxies rather than ACLs in the network. Here's an example on how to enable mTLS on Istio throughout the service mesh.

kubectl apply -f - <<EOF
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
  name: "Lock-it-Down"
  namespace: "Kingdom-Wide"
spec:
  mtls:
    mode: STRICT
EOF

Finally, and arguably the most important security control for service mesh is a strong RBAC (Role Based Access Control)! Separation of privileges, and least privilege access, only work when RBAC works. When utilizing a Zero Trust Framework being sure you are restricting users to least privilege for their job responsibilities is vital to a secure environment. Also being able to ensure that a "deny all" by default is in place can only happen when you have a validated identity to tie to the user. The identity and what they're authorized to access is at the heart of RBAC, and ensures separation of roles and responsibilities for your container environment. This is similar to how TACACS+ is able to separate command levels, and even specific commands, out based off the identity of the admin accessing the switch. Providing separate levels of access for different team members administering a switch is the same as using RBAC to provide different levels of access to the container environment.

Key takeaways (Services):

Service= a method for exposing a network application that is running as one or more Pods in your cluster.
Service Mesh: Each service is typically accompanied by a proxy that handles inbound and outbound requests for that service. These proxies form a "mesh" of L7 controls inside the cluster.
- Cilium: is a good example of how a service mesh is done with Kernel level visibility instead of a proxy.
Service Discovery: Is primarily DNS based. Each container knows about the service cluster IP, and its health through the control plane. The environment can even break off broken or degrading service IP's with mechanisms like circuit breaking.
Service Availability: Comes from intelligent traffic routing, load balancing, and automatic failover mechanisms. Just like in enterprise networks.
mTLS: encrypts communication for all service-to-service communication, which is akin to deploying IPsec tunnels between all your servers.
RBAC: Allows for Separation of privileges, and least privilege access for a service mesh. This is like setting up TACACS+ controls on switches for command level access.

In summary, Service Mesh deployments vary from one tool to another, but all have the same principles of providing services to a container environment. Just like in enterprise networks where security and availability are vital, the same is true for containers to ensure services are readily available and secured within the environment.