Security Pattern – Service Mesh

Overview

The following security pattern describes the controls required to protect resources within service mesh fabric.

A service mesh is an infrastructure layer that manages all service-to-service communication within a distributed microservice ecosystem. It typically accomplishes this through “sidecar” proxies deployed alongside each service, which transparently route traffic. Organisations implement service meshes to address scaling challenges in microservice architectures and to enhance availability and resiliency.

A service mesh comprises two primary logical functions: the Control Plane and the Data Plane.

Service mesh data plane: A sidecar proxy that intercepts requests to microservices and manages service health checking, routing, load balancing, and authentication/authorisation.
Service mesh control plane: Provides service management, policy enforcement, and configuration across the service mesh. The control plane unifies all proxy services within data planes into a cohesive distributed system.

Technology examples for the Control Plane include Hashicorp Consul, AWS App Mesh, Istio, Azure Service Fabric Mesh, GCP Traffic Director, and Linkerd. For the Data Plane, examples of sidecar proxies include Envoy, Kong, Nginx, and Linkerd proxy.

There are several architecture choices for service meshes, depending on the vendor used and deployment. This pattern examines a combined API Gateway and Service Mesh architecture, along with options for securing these services.

Regardless of the architecture, service meshes commonly provide security enhancements as part of their native capabilities to address security threats associated with Microservices, Container Platforms, and Container Orchestration. These enhancements typically include service identity, mutual authentication, RBAC, and traffic encryption.

Despite these enhancements, several security challenges remain:

Typical Challenges

The distributed nature of service interconnectivity and communications makes end-to-end security assessment of data flows extremely difficult. This can lead to the “confused deputy” scenario, where a service is asked to perform a task on behalf of another service, system, or user that may not be authorised to perform that action.
An increased attack surface across the service mesh control plane and sidecar proxies could lead to potential false registration of services, manipulation of service routing, or disruption.
Unauthorised access may occur through direct connectivity to Microservices, bypassing controls applied in sidecar proxies. Microservices delegate enforcement of controls to sidecar proxies for authentication and authorisation. Direct access to microservices or the ability to bypass these controls may allow unauthorised access.
Service meshes rely heavily on application-level micro-segmentation. This requires strong naming standards and role-based service policies to ensure sufficient segregation and isolation across different trust domains.

Scope

The scope of this document is for addressing the security threats that relate to

ID	Description	Example
01	Service mesh fabric for Control and Data Planes across container orchestration services.	Deployment of Istio across Kubernetes and Docker.
02	Ingress and Egress traffic. Although service mesh is primarily managing East-West traffic, this pattern extends into North-South to address overall typical challenges with utilisation of service mesh.	Integration of Ingress and Egress Gateways within Service Mesh.

Out of Scope

ID	Description of exclusion	Reason for Exclusion
01	Service mesh fabric across non-containerised workloads.	Although service meshes can be built across virtual or physical servers, this is not considered typical use case for service mesh.
02	Security threats to container orchestration	Service Mesh deployments are closely related to Container Orchestration, and arguable may become a single pattern in the future. For the time being, these patterns are maintained separately to address different concerns for orchestration versus service-to-service communication flows.
03	DNS related security threats such as hijacking or spoofing.	DNS is a core dependency for Service Meshes to operate within enterprise environment, but are direct threat to Service Mesh Fabric.

Dependencies

ID	Description	Impact from dependency not met
01	Summarised security design principles outlined under Jericho Forum® Commandments. https://publications.opengroup.org/w124	Minimal impact as these principles are used as an example set of baseline requirements within security pattern.

Constraints

ID	Description	Impact from constraint
01

Assumptions

ID	Description	Impact if assumption is false
01	Assumes the utilisation of side-car proxy architecture within Service Mesh.	Applicability of control to specific assets within this pattern need to be re-assessed.

Assets at Risk

The following section provides a list of assets affected by the problem statement:

Asset Title	Asset Description
Service Mesh Management	Manages the state of the service mesh including registry and routing configuration across data plane. Ensure configuration deployment to Side Car proxies including service authentication policies, RBAC and TLS certificate management. This forms part of the overall Service Mesh Control Plane.
Service Mesh Metadata Directory	Data store maintains metadata information for service registration and inventory. It also maintains source of truth for policy and configuration assets within the service mesh. This forms part of the overall Service Mesh Control Plane.
Service Mesh Telemetry	Captures performance metrics, monitoring and insights across service mesh. This forms part of the overall Service Mesh Control Plane.
Side Car Proxy	Intercepts container to container traffic and operates as enforcement point for polices from service mesh control plane. Provides traffic inspection, service registration for upstream/backend service, authentication and authorisation, load balancing and captures metrics. This forms part of the Service Mesh Data Plane.
Ingress/Egress API Gateways	API Gateway for inbound (ingress) and outbound (egress) cluster network traffic. API Gateways are established to either proxy or mediate traffic. This forms part of the Service Mesh Data Plane.

The following assets are also referenced within the pattern but not in scope

Container Orchestration – Underlying infrastructure for scheduling, discovery, and health of container deployment hosting microservices.
Container Platform - Underlying infrastructure for hosting microservice containers. Also referenced in this document as ‘nodes’.
Enterprise DNS – Domain name address resolution.
Certificate Authority – Public Key Infrastructure for managing certificates issued to Ingress/Egress gateways and Side Car Proxies.

Threat Model

The following section provides a list of threats within the problem statement. Note that the threat modelling takes into account the typical security controls already provided within typical service mesh architectures for service to service authentication, authorisation and traffic encryption.

Threat Event (ID / Title)	Threat Description and Characteristics	Diagram
TE-13: Inadequate design and planning leading to improper deployment	The increased complexity of connectivity and relationships within a service mesh creates additional exposure to flaws in workflows and processes. Security controls implemented for upstream components could potentially be skipped or bypassed by directly accessing downstream components (known as the “confused deputy” scenario). Due to the highly distributed and interconnected nature of microservices within a service mesh, it’s challenging to assess end-to-end data flows and subsequent access to sensitive information.
TE-14: Inadequate workflows or processes leading to improper deployment	A malicious trusted insider or rogue user may bypass or circumvent controls applied on the sidecar proxy to gain unauthorised access to microservices. Controls may be bypassed by accessing microservices via the underlying Container Platform. Incorrect implementation of controls within sidecar proxies—such as improper certificate validation or misconfigured authorisation policies—may also allow unauthorised connections.
TE-25: Generation of false identities	A malicious entity may impersonate an authorised service or attempt to route traffic to a rogue node. This could allow the attacker to gain access to applications within the service mesh and/or access sensitive data.
TE-26: Abuse of resources through misconfiguration	Improper configuration or implementation of the service mesh control plane could compromise the integrity of the service mesh. This may be achieved through manipulation of the service metadata directory or hijacking of the service registration process. Manipulation of metadata could allow redirection of service requests to incorrect or malicious services, resulting in disabled or disrupted applications.
TE-35: Lack of security insights, monitoring or manipulation of audit log integrity	Lack of visibility and complex connections may allow security breaches or attacks to go unnoticed. This creates greater difficulty in detecting and investigating malicious activity or rogue processes across a large scale of service interactions. It further limits the ability to conduct forensics on traffic and application workflows/interactions if a breach is suspected.

Target State Solution

Summary

The target state solution evaluates the following design requirements to provide the expected target state solution and design principles.

Design Requirements

The target state solution is required to meet the following requirements, as referenced under Dependencies, Assumptions and Constraints.

Requirement	Implication to Design Principles
1. The scope and level of protection are specific and appropriate to the asset at risk.	Maintain segregated policies for service mesh between containers and external clients
2. Security mechanisms must be pervasive, simple, scalable, and easy to manage.	The security pattern maintains clear security principles to be applied for service mesh
3. Assume context at your peril.	Controls defined in this security pattern are used to identify and measure problems, limitations or issues
4. Devices and applications must communicate using open, secure protocols.	Open and encrypted communication channels such as HTTPS are applied for ingress and egress interfaces to service mesh
5. All devices must be capable of maintaining their security policy on an un-trusted network.	Service mesh is protected against both external and internal threats.
6. All people, processes, and technology must have declared and transparent levels of trust for any transaction to take place.	Validate containers before joining service mesh
7. Mutual trust assurance levels must be determinable.	Establish mutual trust between service mesh and underlying container orchestration services
8. Authentication, authorization, and accountability must interoperate/exchange outside of your locus/area of control.	Apply authentication and authorization for both internal, partner and public clients.
9. Access to data are controlled by security attributes of the data itself.	Maintain attributes for services within service mesh metadata
10. Data privacy (and security of any asset of sufficiently high value) requires a segregation of duties/privileges.	Apply restrictions to access of sensitive data within service mesh metadata
11. By default, data must be appropriately secured when stored, in transit, and in use.	Ensure protection of data flows within service mesh

Solution Overview

The below overview provides a summary of the service mesh architecture for both control and data planes. The control plane enforces restrictions to prevent registration of unauthorised nodes into the service mesh.

The data planes are responsible for intra-cluster communication as well as inbound (ingress) and outbound (egress) cluster network traffic.

Whether traffic is entering the mesh (ingressing) or leaving the mesh (egressing), application service traffic is directed first to the service proxy for handling. Side Car Proxies are primarily focussed on east-west traffic, as opposed to Ingress and Egress Gateway that manage north-south traffic.

Additional Notes

Decentralise enforcement of access controls across East-West traffic but centralise enforcement of access controls for North-South traffic.
- Utilise side car proxies for controlling East-West traffic but do not permit Ingress or Egress traffic outside of Service Mesh from side car proxies.
- Any North-South traffic to or from service mesh are centralised via Ingress and Egress gateway services
Centralise identity management services but decentralise the enforcement of authentication and authorisation policies (applied to gateways and side car proxies).
Block self-registration of nodes into Service Mesh and tightly control registration of any new nodes.
Service mesh telemetry and performance metrics are available and or integrated with security event management to allow communication tracing and monitor abnormal network behaviour for microservices.

Enforcement of Segregation

There are multiple options for defined and enforcement of segregation within the data-plane for service meshes. These options can be summarised into the following categorises.

Level	Description	Example
Network Level Segmentation	Segmentation using IP whitelisting and network firewalls to define service mesh boundaries	Layer 4 network firewalls restricting access to specific endpoints exposed on Ingress Gateways.
Session Level Segmentation	Segmentation using TLS mutual authentication to define application or host boundaries	TLS Mutual Authentication between Side-Car Proxy and or Ingress Gateways
Namespace Level Segmentation	Segmentation using namespace policies within service mesh to define application boundaries	Restrictions applied within Side-Car Proxy for specific requests within allocated namespace.
API Level Segmentation	Segmentation using claims or attributes within access tokens to define API method or scope boundaries	Authorisation restrictions applied on API Gateways based on claims specified in JSON Web Tokens.

To prevent duplication of enforcement points, the following assets focus on a layered approach for segmentation.

Network Level segmentation is primarily managed on Ingress / Egress gateways.
Session Level segmentation is primarily managed on Ingress / Egress gateways and Side-Car proxies.
Namespace segmentation is primarily managed on Side-Car proxies.
Service Level segmentation is primarily managed on Side-Car proxies and within Microservices

The below diagram summaries inbound and outbound flows for service mesh (as described as North-South flows).

The below diagram summaries internal flows within service mesh (as described as East-West flows).

Additional Notes

Side Car Proxy are configured to terminate TLS sessions for inspection and access controls.
- Don’t permit TLS passthrough, and preference use of authentication via Json Web Tokens.
Microservices establish mutual trust with Side-Car proxy to reduce potential hijacking or bypassing of controls.
Microservices are expected to pass context of originating systems via appended tokens within HTTP header. Json Web Tokens are used (as opposed to capturing context directly within HTTP parameters) for maintaining context of originating source systems and/or microservices requesting access to specific resources.
- Side Car Proxy typically will only propagate for one-hop and reliant on application logic in microservice to maintain and transfer context.
- Polices assigned within side car proxies validate HTTP headers and ensure tokens are exchanged between microservices.
- Side Car Proxy is not responsible for validating tokens or ensure appropriate use, however it enforces policies that dictate the propagation into specific headers of inbound or outbound traffic for microservices.

Isolating Compromised Services

Malicious activity, attempts to circumvent segmentation controls or anomalies in traffic behaviour detected on Ingres/Egress gateways or Side-Car proxies trigger a ‘dead letter’ routing of that traffic.

Service mesh architecture allows for routing or mirroring of traffic from the control plane.

Where a security incident or breach is suspected, trigger the routing of traffic to a separate security namespace that operates either as a decoy (honeypot) or forensic capture and analysis. Once a security incident is identified, traffic can continue to be routed to forensic tooling or trigger a ‘dead letter’ route to isolate the incident within the service mesh.

Design Principles

The following design principles are applied from this pattern, based on the requirements.

Maintain defence-in-depth approach to applying segmentation controls within Service Mesh.
Aggregate traffic entering the service mesh within API Gateway to apply security controls and traffic inspection.
Validate authenticity of any nodes joining the Service Mesh.
Disable self-registration of nodes into Service Mesh.
Isolate services suspected of compromise and or de-register from service mesh.

Actors

List the actors involved in this pattern.

Actor Type	Actor Description
External clients	Clients outside of the service mesh consuming services presented via Ingress Gateway.
External providers	Service providers outside of the service mesh access via Egress Gateway.

Locations

This pattern is applied to any locations for assets being utilised

Location	Location Description
Service Mesh	In the context of this pattern, this represents any microservices hosted within the service mesh. These are classified as residing in the Internal network domain.
External Services	In the context of this pattern, this represents any clients or service providers outside the service mesh. These services may reside in either external (Public or Partner) or Internal network domains.

Sequencing

The pattern is designed within the following sequences

Stage gate	Description
Service Registration	Registration or de-registration of microservices within service mesh.
Service Run Time	Interactions of interactions during deployment and execution within service mesh.

Mapping Threats to Controls

The following provides a mapping of security threats to affected assets and the security control objectives required to mitigate them (further detailed in subsequent security pattern logical designs).

Threat Event	Affects Assets	Security Controls Objectives
TE-13: Inadequate design and planning leading to improper deployment	Side-Car Proxy Ingress & Egress API Gateways	AC-03: Access Enforcement AC-04: Information Flow Enforcement AC-14: Permitted Actions Without Identification or Authentication AC-16: Security and Privacy Attributes AC-24: Access Control Decisions IA-09: Service Identification and Authentication SC-08: Transmission Confidentiality and Integrity SC-16: Transmission of Security and Privacy Attributes SI-10: Information Input Validation SI-15: Information Output Filtering
TE-14: Inadequate workflows or processes leading to improper deployment	Side-Car Proxy	CM-02: Baseline Configuration SC-02: Separation of System and User Functionality SC-08: Transmission Confidentiality and Integrity SC-11: Trusted Path SC-13: Cryptographic Protection SC-23: Session Authenticity
TE-25: Generation of false identities	Service Mesh Management	AC-03: Access Enforcement AC-06: Least Privilege AC-24: Access Control Decisions IA-09: Service Identification and Authentication SC-13: Cryptographic Protection SC-17: Public Key Infrastructure Certificates SC-22: Architecture and Provisioning for Name/address Resolution Service SR-11: Component Authenticity
TE-26: Abuse of resources through misconfiguration	Service Mesh Management Service Mesh Metadata Directory	CM-02: Baseline Configuration CM-06: Configuration Settings RA-05: Vulnerability Monitoring and Scanning
TE-35: Lack of security insights, monitoring or manipulation of audit log integrity	Service Mesh Telemetry	AU-02: Event Logging AU-03: Content of Audit Records AU-06: Audit Record Review, Analysis, and Reporting AU-12: Audit Record Generation AU-14: Session Audit IR-05: Incident Monitoring

Security Pattern

Pattern View:

Control list: Service Mesh Management

Control Objective	Control Description
AC-03: Access Enforcement	Restrict administrative access to service mesh interfaces and API’s, through IAM policies. Block any self-registration of node or services into service mesh.
AC-06: Least Privilege	All access control lists and polices are enabled with default deny.
AC-14: Permitted Actions Without Identification or Authentication	Disable any unauthenticated or anonymous access to service mesh control plane.
AC-24: Access Control Decisions	Define role-based access policies across side-car proxies at both system-level and namespace-level.
CM-02: Baseline Configuration	Ensure baseline security configuration for service mesh services are hardened to industry or vendor best practise, including system permissions for configuration files and services. Ensure regular patching cycles are applied.
CP-09: System Backup	Regular scheduled backups are applied across control plane infrastructure.
CP-10: System Recovery and Reconstitution	Ensure reconstitution for service mesh control plane and restoration of service clusters back to operational states
IA-09: Service Identification and Authentication	Validate identity of all services attempting registration within Service Mesh
IR-05: Incident Monitoring	Establish service routing or traffic mirroring for malicious traffic or packets to security services such as decoy honeypots or forensic tooling
RA-05: Vulnerability Monitoring and Scanning	Scan and remove unnecessary, unused or insecure communication flows across service mesh.
SC-13: Cryptographic Protection	Maintain issued certificates with shorter lifespan (90 days) to promote good security hygiene
SC-17: Public Key Infrastructure Certificates	Utilise chained CA from trusted Root CA. Remove any use of self-signed certificates from within service mesh.
SC-22: Architecture and Provisioning for Name/address Resolution Service	Ensure service-to-service communications exist within dedicated domain namespace within DNS, restricted from external systems or endpoints that may look to perform DNS hijack attacks and intercept or route communications.
SC-32: System Partitioning	Ensure separate system partitioning of control plane services and directory from the service mesh data plane. Isolate within separate security groups or segments
SR-11: Component Authenticity	Ensure all sub-components within service mesh and dependent components within orchestration services are authenticated.

Control list: Service Mesh Metadata Directory

Control Objective	Control Description
AC-14: Permitted Actions Without Identification or Authentication	Disable any unauthenticated or anonymous access to metadata directory
CM-02: Baseline Configuration	Ensure baseline security configuration for metadata directory are hardened to industry or vendor best practise including system permissions for configuration file and services.
CP-09: System Backup	Regular scheduled backups are applied metadata directory. Ensure mechanisms employed to protect the integrity of system backups
CP-10: System Recovery and Reconstitution	Ensure reconstitution for service mesh control plane and restoration of service clusters back to operational states
RA-05: Vulnerability Monitoring and Scanning	Regularly monitor and scan for vulnerabilities in the metadata services and related sub-components
SC-13: Cryptographic Protection	Ensure metadata directory is encrypted at rest.
SC-32: System Partitioning	Ensure separate system partitioning of metadata services and directory from the service mesh data plane. Isolate within separate security groups or segments.

Control list: Service Mesh Telemetry

Control Objective	Control Description
AU-02: Event Logging	Capture audit data from service mesh components for both control plane and data plane activities.
AU-03: Content of Audit Records	Data from service mesh telemetry and performance metrics are forwarded or integrated with security event management.
AU-06: Audit Record Review, Analysis, and Reporting	Event audit logs from service mesh are correlated against dependent services such as Enterprise DNS or Certificate Management services within security event management.
AU-12: Audit Record Generation	Capture audit logs from Ingress & Egress Gateways, Side-Car Proxies and Service Mesh Control Plane.
AU-14: Session Audit	Capture and monitor traffic sessions across data plane, for both service-to-service traffic (East-West), Inbound flows and Outbound (North-South) flows.
IR-05: Incident Monitoring	Enable behavioural monitoring and metrics across network and communication traffic within service mesh.
RA-10: Threat Hunting	Forward telemetry data to security event management to identify abnormalities within traffic and communication traffic.

Control list: Side-Car Proxy

Control Objective	Control Description
AC-03: Access Enforcement	Operate as enforcement point for all service-to-service (East-West) inbound and outbound traffic flows within Service Mesh.
AC-04: Information Flow Enforcement	Restrict ingress or egress traffic from service mesh (North-South) to traverse via Ingress and Egress Gateways
AC-14: Permitted Actions Without Identification or Authentication	Disable any unauthenticated or anonymous access for service-to-service flows within Service Mesh.
AC-16: Security and Privacy Attributes	pass context of originating systems via appended tokens within HTTP header. Json Web Tokens are used (as opposed to capturing context directly within HTTP parameters) for maintaining context of originating source systems and/or microservices requesting access to specific resources.
AC-24: Access Control Decisions	Enforce access control based on TLS Mutual Authentication and access policies for service namespaces within service mesh.
CM-02: Baseline Configuration	Ensure baseline security configuration for service mesh services are hardened to industry or vendor best practise, including system permissions for configuration files and services. Ensure regular patching cycles are applied.
IA-09: Service Identification and Authentication	System identities are encoded in TLS certificates, but service names are retrieved via discovery service.
SC-02: Separation of System and User Functionality	Restrict local administrative rights to directly access microservices via underlying Container Platform.
SC-08: Transmission Confidentiality and Integrity	Enforce TLS Mutual Authentication across all service mesh traffic.
SC-11: Trusted Path	Validate integrity for underlying host networking stacks to prevent direct manipulation or disruption. Ensure system access restrictions and applied to localhost or loopback network interfaces used for side-car proxies.
SC-13: Cryptographic Protection	Validate certificate chain installed, certificate expiration and certificate revocation status, across both TLS server and client certificates.
SC-16: Transmission of Security and Privacy Attributes	Validate HTTP headers to ensure appropriate tokens to pass context of the originating source user/application within requests.
SC-23: Session Authenticity	Microservices establish mutual trust with Side-Car proxy to reduce potential hijacking or bypassing of controls.
SI-10: Information Input Validation	Validate inbound requests encapsulate appropriate HTTP Headers to pass context of the originating source user/application within requests to microservice
SI-15: Information Output Filtering	Filter any outbound requests encapsulate appropriate HTTP Headers to pass context of the originating source user/application within requests from microservice.

Control list: Ingress & Egress API Gateway

Control Objective	Control Description
AC-03: Access Enforcement	Access to APIs is authenticated and authorised using industry standard (OpenID Connect / OAuth2).
AC-04: Information Flow Enforcement	Restrict inbound traffic into service mesh to traverse Ingress API Gateway. Restrict outbound traffic from service mesh to traverse Egress API Gateway.
AC-14: Permitted Actions Without Identification or Authentication	Anonymous public APIs do not require authentication or authorisation. Exposure of these APIs are separate from authenticated endpoints.
AC-24: Access Control Decisions	Enforce network level access controls for endpoints exposed on API Gateways. Enforce API level access controls within API Gateways.
IA-09: Service Identification and Authentication	Enforce mutual authentication between API Gateways and Side Car Proxies
SC-08: Transmission Confidentiality and Integrity	Enforce usage of secure protocols for data transmission.
SC-16: Transmission of Security and Privacy Attributes	Ensure Inbound requests forward appropriate HTTP headers to pass context of the originating source user/application within requests.
SI-10: Information Input Validation	Enforce web application firewall rules to inspect API messages (e.g. CWE/SANS Top 25 or OWASP Top 10). Ensure API messages containing binary data or used for file transfer are scanned against malicious payloads or malware.
SI-15: Information Output Filtering	Outbound traffic to external public location are forwarded to web content filtering solution.

Appendix A – References

Please see below links to external sites for further reading

Appendix B - Disclosure Notice

This document is published as independent research only and is without warrenty. It does not represent any publication from National Institute of Standards and Technology (NIST) or other associated US government entities.