Security Pattern – Service Mesh

Overview

The following security pattern describes the controls required to protect resources within service mesh fabric.

A service mesh is an infrastructure layer that manages all service-to-service communication within a distributed microservice ecosystem. It accomplishes this typically via the use of “sidecar” proxies deployed alongside each service through which transparently route traffic. Organisations look to service meshes as a means to deal with scaling issues of microservice architectures and to increase availability/resiliency.

Fundamentally, a service mesh is composed of two logical functions, best broken down as Control Plane and Data Plane.

Technology examples for Control Plane include Hashicorp Consul, AWS App Mesh, Istio, Azure Service Fabric Mesh, GCP Traffic Director or Linkerd. Technology examples for Data Plane include side car proxies such as Envoy, Kong, Nginx or Linkerd proxy.

There a few different architecture choices for service meshes, pending the type of vendor used and deployment. This pattern looks at a combined API Gateway and Service Mesh architecture and options for securing those services.

Regardless of architecture, service meshes commonly provides security enhancements as part of native capability to address security threats associated within Microservices, Container Platforms and Container Orchestration. These enhancements typically include service identity, mutual authentication, RBAC and traffic encryption.

Despite these enhancements, there still remains the following challenges for security.

Typical Challenges

Scope

The scope of this document is for addressing the security threats that relate to

ID Description Example
01 Service mesh fabric for Control and Data Planes across container orchestration services. Deployment of Istio across Kubernetes and Docker.
02 Ingress and Egress traffic. Although service mesh is primarily managing East-West traffic, this pattern extends into North-South to address overall typical challenges with utilisation of service mesh. Integration of Ingress and Egress Gateways within Service Mesh.

Out of Scope 

ID Description of exclusion Reason for Exclusion
01 Service mesh fabric across non-containerised workloads. Although service meshes can be built across virtual or physical servers, this is not considered typical use case for service mesh.
02 Security threats to container orchestration Service Mesh deployments are closely related to Container Orchestration, and arguable may become a single pattern in the future. For the time being, these patterns are maintained separately to address different concerns for orchestration versus service-to-service communication flows.
03 DNS related security threats such as hijacking or spoofing. DNS is a core dependency for Service Meshes to operate within enterprise environment, but are direct threat to Service Mesh Fabric.

Dependencies

ID Description Impact from dependency not met
01 Summarised security design principles outlined under Jericho Forum® Commandments. https://publications.opengroup.org/w124 Minimal impact as these principles are used as an example set of baseline requirements within security pattern.

Constraints

ID Description Impact from constraint
01

Assumptions

ID Description Impact if assumption is false
01 Assumes the utilisation of side-car proxy architecture within Service Mesh. Applicability of control to specific assets within this pattern need to be re-assessed.

Assets at Risk

The following section provides a list of assets affected by the problem statement:

Asset Title Asset Description
Service Mesh Management Manages the state of the service mesh including registry and routing configuration across data plane. Ensure configuration deployment to Side Car proxies including service authentication policies, RBAC and TLS certificate management. This forms part of the overall Service Mesh Control Plane.
Service Mesh Metadata Directory Data store maintains metadata information for service registration and inventory. It also maintains source of truth for policy and configuration assets within the service mesh. This forms part of the overall Service Mesh Control Plane.
Service Mesh Telemetry Captures performance metrics, monitoring and insights across service mesh. This forms part of the overall Service Mesh Control Plane.
Side Car Proxy Intercepts container to container traffic and operates as enforcement point for polices from service mesh control plane. Provides traffic inspection, service registration for upstream/backend service, authentication and authorisation, load balancing and captures metrics. This forms part of the Service Mesh Data Plane.
Ingress/Egress API Gateways API Gateway for inbound (ingress) and outbound (egress) cluster network traffic. API Gateways are established to either proxy or mediate traffic. This forms part of the Service Mesh Data Plane.

The following assets are also referenced within the pattern but not in scope

Threat Model

The following section provides a list of threats within the problem statement. Note that the threat modelling takes into account the typical security controls already provided within typical service mesh architectures for service to service authentication, authorisation and traffic encryption.

Threat Event (ID / Title) Threat Description and Characteristics Diagram
TE-13: Inadequate design and planning leading to improper deployment Increased complexity of connectivity and relationships within service mesh creates additional exposure to flaws in workflows and processes. Security controls implemented for upstream components could potentially be skipped or bypassed by directly accessing downstream components (confused deputy scenario). Due to increased distributed and interconnected nature of microservices within a service mesh, it is difficult to assess end-to-end data flows and subsequent access being provided to sensitive information.
TE-14: Inadequate workflows or processes leading to improper deployment A malicious trusted insider or rouge user may bypass or circumvent controls applied on side-car proxy to gain unauthorised access to microservices. Controls maybe bypassed through accessing microservices via underlying Container Platform. Incorrect implementation of controls within side car proxies such as incorrect certificate validation or misconfigured authorisation policies may also allow unauthorized connections.
TE-25: Generation of false identities A malicious entity may impersonate an authorized service or attempt to route traffic to rouge node. This may allow the attacker to gain access to applications within the service mesh and or access sensitive data.
TE-26: Abuse of resources through misconfiguration Improper configuration or implementation of service mesh control plane could lead to compromise of service mesh integrity. This may be achieved through manipulation of service metadata directory or hijacking service registration process. Manipulation of metadata could allow redirection of service requests to wrong or malicious services, resulting in disabled or disrupted applications.
TE-35: Lack of security insights, monitoring or manipulation of audit log integrity Lack of visibility and complexity of connections may allow security breaches or attacks to be overlooked. This creates greater complexity to detect and investigate malicious activity or rouge processes across large scale of service interactions. This further restricts ability to conduct forensics for traffic and application workflows/interactions if a breach is suspected.

Target State Solution

Summary 

The target state solution evaluates the following design requirements to provide the expected target state solution and design principles.

Design Requirements  

The target state solution is required to meet the following requirements, as referenced under Dependencies, Assumptions and Constraints.

Requirement Implication to Design Principles
1. The scope and level of protection are specific and appropriate to the asset at risk. Maintain segregated policies for service mesh between containers and external clients
2. Security mechanisms must be pervasive, simple, scalable, and easy to manage. The security pattern maintains clear security principles to be applied for service mesh
3. Assume context at your peril. Controls defined in this security pattern are used to identify and measure problems, limitations or issues
4. Devices and applications must communicate using open, secure protocols. Open and encrypted communication channels such as HTTPS are applied for ingress and egress interfaces to service mesh
5. All devices must be capable of maintaining their security policy on an un-trusted network. Service mesh is protected against both external and internal threats.
6. All people, processes, and technology must have declared and transparent levels of trust for any transaction to take place. Validate containers before joining service mesh
7. Mutual trust assurance levels must be determinable. Establish mutual trust between service mesh and underlying container orchestration services
8. Authentication, authorization, and accountability must interoperate/exchange outside of your locus/area of control. Apply authentication and authorization for both internal, partner and public clients.
9. Access to data are controlled by security attributes of the data itself. Maintain attributes for services within service mesh metadata
10. Data privacy (and security of any asset of sufficiently high value) requires a segregation of duties/privileges. Apply restrictions to access of sensitive data within service mesh metadata
11. By default, data must be appropriately secured when stored, in transit, and in use. Ensure protection of data flows within service mesh

Solution Overview

The below overview provides a summary of the service mesh architecture for both control and data planes. The control plane enforces restrictions to prevent registration of unauthorised nodes into the service mesh.

The data planes are responsible for intra-cluster communication as well as inbound (ingress) and outbound (egress) cluster network traffic.

Whether traffic is entering the mesh (ingressing) or leaving the mesh (egressing), application service traffic is directed first to the service proxy for handling. Side Car Proxies are primarily focussed on east-west traffic, as opposed to Ingress and Egress Gateway that manage north-south traffic.

Additional Notes

Enforcement of Segregation

There are multiple options for defined and enforcement of segregation within the data-plane for service meshes. These options can be summarised into the following categorises.

Level Description Example
Network Level Segmentation Segmentation using IP whitelisting and network firewalls to define service mesh boundaries Layer 4 network firewalls restricting access to specific endpoints exposed on Ingress Gateways.
Session Level Segmentation Segmentation using TLS mutual authentication to define application or host boundaries TLS Mutual Authentication between Side-Car Proxy and or Ingress Gateways
Namespace Level Segmentation Segmentation using namespace policies within service mesh to define application boundaries Restrictions applied within Side-Car Proxy for specific requests within allocated namespace.
API Level Segmentation Segmentation using claims or attributes within access tokens to define API method or scope boundaries Authorisation restrictions applied on API Gateways based on claims specified in JSON Web Tokens.

To prevent duplication of enforcement points, the following assets focus on a layered approach for segmentation.

The below diagram summaries inbound and outbound flows for service mesh (as described as North-South flows).

The below diagram summaries internal flows within service mesh (as described as East-West flows).

Additional Notes

Isolating Compromised Services

Malicious activity, attempts to circumvent segmentation controls or anomalies in traffic behaviour detected on Ingres/Egress gateways or Side-Car proxies trigger a ‘dead letter’ routing of that traffic.

Service mesh architecture allows for routing or mirroring of traffic from the control plane.

Where a security incident or breach is suspected, trigger the routing of traffic to a separate security namespace that operates either as a decoy (honeypot) or forensic capture and analysis. Once a security incident is identified, traffic can continue to be routed to forensic tooling or trigger a ‘dead letter’ route to isolate the incident within the service mesh.

Design Principles  

The following design principles are applied from this pattern, based on the requirements.

  1. Maintain defence-in-depth approach to applying segmentation controls within Service Mesh.
  2. Aggregate traffic entering the service mesh within API Gateway to apply security controls and traffic inspection.
  3. Validate authenticity of any nodes joining the Service Mesh.
  4. Disable self-registration of nodes into Service Mesh.
  5. Isolate services suspected of compromise and or de-register from service mesh.

Actors

List the actors involved in this pattern.

Actor Type Actor Description
External clients Clients outside of the service mesh consuming services presented via Ingress Gateway.
External providers Service providers outside of the service mesh access via Egress Gateway.

Locations

This pattern is applied to any locations for assets being utilised

Location Location Description
Service Mesh In the context of this pattern, this represents any microservices hosted within the service mesh. These are classified as residing in the Internal network domain.
External Services In the context of this pattern, this represents any clients or service providers outside the service mesh. These services may reside in either external (Public or Partner) or Internal network domains.

Sequencing

The pattern is designed within the following sequences

Stage gate Description
Service Registration Registration or de-registration of microservices within service mesh.
Service Run Time Interactions of interactions during deployment and execution within service mesh.

Mapping Threats to Controls

The following provides a mapping of security threats to affected assets and the security control objectives required to mitigate them (further detailed in subsequent security pattern logical designs).  

Threat Event Affects Assets Security Controls Objectives
TE-13: Inadequate design and planning leading to improper deployment Side-Car Proxy
Ingress & Egress API Gateways
AC-03: Access Enforcement
AC-04: Information Flow Enforcement
AC-14: Permitted Actions Without Identification or Authentication
AC-16: Security and Privacy Attributes
AC-24: Access Control Decisions
IA-09: Service Identification and Authentication
SC-08: Transmission Confidentiality and Integrity
SC-16: Transmission of Security and Privacy Attributes
SI-10: Information Input Validation
SI-15: Information Output Filtering
TE-14: Inadequate workflows or processes leading to improper deployment Side-Car Proxy CM-02: Baseline Configuration
SC-02: Separation of System and User Functionality
SC-08: Transmission Confidentiality and Integrity
SC-11: Trusted Path
SC-13: Cryptographic Protection
SC-23: Session Authenticity
TE-25: Generation of false identities Service Mesh Management AC-03: Access Enforcement
AC-06: Least Privilege
AC-24: Access Control Decisions
IA-09: Service Identification and Authentication
SC-13: Cryptographic Protection
SC-17: Public Key Infrastructure Certificates
SC-22: Architecture and Provisioning for Name/address Resolution Service
SR-11: Component Authenticity
TE-26: Abuse of resources through misconfiguration Service Mesh Management
Service Mesh Metadata Directory
CM-02: Baseline Configuration
CM-06: Configuration Settings
RA-05: Vulnerability Monitoring and Scanning
TE-35: Lack of security insights, monitoring or manipulation of audit log integrity Service Mesh Telemetry AU-02: Event Logging
AU-03: Content of Audit Records
AU-06: Audit Record Review, Analysis, and Reporting
AU-12: Audit Record Generation
AU-14: Session Audit
IR-05: Incident Monitoring

Security Pattern

Pattern View:

 Control list:  Service Mesh Management

Control Objective Control Description
AC-03: Access Enforcement Restrict administrative access to service mesh interfaces and API’s, through IAM policies. Block any self-registration of node or services into service mesh.
AC-06: Least Privilege All access control lists and polices are enabled with default deny.
AC-14: Permitted Actions Without Identification or Authentication Disable any unauthenticated or anonymous access to service mesh control plane.
AC-24: Access Control Decisions Define role-based access policies across side-car proxies at both system-level and namespace-level.
CM-02: Baseline Configuration Ensure baseline security configuration for service mesh services are hardened to industry or vendor best practise, including system permissions for configuration files and services. Ensure regular patching cycles are applied.
CP-09: System Backup Regular scheduled backups are applied across control plane infrastructure.
CP-10: System Recovery and Reconstitution Ensure reconstitution for service mesh control plane and restoration of service clusters back to operational states
IA-09: Service Identification and Authentication Validate identity of all services attempting registration within Service Mesh
IR-05: Incident Monitoring Establish service routing or traffic mirroring for malicious traffic or packets to security services such as decoy honeypots or forensic tooling
RA-05: Vulnerability Monitoring and Scanning Scan and remove unnecessary, unused or insecure communication flows across service mesh.
SC-13: Cryptographic Protection Maintain issued certificates with shorter lifespan (90 days) to promote good security hygiene
SC-17: Public Key Infrastructure Certificates Utilise chained CA from trusted Root CA. Remove any use of self-signed certificates from within service mesh.
SC-22: Architecture and Provisioning for Name/address Resolution Service Ensure service-to-service communications exist within dedicated domain namespace within DNS, restricted from external systems or endpoints that may look to perform DNS hijack attacks and intercept or route communications.
SC-32: System Partitioning Ensure separate system partitioning of control plane services and directory from the service mesh data plane. Isolate within separate security groups or segments
SR-11: Component Authenticity Ensure all sub-components within service mesh and dependent components within orchestration services are authenticated.

Control list:  Service Mesh Metadata Directory

Control Objective Control Description
AC-14: Permitted Actions Without Identification or Authentication Disable any unauthenticated or anonymous access to metadata directory
CM-02: Baseline Configuration Ensure baseline security configuration for metadata directory are hardened to industry or vendor best practise including system permissions for configuration file and services.
CP-09: System Backup Regular scheduled backups are applied metadata directory. Ensure mechanisms employed to protect the integrity of system backups
CP-10: System Recovery and Reconstitution Ensure reconstitution for service mesh control plane and restoration of service clusters back to operational states
RA-05: Vulnerability Monitoring and Scanning Regularly monitor and scan for vulnerabilities in the metadata services and related sub-components
SC-13: Cryptographic Protection Ensure metadata directory is encrypted at rest.
SC-32: System Partitioning Ensure separate system partitioning of metadata services and directory from the service mesh data plane. Isolate within separate security groups or segments.

Control list:  Service Mesh Telemetry

Control Objective Control Description
AU-02: Event Logging Capture audit data from service mesh components for both control plane and data plane activities.
AU-03: Content of Audit Records Data from service mesh telemetry and performance metrics are forwarded or integrated with security event management.
AU-06: Audit Record Review, Analysis, and Reporting Event audit logs from service mesh are correlated against dependent services such as Enterprise DNS or Certificate Management services within security event management.
AU-12: Audit Record Generation Capture audit logs from Ingress & Egress Gateways, Side-Car Proxies and Service Mesh Control Plane.
AU-14: Session Audit Capture and monitor traffic sessions across data plane, for both service-to-service traffic (East-West), Inbound flows and Outbound (North-South) flows.
IR-05: Incident Monitoring Enable behavioural monitoring and metrics across network and communication traffic within service mesh.
RA-10: Threat Hunting Forward telemetry data to security event management to identify abnormalities within traffic and communication traffic.

Control list:  Side-Car Proxy

Control Objective Control Description
AC-03: Access Enforcement Operate as enforcement point for all service-to-service (East-West) inbound and outbound traffic flows within Service Mesh.
AC-04: Information Flow Enforcement Restrict ingress or egress traffic from service mesh (North-South) to traverse via Ingress and Egress Gateways
AC-14: Permitted Actions Without Identification or Authentication Disable any unauthenticated or anonymous access for service-to-service flows within Service Mesh.
AC-16: Security and Privacy Attributes pass context of originating systems via appended tokens within HTTP header. Json Web Tokens are used (as opposed to capturing context directly within HTTP parameters) for maintaining context of originating source systems and/or microservices requesting access to specific resources.
AC-24: Access Control Decisions Enforce access control based on TLS Mutual Authentication and access policies for service namespaces within service mesh.
CM-02: Baseline Configuration Ensure baseline security configuration for service mesh services are hardened to industry or vendor best practise, including system permissions for configuration files and services. Ensure regular patching cycles are applied.
IA-09: Service Identification and Authentication System identities are encoded in TLS certificates, but service names are retrieved via discovery service.
SC-02: Separation of System and User Functionality Restrict local administrative rights to directly access microservices via underlying Container Platform.
SC-08: Transmission Confidentiality and Integrity Enforce TLS Mutual Authentication across all service mesh traffic.
SC-11: Trusted Path Validate integrity for underlying host networking stacks to prevent direct manipulation or disruption. Ensure system access restrictions and applied to localhost or loopback network interfaces used for side-car proxies.
SC-13: Cryptographic Protection Validate certificate chain installed, certificate expiration and certificate revocation status, across both TLS server and client certificates.
SC-16: Transmission of Security and Privacy Attributes Validate HTTP headers to ensure appropriate tokens to pass context of the originating source user/application within requests.
SC-23: Session Authenticity Microservices establish mutual trust with Side-Car proxy to reduce potential hijacking or bypassing of controls.
SI-10: Information Input Validation Validate inbound requests encapsulate appropriate HTTP Headers to pass context of the originating source user/application within requests to microservice
SI-15: Information Output Filtering Filter any outbound requests encapsulate appropriate HTTP Headers to pass context of the originating source user/application within requests from microservice.

 

Control list:  Ingress & Egress API Gateway

Control Objective Control Description
AC-03: Access Enforcement Access to APIs is authenticated and authorised using industry standard (OpenID Connect / OAuth2).
AC-04: Information Flow Enforcement Restrict inbound traffic into service mesh to traverse Ingress API Gateway. Restrict outbound traffic from service mesh to traverse Egress API Gateway.
AC-14: Permitted Actions Without Identification or Authentication Anonymous public APIs do not require authentication or authorisation. Exposure of these APIs are separate from authenticated endpoints.
AC-24: Access Control Decisions Enforce network level access controls for endpoints exposed on API Gateways. Enforce API level access controls within API Gateways.
IA-09: Service Identification and Authentication Enforce mutual authentication between API Gateways and Side Car Proxies
SC-08: Transmission Confidentiality and Integrity Enforce usage of secure protocols for data transmission.
SC-16: Transmission of Security and Privacy Attributes Ensure Inbound requests forward appropriate HTTP headers to pass context of the originating source user/application within requests.
SI-10: Information Input Validation Enforce web application firewall rules to inspect API messages (e.g. CWE/SANS Top 25 or OWASP Top 10). Ensure API messages containing binary data or used for file transfer are scanned against malicious payloads or malware.
SI-15: Information Output Filtering Outbound traffic to external public location are forwarded to web content filtering solution.

Appendix A – References

Please see below links to external sites for further reading

Appendix B - Disclosure Notice

This document is published as independent research only and is without warrenty. It does not represent any publication from National Institute of Standards and Technology (NIST) or other associated US government entities.