OPSAWG Working Group L. Han Internet-Draft M. Wang Intended status: Informational China Mobile Expires: 28 January 2024 X. Wang T. Zhou Huawei 27 July 2023 Inband Flow Learning Framework draft-hwy-opsawg-ifl-framework-04 Abstract On-path telemetry techniques can provide high-precision inband flow insight and real-time network performance monitoring by embedding instructions or metadata into user packets. They are benificial but still has problems of deployability and flexibility in large scale deployment scenario. This document proposes a reference framework called Inband Flow Learning (IFL), which outlines the architecture and functional modules for automatic deployment and adjustment of flow-oriented monitoring using on-path telemetry techniques, trying to provide a solution for reference to solve the problems. This document also provides different deployment approaches and considerations in practical network deployment. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." Han, et al. Expires 28 January 2024 [Page 1] Internet-Draft Inband Flow Learning Framework July 2023 This Internet-Draft will expire on 28 January 2024. Copyright Notice Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Terminology and Conventions . . . . . . . . . . . . . . . . . 3 2.1. Requirement Language . . . . . . . . . . . . . . . . . . 3 2.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 3. Framework of Inband Flow Learning . . . . . . . . . . . . . . 3 3.1. Service Discovery . . . . . . . . . . . . . . . . . . . . 4 3.2. Inband Flow Information Telemetry Deployment . . . . . . 5 3.2.1. Telemetry Mode . . . . . . . . . . . . . . . . . . . 5 3.2.2. Telemetry Policy . . . . . . . . . . . . . . . . . . 6 3.2.3. Telemetry Instance . . . . . . . . . . . . . . . . . 6 4. Inband Flow Information Telemetry Adjustment . . . . . . . . 7 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 6. Security Considerations . . . . . . . . . . . . . . . . . . . 8 7. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 7.1. Normative References . . . . . . . . . . . . . . . . . . 8 7.2. Informative References . . . . . . . . . . . . . . . . . 8 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 1. Introduction On-path telemetry techniques described in [I-D.song-opsawg-ifit-framework] such as IOAM [RFC9197] and Alternate-Marking [RFC9341] can provide high-precision inband flow insight and real-time network performance monitoring (e.g., jitter, latency, packet loss) by embedding instructions or metadata into user packets. They are benificial for network operation to monitor live traffic running in the network, based on inband flow information telemetry on the entire forwarding path. Han, et al. Expires 28 January 2024 [Page 2] Internet-Draft Inband Flow Learning Framework July 2023 However, when deploying flow-oriented monitoring using on-path telemetry techniques on live traffic, problems like changes of flow characteristics or paths may occur whitch make the traditional static configuration mode no longer applicable. [I-D.hwyh-ippm-ps-inband-flow-learning] states problems of flow identification applying on-path telemetry techniques in real network scenarios, and describes the requirements for inband flow learning mechanism whitch intends to address the problems of deployability and flexibility. This document proposes a reference framework called Inband Flow Learning (IFL), which outlines the architecture and functional modules for automatic deployment and adjustment of flow- oriented monitoring using on-path telemetry techniques. This document also provides different deployment approaches and considerations in practical network deployment. Note that this document focuses on the generation of inband flow telemetry object, and inband flow performance measurement methods are out of the scope of this document. 2. Terminology and Conventions 2.1. Requirement Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. 2.2. Terminology IFL: Inband Flow Learning IFITI: Inband Flow Information Telemetry Instance 3. Framework of Inband Flow Learning The domain of inband flow information telemetry consists of ingress nodes, transit nodes and egress nodes. The ingress nodes are responsible for enabling monitoring functions and the egress nodes are responsible for terminating them. All the nodes in the domain may participate in the inband flow learning by excecuting corresponding functions in the framework of Inband Flow Learning (IFL). The framework of IFL includes three components of Service Discovery, Inband Flow Information Telemetry Deployment and Inband Flow Information Telemetry Adjustment shown in Figure 1. Among these different components, inband flow learning can be embodied in Han, et al. Expires 28 January 2024 [Page 3] Internet-Draft Inband Flow Learning Framework July 2023 automatic service discovery, automatic flow telemetry deployment, and automatic flow telemetry adjustment. +---------+-------------------+------------------+------------------+ |Component| Service | Inband Flow | Inband Flow | | | Discovery | Information | Information | | | | Telemetry | Telemetry | | | | Deployment | Adjustment | +---------+-------------------+------------------+------------------+ |Functions| Sampling polic | Telemetry policy | | | |-------------------+------------------+ Aging | | |Flow characteristic|Telemetry instance| | | | acquisition | | | +---------+-------------------+------------------+------------------+ Figure 1 Framework of Inband Flow Learning 3.1. Service Discovery Before starting the telemetry on service flows, the service should be discovered in order to further determine which flow should be monitored. The target of service discovery function is to obtain the flow characteristics, whitch are represented in terms of IP source address, IP destination address, TCP/UDP port number, VRF, incoming/ outgoing interface etc. Automatic service discovery is implemented based on the sampling policy delivered by the control plane and flow characteristic acquisition on the forwarding plane, whitch is usually performed on the ingress node. Sampling policy is a set of rules that instruct the forwarding plane to identify service flow characteristics based on a specific scope. Flow characteristic acquisition is a process in which the forwarding plane identifies, extracts, and reports service flow characteristic on the live traffic based on the sampling policy. For example, if the service traffic to be monitored has a particular port number, to automatically discover all flows of the service identified by 5-tuple, a sampling policy can be configured to match the live traffic with the particular port number and generate flow information at the 5-tuple granularity. When live traffic passes through the ingress node, the forwarding plane can filters traffic based on the specified sampling policy, identifies all flows with the particular port number, and reports the flows with 5-tuple information. The automatically discovered service flow information can be stored distributedly on the ingress node, or reported to the newwork controller for centralized management. Han, et al. Expires 28 January 2024 [Page 4] Internet-Draft Inband Flow Learning Framework July 2023 3.2. Inband Flow Information Telemetry Deployment After acquiring the flow characteristics by service discovery, telemetry based on the inband flow information can be deployed automatically. Automatic flow telemetry deployment is implemented by creating telemetry instances based on telemetry policy, and executed on different types of network nodes in the domain according to the telemetry mode. 3.2.1. Telemetry Mode There are two modes to deploy inband flow information telemetry: End- to-End (E2E) and Hop-by-Hop (HbH). For majority of the services, E2E telemetry of service flows can meet the requirements of network operators by providing the entire performance insight of the service. In E2E mode shown in Figure 2, ingress node discovers the characteristics of service flows and proceed on-path telemetry on the flows to be monitored. Egress node need to deploy the same monitoring flows and complete the telemetry. If the telemetry data is not carried in the data packet but is reported at each node, flow identifier is required to associate the data on data consumer. Documents like [RFC9326] [RFC9343] [I-D.ietf-mpls-inband-pm-encapsulation] provide the encapsulation format of flow identifier. +-------------+ |Data Consumer| compute E2E flow info +-------------+ | | ___flow info__| |____flow info____ | telemetry telemetry | | | +---------+ +---------+ +---------+ +---------+ | Ingress |---| Transit | ...| Transit |---| Egress | | Node | | Node | | Node | | Node | +---------+ +---------+ +---------+ +---------+ Figure 2 End-to-End Telemetry Mode The distinction of HbH mode to E2E mode is that transit node also participates the inband flow information learning and telemetry. In HbH mode shown in Figure 3, telemetry covers the flow information on every node of the forwarding path the flow packet is transmitted, which provides detailed flow information on each hop. Hop-by-Hop telemetry usually works in the need of an on-demand fault diagnose. Han, et al. Expires 28 January 2024 [Page 5] Internet-Draft Inband Flow Learning Framework July 2023 +-------------+ |Data Consumer| compute HbH flow info +-------------+ | | | | flow info telemetry ______________| | | |_________________ | ___| |___ | | | | | +---------+ +---------+ +---------+ +---------+ | Ingress |---| Transit | ...| Transit |---| Egress | | Node | | Node | | Node | | Node | +---------+ +---------+ +---------+ +---------+ Figure 3 Hop-by-Hop Telemetry Mode 3.2.2. Telemetry Policy Telemetry policy is used to determine which flow should be monitored. By configuring telemetry policy, it can increase the priority of learning and telemetry to critical flow and reduce or filter the learning and telemetry of unimportant flows. It is crucial to network deployment for two reasons, one is the number of flows can be huge, another is the limitation of processing capability either on the controller or the network node. There might be millions of flows in a large scale network, for example 5G mobile backhaul network. It is important to wisely choose the granularity of inband flow information telemetry. Regarding IP traffics, the telemetry policy can be based on either one of or combination of flow characteristics, such as IP source/ destination address, TCP/UDP port number, VRFs, or network device interfaces etc. An IP address with a flexible wildcard mask can also be used as means to provide telemetry policy to an aggregation of flows. 3.2.3. Telemetry Instance Inband Flow Information Telemetry Instance(IFITI), in short called telemetry instance, is the management object of the monitored flow for the deployment of flow-oriented on-path telemetry techniques under the framework of IFL. During its life cycle, IFITI is responsible for providing performance telemetry data on the nodes that the flow it monitors traverses. On ingress nodes IFITIs can be automatically generated in either distributed or centralized way by implementing telemetry policies for automatically discovered service flows. The transit nodes and egress nodes can also automatically generate IFITIs by learning some special information of the monitored flows whitch is embedded by the ingress Han, et al. Expires 28 January 2024 [Page 6] Internet-Draft Inband Flow Learning Framework July 2023 nodes without configuring flow characteristics. Flow identifier is such special information whitch may be a unique value within a domain encapsulated in the service packets to setup the relationship between the characteristic information, telemetry instance and the service flow. It can not only correlate the telemetry data of flows on each node, as mentioned in the previous section, but also serve as the key marker for the forwarding plane to identify the monitored flow. For the forwarding plane, it is much easier to identify a piece of data in a service packet than to identify various types of flow characteristics. The following uses flow identifier as an example to describe the flow learning process on transit and egress node. Once the telemetry instance is created, ingress node can start the telemetry of flow information based on the method of on-path telemetry techniques. At the same time, ingress node encodes inband monitoring information in the service packets, including the identifier. When a service flow packet passes through the transit node or egress node, if the node detects that the packet contains a flow identifier, it considers that the packet is a service flow packet to be monitored, and automatically creates a telemetry instance using the identifier as the key. The automatic creation of telemetry instance on network node can greatly facilitate the dynamic and incremental deployment. On all types of nodes, network operators do not need to statically configure characteristics of monitored flows, which saves a lot of workload and reduces error probability in a large-scale deployment scenario. When the path of the monitored flow changes, the monitored flow can be automatically detected on the new path node and the corresponding telemetry instance can be automatically deployed. 4. Inband Flow Information Telemetry Adjustment When route convergence happens to the network, service flow may switch to other forwarding nodes. When the traffic changes, telemetry instance varies as well. Regarding the telemetry instance running on the fault path, the aging of IFITI should be supported in order to recycle the network resources. IFITI should be deleted once it becomes stale. To monitor the same flow information, new telemetry instance is required to add on the new transit or egress node. Note that aging and adjustment of IFITI can be initiated by controller or network node. When a specific timer used for flow information telemetry timeout, the IFITI would be deleted to stop the telemetry of the flow. Han, et al. Expires 28 January 2024 [Page 7] Internet-Draft Inband Flow Learning Framework July 2023 5. IANA Considerations This document has no request to IANA 6. Security Considerations TBD 7. References 7.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . 7.2. Informative References [I-D.hwyh-ippm-ps-inband-flow-learning] Han, L., Wang, M., Wang, X., and J. Huang, "Problem Statement and Requirement for Inband Flow Learning", Work in Progress, Internet-Draft, draft-hwyh-ippm-ps-inband- flow-learning-03, 27 July 2023, . [I-D.ietf-mpls-inband-pm-encapsulation] Cheng, W., Min, X., Zhou, T., Dai, J., and Y. Peleg, "Encapsulation For MPLS Performance Measurement with Alternate Marking Method", Work in Progress, Internet- Draft, draft-ietf-mpls-inband-pm-encapsulation-06, 14 June 2023, . [I-D.song-opsawg-ifit-framework] Song, H., Qin, F., Chen, H., Jin, J., and J. Shin, "Framework for In-situ Flow Information Telemetry", Work in Progress, Internet-Draft, draft-song-opsawg-ifit- framework-20, 24 April 2023, . Han, et al. Expires 28 January 2024 [Page 8] Internet-Draft Inband Flow Learning Framework July 2023 [RFC9197] Brockners, F., Ed., Bhandari, S., Ed., and T. Mizrahi, Ed., "Data Fields for In Situ Operations, Administration, and Maintenance (IOAM)", RFC 9197, DOI 10.17487/RFC9197, May 2022, . [RFC9326] Song, H., Gafni, B., Brockners, F., Bhandari, S., and T. Mizrahi, "In Situ Operations, Administration, and Maintenance (IOAM) Direct Exporting", RFC 9326, DOI 10.17487/RFC9326, November 2022, . [RFC9341] Fioccola, G., Ed., Cociglio, M., Mirsky, G., Mizrahi, T., and T. Zhou, "Alternate-Marking Method", RFC 9341, DOI 10.17487/RFC9341, December 2022, . [RFC9343] Fioccola, G., Zhou, T., Cociglio, M., Qin, F., and R. Pang, "IPv6 Application of the Alternate-Marking Method", RFC 9343, DOI 10.17487/RFC9343, December 2022, . Authors' Addresses Liuyan Han China Mobile Beijing China Email: hanliuyan@chinamobile.com Minxue Wang China Mobile Beijing China Email: wangminxue@chinamobile.com Xuanxuan Wang Huawei Nanjing China Email: wxxuan@huawei.com Tianran Zhou Huawei Beijing China Han, et al. Expires 28 January 2024 [Page 9] Internet-Draft Inband Flow Learning Framework July 2023 Email: zhoutianran@huawei.com Han, et al. Expires 28 January 2024 [Page 10]