COINRG L. M. Contreras Internet-Draft Telefonica Intended status: Informational M. Boucadair Expires: 11 January 2024 Orange D. Lopez Telefonica C. J. Bernardos Universidad Carlos III de Madrid 10 July 2023 An Evolution of Cooperating Layered Architecture for SDN (CLAS) for Compute and Data Awareness draft-contreras-coinrg-clas-evolution-01 Abstract This document proposes an extension to the Cooperating Layered Architecture for Software-Defined Networking (SDN) by including compute resources and telemetry data processing capabilities. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 11 January 2024. Copyright Notice Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components Contreras, et al. Expires 11 January 2024 [Page 1] Internet-Draft CLAS Evolution July 2023 extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Conventions and Definitions . . . . . . . . . . . . . . . . . 3 3. Cooperating Layered Architecture for Software-Defined Networking (CLAS) . . . . . . . . . . . . . . . . . . . . 3 4. Augmentation of CLAS with Compute and Telemetry Data Awareness . . . . . . . . . . . . . . . . . . . . . . . . 5 4.1. Compute Stratum . . . . . . . . . . . . . . . . . . . . . 5 4.2. Telemetry Plane . . . . . . . . . . . . . . . . . . . . . 5 4.3. Extended CLAS Architecture . . . . . . . . . . . . . . . 6 5. Discussion on Research Aspects of the Proposed Architecture . . . . . . . . . . . . . . . . . . . . . . 7 5.1. Discussion Related to the Compute Stratum . . . . . . . . 7 5.2. Discussion Related to the Telemetry Plane . . . . . . . . 7 6. Applicability scenarios . . . . . . . . . . . . . . . . . . . 8 6.1. Cloud-edge Continuum . . . . . . . . . . . . . . . . . . 8 6.2. Network-application Integration . . . . . . . . . . . . . 9 7. TODO for next versions of this document . . . . . . . . . . . 10 8. Security Considerations . . . . . . . . . . . . . . . . . . . 10 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 10.1. Normative References . . . . . . . . . . . . . . . . . . 10 10.2. Informative References . . . . . . . . . . . . . . . . . 10 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 12 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 12 1. Introduction Telecommunication networks are evolving towards a tight integration of interconnected compute environments, offering specifically capabilities for the instantiation of virtualized network functions interworking with physical variants of other network functions, altogether used to build and deliver services. Moreover, network operations are endorsing automation (e.g., [RFC8969]) and programmability (e.g., [RFC7149][RFC7426]) with the introduction of closed-loop mechanisms, intent declarations, Artificial Intelligence (AI) and Machine Learning (ML) techniques to facilitate informed (proactive) decisions as well as predictive behaviors enabling consistent automation. Contreras, et al. Expires 11 January 2024 [Page 2] Internet-Draft CLAS Evolution July 2023 It is then necessary to provide a network management framework that could incorporate these technical components, structuring the different concerns (i.e., connectivity, processing and telemetry data generation and analysis) and the interaction among components operating the network. Existing approaches (e.g. [RFC8969]) only focus on the networking aspects (i.e., connectivity) without sufficient consideration of both compute domain and telemetry data analysis. This document describes an evolution of the Cooperating Layered Architecture for Software-Defined Networking (CLAS) [RFC8597] to include the aforementioned aspects into the architecture. 2. Conventions and Definitions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. 3. Cooperating Layered Architecture for Software-Defined Networking (CLAS) [RFC8597] describes an SDN architecture structured in two different strata, namely Service Stratum and Transport Stratum. On one hand, the Service Stratum contains the functions related to the provision of services and the capabilities offered to external applications. On the other hand, the Transport Stratum comprises the functions focused on the transfer of data between the communication endpoints (e.g., between end-user devices, between two service gateways, etc.). Each of the strata is structured in different planes, as follows: * The Control plane, which centralizes the control functions of each stratum and directly controls the corresponding resources. * The Management plane, logically centralizing the management functions for each stratum, including the management of the control and resource planes. * The Resource plane, that comprises the resources for either the transport or the service functions. Figure 1 illustrates the original CLAS architecture. Contreras, et al. Expires 11 January 2024 [Page 3] Internet-Draft CLAS Evolution July 2023 Applications /\ || || +-------------------------------------||-------------+ | Service Stratum || | | \/ | | ........................... | | . SDN Intelligence . | | . . | | +--------------+ . +--------------+ . | | | Resource Pl. | . | Mgmt. Pl. | . | | | |<===>. +--------------+ | . | | | | . | Control Pl. | | . | | +--------------+ . | |-----+ . | | . | | . | | . +--------------+ . | | ........................... | | /\ | | || | +-------------------------------------||-------------+ || Standard -- || -- API || +-------------------------------------||-------------+ | Transport Stratum || | | \/ | | ........................... | | . SDN Intelligence . | | . . | | +--------------+ . +--------------+ . | | | Resource Pl. | . | Mgmt. Pl. | . | | | |<===>. +--------------+ | . | | | | . | Control Pl. | | . | | +--------------+ . | |-----+ . | | . | | . | | . +--------------+ . | | ........................... | | | | | +----------------------------------------------------+ Figure 1: Cooperating Layered Architecture for SDN {{RFC8597}} Contreras, et al. Expires 11 January 2024 [Page 4] Internet-Draft CLAS Evolution July 2023 4. Augmentation of CLAS with Compute and Telemetry Data Awareness The CLAS architecture was initially conceived from the perspective of exploiting the advantages of network programmability in operational networks. The evolution of current networks and the services they support are, however, introducing new aspects: * Considerations of distributed computing capabilities attached to different points in the network, intended for hosting a variety of services and applications usually in a virtualized manner (e.g., [I-D.contreras-alto-service-edge]). * Introduction of evidence-driven techniques, such as Analytics, Artificial Intelligence (AI) and Machine Learning (ML) techniques in order to improve operations by means of closed loop automation (e.g., [I-D.francois-nmrg-ai-challenges]). With that in mind, this memo proposes augmentations to the original CLAS architecture by adding the aforementioned aspects. 4.1. Compute Stratum The CLAS architecture is extended by adding a new stratum, named Compute Stratum. This stratum contains the control, management, and resource planes related to the computing aspects. This additional stratum cooperates with the other two in order to facilitate the overall service provision in the network. With this addition, and in order to be more explicit in the strata scope, the previously named Transport Stratum is renamed as Connectivity Stratum, representing the fact that this stratum responsibility is focused on the overall connectivity supporting the other two strata in the architecture. 4.2. Telemetry Plane Telemetry is usually part of the management plane. In order to insist on the “learning” matters, this document defines telemetry as a dedicated plane as it streams sensitive data that is instrumental for the CLAS strata. A further extension to the original CLAS architecture is related to the need of collecting, processing and sharing relevant data from each of the considered strata. With that purpose a telemetry plane is proposed to complement the already existing planes per stratum. Contreras, et al. Expires 11 January 2024 [Page 5] Internet-Draft CLAS Evolution July 2023 The telemetry plane is in charge of handling the data specificities of each stratum. Thus, the telemetry plane in the Service Stratum is focused on data relevant to the service as defined by the application or service owner, usually in terms of service key performance indicators (KPI) [TMV]. Then, the telemetry plane in the compute stratum concentrates on data related to the computing capabilities in use (e.g., CPU load, RAM usage, storage utilization, etc) [OpenStack]. Finally, the telemetry plane in the network stratum is in charge of handling the monitoring and telemetry information obtained from the network (e.g., [I-D.ietf-opsawg-service-assurance-yang]). 4.3. Extended CLAS Architecture Figure 2 presents the augmentation proposed showing the relationship among strata. Applications /\ || +-------------------------------------||-------------+ | Service Stratum || | | \/ | | +--------------+ ........................... | | | Telemetry Pl.| . SDN Intelligence . | | | |<===>. . | | +-----/\-------+ . +--------------+ . | | || . | Mgmt. Pl. | . | | || . +--------------+ | . | | +-----\/-------+ . | Control Pl. |-----+ . | | | Resource Pl. | . | | . | | | |<===>. +--------------+ . | | +--------------+ ........................... | | /\ /\ | | || || | +--------------------------------||-------------||---+ Standard API -- || -- || +--------------------------------||-----+ || | Compute Stratum || | || | \/ | || | +----------+ ................... | || | | Telemetry| . SDN . | Std. || | | Plane |<==>. Intelligence . | API || | +-----/\---+ . +----------+ . | -- || -- | || . | Mgmt. Pl.| . | || | || . +----------+ | . | || | +-----\/---+ . | Control |-+ . | || | | Resource | . | Plane | . | || Contreras, et al. Expires 11 January 2024 [Page 6] Internet-Draft CLAS Evolution July 2023 | | Plane |<==>. +----------+ . | || | +----------+ ................... | || +----------------------------------/\---+ || Standard API -- || -- || +-------------------||-----------||-----+ | Connectivity || || | | Stratum || || | | \/ \/ | | +----------+ ................... | | | Telemetry| . SDN . | | | Plane |<==>. Intelligence . | | +-----/\---+ . +----------+ . | | || . | Mgmt. Pl.| . | | || . +----------+ | . | | +-----\/---+ . | Control |-+ . | | | Resource | . | Plane | . | | | Plane |<==>. +----------+ . | | +----------+ ................... | +---------------------------------------+ Figure 2: Extended CLAS architecture 5. Discussion on Research Aspects of the Proposed Architecture 5.1. Discussion Related to the Compute Stratum The inclusion of the Compute Stratum extends the resource layer/plane in a manner that the network (i.e., including processing capabilities and the associated connectivity) can be programmed consistently and in an integrated way. This is very relevant when evolving to network architectures pursuing the could-edge continuum, even considering the extension to the very extreme edge. Important to note. the aforementioned cloud-edge continuum could be potentially constituted by resources from multiple administrative domains. Enabling the management of multiple heterogeneous domains in a so-called "frictionless" manner is the necessary to be explored. 5.2. Discussion Related to the Telemetry Plane One of the aspects to investigate is the application of evidence- driven, “smart” management (most notably, based on AI) to network management and control. There are multiple issues to consider: * Telemetry data generation and context information. * The lifecycle of data flows in the closed loop, in both directions,from network to management and vice versa. Contreras, et al. Expires 11 January 2024 [Page 7] Internet-Draft CLAS Evolution July 2023 * The flows controlling the behavior (policies/intents), as defined by network admins, and potentially users, towards the management elements. * Feedback (i.e., predictions, suggested actions, etc) patterns from management elements to network administrators, and users. * Metadata models to represent sources and consumers of data at the telemetry plane, supporting the dynamic attachment of new sources and consumers, including data composition elements. * Flow patterns and models facilitating the cooperation among distinct telemetry planes, implying knowledge sharing among different segments, and data and knowledge aggregation at different strata of control. * Security and privacy issues regarding the usage of data flows, considering their provenance and potential attestation methods. A potential way to follow is the definition of a common, model-based, approach, also defining a recursive structure that could become a generalization of the CLAS model. 6. Applicability scenarios This section describes deployment scenarios suitable for the CLAS architecture evolution. 6.1. Cloud-edge Continuum More and more, computing facilities are being deployed by network service providers to satisfy a number of use cases requiring of distributed compute processing capabilities (e.g., for micro-service instantiation), some of them as edge nodes because of the need of proximity. Use cases in [I-D.irtf-coinrg-use-cases] exemplify those needs. Such distributed computing facilities form what is known as cloud- edge continuum. Those distributed facilities need to be interconnected for accomplishing end-to-end services, based in the interaction of multiple applications of service functions placed in different compute nodes for performance or resource efficiency reasons. Current ways of deploying services follow the cloud-native approach of instantiating a set of micro-services that can be located at different compute nodes. Typical cloud management systems such as Kubernetes take care of the allocation of cloud-edge resources across Contreras, et al. Expires 11 January 2024 [Page 8] Internet-Draft CLAS Evolution July 2023 distinct nodes or clusters. For the networking part, it is necessary to interact with network controllers capable of providing the necessary connectivity with certain guarantees. The extended CLAS architecture represents a framework where the cloud-native resources can be handled in combination with the connectivity part, assuring the service not only at the provisioning phase but during the complete service lifecycle, and supporting them across different domains. Features that can expected to be satisfied in this type of scenarios are: * Overall resource optimization of system resources at different levels (i.e., compute, network, etc). This can imply a process of learning and inferring status based on historical resource usage data. * Assurance of Service Level Objectives (SLOs) by acting on either the compute or the connectivity parts. This can motivate the need of compute workloads migration along the service lifetime between compute nodes, requiring to adapt the connectivity to the new placement. * Secure transfer of data across the cloud-edge continuum, with the necessary isolation of services among users, and the required Confidentiality and integrity properties, which can imply the application of isolation capabilities trustworthiness verification in both compute and connectivity strata. 6.2. Network-application Integration Nowadays applications take service decisions mostly decoupled from network status and conditions. Similarly, the network is not aware of applications needs, so that is not possible in certain cases to satisfy application needs. Thus, emerges the need of further collaboration or integration between applications and the network. [RFC9419] discusses principles for designing mechanisms allowing application - network collaboration. Such collaboration can proactively or reactively trigger actions in the network at run time. Features that can expected to be satisfied in this type of scenarios are: * Monitoring information that could be relevant for either the application or the network. The monitoring information will be a composition of information from both compute and connectivity strata. Contreras, et al. Expires 11 January 2024 [Page 9] Internet-Draft CLAS Evolution July 2023 * Exposure of capabilities from the network, including (and even combining) both the compute and connectivity strata. * Usage of metadata for either the connectivity or the processing of the information at service level 7. TODO for next versions of this document This version is a work-in-progress. Next versions of the document will address some further aspects such as: * Communication between strata (and planes). * Deployment scenarios (including legacy ones). * Potential use cases (specially in alignment with on-going activities in COINRG / NMRG). 8. Security Considerations Same security considerations as reflected in [RFC8597] with regards to the strata architecture apply also here. Apart from that, the introduction of the Learning plane on the data management imposes additional security concerns. (TODO: elaborate on data-related security issues). 9. IANA Considerations This document has no IANA actions. 10. References 10.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . 10.2. Informative References Contreras, et al. Expires 11 January 2024 [Page 10] Internet-Draft CLAS Evolution July 2023 [I-D.contreras-alto-service-edge] Contreras, L. M., Randriamasy, S., Ros-Giralt, J., Perez, D. A. L., and C. E. Rothenberg, "Use of ALTO for Determining Service Edge", Work in Progress, Internet- Draft, draft-contreras-alto-service-edge-08, 10 July 2023, . [I-D.francois-nmrg-ai-challenges] François, J., Clemm, A., Papadimitriou, D., Fernandes, S., and S. Schneider, "Research Challenges in Coupling Artificial Intelligence and Network Management", Work in Progress, Internet-Draft, draft-francois-nmrg-ai- challenges-02, 13 March 2023, . [I-D.ietf-opsawg-service-assurance-yang] Claise, B., Quilbeuf, J., Lucente, P., Fasano, P., and T. Arumugam, "YANG Modules for Service Assurance", Work in Progress, Internet-Draft, draft-ietf-opsawg-service- assurance-yang-11, 3 January 2023, . [I-D.irtf-coinrg-use-cases] Kunze, I., Wehrle, K., Trossen, D., Montpetit, M., de Foy, X., Griffin, D., and M. Rio, "Use Cases for In-Network Computing", Work in Progress, Internet-Draft, draft-irtf- coinrg-use-cases-04, 30 June 2023, . [RFC7149] Boucadair, M. and C. Jacquenet, "Software-Defined Networking: A Perspective from within a Service Provider Environment", RFC 7149, DOI 10.17487/RFC7149, March 2014, . [RFC7426] Haleplidis, E., Ed., Pentikousis, K., Ed., Denazis, S., Hadi Salim, J., Meyer, D., and O. Koufopavlou, "Software- Defined Networking (SDN): Layers and Architecture Terminology", RFC 7426, DOI 10.17487/RFC7426, January 2015, . Contreras, et al. Expires 11 January 2024 [Page 11] Internet-Draft CLAS Evolution July 2023 [RFC8597] Contreras, LM., Bernardos, CJ., Lopez, D., Boucadair, M., and P. Iovanna, "Cooperating Layered Architecture for Software-Defined Networking (CLAS)", RFC 8597, DOI 10.17487/RFC8597, May 2019, . [RFC8969] Wu, Q., Ed., Boucadair, M., Ed., Lopez, D., Xie, C., and L. Geng, "A Framework for Automating Service and Network Management with YANG", RFC 8969, DOI 10.17487/RFC8969, January 2021, . [RFC9419] Arkko, J., Hardie, T., Pauly, T., and M. Kühlewind, "Considerations on Application - Network Collaboration Using Path Signals", RFC 9419, DOI 10.17487/RFC9419, July 2023, . [TMV] "Service performance measurement methods over 5G experimental networks", May 2021. Acknowledgments This work has been partially funded by the European Union under Horizon Europe projects NEMO (NExt generation Meta Operating system) grant number 101070118, and CODECO (COgnitive, Decentralised Edge- Cloud Orchestration), grant number 101092696. Authors' Addresses Luis M. Contreras Telefonica Ronda de la Comunicacion, s/n 28050 Madrid Spain Email: luismiguel.contrerasmurillo@telefonica.com URI: http://lmcontreras.com Mohamed Boucadair Orange 35000 Rennes France Email: mohamed.boucadair@orange.com Diego R. Lopez Telefonica Seville Spain Contreras, et al. Expires 11 January 2024 [Page 12] Internet-Draft CLAS Evolution July 2023 Email: diego.r.lopez@telefonica.com Carlos J. Bernardos Universidad Carlos III de Madrid Av. Universidad, 30 28911 Leganes, Madrid Spain Email: cjbc@it.uc3m.es Contreras, et al. Expires 11 January 2024 [Page 13]