Network Working Group C. Jennings Internet-Draft S. Nandakumar Intended status: Informational M. Zanaty Expires: 11 January 2024 Cisco 10 July 2023 MOQ Usages for audio and video applications draft-jennings-moq-usages-00 Abstract Media over QUIC Transport (MOQT) defines a publish/subscribe based unified media delivery protocol for delivering media for streaming and interactive applications over QUIC. This specification defines details for building audio and video applications over MOQT, more specifically, provides information on mapping application media objects to the MOQT object model and possible mapping of the same to underlying QUIC transport. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 11 January 2024. Copyright Notice Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction 1.1. Requirements Notation and Conventions 2. MOQT QUIC Mapping 2.1. Stream per MOQT Group 2.2. Stream per MOQT Object 2.3. Stream per MOQT Track 2.4. Stream per Priority 2.5. Stream per multiple MOQT Tracks 3. MoQ Audio Objects 4. MoQ Video Objects 4.1. Encoded Frame 4.2. Encoded Slice 4.3. CMAF Chunk 4.4. CMAF Fragment 5. MOQT Track 5.1. Single Quality Media Streams 5.2. Multiple Quality Media Streams 5.2.1. Simulcast 5.2.2. Scalable Video Coding (SVC) 5.2.3. k-SVC 6. Object and Track Priorities 7. Relay Considerations 8. Bitrate Adaptation 9. Usage Mode identification 10. References 10.1. Normative References 10.2. Informative References Appendix A. Security Considerations Appendix B. IANA Considerations Appendix C. Acknowledgments Authors' Addresses 1. Introduction Media Over QUIC Transport (MOQT) [MoQTransport] allows set of publishers and subscribers to participate in the media delivery over QUIC for streaming and interactive applications. The MOQT specification defines the necessary protocol machinery for the endpoints and relays to participate, however, it doesn't provide recommendations for media applications on using MOQT object model and mapping the same to underlying QUIC transport. This document introduces MOQT's object model mapping to underlying QUIC transport in Section 2. Section 3 and Section 4 describe various grouping of application level media objects and their mapping to the MOQT object model. Section 5.2 discusses considerations when using multiple quality video applications, such as simulcast and/or layer coding, over the MOQT protocol.Section 8 describes considerations for adaptive bitrate techniques and finally the Section 6 discusses interactions when priorities are used on objects and tracks. Below picture captures the conceptual model showing mapping at various levels of a typical media application stack using MOQT delivery protocol. +------------------------------+ | Application Data | ----+ frames, slices, segments +---------------+--------------+ | | v | +-------------------------------+ | | Tracks, Groups, Objects | | +-------------------------------+ +--------------v---------------+ | MOQT Object Model | +--------------+---------------+ | +----------------------------------------+ | |Stream per Group, Stream per Object, .. | | +----------------------------------------+ +---------------v--------------+ | QUIC | +------------------------------+ 1.1. Requirements Notation and Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD","SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. 2. MOQT QUIC Mapping In a typical MOQT media application, the captured media from a media source is encoded (compressed), encrypted (based on the encryption scheme), packaged (based on the container format) and mapped onto MOQT object model. Applications (such as Media producers, Relays) deliver MOQT objects over QUIC transport by choosing the mapping that is appropriate based on the context. | Encoded (and/or) Encrypted Media Stream V +-------------------+ | MOQT | +-------------------+ | MOQ Tracks V / ===== QUIC Stream per Group +---------------------+ | | QUIC Transport | --| ===== QUIC Stream per Object +---------------------+ | | ===== QUIC Stream per Track | | ===== QUIC Stream per multiple Tracks | \ ===== QUIC Stream per Priority Subsections below describes a few possibilities to consider when mapping MOQT objects to QUIC Streams. 2.1. Stream per MOQT Group In this mode, an unidirectional QUIC stream is setup per MOQT Group (section 2.2 of [MoQTransport]). Following observations can be made about such a setup: * MOQT groups typically represent things that share some kind of relationship (eg decodability, priority) and having the objects of a group share the same underlying stream context allows them to delivered coherently. * Media consumers can map each incoming QUIC stream into decoding context in the order of their arrival per group. * Since the objects within group share the QUIC stream, there is likelihood of increased end to end latency due to head-of-line blocking under losses. 2.2. Stream per MOQT Object In this mode, an unidirectional QUIC stream is setup per MOQT Object (section 2.1 of [MoQTransport]). Following observations can be made about such a setup: * Using a single stream per object can help reduce latency at the source especially when objects represent smaller units of the application data (say a single encoded frame). * The impact of on-path losses to the end to end latency is scoped to the object duration. The smaller the object duration, lesser the impact on the end to end latency. * One stream per object may end up in creating large number of streams, especially when the objects durations is small. * Media consumers may need to re-organize the incoming streams for handling objects within a group, since streams may arrive out of order. 2.3. Stream per MOQT Track In this mode, there is one unidirectional QUIC stream per MOQT Track (section 2.3 [MoQTransport]). Following observations can be made about such a setup: * This scheme is the simplest in its implementation and the streams stays active until the track exists. Endpoints need to maintain just one stream context per track. * Since all the objects within the track share the same stream, the may be impact on end to end latency due to HOL blocking under loss. 2.4. Stream per Priority In this mode, there is one unidirectional QUIC stream per MOQT Track/ Object priority. Following observations can be made about such a setup: * This scheme is relatively simpler in its implementation and number of stream contexts match the number of priority levels. * Such a scheme can be used at Relays when forwarding decisions can be naturally mapped to priorities carried in the object header. 2.5. Stream per multiple MOQT Tracks This mode is similar Section 2.3 but with more than one MOQT track delivered over a single unidirectional QUIC stream, thus allowing implementations to map multiple incoming QUIC streams to a few outgoing QUIC Streams. Similar to Section 2.3, this mode is relatively simple in its implementation and at the same time may suffer from latency impacts under losses. 3. MoQ Audio Objects Each chunk of encoded audio data, say 10ms, represents a MOQ Object. In this setup, there is one MOQT Object per MOQT Group, where, the Group Sequence in the object header is increment by one for each encoded audio chunk and the Object Sequence is defaulted to value 0. When mapped to the underlying QUIC Stream, each such unitary group is sent over individual unidirectional QUIC stream (similar to Section 2.2/Section 2.1). Future sub 2 kbps audio codecs may take advantage of a rapidly updated model that are needed to decode the audio which could result in audio needing to use groups with multiple objects to ensure all the objects needed to decode some audio are in the same group. 4. MoQ Video Objects The decision on what constitutes a MOQ object/group and its preferred mapping to the underlying QUIC transport for video streams is governed by the granularity of encoded bitstream, as chosen by the application. The smallest unit of such an application defined encoded bitstream will be referred to as "Video Atom" in this specification and they are mapped 1:1 to MOQ Objects. The size and duration of a video atom is application controlled and follows various strategies driven by application requirements such as latency, quality, bandwidth and so on. Following subsections identify various granularities defining the video atoms and their corresponding mapping to MOQT object model and the underlying QUIC transport. 4.1. Encoded Frame In this scheme, the video atom is a single encoded video frame. The Group Sequence is incremented by 1 at IDR Frame boundaries. The Object Sequence is increment by 1 for each video frame, starting at 0 and resetting to 0 at the start of new group. The first video frame (Object Sequence 0) should be IDR Frame and the rest of the video frames within a MOQT group are typically dependent frames (delta frames) and organized in the decode order. When using QUIC mapping scheme defined in Section 2.2, each unidirectional QUIC stream is used to deliver one encoded frame. In this mode, the receiver application should manage out of order streams to ensure the MOQ Objects are delivered to the decoder in the increasing order of the Object Sequence within a group and then in the increasing order of the Group Sequence. When using QUIC mapping as defined in {spg}, one unidirectional QUIC stream is setup to deliver all the encoded frames (objects) within a group. 4.2. Encoded Slice In Slice-based encoding a single video frame is “sliced” into separate sections and are encoded simultaneously in parallel. Once encoded, each slice can then be immediately streamed to a decoder instead of waiting for the entire frame to be encoded first. In this scheme, the video atom is a encoded slice, starting with the IDR frame as Object Sequence of 0 for that slice and followed by delta frames with Object Sequence incremented by 1 successively. A MOQT Group is identified by set of such objects at each IDR frame boundaries. To be able to successfully decode and render at the media consumer, the identifier of the containing video frame for the slice needs to be carried end to end. This would allow the media consumer to map the slices to right decoding context of the frame being processed. Note: The video frame identifier may be carried either as part of encoded object's payload header or introduce a group header for conveying the frame identifier. When using QUIC mapping scheme defined in Section 2.2, each unidirectional QUIC stream is used to deliver one encoded slice of a video frame. When using QUIC mapping as defined in {spg}, each unidirectional QUIC stream is setup to deliver all the encoded slices (objects) within a group. 4.3. CMAF Chunk CMAF [CMAF] chunks are CMAF addressable media objects that contain a consecutive subset of the media samples in a CMAF fragment. CMAF chunks can be used by a delivery protocol to deliver media samples as soon as possible during live encoding and streaming, i.e., typically less than a second. CMAF chunks enable the progressive encoding, delivery, and decoding of each CMAF fragment. A given video application may choose to have chunk duration to span more than one encoded video frame. When using CMAF chunks, the video atom is a CMAF chunk. The CMAF chunk containing the IDR Frame shall have Object Sequence set to 0, with each additional chunk with its Object Sequence incremented by 1. The Group Sequence is incremented at every IDR interval and all the CMAF chunks within a given IDR interval shall be part of the same MOQT Group. When using QUIC mapping scheme defined in Section 2.2, each unidirectional QUIC stream is used to deliver a CMAF Chunk. When using QUIC mapping as defined in {spg}, each unidirectional QUIC stream is setup to deliver all the CMAF chunks (objects) within a group. When using QUIC mapping defined in Section 2.3 CMAF chunks corresponding to CMAF track are delivered over the same unidirectional QUIC stream. 4.4. CMAF Fragment CMAF fragments are the media objects that are encoded and decoded. For scenarios, where the fragments contains one or more complete coded and independently decodable video sequences, each such fragment is identified as single MOQT Object and it forms its own MOQT Group. There is one unidirectional QUIC stream per such an object Section 2.2. Media senders should stream the bytes of the object, in the decode order, as they are generated in order the reduce the latencies. 5. MOQT Track MOQT Tracks are typically characterized by having a single encoding and optionally a encryption configuration. Applications can encoded a captured source stream into one or more qualities as described in the sub sections below. 5.1. Single Quality Media Streams For scenarios where the media producer intents to publish single quality audio and video streams, applications shall map the objects from such audio and video streams to individual tracks enabling each track to represent a single quality. 5.2. Multiple Quality Media Streams It is not uncommon for applications to support multiple qualities (renditions) per source stream to support receivers with varied capabilities, enabling adaptive bitrate media flows, for example. We describe 2 common approaches for supporting multiple qualities(renditions/encodings) - Simulcast and Layered Coding. 5.2.1. Simulcast In simulcast, each MOQT track is an time-aligned alternate encoding (say, multiple resolutions) of the same source content. Simulcasting allows consumers to switch between tracks at group boundaries seamlessly. Few observations: * Catalog should identify time-aligned relationship between the simulcasted tracks. * All the alternate encodings shall matching base timestamp and duration. * All the alternate encodings are for the same source media stream. * Media consumers can pick and choose the right quality by subscribing to the appropriate track. * Media consumers react to changing network/bandwidth situations by subscribing to different quality track at the group boundaries. 5.2.2. Scalable Video Coding (SVC) SVC defines a coded video representation in which a given bitstream offers representations of the source material at different levels of fidelity (spatial, quality, temporal) structured in a hierarchical manner. Such an organization allows bitstream to be extracted at lower bit rate than the complete sequence to enable decoding of pictures with multiple image structures (for sequences encoded with spatial scalability), pictures at multiple picture rates (for sequences encoded with temporal scalability), and/or pictures with multiple levels of image quality (for sequences encoded with SNR/ quality scalability). Different layers can be separated into different bitstreams. All decoders access the base stream; more capable decoders can access enhancement streams. 5.2.2.1. All layers in a single MOQT Track In this mode, the video application transmits all the SVC layers under a single MOQT Track. When mapping to the MOQT object model, any of the methods described in Section 4 can be leveraged to mapped the encoded bitstream into MOQT groups and objects. When transmitting all the layers as part of a single track, following properties needs to be considered: * Catalog should identify the SVC Codec information in its codec definition. * Media producer should map each video atom to the MOQ object in the decode order and can utilize any of the QUIC mapping methods described in Section 2. * Dependency information for all the layers (such as spatial/ temporal layer identifiers, dependent descriptions) are encoded in the bitstream and/or container. The scheme to map all the layers to a single track is simple to implement and allows subscribers/media consumers to make independent layer drop decisions without needing any protocol exchanges (as needed in Section 5.2.1). However, such a scheme is constrained by disallowing selective subscriptions to the layers of interest. 5.2.2.2. One SVC layer per MOQT Track In this mode, each SVC layer is mapped to a MOQT Track. Each unique combination of fidelity (say spatial and temporal) is identified by a MOQT Track ( see example below). +-----------+ +-----------+ | S0T0 | --------> | Track1 | +-----------+ +-----------+ +-----------+ +-----------+ | S0T1 | --------> | Track2 | +-----------+ +-----------+ +-----------+ +-----------+ | S1T0 | --------> | Track3 | +-----------+ +-----------+ +-----------+ +-----------+ | S1T1 | --------> | Track4 | +-----------+ +-----------+ ex: 2-layer spatial and 2-layer temporal scalability encoding The catalog should identify the complete list of dependent tracks for each track that is part of layered coding for a given media stream. For example the figure below shows a sample layer dependency structure (2 spatial and temporal layers) and corresponding tracks dependencies. +----------+ +----------->| S1T1 | | | Track4 | | +----------+ | ^ | | +----------+ | | S1TO | | | Track3 | | +----------+ +-----+----+ ^ | SOT1 | | | Track2 | | +----------+ | ^ +----------+ | | SOTO | | | Track1 |---------+ +----------+ Catalog Track Dependencies: Track2 depends on Track1 Track3 depends on Track1 Track4 depends on Track2 and Track3 Within each track, the encoded media for the given layer can follow mappings defined in Section 4 and can choose from the options defined in Section 2 for transporting the mapped objects over QUIC. The bitstream and/or the container should carry the neccessary to capture video frame level dependencies. Media consumers would need to consider information from catalog to group the the related tracks and gather information from the bistream to establish frame level depedencies. This would allow the consumer to appropriately map the incoming QUIC streams and MOQ objects to the right decoder context. 5.2.3. k-SVC k-SVC is a flavor of layered coding wherein the encoded frames within a layer depend only on the frames within the same layer, with the exception that the IDR frame in the enhancement layers depends on the IDR frame in the next level lower fidelity layer. When each layer of a k-SVC encoded bitstream is mapped to a MOQT track, following needs to be taken into consideration: * Catalog should identify the tracks are related via k-SVC dependency * MOQT protocol should be extended to propose a group header that enables track for the enhancement layer to identify the group sequence of its dependent track for satisfying IDR Frame dependency. 6. Object and Track Priorities Media producers are free to prioritize media delivery between the tracks by encoding priority information in the MOQT Object Header for a given track. Relay can utilize these priorities to make forwarding decisions. "draft-zanaty-moq-priority" specifies a prioritization mechanism for objects delivered using the Media over QUIC Transport (MOQT) protocol. 7. Relay Considerations Relays are not allowed to modify MOQT object header, as it might break encryption and authentication. However, Relays are free to apply any of the transport mappings defined in Section 2 that it sees fit based on the local decisions. For example, a well engineered Relay network may choose to take multiple incoming QUIC streams and map it to few outgoing QUIC streams (similar to one defined in Section 2.3) or the Relays may choose MOQT object priorities Section 2.4 to decide the necessary transport mapping. It is important to observe that such decisions cam be made solely considering the MOQT Object header information. 8. Bitrate Adaptation TODO: add considerations for client side ABR and possible options for server side ABR. 9. Usage Mode identification This specification explores 2 possible usage modes for applications to consider when using MOQT media delivery protocol : 1. Transport Mapping Section 2 2. MOQT Object model mapping For interoperability purposes, media producers should communicate its usage modes to the media consumers. Same can be achieved in one of the following ways 1. Via out of band, application specific mechanism. This approach limits the interoperability across applications, however. 2. Exchange the usage modes via Catalog. This approach enables consumers of catalog to setup their transport/media stacks appropriately based on the sender's preference. 10. References 10.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . 10.2. Informative References [MoQTransport] Curley, L., Pugin, K., Nandakumar, S., and V. Vasiliev, "Media over QUIC Transport", Work in Progress, Internet- Draft, draft-ietf-moq-transport-00, 5 July 2023, . [CMAF] "Information technology -- Multimedia application format (MPEG-A) -- Part 19: Common media application format (CMAF) for segmented media", March 2020. Appendix A. Security Considerations This section needs more work. Appendix B. IANA Considerations This document doesn't recommend any changes to IANA registries. Appendix C. Acknowledgments Thanks to MoQ WG for all the discussions that inspired this document's existence. Authors' Addresses Cullen Jennings Cisco Email: fluffy@iii.ca Suhas Nandakumar Cisco Email: snandaku@cisco.com Mo Zanaty Cisco Email: mzanaty@cisco.com