Mapping RTP Streams to Controlling Multiple Streams for Telepresence (CLUE) Media Captures

Tel Aviv


          Israel


        ron.even.tlv@gmail.com


    
      8x8, Inc. / Jitsi
      
        
          
          Jersey City
          NJ
          07302
          United States of America
        
        jonathan.lennox@8x8.com
      
    
    
    
      
   This document describes how the Real-time Transport Protocol (RTP) is used
   in the context of the Controlling Multiple Streams for Telepresence (CLUE)
   protocol.  It also describes the mechanisms and recommended practice for
   mapping RTP media streams, as defined in the Session Description Protocol
   (SDP), to CLUE Media Captures and defines a new RTP header extension
   (CaptureID).
    
    
      
        Status of This Memo
        
            This is an Internet Standards Track document.
        
        
            This document is a product of the Internet Engineering Task Force
            (IETF).  It represents the consensus of the IETF community.  It has
            received public review and has been approved for publication by
            the Internet Engineering Steering Group (IESG).  Further
            information on Internet Standards is available in Section 2 of 
            RFC 7841.
        
        
            Information about the current status of this document, any
            errata, and how to provide feedback on it may be obtained at
            .
        
      
      
        Copyright Notice
        
            Copyright (c) 2021 IETF Trust and the persons identified as the
            document authors. All rights reserved.
        
        
            This document is subject to BCP 78 and the IETF Trust's Legal
            Provisions Relating to IETF Documents
            () in effect on the date of
            publication of this document. Please review these documents
            carefully, as they describe your rights and restrictions with
            respect to this document. Code Components extracted from this
            document must include Simplified BSD License text as described in
            Section 4.e of the Trust Legal Provisions and are provided without
            warranty as described in the Simplified BSD License.
        
      
    
    
      
        Table of Contents
        
          
            .  Introduction
          
          
            .  Terminology
          
          
            .  RTP Topologies for CLUE
          
          
            .  Mapping CLUE Capture Encodings to RTP Streams
          
          
            .  MCC Constituent CaptureID Definition
            
              
                .  RTCP CaptureID SDES Item
              
              
                .  RTP Header Extension
              
            
          
          
            .  Examples
          
          
            .  Communication Security
          
          
            .  IANA Considerations
          
          
            .  Security Considerations
          
          
            . References
            
              
                .  Normative References
              
              
                .  Informative References
              
            
          
          
            Acknowledgments
          
          
            Authors' Addresses


  
    
      Introduction
      
   Telepresence systems can send and receive multiple media streams.
   The CLUE Framework  defines Media Captures
   (MCs) as a source of Media, from one or more Capture Devices.  A Media
   Capture may also be constructed from other Media streams.  A middlebox
   can express conceptual Media Captures that it constructs from
   Media streams it receives.  A Multiple Content Capture (MCC) is a
   special Media Capture composed of multiple Media Captures.
      SIP Offer/Answer  uses SDP
       to describe the RTP media
	streams .  Each RTP stream 
        has a unique Synchronization Source (SSRC)
	within its RTP session.  The content of the RTP stream is created by
	an encoder in the endpoint.  This may be an original content from a
	camera or a content created by an intermediary device like a Multipoint Control Unit (MCU).
      
   This document makes recommendations for the CLUE architecture about
   how RTP and RTP Control Protocol (RTCP) streams should be encoded and transmitted and how
   their relation to CLUE Media Captures should be communicated.  The
   proposed solution supports multiple RTP topologies .
      
   With regards to the media (audio, video, and timed text), systems that
   support CLUE use RTP for the media, SDP for codec and media transport
   negotiation (CLUE individual encodings), and the CLUE protocol for
   Media Capture description and selection.  In order to associate the
   media in the different protocols, there are three mappings that need to
   be specified:


      
        CLUE individual encodings to SDP
        RTP streams to SDP (this is not a CLUE-specific mapping)
        RTP streams to MC to map the received RTP stream to the current MC
in the MCC.
      
    
    
      Terminology
      
    The key words "MUST", "MUST NOT",
    "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT",
    "RECOMMENDED", "NOT RECOMMENDED",
    "MAY", and "OPTIONAL" in this document are
    to be interpreted as described in BCP 14 
         when, and only when, they appear in all capitals,
    as shown here.

      
   Definitions from the CLUE Framework
   (see ) are used by this document as
   well.
    
    
      RTP Topologies for CLUE
      
   The typical RTP topologies used by CLUE telepresence systems specify
   different behaviors for RTP and RTCP distribution.  A number of RTP
   topologies are described in .  For CLUE telepresence, the
   relevant topologies include Point-to-Point, as well as Media-Mixing
   Mixers, Media-Switching Mixers, and Selective Forwarding Middleboxes.
      
   In the Point-to-Point topology, one peer communicates directly with a
   single peer over unicast.  There can be one or more RTP sessions,
   each sent on a separate 5-tuple, that have a separate SSRC space,
   with each RTP session carrying multiple RTP streams identified by
   their SSRC.  All SSRCs are recognized by the peers based on the
   information in the RTCP Source description (SDES) report that
   includes the Canonical Name (CNAME) and SSRC of the sent RTP streams.  There are
   different Point-to-Point use cases as specified in the CLUE use case
   .  In some cases, a CLUE session that, at a high level, is
   Point-to-Point may nonetheless have an RTP stream that is best
   described by one of the mixer topologies.  For example, a CLUE
   endpoint can produce composite or switched captures for use by a
   receiving system with fewer displays than the sender has cameras.
   The Media Capture may be described using an MCC.
      
   For the media mixer topology , the peers communicate only
   with the mixer.  The mixer provides mixed or composited media
   streams, using its own SSRC for the sent streams.  If needed by the CLUE
   endpoint, the conference roster information including conference
   participants, endpoints, media, and media-id (SSRC) can be determined
   using the conference event package  element.
      
   Media-Switching Mixers and Selective Forwarding Middleboxes behave as
   described in .
    
    
      Mapping CLUE Capture Encodings to RTP Streams
      
   The different topologies described in  create different SSRC
   distribution models and RTP stream multiplexing points.
      
   Most video conferencing systems today can separate multiple RTP
   sources by placing them into RTP sessions using the SDP description;
   the video conferencing application can also have some knowledge about
   the purpose of each RTP session.  For example, video conferencing
   applications that have a primary video source and a slides video
   source can send each media source in a separate RTP session with a
   content attribute , enabling different application behavior
   for each received RTP media source.  Demultiplexing is
   straightforward because each Media Capture is sent as a single RTP
   stream, with each RTP stream being sent in a separate RTP session, on
   a distinct UDP 5-tuple.  This will also be true for mapping the RTP
   streams to Capture Encodings, if each Capture Encoding
   uses a separate RTP session and the consumer can identify it based
   on the receiving RTP port.  In this case, SDP only needs to label the
   RTP session with an identifier that can be used to identify the Media
   Capture in the CLUE description.  The SDP label attribute serves as
   this identifier.
      
   Each Capture Encoding MUST be sent as a separate RTP stream.  CLUE
   endpoints MUST support sending each such RTP stream in a separate RTP
   session signaled by an SDP "m=" line.  They MAY also support sending
   some or all of the RTP streams in a single RTP session, using the
   mechanism described in  to
   relate RTP streams to SDP "m=" lines.
      
   MCCs bring another mapping issue, in that an MCC represents multiple
   Media Captures that can be sent as part of the MCC if configured by
   the consumer.  When receiving an RTP stream that is mapped to the
   MCC, the consumer needs to know which original MC it is in order to
   get the MC parameters from the advertisement.  If a consumer
   requested a MCC, the original MC does not have a Capture Encoding, so
   it cannot be associated with an "m=" line using a label as described in
   "CLUE Signaling" .  It is important, for
   example, to get correct scaling information for the original MC,
   which may be different for the various MCs that are contributing to
   the MCC.
    
    
      MCC Constituent CaptureID Definition
      
   For an MCC that can represent multiple switched MCs, there is a need
   to know which MC is represented in the current RTP stream at any
   given time.  This requires a mapping from the SSRC of the RTP stream
   conveying a particular MCC to the constituent MC.  In order to
   address this mapping, this document defines an RTP header extension
   and SDES item that includes the captureID of the original MC,
   allowing the consumer to use the MC's original source attributes like
   the spatial information.
      
   This mapping temporarily associates the SSRC of the RTP stream
   conveying a particular MCC with the captureID of the single original
   MC that is currently switched into the MCC.  This mapping cannot be
   used for a composed case where more than one original MC is
   composed into the MCC simultaneously.
      
   If there is only one MC in the MCC, then the media provider MUST send
   the captureID of the current constituent MC in the RTP header
   extension and as an RTCP CaptureID SDES item.  When the media provider
   switches the MC it sends within an MCC, it MUST send the captureID
   value for the MC that just switched into the MCC in an RTP header
   extension and as an RTCP CaptureID SDES item as specified in .
      
   If there is more than one MC composed into the MCC, then the media
   provider MUST NOT send any of the MCs' captureIDs using this
   mechanism.  However, if an MCC is sending Contributing Source (CSRC)
   information in the RTP header for a composed capture, it MAY send the
   captureID values in the RTCP SDES packets giving source information
   for the SSRC values sent as CSRCs.
      
   If the media provider sends the captureID of a single MC switched
   into an MCC, then later sends one composed stream of multiple MCs in
   the same MCC, it MUST send the special value "-", a single-dash
   character, as the captureID RTP header extension and RTCP CaptureID
   SDES item.  The single-dash character indicates there is no
   applicable value for the MCC constituent CaptureID.  The media
   consumer interprets this as meaning that any previous CaptureID value
   associated with this SSRC no longer applies. As
    defines the captureID syntax as
   "xs:ID", the single-dash character is not a legal captureID value, so
   there is no possibility of confusing it with an actual captureID.
      
        RTCP CaptureID SDES Item
        This document specifies a new RTCP SDES item.
        
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|   CaptId=14   |     length    | CaptureID                     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|   ....        |
+-+-+-+-+-+-+-+-+

        
   This CaptureID is a variable-length UTF-8 string corresponding to either
   a CaptureID negotiated in the CLUE protocol or the single
   character "-".
        
   This SDES item MUST be sent in an SDES packet within a compound RTCP
   packet unless support for Reduced-Size RTCP has been negotiated as
   specified in RFC 5506 , in which case it can be sent as an
   SDES packet in a noncompound RTCP packet.
      
      
        RTP Header Extension
        
   The CaptureID is also carried in an RTP header extension ,
   using the mechanism defined in .
        
   Support is negotiated within SDP using the URN "urn:ietf:params:rtp-hdrext:sdes:CaptureID".
        
   The CaptureID is sent in an RTP header extension because for switched
   captures, receivers need to know which original MC corresponds to the
   media being sent for an MCC, in order to correctly apply geometric
   adjustments to the received media.
        
   As discussed in , there is no need to send the CaptId Header
   Extension with all RTP packets.  Senders MAY choose to send it only
   when a new MC is sent.  If such a mode is being used, the header
   extension SHOULD be sent in the first few RTP packets to reduce the
   risk of losing it due to packet loss.  See  for further discussion.
      
    
    
      Examples
      
   In this partial advertisement, the media provider advertises a
   composed capture VC7 made of a big picture representing the current
   speaker (VC3) and two picture-in-picture boxes representing the
   previous speakers (the previous one -- VC5 -- and the oldest one -- VC6).
      
  <ns2:mediaCapture
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:type="ns2:videoCaptureType" captureID="VC7"
       mediaType="video">
         <ns2:captureSceneIDREF>CS1</ns2:captureSceneIDREF>
         <ns2:nonSpatiallyDefinable>true</ns2:nonSpatiallyDefinable>
         <ns2:content>
               <ns2:captureIDREF>VC3</ns2:captureIDREF>
               <ns2:captureIDREF>VC5</ns2:captureIDREF>
               <ns2:captureIDREF>VC6</ns2:captureIDREF>
         </ns2:content>
                 <ns2:maxCaptures>3</ns2:maxCaptures>
           <ns2:allowSubsetChoice>false</ns2:allowSubsetChoice>
         <ns2:description lang="en">big picture of the current
           speaker pips about previous speakers</ns2:description>
           <ns2:priority>1</ns2:priority>
           <ns2:lang>it</ns2:lang>
           <ns2:mobility>static</ns2:mobility>
           <ns2:view>individual</ns2:view>
       </ns2:mediaCapture>

      
   In this case, the media provider will send capture IDs VC3, VC5, or VC6
   as an RTP header extension and RTCP SDES message for the RTP stream
   associated with the MC.
      
   Note that this is part of the full advertisement message example from
   the CLUE data model example  and is not a
   valid XML document.
    
    
      Communication Security
      
   CLUE endpoints MUST support RTP/SAVPF profiles and the Secure Real-time Transport Protocol (SRTP) .
   CLUE endpoints MUST support DTLS  and DTLS-SRTP 
         for SRTP keying.
      
   All media channels SHOULD be secure via SRTP and the RTP/SAVPF
   profile unless the RTP media and its associated RTCP are secure by
   other means (see  and ).
      
   All CLUE implementations MUST support DTLS 1.2 with the
   TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 cipher suite and the P-256
   curve .  The DTLS-SRTP protection profile
   SRTP_AES128_CM_HMAC_SHA1_80 MUST be supported for SRTP.
   Implementations MUST favor cipher suites that support Perfect
   Forward Secrecy (PFS) over non-PFS cipher suites and SHOULD favor
   Authenticated Encryption with Associated Data (AEAD) over non-AEAD
   cipher suites.  Encrypted SRTP Header extensions  MUST be supported.

      
   Implementations SHOULD implement DTLS 1.2 with the
   TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 cipher suite.
   Implementations MUST favor cipher suites that support Perfect Forward Secrecy (PFS) over non-
   PFS cipher suites and SHOULD favor Authenticated Encryption with Associated Data (AEAD) over non-AEAD cipher suites.
      
   NULL Protection profiles MUST NOT be used for RTP or RTCP.
      
   CLUE endpoints MUST generate short-term persistent RTCP CNAMEs, as

   specified in , and thus can't be used for long-term tracking
   of the users.
    
    
      IANA Considerations
      
   This document defines a new extension URI in the "RTP SDES Compact
   Header Extensions" subregistry of the "Real-Time Transport Protocol
   (RTP) Parameters" registry, according to the following data:
      
        Extension URI:
        urn:ietf:params:rtp-hdrext:sdes:CaptId
        Description:
        CLUE CaptId
        Contact:
        
           <ron.even.tlv@gmail.com>
        
        Reference:
        RFC 8849
      
      The IANA has registered one new RTCP SDES items in the
"RTCP SDES Item Types" registry, as follows:
      
        
          
            Value
            Abbrev
            Name
            Reference
          
          
            14
            CCID
            CLUE CaptId
            RFC 8849
          
        
      
    
    
      Security Considerations
      
   The security considerations of the RTP specification, the RTP/SAVPF
   profile, and the various RTP/RTCP extensions and RTP payload formats
   that form the complete protocol suite described in this memo apply.
   It is believed that there are no new security considerations
   resulting from the combination of these various protocol extensions.
      
   The "Extended Secure RTP Profile for Real-time Transport Control
   Protocol (RTCP)-Based Feedback (RTP/SAVPF)" document  provides
   the handling of fundamental issues by offering confidentiality, integrity,
   and partial source authentication.  A mandatory-to-implement and use
   media security solution is created by combining this secured RTP
   profile and DTLS-SRTP keying  as defined in the
   communication security section of this memo ().
      
      
   RTCP packets convey a CNAME identifier that is used
   to associate RTP packet streams that need to be synchronized across
   related RTP sessions.  Inappropriate choice of CNAME values can be a
   privacy concern, since long-term persistent CNAME identifiers can be
   used to track users across multiple calls.  The communication
   security section of this memo () mandates the generation of short-
   term persistent RTCP CNAMEs, as specified in , so they can't
   be used for long-term tracking of the users.
      
   Some potential denial-of-service attacks exist if the RTCP reporting
   interval is configured to an inappropriate value. 

   This could be done
   by configuring the RTCP bandwidth fraction to an excessively large or
   small value using the SDP "b=RR:" or "b=RS:" lines , or some
   similar mechanism, or by choosing an excessively large or small value
   for the RTP/AVPF minimal receiver report interval (if using SDP, this
   is the "a=rtcp-fb:... trr-int" parameter) . The risks are as
   follows:
      
        The RTCP bandwidth could be configured to make the regular
       reporting interval so large that effective congestion control
       cannot be maintained, potentially leading to denial of service
       due to congestion caused by the media traffic;
        The RTCP interval could be configured to a very small value,
       causing endpoints to generate high-rate RTCP traffic, which potentially
       leads to denial of service due to the non-congestion-controlled
       RTCP traffic; and
        RTCP parameters could be configured differently for each
       endpoint, with some of the endpoints using a large reporting
       interval and some using a smaller interval, leading to denial of
       service due to premature participant timeouts, which are due to mismatched
       timeout periods that are based on the reporting interval (this
       is a particular concern if endpoints use a small but non-zero
       value for the RTP/AVPF minimal receiver report interval (trr-int)
       , as discussed in ).
      
      
   Premature participant timeout can be avoided by using the fixed (non-
   reduced) minimum interval when calculating the participant timeout
   .  To address the other
   concerns, endpoints SHOULD ignore parameters that configure the RTCP
   reporting interval to be significantly longer than the default five-second
   interval specified in  (unless the media data rate is
   so low that the longer reporting interval roughly corresponds to 5%
   of the media data rate) or that configure the RTCP reporting
   interval small enough that the RTCP bandwidth would exceed the media
   bandwidth.
      
   The guidelines in  apply when using variable bit rate (VBR)
   audio codecs such as Opus.
      
   Encryption of the header extensions is RECOMMENDED,
   unless there are known reasons, like RTP middleboxes performing voice-activity-based
   source selection or third-party monitoring that will
   greatly benefit from the information, and this has been expressed
   using API or signaling.  If further evidence is produced to show
   that information leakage is significant from audio level indications,
   then the use of encryption needs to be mandated at that time.
      
   In multi-party communication scenarios using RTP middleboxes,
   the middleboxes are REQUIRED, by this protocol, to not weaken the
   sessions' security.  The middlebox SHOULD maintain
   confidentiality, maintain integrity, and perform source authentication.  The
   middlebox MAY perform checks that prevent any endpoint participating
   in a conference to impersonate another.  Some additional security
   considerations regarding multi-party topologies can be found in
   .
      
   The CaptureID is created as part of the CLUE protocol.  The CaptId
   SDES item is used to convey the same CaptureID value in the SDES
   item.  When sending the SDES item, the security considerations
   specified in  and in the
   communication security section of this memo (see ) are applicable.
   Note that since the CaptureID is also carried in CLUE protocol
   messages, it is RECOMMENDED that this SDES item use at least similar
   protection profiles as the CLUE protocol messages carried in the CLUE
   data channel.
    
  
  
    
      References
      
        Normative References
        
          
            Key words for use in RFCs to Indicate Requirement Levels
            
              
            
            
            
              In many standards track documents several words are used to signify the requirements in the specification.  These words are often capitalized. This document defines these words as they should be interpreted in IETF documents.  This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.
            
          
          
          
          
        
        
          
            The Secure Real-time Transport Protocol (SRTP)
            
              
            
            
              
            
            
              
            
            
              
            
            
              
            
            
            
              This document describes the Secure Real-time Transport Protocol (SRTP), a profile of the Real-time Transport Protocol (RTP), which can provide confidentiality, message authentication, and replay protection to the RTP traffic and to the control traffic for RTP, the Real-time Transport Control Protocol (RTCP).   [STANDARDS-TRACK]
            
          
          
          
        
        
          
            Framework for Establishing a Secure Real-time Transport Protocol (SRTP) Security Context Using Datagram Transport Layer Security (DTLS)
            
              
            
            
              
            
            
              
            
            
            
              This document specifies how to use the Session Initiation Protocol (SIP) to establish a Secure Real-time Transport Protocol (SRTP) security context using the Datagram Transport Layer Security (DTLS) protocol.  It describes a mechanism of transporting a fingerprint attribute in the Session Description Protocol (SDP) that identifies the key that will be presented during the DTLS handshake.  The key exchange travels along the media path as opposed to the signaling path.  The SIP Identity mechanism can be used to protect the integrity of the fingerprint attribute from modification by intermediate proxies.  [STANDARDS-TRACK]
            
          
          
          
        
        
          
            Datagram Transport Layer Security (DTLS) Extension to Establish Keys for the Secure Real-time Transport Protocol (SRTP)
            
              
            
            
              
            
            
            
              This document describes a Datagram Transport Layer Security (DTLS) extension to establish keys for Secure RTP (SRTP) and Secure RTP Control Protocol (SRTCP) flows.  DTLS keying happens on the media path, independent of any out-of-band signalling channel present. [STANDARDS-TRACK]
            
          
          
          
        
        
          
            Datagram Transport Layer Security Version 1.2
            
              
            
            
              
            
            
            
              This document specifies version 1.2 of the Datagram Transport Layer Security (DTLS) protocol.  The DTLS protocol provides communications privacy for datagram protocols.  The protocol allows client/server applications to communicate in a way that is designed to prevent eavesdropping, tampering, or message forgery.  The DTLS protocol is based on the Transport Layer Security (TLS) protocol and provides equivalent security guarantees.  Datagram semantics of the underlying transport are preserved by the DTLS protocol.  This document updates DTLS 1.0 to work with TLS version 1.2.  [STANDARDS-TRACK]
            
          
          
          
        
        
          
            Encryption of Header Extensions in the Secure Real-time Transport Protocol (SRTP)
            
              
            
            
            
              The Secure Real-time Transport Protocol (SRTP) provides authentication, but not encryption, of the headers of Real-time Transport Protocol (RTP) packets.  However, RTP header extensions may carry sensitive information for which participants in multimedia sessions want confidentiality.  This document provides a mechanism, extending the mechanisms of SRTP, to selectively encrypt RTP header extensions in SRTP.
              This document updates RFC 3711, the Secure Real-time Transport Protocol specification, to require that all future SRTP encryption transforms specify how RTP header extensions are to be encrypted.
            
          
          
          
        
        
          
            RTP Header Extension for the RTP Control Protocol (RTCP) Source Description Items
            
              
            
            
              
            
            
              
            
            
              
            
            
            
              Source Description (SDES) items are normally transported in the RTP Control Protocol (RTCP).  In some cases, it can be beneficial to speed up the delivery of these items.  The main case is when a new synchronization source (SSRC) joins an RTP session and the receivers need this source's identity, relation to other sources, or its synchronization context, all of which may be fully or partially identified using SDES items.  To enable this optimization, this document specifies a new RTP header extension that can carry SDES items.
            
          
          
          
        
        
          
            Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words
            
              
            
            
            
              RFC 2119 specifies common key words that may be used in protocol  specifications.  This document aims to reduce the ambiguity by clarifying that only UPPERCASE usage of the key words have the  defined special meanings.
            
          
          
          
          
        
        
          
            Negotiating Media Multiplexing Using the Session Description Protocol (SDP)
            
              
            
            
              
            
            
              
            
            
          
          
          
        
        
          
            Framework for Telepresence Multi-Streams
            
              
            
            
              
            
            
              
            
            
          
          
          
        
        
          
            An XML Schema for the Controlling Multiple Streams for Telepresence (CLUE) Data Model
            
              
            
            
              
            
            
          
          
          
        
      
      
        Informative References
        
          
            Digital Signature Standard (DSS)
            
            
              National Institute of Standards and Technology (NIST)
            
            
          
          FIPS, PUB 186-4
        
        
          
            An Offer/Answer Model with Session Description Protocol (SDP)
            
              
            
            
              
            
            
            
              This document defines a mechanism by which two entities can make use of the Session Description Protocol (SDP) to arrive at a common view of a multimedia session between them.  In the model, one participant offers the other a description of the desired session from their perspective, and the other participant answers with the desired session from their perspective.  This offer/answer model is most useful in unicast sessions where information from both participants is needed for the complete view of the session.  The offer/answer model is used by protocols like the Session Initiation Protocol (SIP).  [STANDARDS-TRACK]
            
          
          
          
        
        
          
            RTP: A Transport Protocol for Real-Time Applications
            
              
            
            
              
            
            
              
            
            
              
            
            
            
              This memorandum describes RTP, the real-time transport protocol.  RTP provides end-to-end network transport functions suitable for applications transmitting real-time data, such as audio, video or simulation data, over multicast or unicast network services.  RTP does not address resource reservation and does not guarantee quality-of- service for real-time services.  The data transport is augmented by a control protocol (RTCP) to allow monitoring of the data delivery in a manner scalable to large multicast networks, and to provide minimal control and identification functionality.  RTP and RTCP are designed to be independent of the underlying transport and network layers.  The protocol supports the use of RTP-level translators and mixers. Most of the text in this memorandum is identical to RFC 1889 which it obsoletes.  There are no changes in the packet formats on the wire, only changes to the rules and algorithms governing how the protocol is used. The biggest change is an enhancement to the scalable timer algorithm for calculating when to send RTCP packets in order to minimize transmission in excess of the intended rate when many participants join a session simultaneously.  [STANDARDS-TRACK]
            
          
          
          
          
        
        
          
            Session Description Protocol (SDP) Bandwidth Modifiers for RTP Control Protocol (RTCP) Bandwidth
            
              
            
            
            
              This document defines an extension to the Session Description Protocol (SDP) to specify two additional modifiers for the bandwidth attribute. These modifiers may be used to specify the bandwidth allowed for RTP Control Protocol (RTCP) packets in a Real-time Transport Protocol (RTP) session.  [STANDARDS-TRACK]
            
          
          
          
        
        
          
            SDP: Session Description Protocol
            
              
            
            
              
            
            
              
            
            
            
              This memo defines the Session Description Protocol (SDP).  SDP is intended for describing multimedia sessions for the purposes of session announcement, session invitation, and other forms of multimedia session initiation.  [STANDARDS-TRACK]
            
          
          
          
        
        
          
            A Session Initiation Protocol (SIP) Event Package for Conference State
            
              
            
            
              
            
            
              
            
            
            
              This document defines a conference event package for tightly coupled conferences using the Session Initiation Protocol (SIP) events framework, along with a data format used in notifications for this package.  The conference package allows users to subscribe to a conference Uniform Resource Identifier (URI).  Notifications are sent about changes in the membership of this conference and optionally about changes in the state of additional conference components.  [STANDARDS-TRACK]
            
          
          
          
        
        
          
            Extended RTP Profile for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/AVPF)
            
              
            
            
              
            
            
              
            
            
              
            
            
              
            
            
            
              Real-time media streams that use RTP are, to some degree, resilient against packet losses.  Receivers may use the base mechanisms of the Real-time Transport Control Protocol (RTCP) to report packet reception statistics and thus allow a sender to adapt its transmission behavior in the mid-term.  This is the sole means for feedback and feedback-based error repair (besides a few codec-specific mechanisms).  This document defines an extension to the Audio-visual Profile (AVP) that enables receivers to provide, statistically, more immediate feedback to the senders and thus allows for short-term adaptation and efficient feedback-based repair mechanisms to be implemented.  This early feedback profile (AVPF) maintains the AVP bandwidth constraints for RTCP and preserves scalability to large groups.  [STANDARDS-TRACK]
            
          
          
          
        
        
          
            The Session Description Protocol (SDP) Content Attribute
            
              
            
            
              
            
            
            
              This document defines a new Session Description Protocol (SDP) media- level attribute, 'content'.  The 'content' attribute defines the content of the media stream to a more detailed level than the media description line.  The sender of an SDP session description can attach the 'content' attribute to one or more media streams.  The receiving application can then treat each media stream differently (e.g., show it on a big or small screen) based on its content.  [STANDARDS-TRACK]
            
          
          
          
        
        
          
            Extended Secure RTP Profile for Real-time Transport Control Protocol (RTCP)-Based Feedback (RTP/SAVPF)
            
              
            
            
              
            
            
            
              An RTP profile (SAVP) for secure real-time communications and another profile (AVPF) to provide timely feedback from the receivers to a sender are defined in RFC 3711 and RFC 4585, respectively.  This memo specifies the combination of both profiles to enable secure RTP communications with feedback.  [STANDARDS-TRACK]
            
          
          
          
        
        
          
            Support for Reduced-Size Real-Time Transport Control Protocol (RTCP): Opportunities and Consequences
            
              
            
            
              
            
            
            
              This memo discusses benefits and issues that arise when allowing Real-time Transport Protocol (RTCP) packets to be transmitted with reduced size.  The size can be reduced if the rules on how to create compound packets outlined in RFC 3550 are removed or changed.  Based on that analysis, this memo defines certain changes to the rules to allow feedback messages to be sent as Reduced-Size RTCP packets under certain conditions when using the RTP/AVPF (Real-time Transport Protocol / Audio-Visual Profile with Feedback) profile (RFC 4585). This document updates RFC 3550, RFC 3711, and RFC 4585.  [STANDARDS-TRACK]
            
          
          
          
        
        
          
            Guidelines for the Use of Variable Bit Rate Audio with Secure RTP
            
              
            
            
              
            
            
            
              This memo discusses potential security issues that arise when using variable bit rate (VBR) audio with the secure RTP profile.  Guidelines to mitigate these issues are suggested.  [STANDARDS-TRACK]
            
          
          
          
        
        
          
            Guidelines for Choosing RTP Control Protocol (RTCP) Canonical Names (CNAMEs)
            
              
            
            
              
            
            
              
            
            
              
            
            
            
              The RTP Control Protocol (RTCP) Canonical Name (CNAME) is a persistent transport-level identifier for an RTP endpoint.  While the Synchronization Source (SSRC) identifier of an RTP endpoint may change if a collision is detected or when the RTP application is restarted, its RTCP CNAME is meant to stay unchanged, so that RTP endpoints can be uniquely identified and associated with their RTP media streams.
              For proper functionality, RTCP CNAMEs should be unique within the participants of an RTP session.  However, the existing guidelines for choosing the RTCP CNAME provided in the RTP standard (RFC 3550) are insufficient to achieve this uniqueness.  RFC 6222 was published to update those guidelines to allow endpoints to choose unique RTCP CNAMEs.  Unfortunately, later investigations showed that some parts of the new algorithms were unnecessarily complicated and/or ineffective.  This document addresses these concerns and replaces RFC 6222.
            
          
          
          
        
        
          
            Options for Securing RTP Sessions
            
              
            
            
              
            
            
            
              The Real-time Transport Protocol (RTP) is used in a large number of different application domains and environments.  This heterogeneity implies that different security mechanisms are needed to provide services such as confidentiality, integrity, and source authentication of RTP and RTP Control Protocol (RTCP) packets suitable for the various environments.  The range of solutions makes it difficult for RTP-based application developers to pick the most suitable mechanism.  This document provides an overview of a number of security solutions for RTP and gives guidance for developers on how to choose the appropriate security mechanism.
            
          
          
          
        
        
          
            Securing the RTP Framework: Why RTP Does Not Mandate a Single Media Security Solution
            
              
            
            
              
            
            
            
              This memo discusses the problem of securing real-time multimedia sessions.  It also explains why the Real-time Transport Protocol (RTP) and the associated RTP Control Protocol (RTCP) do not mandate a single media security mechanism.  This is relevant for designers and reviewers of future RTP extensions to ensure that appropriate security mechanisms are mandated and that any such mechanisms are specified in a manner that conforms with the RTP architecture.
            
          
          
          
        
        
          
            Use Cases for Telepresence Multistreams
            
              
            
            
              
            
            
              
            
            
              
            
            
            
              Telepresence conferencing systems seek to create an environment that gives users (or user groups) that are not co-located a feeling of co-located presence through multimedia communication that includes at least audio and video signals of high fidelity.  A number of techniques for handling audio and video streams are used to create this experience.  When these techniques are not similar, interoperability between different systems is difficult at best, and often not possible.  Conveying information about the relationships between multiple streams of media would enable senders and receivers to make choices to allow telepresence systems to interwork.  This memo describes the most typical and important use cases for sending multiple streams in a telepresence conference.
            
          
          
          
        
        
          
            RTP Topologies
            
              
            
            
              
            
            
            
              This document discusses point-to-point and multi-endpoint topologies used in environments based on the Real-time Transport Protocol (RTP). In particular, centralized topologies commonly employed in the video conferencing industry are mapped to the RTP terminology.
            
          
          
          
        
        
          
            Sending Multiple RTP Streams in a Single RTP Session
            
              
            
            
              
            
            
              
            
            
              
            
            
            
              This memo expands and clarifies the behavior of Real-time Transport Protocol (RTP) endpoints that use multiple synchronization sources (SSRCs).  This occurs, for example, when an endpoint sends multiple RTP streams in a single RTP session.  This memo updates RFC 3550 with regard to handling multiple SSRCs per endpoint in RTP sessions, with a particular focus on RTP Control Protocol (RTCP) behavior.  It also updates RFC 4585 to change and clarify the calculation of the timeout of SSRCs and the inclusion of feedback messages.
            
          
          
          
        
        
          
            A General Mechanism for RTP Header Extensions
            
              
            
            
              
            
            
              
            
            
            
              This document provides a general mechanism to use the header extension feature of RTP (the Real-time Transport Protocol).  It provides the option to use a small number of small extensions in each RTP packet, where the universe of possible extensions is large and registration is decentralized.  The actual extensions in use in a session are signaled in the setup information for that session.  This document obsoletes RFC 5285.
            
          
          
          
        
        
          
            Session Signaling for Controlling Multiple Streams for Telepresence (CLUE)
            
              
            
            
              
            
            
              
            
            
              
            
            
          
          
          
        
      
    
    
      Acknowledgments
      
   The authors would like to thank  and
    for
   contributing text to this work.   helped draft
   the security section.
    
    
      Authors' Addresses
      
        
        
          
            
            Tel Aviv
            
            Israel
          
          ron.even.tlv@gmail.com
        

      
      
        8x8, Inc. / Jitsi
        
          
            
            Jersey City
            NJ
            07302
            United States of America
          
          jonathan.lennox@8x8.com