teas                                                            Q. Xiong
Internet-Draft                                           ZTE Corporation
Intended status: Standards Track                             K. Kompella
Expires: 31 December 2026                                            HPE
                                                                 D. King
                                                    Lancaster University
                                                            29 June 2026


                  HPC/AI Scheduler Job Metadata Model
              draft-xkk-teas-hpc-scheduler-job-metadata-00

Abstract

   This document defines a scheduler-facing metadata model for High
   Performance Computing (HPC) and AI workloads.  The model captures
   common job, workload, scheduler, tenant, timing, and task metadata
   that can be mapped from heterogeneous workload managers and
   orchestration platforms and used as context for network service
   intent.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at https://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on 31 December 2026.

Copyright Notice

   Copyright (c) 2026 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents (https://trustee.ietf.org/
   license-info) in effect on the date of publication of this document.
   Please review these documents carefully, as they describe your rights
   and restrictions with respect to this document.  Code Components


Xiong, et al.           Expires 31 December 2026                [Page 1]

Internet-Draft        HPC/AI scheduler job metadata            June 2026


   extracted from this document must include Revised BSD License text as
   described in Section 4.e of the Trust Legal Provisions and are
   provided without warranty as described in the Revised BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
   2.  Conventions Used in This Document . . . . . . . . . . . . . .   3
     2.1.  Requirements Language . . . . . . . . . . . . . . . . . .   3
   3.  Terminology . . . . . . . . . . . . . . . . . . . . . . . . .   3
   4.  Model Scope . . . . . . . . . . . . . . . . . . . . . . . . .   4
   5.  Model Structure . . . . . . . . . . . . . . . . . . . . . . .   5
   6.  Relationship to Other Models  . . . . . . . . . . . . . . . .   6
   7.  YANG Data Model . . . . . . . . . . . . . . . . . . . . . . .   7
   8.  Security Considerations . . . . . . . . . . . . . . . . . . .  16
   9.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  16
   10. Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  17
   11. References  . . . . . . . . . . . . . . . . . . . . . . . . .  17
     11.1.  Normative References . . . . . . . . . . . . . . . . . .  17
     11.2.  Informative References . . . . . . . . . . . . . . . . .  17
   Appendix A.  Example  . . . . . . . . . . . . . . . . . . . . . .  18
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  19

1.  Introduction

   HPC and AI workflows are commonly managed by workload managers and
   orchestration systems such as batch schedulers, Kubernetes-based
   training systems, workflow engines, and higher-level AI platforms.
   These systems maintain metadata about jobs, tasks, users, tenants,
   timing, resource requests, and workload structure.

   Examples of such systems include HPC workload managers such as Slurm,
   PBS Pro/OpenPBS, IBM Spectrum LSF, and Grid Engine-style schedulers,
   as well as AI and machine learning orchestration platforms based on
   Kubernetes, Kubeflow, Ray, Volcano, Kueue, Red Hat OpenShift AI,
   NVIDIA Base Command Manager, and NVIDIA Run:ai.  These examples are
   illustrative; the model is intended to be independent of any specific
   scheduler or orchestration platform.

   The requirements reflected in this model are derived from the types
   of information commonly exposed by such workload schedulers and AI
   orchestration platforms, including workload identity, job structure,
   task or role information, timing, placement context, tenant or
   project context, and correlation identifiers.  The intent is to carry
   the network-relevant subset of this information without requiring the
   network domain to adopt the native data model of any one scheduler.


Xiong, et al.           Expires 31 December 2026                [Page 2]

Internet-Draft        HPC/AI scheduler job metadata            June 2026


   The representation of this metadata is platform-specific.  For
   example, an HPC scheduler may identify jobs using scheduler-local job
   identifiers and queues, while a Kubernetes-based AI platform may use
   namespaces, custom resources, pod sets, and workload admission
   objects.  A common metadata model allows the network-relevant
   portions of these platform-specific job descriptions to be
   represented in a consistent form.

   The broader HP-WAN context and current deployment considerations are
   described in [I-D.kcrh-hpwan-state-of-art] and
   [I-D.xhy-hpwan-framework].  This document focuses on the scheduler
   and job metadata needed to relate workload context to that network
   environment.

   Related work on machine learning cluster scheduling, including
   [I-D.kompella-rtgwg-mlnwsched], illustrates that job timing,
   placement, and resource context can be relevant beyond the compute
   scheduler itself.  This document provides a platform-neutral way to
   carry scheduler and job metadata that can be used for correlation
   with network service intent.

   This document defines a YANG model for scheduler and job metadata.
   It does not define the requested network service itself and does not
   define how that service is realized in the network.  The metadata
   defined here is intended to be used by a service intent model that
   expresses the desired connectivity outcome for the workload.

2.  Conventions Used in This Document

2.1.  Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in BCP
   14 [RFC2119] [RFC8174] when, and only when, they appear in all
   capitals, as shown here.

3.  Terminology

   This document defines common terminology used by the HPC/AI scheduler
   job metadata model, the HPC/AI service intent model, and the HPC/AI
   tunnel realization model.

   Workload:  A unit of work submitted to, or managed by, a workload
      manager or orchestration platform.  A workload can be an HPC batch
      workload, an AI training workload, an inference workload, a data
      movement workflow, or another scheduled application-level
      activity.


Xiong, et al.           Expires 31 December 2026                [Page 3]

Internet-Draft        HPC/AI scheduler job metadata            June 2026


   Job:  A scheduler-visible execution object associated with a
      workload.  A job is identified by the originating scheduler or
      orchestration platform and can contain one or more tasks, roles,
      replicas, or execution units.

   Task:  A component of a job that represents a schedulable or
      executable part of the workload.  Examples include an HPC task, an
      MPI rank group, a training worker, a parameter-server role, or a
      workflow stage.

   Scheduler:  A workload manager or orchestration system that creates,
      admits, places, or manages workloads and jobs.  Examples include
      HPC batch schedulers and Kubernetes-based AI orchestration
      systems.

   Scheduler Job Metadata:  Platform-neutral context describing the
      originating scheduler, submitter, workload, job, task structure,
      and timing information.  Scheduler job metadata identifies and
      describes the workload but does not request network connectivity.

   Service Intent:  A request for a network service associated with a
      workload or job.  Service intent describes the desired
      connectivity outcome, including endpoints, communication pattern,
      timing, data movement, performance objectives, policy preferences,
      and admission state.  It does not prescribe the network mechanism
      used to realize the service.

   Tunnel Realization:  The network-side realization of an admitted
      service intent.  A tunnel realization can reference tunnels,
      paths, policy, protection, resource allocation, lifecycle state,
      and performance monitoring associated with the service intent.

   Correlation Identifier:  An identifier used to associate scheduler
      job metadata, service intent, and tunnel realization state across
      systems that may use different native identifiers.

4.  Model Scope

   The scheduler job metadata model provides workload context that can
   be consumed by a network service intent system.  It includes
   identifiers and descriptive attributes that allow a network
   controller, orchestrator, or broker to correlate a network service
   request with the originating workload manager and job.

   The model is intended to be independent of a specific workload
   manager.  Platform-specific identifiers are carried as metadata and
   do not imply that the network controller understands the internal
   scheduling behavior of the originating platform.


Xiong, et al.           Expires 31 December 2026                [Page 4]

Internet-Draft        HPC/AI scheduler job metadata            June 2026


   This model is intended to provide a stable boundary between workload
   scheduling systems and IETF-defined interfaces used by data center
   and inter-data-center network orchestration systems.

5.  Model Structure

         module: ietf-hpc-scheduler-job-metadata
           +--rw hpc-scheduler-job-metadata
              +--rw scheduler
              |  +--rw scheduler-id?            string
              |  +--rw scheduler-name?          string
              |  +--rw scheduler-type?          identityref
              |  +--rw platform-instance?       string
              +--rw submitter
              |  +--rw tenant-id?               string
              |  +--rw project-id?              string
              |  +--rw namespace?               string
              |  +--rw user-id?                 string
              |  +--rw account-id?              string
              +--rw workload
              |  +--rw workload-id?             string
              |  +--rw workload-name?           string
              |  +--rw workload-type?           identityref
              |  +--rw framework?               identityref
              |  +--rw priority?                uint32
              |  +--rw queue?                   string
              |  +--rw correlation-id?          string
              +--rw job
              |  +--rw job-id?                  string
              |  +--rw job-name?                string
              |  +--rw job-array-id?            string
              |  +--rw job-size?                uint32
              |  +--rw task* [task-id]
              |     +--rw task-id               string
              |     +--rw task-name?            string
              |     +--rw task-role?            identityref
              |     +--rw task-index?           uint32
              +--rw timing
                 +--rw submit-time?             yang:date-and-time
                 +--rw earliest-start-time?     yang:date-and-time
                 +--rw requested-start-time?    yang:date-and-time
                 +--rw deadline?                yang:date-and-time
                 +--rw requested-duration?      uint32
                 +--rw duration-unit?           identityref

                 Figure 2: Scheduler job metadata model structure


Xiong, et al.           Expires 31 December 2026                [Page 5]

Internet-Draft        HPC/AI scheduler job metadata            June 2026


6.  Relationship to Other Models

   The naming relationship between these concepts is hierarchical.

   * Scheduler job metadata in this document identifies and describes
   the workload.

   * A service intent as per draft-xkk-teas-hpc-service-intent
   identifies the network service requested for that workload.

   * A tunnel realization as per draft-xkk-teas-hpc-tunnel-realization
   identifies the network resources used to realize an admitted service
   intent.

         .----------------------------.
         | Scheduler/Job Metadata     |
         | workload-id, job-id,       |
         | task-id, correlation-id    |
         '-------------+--------------'
                       |
                       | referenced by
                       v
         .-------------+--------------.
         | Service Intent             |
         | intent-id, workload-ref,   |
         | endpoints, objectives      |
         '-------------+--------------'
                       |
                       | admitted and realized by
                       v
         .-------------+--------------.
         | Tunnel Realization         |
         | realization-id, intent-ref,|
         | tunnel/path references     |
         '----------------------------'

                 Figure 1: Relationship

   A workload or job can have zero or more service intent instances.  A
   service intent instance can have zero or more tunnel realization
   instances.  A tunnel realization instance is associated with one
   service intent instance, although the underlying network service may
   use one or more tunnels, paths, or technology-specific constructs.

   The scheduler job metadata model provides context for a separate
   service intent request.  A service intent instance can refer to the
   metadata instance using a workload identifier, job identifier, or
   correlation identifier.  This separation allows multiple service


Xiong, et al.           Expires 31 December 2026                [Page 6]

Internet-Draft        HPC/AI scheduler job metadata            June 2026


   intent requests to be associated with a single workload, and allows
   one service intent request to be updated or replaced without changing
   the scheduler metadata.

7.  YANG Data Model

   The YANG data model is as follows:


module ietf-hpc-scheduler-job-metadata {
  yang-version 1.1;
  namespace "urn:ietf:params:xml:ns:yang:ietf-hpc-scheduler-job-metadata";
  prefix hpc-sched;

  import ietf-yang-types {
    prefix yang;
    reference
      "RFC 6991: Common YANG Data Types";
  }

  organization
    "IETF Traffic Engineering Architecture and Signaling (TEAS)
     Working Group";
  contact
    "WG Web:   <https://datatracker.ietf.org/wg/teas/>
     WG List:  <mailto:teas@ietf.org>

     Editor:   Quan Xiong
               <mailto:xiong.quan@zte.com.cn>

     Editor:   Kireeti Kompella
               <mailto:kireeti.ietf@gmail.com>

     Editor:   Daniel King
               <mailto:d.king@lancaster.ac.uk>";

  description
    "This module defines a scheduler-facing metadata model for
     High Performance Computing (HPC) and AI workloads. The model
     captures common job, workload, scheduler, tenant, timing, and
     task metadata that can be mapped from heterogeneous workload
     managers and orchestration platforms.

     Copyright (c) 2026 IETF Trust and the persons identified as
     authors of the code. All rights reserved.

     Redistribution and use in source and binary forms, with or
     without modification, is permitted pursuant to, and subject


Xiong, et al.           Expires 31 December 2026                [Page 7]

Internet-Draft        HPC/AI scheduler job metadata            June 2026


     to the license terms contained in, the Revised BSD License
     set forth in Section 4.c of the IETF Trust's Legal Provisions
     Relating to IETF Documents
     (https://trustee.ietf.org/license-info).

     This version of this YANG module is part of RFC XXXX; see
     the RFC itself for full legal notices.";

  revision 2026-04-23 {
    description
      "Initial version of the HPC/AI scheduler job metadata model.";
    reference
      "RFC XXXX: HPC/AI Scheduler Job Metadata Model";
  }

  /*
   * Identity definitions
   */
  identity scheduler-type {
    description
      "Base identity for scheduler types.";
  }

  identity slurm {
    base scheduler-type;
    description
      "Slurm workload manager.";
  }

  identity pbs {
    base scheduler-type;
    description
      "PBS Pro/OpenPBS workload manager.";
  }

  identity lsf {
    base scheduler-type;
    description
      "IBM Spectrum LSF workload manager.";
  }

  identity kubernetes {
    base scheduler-type;
    description
      "Kubernetes-based orchestration platform.";
  }

  identity kubeflow {


Xiong, et al.           Expires 31 December 2026                [Page 8]

Internet-Draft        HPC/AI scheduler job metadata            June 2026


    base scheduler-type;
    description
      "Kubeflow AI orchestration platform.";
  }

  identity workload-type {
    description
      "Base identity for workload types.";
  }

  identity hpc-batch {
    base workload-type;
    description
      "HPC batch workload.";
  }

  identity ai-training {
    base workload-type;
    description
      "AI training workload.";
  }

  identity ai-inference {
    base workload-type;
    description
      "AI inference workload.";
  }

  identity data-movement {
    base workload-type;
    description
      "Data movement workload.";
  }

  identity framework {
    description
      "Base identity for workload frameworks.";
  }

  identity mpi {
    base framework;
    description
      "Message Passing Interface (MPI) framework.";
  }

  identity tensorflow {
    base framework;
    description


Xiong, et al.           Expires 31 December 2026                [Page 9]

Internet-Draft        HPC/AI scheduler job metadata            June 2026


      "TensorFlow machine learning framework.";
  }

  identity pytorch {
    base framework;
    description
      "PyTorch machine learning framework.";
  }

  identity task-role {
    description
      "Base identity for task roles.";
  }

  identity worker {
    base task-role;
    description
      "Worker role in distributed computation.";
  }

  identity parameter-server {
    base task-role;
    description
      "Parameter server role in distributed training.";
  }

  identity master {
    base task-role;
    description
      "Master/coordinator role.";
  }

  identity duration-unit {
    description
      "Base identity for duration units.";
  }

  identity seconds {
    base duration-unit;
    description
      "Duration in seconds.";
  }

  identity minutes {
    base duration-unit;
    description
      "Duration in minutes.";
  }


Xiong, et al.           Expires 31 December 2026               [Page 10]

Internet-Draft        HPC/AI scheduler job metadata            June 2026


  identity hours {
    base duration-unit;
    description
      "Duration in hours.";
  }

  /*
   * Typedefs
   */
  typedef priority-type {
    type uint32 {
      range "0..1000";
    }
    description
      "Priority value type, with higher values indicating higher priority.";
  }

  /*
   * Groupings
   */
  grouping scheduler-grouping {
    description
      "Scheduler identification and metadata.";
    leaf scheduler-id {
      type string;
      description
        "Unique identifier for the scheduler instance.";
    }
    leaf scheduler-name {
      type string;
      description
        "Human-readable name of the scheduler.";
    }
    leaf scheduler-type {
      type identityref {
        base scheduler-type;
      }
      description
        "Type of scheduler or orchestration platform.";
    }
    leaf platform-instance {
      type string;
      description
        "Platform-specific instance identifier or version.";
    }
  }

  grouping submitter-grouping {


Xiong, et al.           Expires 31 December 2026               [Page 11]

Internet-Draft        HPC/AI scheduler job metadata            June 2026


    description
      "Submitter and tenant context.";
    leaf tenant-id {
      type string;
      description
        "Tenant identifier for multi-tenant environments.";
    }
    leaf project-id {
      type string;
      description
        "Project identifier within the tenant.";
    }
    leaf namespace {
      type string;
      description
        "Namespace identifier (e.g., Kubernetes namespace).";
    }
    leaf user-id {
      type string;
      description
        "User identifier who submitted the workload.";
    }
    leaf account-id {
      type string;
      description
        "Accounting or billing account identifier.";
    }
  }

  grouping workload-grouping {
    description
      "Workload identification and metadata.";
    leaf workload-id {
      type string;
      description
        "Unique identifier for the workload.";
    }
    leaf workload-name {
      type string;
      description
        "Human-readable name of the workload.";
    }
    leaf workload-type {
      type identityref {
        base workload-type;
      }
      description
        "Type of workload.";


Xiong, et al.           Expires 31 December 2026               [Page 12]

Internet-Draft        HPC/AI scheduler job metadata            June 2026


    }
    leaf framework {
      type identityref {
        base framework;
      }
      description
        "Computational framework used by the workload.";
    }
    leaf priority {
      type priority-type;
      description
        "Priority of the workload.";
    }
    leaf queue {
      type string;
      description
        "Queue or partition where the workload is submitted.";
    }
    leaf correlation-id {
      type string;
      description
        "Correlation identifier for cross-system tracing.";
    }
  }

  grouping task-grouping {
    description
      "Task-level metadata.";
    leaf task-id {
      type string;
      mandatory true;
      description
        "Unique identifier for the task within the job.";
    }
    leaf task-name {
      type string;
      description
        "Human-readable name of the task.";
    }
    leaf task-role {
      type identityref {
        base task-role;
      }
      description
        "Functional role of the task in the workload.";
    }
    leaf task-index {
      type uint32;


Xiong, et al.           Expires 31 December 2026               [Page 13]

Internet-Draft        HPC/AI scheduler job metadata            June 2026


      description
        "Index or sequence number of the task.";
    }
  }

  grouping job-grouping {
    description
      "Job structure and task information.";
    leaf job-id {
      type string;
      description
        "Scheduler-specific job identifier.";
    }
    leaf job-name {
      type string;
      description
        "Human-readable job name.";
    }
    leaf job-array-id {
      type string;
      description
        "Job array identifier for array jobs.";
    }
    leaf job-size {
      type uint32;
      description
        "Total number of tasks or execution units in the job.";
    }
    list task {
      key "task-id";
      description
        "List of tasks comprising the job.";
      uses task-grouping;
    }
  }

  grouping timing-grouping {
    description
      "Timing and scheduling information.";
    leaf submit-time {
      type yang:date-and-time;
      description
        "Time when the workload was submitted to the scheduler.";
    }
    leaf earliest-start-time {
      type yang:date-and-time;
      description
        "Earliest time when the workload can start.";


Xiong, et al.           Expires 31 December 2026               [Page 14]

Internet-Draft        HPC/AI scheduler job metadata            June 2026


    }
    leaf requested-start-time {
      type yang:date-and-time;
      description
        "Requested start time for the workload.";
    }
    leaf deadline {
      type yang:date-and-time;
      description
        "Deadline by which the workload should complete.";
    }
    leaf requested-duration {
      type uint32;
      description
        "Requested duration for the workload execution.";
    }
    leaf duration-unit {
      type identityref {
        base duration-unit;
      }
      description
        "Unit for the requested duration.";
    }
  }

  /*
   * Top-level container
   */
  container hpc-scheduler-job-metadata {
    description
      "Top-level container for HPC/AI scheduler job metadata.";

    container scheduler {
      description
        "Scheduler identification and metadata.";
      uses scheduler-grouping;
    }

    container submitter {
      description
        "Submitter and tenant context.";
      uses submitter-grouping;
    }

    container workload {
      description
        "Workload identification and metadata.";
      uses workload-grouping;


Xiong, et al.           Expires 31 December 2026               [Page 15]

Internet-Draft        HPC/AI scheduler job metadata            June 2026


    }

    container job {
      description
        "Job structure and task information.";
      uses job-grouping;
    }

    container timing {
      description
        "Timing and scheduling information.";
      uses timing-grouping;
    }
  }
}

8.  Security Considerations

   Scheduler and job metadata can reveal user, tenant, project,
   workload, timing, and operational information.  Implementations need
   to protect the confidentiality and integrity of this information and
   restrict access to authorized workload managers, controllers,
   orchestrators, and network management systems.

9.  IANA Considerations

   IANA is requested to register one URI in the "IETF XML Registry"
   [RFC3688].  Following the format in [RFC3688], the following
   registration is requested:

      URI: urn:ietf:params:xml:ns:yang:ietf-hpc-scheduler-job-metadata

      Registrant Contact: The IESG.

      XML: N/A; the requested URI is an XML namespace.

   IANA is requested to register the following YANG module in the "YANG
   Module Names" registry [RFC6020].

    name: ietf-hpc-scheduler-job-metadata

    namespace: urn:ietf:params:xml:ns:yang:ietf-hpc-scheduler-job-metadata

    prefix: hpc-sched

    reference: RFC XXXX


Xiong, et al.           Expires 31 December 2026               [Page 16]

Internet-Draft        HPC/AI scheduler job metadata            June 2026


10.  Acknowledgements

   The authors acknowledge the related HP-WAN framework and problem
   statement work that provides the broader context for this scheduler
   job metadata model.

11.  References

11.1.  Normative References

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,
              <https://www.rfc-editor.org/info/rfc2119>.

   [RFC8174]  Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC
              2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174,
              May 2017, <https://www.rfc-editor.org/info/rfc8174>.

11.2.  Informative References

   [I-D.kcrh-hpwan-state-of-art]
              King, D., Chown, T., Rapier, C., Huang, D., and K. Yao,
              "Current State of the Art for High Performance Wide Area
              Networks", Work in Progress, Internet-Draft, draft-kcrh-
              hpwan-state-of-art-03, 20 October 2025,
              <https://datatracker.ietf.org/doc/html/draft-kcrh-hpwan-
              state-of-art-03>.

   [I-D.kompella-rtgwg-mlnwsched]
              Kompella, K., Beeram, V. P., Mahale, A., Bhargava, R., and
              N. Geyer, "Scheduling Network Resources for Machine
              Learning Clusters", Work in Progress, Internet-Draft,
              draft-kompella-rtgwg-mlnwsched-02, 1 March 2026,
              <https://datatracker.ietf.org/doc/html/draft-kompella-
              rtgwg-mlnwsched-02>.

   [I-D.xhy-hpwan-framework]
              Xiong, Q., Huang, G., Yao, K., and C. Lin, "Framework for
              High Performance Wide Area Network (HP-WAN)", Work in
              Progress, Internet-Draft, draft-xhy-hpwan-framework-03, 20
              October 2025, <https://datatracker.ietf.org/doc/html/
              draft-xhy-hpwan-framework-03>.


Xiong, et al.           Expires 31 December 2026               [Page 17]

Internet-Draft        HPC/AI scheduler job metadata            June 2026


Appendix A.  Example

   This section provides an example of scheduler job metadata for a
   distributed AI training workload.  The example demonstrates how
   platform-specific job information from a Kubernetes-based AI
   orchestration system is mapped to the common metadata model.

   Consider a scenario where a user submits a distributed training job
   using Kubeflow on a Kubernetes cluster.  The job involves multiple
   worker nodes and parameter servers.


      {
        "ietf-hpc-scheduler-job-metadata:hpc-scheduler-job-metadata": {
        "scheduler": {
            "scheduler-id": "ai-orchestrator-1",
            "scheduler-name": "AI-Training-Orchestrator",
            "scheduler-type": "kubernetes",
            "platform-instance": "nvidia-base-command-2.0"
          },
          "submitter": {
            "tenant-id": "ai-research-lab",
            "project-id": "distributed-ml-project",
            "namespace": "ml-training",
            "user-id": "researcher-bob",
            "account-id": "project-alpha"
          },
          "workload": {
            "workload-id": "distributed-training-001",
            "workload-name": "large-scale-llm-training",
            "workload-type": "ai-training",
            "framework": "pytorch",
            "priority": 100,
            "queue": "gpu-high-priority",
            "correlation-id": "corr-ai-training-001"
          },
          "job": {
            "job-id": "job-2026-04-23-001",
            "job-name": "llm-13b-distributed",
            "job-size": 3,
            "task": [
              {
                "task-id": "worker-1",
                "task-name": "gpu-worker-west-1",
                "task-role": "worker",
                "task-index": 0
              },


Xiong, et al.           Expires 31 December 2026               [Page 18]

Internet-Draft        HPC/AI scheduler job metadata            June 2026


              {
                "task-id": "worker-2",
                "task-name": "gpu-worker-west-2",
                "task-role": "worker",
                "task-index": 1
              },
              {
                "task-id": "worker-3",
                "task-name": "gpu-worker-east-1",
                "task-role": "worker",
                "task-index": 2
              }
            ]
          },
          "timing": {
            "submit-time": "2026-04-23T09:00:00Z",
            "earliest-start-time": "2026-04-23T09:45:00Z",
            "requested-start-time": "2026-04-23T10:00:00Z",
            "deadline": "2026-04-23T12:00:00Z",
            "requested-duration": 120,
            "duration-unit": "minutes"
          }
        }
      }

Authors' Addresses

   Quan Xiong
   ZTE Corporation
   Email: xiong.quan@zte.com.cn


   Kireeti Kompella
   HPE
   Email: kireeti.ietf@gmail.com


   Daniel King
   Lancaster University
   Email: d.king@lancaster.ac.uk


Xiong, et al.           Expires 31 December 2026               [Page 19]