Network Working Group J. Rosenberg Internet-Draft Five9 Intended status: Informational P. White Expires: 22 April 2026 Bitwave 19 October 2025 Normalized API for AI Agents Calling Tools (N-ACT) draft-rosenberg-aiproto-nact-00 Abstract This document defines a protocol that facilitates integration of tools into the design and run-time operations of AI Agents. The focus is on enterprise AI agents that need to make use of APIs exposed by third party providers. This protocol, called the Normalized API for AI Agents Calling Tools (N-ACT) - pronounced like "enact" - defines an OpenAI spec that has two principle features - enumeration of tools and invocation of tools. The enumeration API enables a human - the AI Agent designer employed by the enterprise - to select and include tools from third-party vendors into operating procedures (also known as skills or instructions) which direct the behavior of AI Agents, including how and when to invoke those tools. The enumeration API can also be used (optionally) at run-time for the LLM to obtain tool descriptions. The second feature of the API - invocation - allows the AI Agent executor to perform the inter-domain invocation of the tool at run time. By standardizing these two API functions, the time and cost of integration of Internet APIs into AI Agents can be reduced. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 22 April 2026. Rosenberg & White Expires 22 April 2026 [Page 1] Internet-Draft N-ACT October 2025 Copyright Notice Copyright (c) 2025 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Background . . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 3. Problem Statement . . . . . . . . . . . . . . . . . . . . . . 9 3.1. Tool Curation . . . . . . . . . . . . . . . . . . . . . . 9 3.1.1. Tool Sculpting . . . . . . . . . . . . . . . . . . . 9 4. The N-ACT Protocol Goals . . . . . . . . . . . . . . . . . . 11 5. Overview of Operation . . . . . . . . . . . . . . . . . . . . 12 5.1. Tool Signatures . . . . . . . . . . . . . . . . . . . . . 12 5.2. Enumeration Endpoint . . . . . . . . . . . . . . . . . . 13 5.3. Invocation Endpoint . . . . . . . . . . . . . . . . . . . 14 5.4. Versioning . . . . . . . . . . . . . . . . . . . . . . . 15 6. Protocol Details . . . . . . . . . . . . . . . . . . . . . . 15 6.1. Enumeration API . . . . . . . . . . . . . . . . . . . . . 15 6.2. Details for a Specific tool . . . . . . . . . . . . . . . 19 6.3. Tool Version Retrieval . . . . . . . . . . . . . . . . . 19 6.4. Pagination . . . . . . . . . . . . . . . . . . . . . . . 19 7. Invocation API . . . . . . . . . . . . . . . . . . . . . . . 20 7.1. OpenAPI Specification . . . . . . . . . . . . . . . . . . 21 8. Relationship to MCP . . . . . . . . . . . . . . . . . . . . . 21 8.1. Similarities . . . . . . . . . . . . . . . . . . . . . . 21 8.2. Differences . . . . . . . . . . . . . . . . . . . . . . . 21 9. Informative References . . . . . . . . . . . . . . . . . . . 22 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 22 1. Background AI Agents are applications built using Large Language Models (LLMs). AI Agents typically operate on behalf of some human or organization, and are capable of taking actions through the invocation of functions, also known as tools. [I-D.rosenberg-ai-protocols] provides an overview of protocols and use cases for AI Agents. Rosenberg & White Expires 22 April 2026 [Page 2] Internet-Draft N-ACT October 2025 In enterprise applications of AI Agents - including customer support uses cases - it is very common for the AI Agent to invoke tools provided by other organizations. As an example, consider an enterprise that is a healthcare provider. They have an AI Agent that needs to facilitate appointment scheduling, appointment cancellation, prescription refills, store location and hours lookup, and vaccine scheduling. The actions are all supported by tools which are exposed to the AI Agent through its instructions. The tools, in this case, could be implemented by servers within the enterprise domain - built by their IT department for example. Or, they might be implemented by B2B SaaS providers that offer these capabilities to the entperrise. For example, in the healthcare space, Epic, Cerner, and Meditech are a few of the B2B SaaS providers that are often used by retail healthcare providers. It is very common for enterprises to have dozens, if not hundreds, of different systems providing APIs that might need to be accessed by an AI Agent. This high cardinality is not new. Prior to the advent of large language model technologies, voice and chat bots - and even classic "press 1 for sales, and 2 for support" Interactive Voice Response Systems (IVR) - needed to access large numbers of APIs to provide customer support for end users. Integration of these APIs into IVRs, voice bots and chat bots often represented the most expensive and complex part of deploying those systems - not the training and optimization of the AI model which invokes them. The advent of AI Agents - if anything - only increases the likely cardinality of the number of distinct API sysems that need to be integrated. This is due to increased capabilities of LLMs compared to prior generation AI models. Whereas, in the past, certain use cases where just too difficult to automate (for example, multi-room booking for hotels - is difficult to do on websites, and almost never attempted by voice or chat bots) - now, these may be within reach for automation using AI Agent technology. As a result, we will see more use cases, and therefore even more APIs that need to be integrated. For an AI agent to be effective, it needs more direction than just the enumeration of tools with names and descriptions. Particularly in enterprise and customer support use cases, detailed instructions are required which direct when, how, if and why those tools should be invoked. These instructions guide the AI agent on how to converse with the customer, what kind of information to collect, how to use it to invoke the tools, how to handle the differing responses, and what to do with the data that is returned. We use the term operating procedure to describe an instruction that provides this detail to an AI Agent. An operating procedure is typically specific to a particular task (for example, booking an appointment for a healthcare visit). An operating procedure at the end of the day is a document, Rosenberg & White Expires 22 April 2026 [Page 3] Internet-Draft N-ACT October 2025 containing natural language text with references to tools, tool inputs, and tool outputs. Others have referred to these as skills (see the recent Anthropic blog post here: https://www.anthropic.com/engineering/equipping-agents-for-the-real- world-with-agent-skills (https://www.anthropic.com/engineering/ equipping-agents-for-the-real-world-with-agent-skills)), instructions or even prompts. Here is an abbreviated example of an operating procedure for prescription refill that illustrates this: ##Goal Your goal is to help the user with prescription refill inquiries. ## Prescription Refill Status Follow this flow if the user: - is asking about the status of a prescription refill request they've made; - is asking when their prescription refill(s) will be ready. For example: - What's the status of my prescription request? - I just want an update regarding my prescriptions. - When will my prescription refill be ready? ### Prescription Lookup - Use look_up_prescriptions() to look up the user's active prescriptions. You must have previously authenticated the user and have their account ID. If you don't have their accountID yet, perform the authentication flow. ### Present Status - Find the prescriptions which have a status other than null and present them to the user along with their status. ### Sending Confirmation - Ask the user if they want to receive a confirmation when the refill has been sent to the pharmacy. - If you have the user's preferred contact method (which the you would have gotten earlier from look_up_ani_user_by_ani(), use that information and send them the confirmation of the appointment status along with the appointment details. Figure 1: Example Operating Procedure As you can see from the example, the operating procedure is a detailed recipe-book - a training manual of sorts - for the AI Agent. It describes API sequencing (for example, that the lookup of Rosenberg & White Expires 22 April 2026 [Page 4] Internet-Draft N-ACT October 2025 prescriptions can only be done after authentication), usage of parameters (that accountID must be included from prior authentication), handling of results (show the list to the user), and so on. Operating procedures are analagous to training manuals for human agents, which similarly instruct them in the sequencing and handling of a call. Instead of invoking tools, humans use web applications to do a similar job. In traditional contact centers, entire teams of people are accountable for creating and maintaining these training manuals. They are essential for ensuring quality of customer support. At the end of the day - operating procedures are what make the AI Agent work. While their generation may be assisted by an AI to facilitate authoring, operating procedures are crafted by human beings using traditional software. The person whose role is to create the operating procedure is called the AI Agent designer - or just designer for short. The software used by the designer to create an AI Agent, including the authoring of the operating procedure, is called the AI Agent platform. To properly create an operating procedure, the designer must be a domain expert - familiar with the appropriate process for performing the task the AI Agent is to perform. Part of task of creating an AI Agent is the selection of the tools to be used. Typically, this selection happens through the authoring of the operating procedure, which makes explicit reference to the tool as in the example above. It is also possible for the designer to select a tool for usage by the AI Agent but not provide any guidance on its usage in the operating procedure (most of the current APIs for AI Agents separate the tool list from the instruction, thus making this possible). While technically this can work, it often results in higher rates of hallucinated tool usage. The choice by a designer on the set of tools that are shown to the LLM (we use the verb show as a synonym for inclusion of information into the model's context window) is referred to as tool selection. The process of tool selection and authorship of the operating procedures happen ahead of the deployment and actual usage of the AI Agent. We refer to this as design time, whereas the usage of the AI Agent by end users is called run time. At run time, a user will begin a session with an AI Agent. At the beginning of the session, the AI Agent is shown the operating procedure and tools, and begins operation. Rosenberg & White Expires 22 April 2026 [Page 5] Internet-Draft N-ACT October 2025 It is becoming increasingly common for AI Agent design systems to make use of multiple distinct AI Agents (called sub-agents) which are each equipped to handle specific tasks. These are called multi-agent systems. Once again using our health care example, one sub-agent might handle prescription refills, while another handles appointment scheduling and cancellation. Each sub-agent would have its own distinct set of operating procedures and tools, appropriate for those tasks. This further reduces the size of the context window at any point in time, reducing hallucination rates. There are various techniques for how these sub-agents are strung together. At the end of the day, it is often done through tool calling, asking the LLM to invoke a tool that transfers the session to a different AI Agent. This is (interestingly) analagous to live human agents in contact centers, which typically have human agents with differing skills. When a customer asks for something which is outside of the expertise for the human agent they are talking to, the human agent transfers the call to a different human agent with the right skill. In essence - that is what is happening in these multi-agent systems. The usage of a designer to select only those tools needed for an agent, and the usage of multi-agent systems, provides progressive tool disclosure. Progressive tool disclosure is a concept wherein an AI Agent is only shown just the set of tools it needs at a given time, and as the conversation flows, more tools are shown. The tools selected by the designer, and incorporated into the operating procedure, are often implemented by the executor through the invocation of APIs on some server. The server exposing those APIs is called an API Server. 2. Terminology Terminology is becoming foundationally important in this emerging AI Agent field. This section summarizes the key terminology and concepts from above. * AI Agent: A software application that makes use of an LLM to perform tasks that require interaction with software systems. With an AI Agent, the LLM can output both text for the user, but it can also request the invocation of tools, which are an interface the LLM can use to access those software systems. AI Agents are composed of two elements: an LLM and an executor. The executor is built using traditional software (e.g., Java, node.js, python, etc) and interfaces with the LLM. * Tool: An interface representing a software system. A tool has a name, a description, inputs and outputs. In AI Agent system, the LLM is responsible for collecting the inputs and requesting the Rosenberg & White Expires 22 April 2026 [Page 6] Internet-Draft N-ACT October 2025 invocation of the tool. The executor actually executes the tool, often through the invocation of APIs, and feeds the results back to the LLM. By convention, tool names use underscores and have a terminating curly bracket, even though they may also take parameters. For example, when we say "the LLM invokes schedule_appointment()" this means that the LLM has made a decision to invoke the tool name schedule_appointment and has synthesized the needed input parameters to the tool * Executor: Also called the orchestrator or orchestration engine. It represents the traditional software component of an AI Agent. The executor funnels messages to and from the end user into the LLM. It receives the outputs of the LLM, and if that output is a request to perform a tool call, the executor runs the tool and provides its output back to the LLM. * Show: The process wherein the executor provides input to the LLM, inclusive of tool call results, by adding it to the context window of the LLM. For example, we say, "the tool call result is shown to the LLM'. * Operating Procedure: A document authored by a human, and shown to an LLM during run-time operation, which instructs the AI Agent on the sequence of tools to invoke, how to interact with the user to collect the data needed for tool invocation, what to do with the results of the tool call, how to handle different conditions based on the output of the tools, and how to handle exceptional conditions. Operating procedures reference tools and their inputs by name. * AI Agent Designer: Synonymous with designer * Designer: A human being whose job is to create operating procedures and perform tool selection during the design phase of the AI Agent. * Tool Selection: The process of selection of the right set of tools to show to an AI Agent for a particular task. By selecting only those tools which are needed for a task, the likelihood of tool hallucination is reduced. Tool selection in the N-ACT framework happens at design time (though the protocol allows this to happen at run time in an interoperable fashion, should that mode be selected by the AI Agent platform vendor). * Design Time: A point in time during which the designer is interacting with the AI Agent platform in the producion of an operating procedure and selection of tools. Rosenberg & White Expires 22 April 2026 [Page 7] Internet-Draft N-ACT October 2025 * Enterprise: The administrative entity on whose behalf an AI Agent operates. Typically the designer is an employee of the enterprise, though this can be outsourced to consulting firms or the AI Agent Platform vendor. * Run Time: A point in time during which an end user is interacting with an AI Agent. * Multi-Agent Systems: An AI Agent wherein its job is broken down into a smaller set of discrete tasks, each handled by a sub-agent that is equipped (via its operating procedure and tool list) to perform that specific task * Sub-Agent: An AI Agent that performs a narrow set of tasks. Multiple Sub-Agents are woven together to form the AI Agent in a Multi-Agent System. * Progressive Tool Disclosure: A technique for reduction of hallucination wherein, as an AI Agent session unfolds, the set of tools shown to the LLM evolves, offering it only the tools it needs to do its job at any point in time. * API Server: A server that exposes APIs which can be invoked via a tool call by the AI Agent * API Provider: The administrative entity that is offering APIs that are of interest to an AI Agent. Typically, the API Provider is different from the enterprise, and is also different in turn from the AI Agent Platorm vendor. * AI Agent Session: A period of time from when an AI agent begins - by having the executor show the LLM the initial operating procedure and tool selection - estabishing initial context - and interacting with the end user - until the end user interaction completes. * AI Agent Platform: A piece of software which includes a design time experience for designers, and a run-time capability for handling AI Agent sessions. Typically built as a cloud service by the AI Agent Platform vendor. Though the AI Agent Platform can be provided by the vendor of the LLM itself, these can be distinct vendors. * AI Agent Platgorm Vendor: The administrative entity that has developed the AI Agent Platform, and made it available for usage by enterprises to author and execute AI Agents. Rosenberg & White Expires 22 April 2026 [Page 8] Internet-Draft N-ACT October 2025 3. Problem Statement There are multiple problems that are addressed by N-ACT, covering design time and run-time challenges. 3.1. Tool Curation At design time, the AI Agent platform needs to enable the designer to author the operating procedure and, as a result, select the tools that will be made available to the AI Agent (or the sub-agent). To do this, the designer needs to know what the available tools are, from which they can select. In essence, the designer is performing a curation task, choosing amongst a large inventory of tools available from many possible API servers. Most often, the enterprise deploying the AI Agent, is different from the vendor of the AI Agent platform, which is different still from the vendors of the API servers. As a result, the designer needs to know the API surface area of the various vendors of API servers, and know which API to select and use. Sometimes, API Servers offer OpenAPI specifications for their APIs, but these are tailored for usage with software developers, not consumption by AI Agent designers, and certainly not by LLMs. Consequently, they lack the required natural language descriptions to aid in the curation process by the designer. Designers need to spend time researching and studying APIs, learning what they do, and determining whether the right APIs exist, and if so, what they are. They have to determine whether the API can be used as is, or whether changes are needed to make the API more easily (and reliably) consumeable for the automation. This process is historically long and tedious, and is exacerbated by the fact that the APIs are spread across dozens of API servers from the many vendors used by a particular enterprise. The lengthy time required for API integration is one of the most significant parts of the time required to build automation solutions (including AI Agents) within enterprises. 3.1.1. Tool Sculpting Oftentimes existing APIs on API servers - usually REST these days - are complex and (clearly) not optimized for usage by an LLM. They have been designed to be consumed by traditional software authored by software developers. Traditional REST API signatures have complex inputs - URI parameters, header parameters, and JSON bodies. They also tend to mix programmatic elements and semantic elements. Programmatic elements are those parts of the API parameters that are meaningful only to the software which is invoking those APIs. These include resource identifiers like UUIDs, trace IDs, URLs, timestamps, Rosenberg & White Expires 22 April 2026 [Page 9] Internet-Draft N-ACT October 2025 meta-data and so on. Semantic elements are those which are ultimately the input from the end user which drove the execution of the API. As an example, when doing something as simple as looking up the weather, the city and the date (today or the future) are the semantic elements. But the lookup API might need to provide programmatic elements like a weather station ID, which is not meaningful to the user request. It is also the case that a particular end user request (e.g., give me the weather tomorrow in Manhattan), doesn't map to a single API call. Instead, a sequence of API calls need to be orhcestrated together to actually provide this simple ask. As an example, the APIs at the US National Weather service provide weather forecasting based on grids of 2.5km to 2.5km. Consequently, to provide the weather for Manhattan, the location must be first mapped to a coordinate, and then the coordinates fed into the forecast API. There are two ways these problems are solved today. A simple way to solve these problems is to just let the LLM figure it out. A naive implementation would - in essence, use the existing OpenAPI specs for an API as a tool, and request the LLM to synthesize inputs, process outputs, and chain API calls together. While this is possible - and indeed works for simple use cases - it introduces hallucination risk. It also complicates the design process, requiring the designer to understand the details of these APIs. The designer must - via the operating procedure - direct the AI Agent on how to use the APIs and navigate them. This approach has many problems. First, it increases the time required to author the operating procedure, since it must now include the manually generated instructions on how to use the APIs. Second - and most importantly - it increases hallucination risk. The larger the gap between how the API works, and what the semantically meaningful operation is - the larger the surface area for mistakes by the LLM. The lesson is - never ask an LLM to do what a normal piece of software can do better. And thus, the second way this is solved today is through the development of automation ontop of existing APIs. Again using the US National weather service example, a piece of traditional software can be written which exposes an API called lookup_weather_by_city. This API would, under the hood, first convert the city to geo-coordinates, and then lookup the weather for those coordinates. The resulting simplified API can then be shown to the LLM as a tool. Rosenberg & White Expires 22 April 2026 [Page 10] Internet-Draft N-ACT October 2025 The process of adding a layer of traditional software ontop of existing APIs in order to produce ones that more directly map to user input, in order to reduce tool call hallucination risk, we call as tool sculpting. Specifically, tool sculpting is the process of crafting an API which: 1. Minimizes the inputs for each tool to only those which are semantically meaningful and must be synthesized by an AI Agent 2. Minimizes the outputs for each tool to only those which would be relevant to an AI Agent 3. Orchestrates multi-step backend API calls so that the AI Agent doesn't need to do it 4. The N-ACT Protocol Goals The N-ACT protocol (N-ACT is pronounced like the word enact) addresses the underlying problems of tool curation and tool minimization. It has, at its core, the following goals: * Reduce the time and costs for designers to create operating procedures and perform tool selection by simplifying the process of curation * Enable development of AI Agents for enterprises which need access to tools across dozens or hundreds of API servers, typically sold to the enterprise by third party B2B SaaS vendors, by consolidating decision logic on tool selection in the hands of the designer * Simplify adoption by tool vendors by building ontop of the existing API platforms and APIs they already have. The closer the current API surface area is to the ideal tool set for an AI Agent, the less effort is required to implement * Standardize an API signature for tool calling * Tools are just APIs that are optimized for an LLM, but can also be used by traditional software applications too * Authentication and Auhtorization use the existing techniques used for REST APIs offered by the API Vendor * Reduce hallucination risk through tool sculpting * Separation of responsibility - API vendors perform tool sculpting, and the AI Agent designer crafts the operating procedure which instructs the AI Agent on how to use the many tools (across Rosenberg & White Expires 22 April 2026 [Page 11] Internet-Draft N-ACT October 2025 vendors even) that it can use. In other words - one vendor makes tools which independent of the agent, and the other vendor makes agents which make use of those tools. 5. Overview of Operation N-ACT is a web API that defines a pair of REST endpoints - one for enumerating tools available on the server, and the other for invoking a tool on the server. Both of these make use of a tool signature, which is a standardized interface for what a tool looks like - how to describe a tool, and how to invoke it. The API client is implemented by the AI Agent platform, and the rest endpoints are implemented by vendor wishing to expose their APIs as N-ACT tools. It is envisioned that these just end up being new REST APIs added to the existing API surface area of the API Provider. 5.1. Tool Signatures N-ACT standardizes the signature for tool calling. A tool can be thought of as a Java interface, and the implementation of the tool happens over the wire by having the executor take the tool call from the LLM and send it to the API server. Every tool in N-ACT is comprised of the following descriptive signature: * The tool ID * The name of the tool * The description of the tool * The set of input parameters. For each input parameter: - The name of the input parameter, and whether it is required or optional - The type of the input parameter - one of a small set (string, int, enum) - The description of the input parameter - Any constraints on the input types o Strings: max length o Int: min and max values o Enum: list of enumerated values that are permitted, along with a description of each value. * The output, of which there can be one or more. Each output is: - The name of the output - A description of the output - The type of the output (string, int, enum, json) Invoking a tool requires the following to be provided: * the name of the tool Rosenberg & White Expires 22 April 2026 [Page 12] Internet-Draft N-ACT October 2025 * an input parameter list, which is an array of name/value pairs Consequently, the N-ACT protocol defines two JSON objects - the signature, and the invocation. The N-ACT protocol carries these in its body. The signature ensures that the executor can take the output of the LLM, validate it against the signature, and execute it by sending the invocation over the wire. For example, if the LLM synthesizes a tool call request with a missing parameter, or with an enum value that is not amongst the allowed values, the executor can reject it and ask the LLM to try again, or take other error handling logic. 5.2. Enumeration Endpoint The enumeration endpoint - as the name implies - returns a list of tools available. This API endpoint returns an array, each element in the array is a signature as noted above. The API supports tagging and versioning, with filtering on those parameters, to facilitate design time interactions between the AI Agent platform and the API server. This API facilitates the following design-time workflow: 1. The designer identifies the set of API vendors for which the AI Agent should be allowed to operate 2. For each such API vendor, the designer configures the AI Agent platform with the URI for the N-ACT tool enumeration endpoint. This specification suggests a common naming practice of using the root API endpoint followed by "tools" (e.g., https://api.weather.gov/tools (https://api.weather.gov/tools)) to simplify this when possible. 3. The administrator configures the appropriate credentials needed for AuthN and AUthZ (typically an OAuth grant flow) - this is not any different from any other endpoint from the API provider 4. The designer accesses the AI Agent platform, and the platform invokes the enumeration API and retrieves the tool signatures for all of the tools exposed on the API server for all of the configured API providers. This provides a list of tools to the platform. This list can be viewed, searched, filtering and sorted by the designer. The UI of the design time experience can help the designer find and select tools for placement into the operating procedure. The above steps can be done just once, and then apply to all AI Agents subsequently built on the platform. Alternatively, the enumeration APIs can be explored by the AI Agent platform at design time, as the designer works. Rosenberg & White Expires 22 April 2026 [Page 13] Internet-Draft N-ACT October 2025 Once the designer has selected the tool, the tool signature can be retrieved by the AI Agent platform, stored, and then provided to the LLM at run-time. With N-ACT, it is also possible to use the enumeration API retrieve a fresh tool signature at run-time, rather than use the version retrieved and cached at design time. There are pros and cons to both approaches. Retrieving and caching the signature at design time ensures consistency. Designers seeking greater consistency - wherein testing of the agent performed before launch, matches run-time behavior, will prefer to cache the descriptions. Indeed, the usage of caching allows the designer to tune and tweak the natural language text in the signature to improve accuracy, without dependng on the API vendor to do so. This facilitates the separation of concerns built into the N-ACT protocol - that the AI agent designer can control all aspects of the behavior of the AI Agent, including the authorship of all natural language provided to the AI Agent. The API vendor provides the tool - which is purely programmatic. Alternatively, if the AI Agent platform fetches the tool signature each time at run time, the latest-and-greatest can be used. N-ACT does not technically prohibit this, allowing this mode to be implemented in an interoperable fashion. The key to efficacy of tool usage is the tool sculpting, a responsibility which sits with the API vendor in the N-ACT protocol. The API server must distill down its APIs into a set of tools, each of which do targeted, focused tasks. Tool sculpting has a double benefit. First, it reduces the surface area of information that the LLM needs to synthesize to invoke the tool, and reduces the amount it needs to comprehend to process its outputs. Second, it simplifies the job of the designer in selection of the tool, and in crafting the operating procedure to instruct the LLM on when and how to invoke it. N-ACT also provides an API, part of the enumeration endpoint, for retrieving the signature for a specific tool. This is useful for the run-time fetching of the signature. 5.3. Invocation Endpoint The invocation endpoint is used by the executor at run-time. When the LLM creates a tool request, it provides the tool name and input values. Using the signature, the executor can validate the tool call request as being valid for the tool, and then use the wire protocol to invoke the tool on the API server. When the response comes, it can then be shown to the LLM. Rosenberg & White Expires 22 April 2026 [Page 14] Internet-Draft N-ACT October 2025 The N-ACT invocation endpoint allows versioning, so that a specific version of the tool can be used. 5.4. Versioning A common problem historically in the development of voice and chat bots - the precursor to modern AI Agents - is that API vendors sometimes change APIs, breaking operation of the bot. This problem remains with AI Agents, and N-ACT addresses it by natively adding version support. The N-ACT protocol requires versioning for tools. It mandates that each version of a tool is backwards compatible. A new version of a tool can only add new optional input parameters, or add new output parameters. To ensure consistent behavior, N-ACT requires API servers to allow invocation of a tool against a specific version, allowing AI Agent designers to select when to upgrade when new versions become available. 6. Protocol Details The following provides a detailed specification of the protocol. 6.1. Enumeration API N-ACT servers MUST implement the enumeration API, which is structured as follows: GET {platform-root}/tools It is RECOMMENDED that {platform-root} be the same as the API root endpoint that the API vendor already provides. The body returned in the result, which supports pagination (described below), includes an element called items, a JSON array that contains the array of signatures. Each element of the array is a JSON object of type ToolSignature. An example of a valid JSON object of this type is: Rosenberg & White Expires 22 April 2026 [Page 15] Internet-Draft N-ACT October 2025 { "toolId" : "0479a45d-ad0a-49d4-94db-75edf00d2ca4", "name" : "Lookup Weather", "description" : "Invoke this tool to lookup the weather for a given city.", "img" : "{optional URL for the tool}", "version" : "{versionNum}, "currentVersion": "{versionNum}", "tags" : ["system", "retrievals"] "input_parameters" : [ { "id" : "city" "name" : "City", "type" : "string", "description" : "The city for the weather lookup. For example, Boston or Los Angeles.", "required" : true } ] "output_parameters" : [ { "id" : "temp-fh" "name" : "Temperature in Fahrenheit", "type" : "int", "description" : "The current temperature in the named city." } ] } Figure 2: ToolSignature Object Example The toolID is always a UUID, ensuring tool uniqueness on the server. The name MUST be less than 255 characters, and MUST be unique across all other tools on the API server. Names SHOULD use snake case, and be sufficiently long to be usefully descriptive without the description. The longer names also reduce the likelhihood of tool name collisisons across API servers, in cases where an AI Agent is being shown tools from different API servers. However, it is ultimately the responsibility of the AI Agent platform to make sure tool names shown to the LLM are unique. An example of a good tool name is: lookup_weather_by_city Rosenberg & White Expires 22 April 2026 [Page 16] Internet-Draft N-ACT October 2025 The tool description MUST be less than 2000 characters. Tool names and description MUST be in English. Names and descriptions are consumed by both the designer (a human) and the LLM. Localization is only needed for the human, and is the responsibility of the AI Agent platform. For a particular version of the tool, the signature is completely locked. Meaning, the API server MUST NOT change any part of the signature (with the notable exception of the value of currentVersion) without changing the version. The version number of a tool MUST be a positive integer, and MUST increase monotonically, starting at 1. The value of currentVersion reflects the most current version of the tool. The version returned at the top level tool enumeration endpoint MUST be the latest version of the tool, in which case the values of version and currentVersion are always identical. Input parameters are typed. The type value is optional to include in the JSON. If omitted, it is "string" by default. The other valid values are "int", "boolean" and "enum". More complex input types are not supported. This is to constrain the complexity of objects that the LLM is asked to synthesize. [OPEN ISSUE: lists maybe??]. If the type is "int", the optional "max" parameter can be included, which represents the maximum value for the integer. The default is 65535. If the type is enum, the input parameter must include the attribute "allowed-values", which contains an array of name-description pairs. As an example: Rosenberg & White Expires 22 April 2026 [Page 17] Internet-Draft N-ACT October 2025 { "id" : "flight_class" "name" : "Flight Class", "type" : "enum", "description" : "The cabin class for the flight reservation", "allowed-values" : { [ { "name" : "ECONOMY", "description" : "Economy class, the least expensive cabin class. Also known as coach." }, { "name" : "PREMIUM_ECONOMY", "description" : "Premium economy class, the second seat tier. More legroom." }, { "name" : "BUSINESS", "description" : "Business class, the next to top seat tier. Offers lie-down seating." }, { "name" : "FIRST", "description" : "The top tier. Lie down seating, luxury meal service, lounge access." }, ] } "required" : true } Figure 3: ToolSignature Input Parameters Enum Support For enumerated strings, names MUST be snake case but capitalized, with a maximum length of 255 characters. Descriptions have a maximum length of 2000 characters. Input parameters have IDs and names. The IDs are meant to provide uniqueness across version, facilitating design time software processing of version changes. The IDs need only be unique within the tool. Input parameter names must also be unique within the tool. The LLM is asked to synthesize the name, not the ID. Consequently, there is no reason to show the parameter ID to the LLM at run time. Rosenberg & White Expires 22 April 2026 [Page 18] Internet-Draft N-ACT October 2025 Output parameters include the additional type of JSON. This is used for cases where a more complex structure is returned, and the expectation is that the LLM can parse the structure for comprehension. There is an explicit assumption in the signatures that it is more important to reduce hallucination on input synthesis, than it is to risk mis-comprehension of tool output. This is why the protocol is asymmetric - allowing JSON output, but not input. For input parameters, the "required" field is optional; if absent, the default is "true". Optional inputs should be used sparingly. It is better to have more tools with fewer inputs, than a single tool with many optional inputs. This reduces the likelihood of the LLM hallucinating tool usage. 6.2. Details for a Specific tool The API allows for a signature to be retrieved for a specific tool via: GET {platform-root}/tools/{toolID} The server MUST return the current version of the specific tool. 6.3. Tool Version Retrieval The API allows all versions of a tool to be retrieved via: GET {platform-root}/tools/{toolID}/versions The result is paginated, and includes the full signature for each version. These MUST be sorted from most recent to oldest. An individual version can then be retrieved via: GET {platform-root}/tools/{toolID}/versions/{versionNum} 6.4. Pagination Pagination is supported on the enumeration APIs (any of the above endpoints). All of the APIs support a pageCursor and pageLimit URI parameter. The page limit specifies the maximum number of pages to return. Rosenberg & White Expires 22 April 2026 [Page 19] Internet-Draft N-ACT October 2025 The array holding the enumeration is included in the JSON body using an attribute called items which contains the array. A peer element, called paging, holds the result of the query. It contains two values - pageLimit - containing the actual page limit the server is offering, and next. The next parameter contains the value of the cursor query parameter that the client should provide as a URI parameter to retrieve the next page. 7. Invocation API At run-time, a tool is invoked with the invocation API: POST {platform-root}/tools/{toolID}:invoke The body of this takes the invocation object, an example of which is: { "name" : "lookup_weather_by_city", "input_parameters" : [ { "name" : "City", "value" : "Omaha, Nebraska" } ] The response should contain the result of the invocation, an example of which looks like this: { "output_parameters" : [ { "name" : "Temperature in Fahrenheit", "value" : 80 } ] The output MUST include the name which matches to the output name in the signature. It is also possible to request invocation of a specific version of the tool: POST {platform-root}/tools/{toolID}/versions/{versionId}:invoke Rosenberg & White Expires 22 April 2026 [Page 20] Internet-Draft N-ACT October 2025 Traditional HTTP response codes apply to the response, including 5xx for temporary failures and 4xx if the request is formatted correctly. Note that, a 4xx should never happen if the client follows the specification defined here. If the request generates a 5xx, the executor SHOULD retry, rather than show the error to the LLM. However, this is up to the discretion of the AI Agent designer on how long to wait before showing it the error and asking it what to do. 7.1. OpenAPI Specification To be added when spec is closer to complete. 8. Relationship to MCP N-ACT is similar in many ways to the Model Context Protocol (MCP) (https://modelcontextprotocol.io/introduction (https://modelcontextprotocol.io/introduction)) which is being driven by Anthropic, but different in others. This section covers both similarities and differences. On the whole, the two protocols tackle a similar problem (facilitating tool calling and serving tool context to LLMs), but differ in their approach (MCP better suited for use cases where less designer control is appropriate, and N-ACT for cases where more designer control is appropriate). 8.1. Similarities Both MCP and N-ACT support APIs for enumerating tools and invoking tools. Both MCP and N-ACT can be used to feed context to the LLM at run- time. 8.2. Differences The biggest difference is the assumption in control. N-ACT is very much targeted at use cases where there is a designer persona crafting an operating procedure. Consequently, the tool enumeration API is consumed by traditional software in N-ACT, enabling the designer to perform the process of tool selection and design of the operating procedure, both at design time. In MCP, the tool enumeration API is primarily consumed by the AI Agent at run-time. For this reason, MCP has the idea of progressive tool disclosure that happens on the server side, in order to reduce model context. N-ACT, on the other hand, assumes progressive tool disclosure happens as a consequnce of a design time operation, and thus, in simple terms, is client side. Rosenberg & White Expires 22 April 2026 [Page 21] Internet-Draft N-ACT October 2025 In MCP, the logic for progressive tool disclosure is distributed in cases where there are multiple MCP servers, with each server executing its own, non-standardized logic for progressive tool disclosure (though the protocol does not require this behavior, it is purportedly one of the main reasons for the session oriented aspect of the protocol). In N-ACT, the logic for progressive tool disclosure lives in the AI Agent, with the logic being crafted at design-time by the designer. In this way, N-ACT is more compatible with Anthropic's skills concept, than it is with its MCP protocol. Indeed, N-ACT could be viewed as the offspring of skills and MCP, by adding over-the-wire tool calling to skills. These foundational differences have consequences in the other parts of the protocols. MCP is session oriented. A session oriented protocol is required when the server is responsible for progressive tool disclosure. N-ACT is stateless, because it is not performing server side progressive tool disclosure. MCP uses JSON-RPC and runs over either stdio (for local usage on a PC) or streamable HTTP for network connections. N-ACT is a normal REST API and uses traditional HTTPS and OpenAPI specifications. MCP typically requires a dedicated server that provides its functionality (termination of the persistent connection, session handling, and gateway to existing APIs). N-ACT is designed to be added to existing API servers as just another API - albeit ones optimized for LLMs. MCP APIs are not easily consumeable by non-LLM applications. N-ACT APIs are just APIs and can be consumed by both traditional software and AI Agents. 9. Informative References [I-D.rosenberg-ai-protocols] Rosenberg, J. and C. F. Jennings, "Framework, Use Cases and Requirements for AI Agent Protocols", Work in Progress, Internet-Draft, draft-rosenberg-ai-protocols-00, 5 May 2025, . Authors' Addresses Jonathan Rosenberg Five9 Email: jdrosen@jdrosen.net Rosenberg & White Expires 22 April 2026 [Page 22] Internet-Draft N-ACT October 2025 Pat White Bitwave Email: pat.white@traego.com Rosenberg & White Expires 22 April 2026 [Page 23]