add explainer for the declarative api #26

MiguelsPizza · 2025-09-17T03:24:36Z

This draft proposal outlines a declarative WebMCP API that enables web pages to expose tools via HTML, using minimal attributes like tool-name and standard form semantics. Currently, it's a compilation of my notes and ideas from developing this approach, and I'm sharing it to gather feedback before the September 18th working group meeting. I'm particularly interested in your thoughts on the open questions (e.g., JSON vs. HTML responses, elicitation flows), tradeoffs, and overall API design.

The proposed API was shaped by building a real application and polyfill during the MCP enterprise hackathon, where our team successfully implemented it (and took home the win, which was exciting validation!). You can see a video of a Rails app using declarative WebMCP tools to enable complex browser automation without client-side JavaScript: link.

Based on your feedback, I'll refine this draft to align more closely with the structured format and narrative style of other explainers in the repo.

bwalderman · 2025-10-08T19:50:27Z

This is great work. I do have one general question. Was reusing ARIA attributes instead of introducing new tool-* attributes considered?

There are already attributes such as aria-label and aria-description and others for labelling and describing elements and so it might be helpful to define WebMCP mappings/behaviors for these instead of introducing entirely new HTML attributes.

One benefit of using these existing attributes is that they are also surfaced in native accessibility APIs, so assistive tools that already use these APIs to access the page's accessibility tree would be able to access WebMCP tools declarations as well.

MiguelsPizza · 2025-10-15T16:13:55Z

@bwalderman This is a good idea, I'll put the PR in draft while I re-implement the ARIA based polyfill.

The only thing I can think is that we still need a way to make exposing tools to the agent opt-in (or opt-out)

Maybe we still tag elements with a tool-name to expose them to the agent? This will help prevent duplicate tool names which causes errors in most inference providers

vsakaria · 2025-11-07T21:55:29Z

The concern with the HTML method is of course the iFrame. Realistically speaking its stood up well for some years now. Would an iFrame in a browser be more trustworthy. I would prefer JSON and rendering on client. I am sure web components can be distributed with framework payloads and CSS. But the build process for this type of architecture would have to change. I would prefer that design.

The trade off really is that payload would need to be disputed more frequently.

anssiko · 2025-11-11T03:59:12Z

@matatk to review for the accessibility group's perspective (aka APA WG).

anssiko · 2025-11-25T13:55:45Z

A new paper and implementation experience:

https://arxiv.org/abs/2511.11287v1
https://svenschultze.github.io/VOIX/

@svenschultze & team, this W3C community group is developing a WebMCP API that is complemented with a declarative mechanism explored in this PR.

Let’s join forces to explore this space. Here’s how to join:
https://webmachinelearning.github.io/community/#join

anssiko · 2025-11-25T14:46:13Z

That was fast. I’m excited to welcome @svenschultze to the WebML Community Group! 🎉

svenschultze · 2025-11-25T16:10:08Z

Hi @anssiko, thank you for making me aware of this project! It is great to see the community converging on this. I'm happy to share some insights from our work on VOIX, where we implemented a similar declarative framework and tested it with developers.

We established a more explicit interface where MCP tools are separated from standard UI HTML elements. This ensures the agent only accesses data and actions the developer specifically intended to share. I think this is also relevant for the discussion about including ARIA attributes. I think it is important not to just reuse ARIA since this could lead to conflicts of interest between optimizing for accessibility or agents.
Is there an equivalent idea for declarative context/resources in this spec? We found that it was really helpful to explicitly set agent-only text elements (in our case, specific <context name="mouse_position"> elements). This avoids long context inputs of the full html text, hides potentially sensitive data like credit card numbers, and enables high-fidelity synergetic multimodal interaction where UI hover/selection states can be explicitly exposed to the agent. This way, you can interact with websites using commands like "move this to here" without requiring long chains of tool calls.

bwalderman · 2025-12-03T23:09:07Z

Following up on my earlier suggestion to look at ARIA: ARIA attributes for browsing agents was discussed at TPAC and from what I understand, the consensus is this is not actually a good idea. One of the concerns raised was the same that @svenschultze mentioned above, that reusing ARIA could lead to conflicts of interest and possibly incentivize web developers to optimize for machines instead of people using a11y tools.

domfarolino · 2025-12-11T18:49:24Z

docs/declarative.md

+- To preserve correctness and avoid race conditions, the browser MUST enforce this ordering for a single tool invocation:
+  1) Execute the tool and fully evaluate its body (JS `execute` or form submission/response parsing).
+  2) Deliver the tool result to the client (agent) over MCP.
+  3) Recompute the catalog (scan DOM + JS registrations), compute deltas, and if changed, emit [`notifications/tools/list_changed`](https://modelcontextprotocol.io/specification/2025-06-18/schema#notifications%2Ftools%2Flist-changed) to all connected clients for this server.


The user agent wouldn't have a step here to actively recompute the catalogue, right? I would assume that any side effects that result from executing the last tool (i.e., adding new elements to the DOM that hav a tool-name attribute, or invoking script which registers a new tool imperatively) would just keep the catalogue up-to-date always, right?

Apologies if I'm missing how MCP's notifications/tools/list_changed fits in here.

domfarolino · 2025-12-11T18:50:39Z

docs/declarative.md

+  1) Execute the tool and fully evaluate its body (JS `execute` or form submission/response parsing).
+  2) Deliver the tool result to the client (agent) over MCP.
+  3) Recompute the catalog (scan DOM + JS registrations), compute deltas, and if changed, emit [`notifications/tools/list_changed`](https://modelcontextprotocol.io/specification/2025-06-18/schema#notifications%2Ftools%2Flist-changed) to all connected clients for this server.
+  4) Apply any navigation/unmounts/redirects (e.g., those implied by `_meta.uiRedirect` or HTTP redirects).


Is this the only thing that can specify an action that the browser should take after the response is obtained? Having a full list of actions that could be done (navigations, displaying more UI / appending things to the DOM) would be great—I'm not sure what MCP specifies is possible outside of _meta.uiRedirect.

domfarolino · 2025-12-11T18:52:19Z

docs/declarative.md

+  2) Deliver the tool result to the client (agent) over MCP.
+  3) Recompute the catalog (scan DOM + JS registrations), compute deltas, and if changed, emit [`notifications/tools/list_changed`](https://modelcontextprotocol.io/specification/2025-06-18/schema#notifications%2Ftools%2Flist-changed) to all connected clients for this server.
+  4) Apply any navigation/unmounts/redirects (e.g., those implied by `_meta.uiRedirect` or HTTP redirects).
+- In particular, the tool that is currently executing MUST NOT be unmounted (e.g., DOM removal or JS unregistration) until after its result has been delivered to the client and any [`notifications/tools/list_changed`](https://modelcontextprotocol.io/specification/2025-06-18/schema#notifications%2Ftools%2Flist-changed) notification has been sent.


I'm not super clear about this part. Imagine the user submits a tool_name= form, but before the agent receives the response, some random script comes in and calls form.remove(), removing it from the DOM. Is this line saying that we shouldn't let that DOM API work? That's not really feasible. What is the intent here?

domfarolino · 2025-12-11T18:53:58Z

docs/declarative.md

+
+Notification shape
+- Method: [`notifications/tools/list_changed`](https://modelcontextprotocol.io/specification/2025-06-18/schema#notifications%2Ftools%2Flist-changed)
+- Use when: any change to the catalog (addition, removal, or metadata/schema change) is observed after a recomputation.


I think this make sense to me. Basically this would mean the browser sends a list_changed notification to the ... agent? ... whenever declarative tool is appended to or removed from the DOM, or whenever the tool-name attribute is added to an existing <form>, <a>, or <button>, right?

domfarolino · 2025-12-11T18:54:45Z

docs/declarative.md

+```
+interface ToolListChangedNotification {
+  method: "notifications/tools/list_changed";
+  params?: { _meta?: { [key: string]: unknown }; [key: string]: unknown };


Just so I'm clear, is this the shape of the notification that's sent to the agent? (I'm still cutting my teeth on the whole MCP spec, so bear with me if this is a dumb question...

domfarolino · 2025-12-11T18:57:14Z

docs/declarative.md

+  P-->>B: result
+else Declarative form
+  B->>S: fetch(action, method, body/query) [Accept: JSON]
+  S-->>B: JSON result (optional Location)


What is Location? Is this the meta UI redirect?

domfarolino · 2025-12-11T18:58:41Z

docs/declarative.md

+  S-->>B: JSON result (optional Location)
+end
+B-->>A: Return CallToolResult
+B->>P: Recompute catalog (DOM + JS)


I think this step is what I was commenting on earlier. I'm not totally sure why we'd need to recompute the list of available tools at this stage. I'd assumed this would be automatically kept up to date by operations that add/remove declarative tools (or imperative, JS-based tools) on its own. Is it possible that the CallToolResult JSON specifies new tools to be added?

domfarolino · 2025-12-11T19:04:57Z

docs/declarative.md

+- Slightly delays visible navigation until result delivery.
+- Page authors can’t forcibly interrupt an in-flight call (must cancel cooperatively).
+
+Strategy B — Navigate-early (allow unmount during execution)


I want to make sure I'm clear on what "Navigate-early" means here—what is the navigation referring to? Are you referring to the action attribute's URL for the form? Or the navigation that the agent would tell the browser to do, once it receives the meta UI redirect? I think you're referring to the former.

On that assumption, I'm not sure what Strategy B really looks like. Given what you say below in "Matches default web navigation behavior", it sounds like submitting a form would trigger a navigation to the form's action attribute URL. So what does the agent get in return? Does it also fetch the Accept: JSON request alongside the navigation, race the two requests, and sometimes the agent receives the JSON before the navigation completes, and other times the navigation completes first and the agent receives nothing? Is that "Strategy B"?

domfarolino · 2025-12-11T19:06:41Z

docs/declarative.md

+- Visibility: Ignore `tool-elicit` on non-interactive controls such as `input[type=hidden]`, `disabled`, or `readonly` controls. Authors should instead expose an interactive control if they want user input.
+- Merging: Final submitted values come from the elicitation UI for elicited fields (prefilled by agent/defaults); all other fields submit resolved values from agent/defaults/DOM as usual.
+- Validation: Standard HTML constraints gate submission. `required` still applies; `tool-elicit` doesn’t change schema.
+- UX: The browser chooses the UI, but SHOULD display each elicited field’s label or `tool-param-title` and (optionally) `tool-param-description`.


So the UI resource used for the elicitation of user input is entirely in the control of the browser, right? The agent does not provide any HTML/CSS resources to display to the user to collect such information?

domfarolino · 2025-12-11T21:34:19Z

Thinking through it a little more, I'm a little concerned about the Accept: application/json part of the proposal. Why does the proposal lean on it? It seems to lean on it to increase the likelihood that the agent gets structured JSON back from the action URL endpoint, instead of HTML. But we need to consider the fact that this header does not enforce JSON responses, it's just used for content negotiation. So:

Do we really expect server authors to supply JSON/agent-readable responses alongside traditional HTML responses, at the same endpoint? Historically, https://wiki.whatwg.org/wiki/Why_not_conneg#Negotiating_by_format says this is pretty rare, so I'm less sure about leaning on it. At least initially, almost all front-end authors that slap tool-name on their forms will be feeding straight HTML into the agent, which presumably will be rejected either by the agent, or by WebMCP's spec text that processes the response before feeding it to the agent.
Do we have to worry about old/legacy action URL endpoints that don't expect to be hit as a result of agent actuation, suddenly being hit by it? I guess not, maybe the whole WebMCP proposal relies on the assumption that the AI agent would only call tools (and submit forms) that the user would do themselves if they weren't working through an agent, so maybe we don't need to worry about it. I'm just trying to make sure we don't end up in a situation where people think slapping tool-name on a <form> only hits the "agent-ready"-version of the endpoint if it exists and nothing else, and therefore it's free to call and doesn't have the same destructive properties as traditional forms. (When this is not true, since Accept: application/json makes no such guarantees; it just calls the usual endpoint, and the server has to be ready to take the hint as a part of content negotiation).

If either of the above concerns are big enough, I wonder if makes sense to use something like tool-url that overrides action...

add wip explainer for the declarative api

e3c23c0

MiguelsPizza force-pushed the declarative branch from 4558fa1 to e3c23c0 Compare September 17, 2025 04:07

MiguelsPizza marked this pull request as ready for review September 17, 2025 04:08

anssiko mentioned this pull request Sep 18, 2025

Declarative API Equivalent #22

Open

MiguelsPizza marked this pull request as draft October 15, 2025 16:13

anssiko mentioned this pull request Oct 16, 2025

WebML WG/CG F2F Agenda - TPAC 2025 (Kobe, Japan) webmachinelearning/meetings#35

Closed

anssiko added the Agenda+ label Oct 24, 2025

anssiko mentioned this pull request Dec 11, 2025

Transition from the WebMCP explainer to a Community Group spec draft #60

Open

domfarolino reviewed Dec 11, 2025

View reviewed changes

add explainer for the declarative api #26

Are you sure you want to change the base?

add explainer for the declarative api #26

Conversation

MiguelsPizza commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bwalderman commented Oct 8, 2025

Uh oh!

MiguelsPizza commented Oct 15, 2025

Uh oh!

vsakaria commented Nov 7, 2025

Uh oh!

anssiko commented Nov 11, 2025

Uh oh!

anssiko commented Nov 25, 2025

Uh oh!

anssiko commented Nov 25, 2025

Uh oh!

svenschultze commented Nov 25, 2025

Uh oh!

bwalderman commented Dec 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

domfarolino commented Dec 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

MiguelsPizza commented Sep 17, 2025 •

edited

Loading