-
Notifications
You must be signed in to change notification settings - Fork 21
add explainer for the declarative api #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
4558fa1 to
e3c23c0
Compare
|
This is great work. I do have one general question. Was reusing ARIA attributes instead of introducing new There are already attributes such as One benefit of using these existing attributes is that they are also surfaced in native accessibility APIs, so assistive tools that already use these APIs to access the page's accessibility tree would be able to access WebMCP tools declarations as well. |
|
@bwalderman This is a good idea, I'll put the PR in draft while I re-implement the ARIA based polyfill. The only thing I can think is that we still need a way to make exposing tools to the agent opt-in (or opt-out) Maybe we still tag elements with a |
|
The concern with the HTML method is of course the iFrame. Realistically speaking its stood up well for some years now. Would an iFrame in a browser be more trustworthy. I would prefer JSON and rendering on client. I am sure web components can be distributed with framework payloads and CSS. But the build process for this type of architecture would have to change. I would prefer that design. The trade off really is that payload would need to be disputed more frequently. |
|
@matatk to review for the accessibility group's perspective (aka APA WG). |
|
A new paper and implementation experience: https://arxiv.org/abs/2511.11287v1 @svenschultze & team, this W3C community group is developing a WebMCP API that is complemented with a declarative mechanism explored in this PR. Let’s join forces to explore this space. Here’s how to join: |
|
That was fast. I’m excited to welcome @svenschultze to the WebML Community Group! 🎉 |
|
Hi @anssiko, thank you for making me aware of this project! It is great to see the community converging on this. I'm happy to share some insights from our work on VOIX, where we implemented a similar declarative framework and tested it with developers.
|
|
Following up on my earlier suggestion to look at ARIA: ARIA attributes for browsing agents was discussed at TPAC and from what I understand, the consensus is this is not actually a good idea. One of the concerns raised was the same that @svenschultze mentioned above, that reusing ARIA could lead to conflicts of interest and possibly incentivize web developers to optimize for machines instead of people using a11y tools. |
| - To preserve correctness and avoid race conditions, the browser MUST enforce this ordering for a single tool invocation: | ||
| 1) Execute the tool and fully evaluate its body (JS `execute` or form submission/response parsing). | ||
| 2) Deliver the tool result to the client (agent) over MCP. | ||
| 3) Recompute the catalog (scan DOM + JS registrations), compute deltas, and if changed, emit [`notifications/tools/list_changed`](https://modelcontextprotocol.io/specification/2025-06-18/schema#notifications%2Ftools%2Flist-changed) to all connected clients for this server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user agent wouldn't have a step here to actively recompute the catalogue, right? I would assume that any side effects that result from executing the last tool (i.e., adding new elements to the DOM that hav a tool-name attribute, or invoking script which registers a new tool imperatively) would just keep the catalogue up-to-date always, right?
Apologies if I'm missing how MCP's notifications/tools/list_changed fits in here.
| 1) Execute the tool and fully evaluate its body (JS `execute` or form submission/response parsing). | ||
| 2) Deliver the tool result to the client (agent) over MCP. | ||
| 3) Recompute the catalog (scan DOM + JS registrations), compute deltas, and if changed, emit [`notifications/tools/list_changed`](https://modelcontextprotocol.io/specification/2025-06-18/schema#notifications%2Ftools%2Flist-changed) to all connected clients for this server. | ||
| 4) Apply any navigation/unmounts/redirects (e.g., those implied by `_meta.uiRedirect` or HTTP redirects). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the only thing that can specify an action that the browser should take after the response is obtained? Having a full list of actions that could be done (navigations, displaying more UI / appending things to the DOM) would be great—I'm not sure what MCP specifies is possible outside of _meta.uiRedirect.
| 2) Deliver the tool result to the client (agent) over MCP. | ||
| 3) Recompute the catalog (scan DOM + JS registrations), compute deltas, and if changed, emit [`notifications/tools/list_changed`](https://modelcontextprotocol.io/specification/2025-06-18/schema#notifications%2Ftools%2Flist-changed) to all connected clients for this server. | ||
| 4) Apply any navigation/unmounts/redirects (e.g., those implied by `_meta.uiRedirect` or HTTP redirects). | ||
| - In particular, the tool that is currently executing MUST NOT be unmounted (e.g., DOM removal or JS unregistration) until after its result has been delivered to the client and any [`notifications/tools/list_changed`](https://modelcontextprotocol.io/specification/2025-06-18/schema#notifications%2Ftools%2Flist-changed) notification has been sent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super clear about this part. Imagine the user submits a tool_name= form, but before the agent receives the response, some random script comes in and calls form.remove(), removing it from the DOM. Is this line saying that we shouldn't let that DOM API work? That's not really feasible. What is the intent here?
|
|
||
| Notification shape | ||
| - Method: [`notifications/tools/list_changed`](https://modelcontextprotocol.io/specification/2025-06-18/schema#notifications%2Ftools%2Flist-changed) | ||
| - Use when: any change to the catalog (addition, removal, or metadata/schema change) is observed after a recomputation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this make sense to me. Basically this would mean the browser sends a list_changed notification to the ... agent? ... whenever declarative tool is appended to or removed from the DOM, or whenever the tool-name attribute is added to an existing <form>, <a>, or <button>, right?
| ``` | ||
| interface ToolListChangedNotification { | ||
| method: "notifications/tools/list_changed"; | ||
| params?: { _meta?: { [key: string]: unknown }; [key: string]: unknown }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just so I'm clear, is this the shape of the notification that's sent to the agent? (I'm still cutting my teeth on the whole MCP spec, so bear with me if this is a dumb question...
| P-->>B: result | ||
| else Declarative form | ||
| B->>S: fetch(action, method, body/query) [Accept: JSON] | ||
| S-->>B: JSON result (optional Location) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is Location? Is this the meta UI redirect?
| S-->>B: JSON result (optional Location) | ||
| end | ||
| B-->>A: Return CallToolResult | ||
| B->>P: Recompute catalog (DOM + JS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this step is what I was commenting on earlier. I'm not totally sure why we'd need to recompute the list of available tools at this stage. I'd assumed this would be automatically kept up to date by operations that add/remove declarative tools (or imperative, JS-based tools) on its own. Is it possible that the CallToolResult JSON specifies new tools to be added?
| - Slightly delays visible navigation until result delivery. | ||
| - Page authors can’t forcibly interrupt an in-flight call (must cancel cooperatively). | ||
|
|
||
| Strategy B — Navigate-early (allow unmount during execution) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to make sure I'm clear on what "Navigate-early" means here—what is the navigation referring to? Are you referring to the action attribute's URL for the form? Or the navigation that the agent would tell the browser to do, once it receives the meta UI redirect? I think you're referring to the former.
On that assumption, I'm not sure what Strategy B really looks like. Given what you say below in "Matches default web navigation behavior", it sounds like submitting a form would trigger a navigation to the form's action attribute URL. So what does the agent get in return? Does it also fetch the Accept: JSON request alongside the navigation, race the two requests, and sometimes the agent receives the JSON before the navigation completes, and other times the navigation completes first and the agent receives nothing? Is that "Strategy B"?
| - Visibility: Ignore `tool-elicit` on non-interactive controls such as `input[type=hidden]`, `disabled`, or `readonly` controls. Authors should instead expose an interactive control if they want user input. | ||
| - Merging: Final submitted values come from the elicitation UI for elicited fields (prefilled by agent/defaults); all other fields submit resolved values from agent/defaults/DOM as usual. | ||
| - Validation: Standard HTML constraints gate submission. `required` still applies; `tool-elicit` doesn’t change schema. | ||
| - UX: The browser chooses the UI, but SHOULD display each elicited field’s label or `tool-param-title` and (optionally) `tool-param-description`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the UI resource used for the elicitation of user input is entirely in the control of the browser, right? The agent does not provide any HTML/CSS resources to display to the user to collect such information?
|
Thinking through it a little more, I'm a little concerned about the
If either of the above concerns are big enough, I wonder if makes sense to use something like |
This draft proposal outlines a declarative WebMCP API that enables web pages to expose tools via HTML, using minimal attributes like tool-name and standard form semantics. Currently, it's a compilation of my notes and ideas from developing this approach, and I'm sharing it to gather feedback before the September 18th working group meeting. I'm particularly interested in your thoughts on the open questions (e.g., JSON vs. HTML responses, elicitation flows), tradeoffs, and overall API design.
The proposed API was shaped by building a real application and polyfill during the MCP enterprise hackathon, where our team successfully implemented it (and took home the win, which was exciting validation!). You can see a video of a Rails app using declarative WebMCP tools to enable complex browser automation without client-side JavaScript: link.
Based on your feedback, I'll refine this draft to align more closely with the structured format and narrative style of other explainers in the repo.