pydantic
diff --git a/‎docs/agents.md‎
Lines changed: 4 additions & 3 deletions b/‎docs/agents.md‎
Lines changed: 4 additions & 3 deletions
diff --git a/‎docs/durable_execution/temporal.md‎
Lines changed: 19 additions & 18 deletions b/‎docs/durable_execution/temporal.md‎
Lines changed: 19 additions & 18 deletions
diff --git a/‎docs/output.md‎
Lines changed: 9 additions & 0 deletions b/‎docs/output.md‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎docs/tools-advanced.md‎
Lines changed: 39 additions & 0 deletions b/‎docs/tools-advanced.md‎
Lines changed: 39 additions & 0 deletions
diff --git a/‎pydantic_ai_slim/pydantic_ai/_agent_graph.py‎
Lines changed: 48 additions & 29 deletions b/‎pydantic_ai_slim/pydantic_ai/_agent_graph.py‎
Lines changed: 48 additions & 29 deletions
diff --git a/‎pydantic_ai_slim/pydantic_ai/_tool_manager.py‎
Lines changed: 1 addition & 3 deletions b/‎pydantic_ai_slim/pydantic_ai/_tool_manager.py‎
Lines changed: 1 addition & 3 deletions
@@ -125,10 +125,11 @@ It also takes an optional `event_stream_handler` argument that you can use to ga
 The example below shows how to stream events and text output. You can also [stream structured output](output.md#streaming-structured-output).
 
 !!! note
-    As the `run_stream()` method will consider the first output matching the [output type](output.md#structured-output) to be the final output,
-    it will stop running the agent graph and will not execute any tool calls made by the model after this "final" output.
+    The `run_stream()` method will consider the first output that matches the [output type](output.md#structured-output) to be the final output of the agent run, even when the model generates tool calls after this "final" output.
 
-    If you want to always run the agent graph to completion and stream all events from the model's streaming response and the agent's execution of tools,
+	These "dangling" tool calls will not be executed unless the agent's [`end_strategy`][pydantic_ai.agent.Agent.end_strategy] is set to `'exhaustive'`, and even then their results will not be sent back to the model as the agent run will already be considered completed.
+
+    If you want to always keep running the agent when it performs tool calls, and stream all events from the model's streaming response and the agent's execution of tools,
     use [`agent.run_stream_events()`][pydantic_ai.agent.AbstractAgent.run_stream_events] or [`agent.iter()`][pydantic_ai.agent.AbstractAgent.iter] instead, as described in the following sections.
 
 ```python {title="run_stream_event_stream_handler.py"}
 
@@ -86,8 +86,8 @@ from temporalio.worker import Worker
 
 from pydantic_ai import Agent
 from pydantic_ai.durable_exec.temporal import (
-    AgentPlugin,
     PydanticAIPlugin,
+    PydanticAIWorkflow,
     TemporalAgent,
 )
 
@@ -101,26 +101,27 @@ temporal_agent = TemporalAgent(agent)  # (1)!
 
 
 @workflow.defn
-class GeographyWorkflow:  # (2)!
+class GeographyWorkflow(PydanticAIWorkflow):  # (2)!
+    __pydantic_ai_agents__ = [temporal_agent]  # (3)!
+
     @workflow.run
     async def run(self, prompt: str) -> str:
-        result = await temporal_agent.run(prompt)  # (3)!
+        result = await temporal_agent.run(prompt)  # (4)!
         return result.output
 
 
 async def main():
-    client = await Client.connect(  # (4)!
-        'localhost:7233',  # (5)!
-        plugins=[PydanticAIPlugin()],  # (6)!
+    client = await Client.connect(  # (5)!
+        'localhost:7233',  # (6)!
+        plugins=[PydanticAIPlugin()],  # (7)!
     )
 
-    async with Worker(  # (7)!
+    async with Worker(  # (8)!
         client,
         task_queue='geography',
         workflows=[GeographyWorkflow],
-        plugins=[AgentPlugin(temporal_agent)],  # (8)!
     ):
-        output = await client.execute_workflow(  # (9)!
+        output = await client.execute_workflow(  # (10)!
             GeographyWorkflow.run,
             args=['What is the capital of Mexico?'],
             id=f'geography-{uuid.uuid4()}',
@@ -131,15 +132,15 @@ async def main():
 ```
 
 1. The original `Agent` cannot be used inside a deterministic Temporal workflow, but the `TemporalAgent` can.
-2. As explained above, the workflow represents a deterministic piece of code that can use non-deterministic activities for operations that require I/O.
-3. [`TemporalAgent.run()`][pydantic_ai.durable_exec.temporal.TemporalAgent.run] works just like [`Agent.run()`][pydantic_ai.Agent.run], but it will automatically offload model requests, tool calls, and MCP server communication to Temporal activities.
-4. We connect to the Temporal server which keeps track of workflow and activity execution.
-5. This assumes the Temporal server is [running locally](https://github.com/temporalio/temporal#download-and-start-temporal-server-locally).
-6. The [`PydanticAIPlugin`][pydantic_ai.durable_exec.temporal.PydanticAIPlugin] tells Temporal to use Pydantic for serialization and deserialization, and to treat [`UserError`][pydantic_ai.exceptions.UserError] exceptions as non-retryable.
-7. We start the worker that will listen on the specified task queue and run workflows and activities. In a real world application, this might be run in a separate service.
-8. The [`AgentPlugin`][pydantic_ai.durable_exec.temporal.AgentPlugin] registers the `TemporalAgent`'s activities with the worker.
-9. We call on the server to execute the workflow on a worker that's listening on the specified task queue.
-10. The agent's `name` is used to uniquely identify its activities.
+2. As explained above, the workflow represents a deterministic piece of code that can use non-deterministic activities for operations that require I/O. Subclassing [`PydanticAIWorkflow`][pydantic_ai.durable_exec.temporal.PydanticAIWorkflow] is optional but provides proper typing for the `__pydantic_ai_agents__` class variable.
+3. List the `TemporalAgent`s used by this workflow. The [`PydanticAIPlugin`][pydantic_ai.durable_exec.temporal.PydanticAIPlugin] will automatically register their activities with the worker. Alternatively, if modifying the worker initialization is easier than the workflow class, you can use [`AgentPlugin`][pydantic_ai.durable_exec.temporal.AgentPlugin] to register agents directly on the worker.
+4. [`TemporalAgent.run()`][pydantic_ai.durable_exec.temporal.TemporalAgent.run] works just like [`Agent.run()`][pydantic_ai.Agent.run], but it will automatically offload model requests, tool calls, and MCP server communication to Temporal activities.
+5. We connect to the Temporal server which keeps track of workflow and activity execution.
+6. This assumes the Temporal server is [running locally](https://github.com/temporalio/temporal#download-and-start-temporal-server-locally).
+7. The [`PydanticAIPlugin`][pydantic_ai.durable_exec.temporal.PydanticAIPlugin] tells Temporal to use Pydantic for serialization and deserialization, treats [`UserError`][pydantic_ai.exceptions.UserError] exceptions as non-retryable, and automatically registers activities for agents listed in `__pydantic_ai_agents__`.
+8. We start the worker that will listen on the specified task queue and run workflows and activities. In a real world application, this might be run in a separate service.
+9. The agent's `name` is used to uniquely identify its activities.
+10. We call on the server to execute the workflow on a worker that's listening on the specified task queue.
 
 _(This example is complete, it can be run "as is" — you'll need to add `asyncio.run(main())` to run `main`)_
 
 
@@ -306,6 +306,15 @@ print(repr(result.output))
 
 _(This example is complete, it can be run "as is")_
 
+##### Parallel Output Tool Calls
+
+When the model calls other tools in parallel with an output tool, you can control how tool calls are executed by setting the agent's [`end_strategy`][pydantic_ai.agent.Agent.end_strategy]:
+
+- `'early'` (default): Output tools are executed first. Once a valid final result is found, remaining function and output tool calls are skipped
+- `'exhaustive'`: Output tools are executed first, then all function tools are executed. The first valid output tool result becomes the final output
+
+The `'exhaustive'` strategy is useful when tools have important side effects (like logging, sending notifications, or updating metrics) that should always execute.
+
 #### Native Output
 
 Native Output mode uses a model's native "Structured Outputs" feature (aka "JSON Schema response format"), where the model is forced to only output text matching the provided JSON schema. Note that this is not supported by all models, and sometimes comes with restrictions. For example, Gemini cannot use tools at the same time as structured output, and attempting to do so will result in an error.
 
@@ -371,6 +371,38 @@ def my_flaky_tool(query: str) -> str:
 
 Raising `ModelRetry` also generates a `RetryPromptPart` containing the exception message, which is sent back to the LLM to guide its next attempt. Both `ValidationError` and `ModelRetry` respect the `retries` setting configured on the `Tool` or `Agent`.
 
+### Tool Timeout
+
+You can set a timeout for tool execution to prevent tools from running indefinitely. If a tool exceeds its timeout, it is treated as a failure and a retry prompt is sent to the model (counting towards the retry limit).
+
+```python
+import asyncio
+
+from pydantic_ai import Agent
+
+# Set a default timeout for all tools on the agent
+agent = Agent('test', tool_timeout=30)
+
+
+@agent.tool_plain
+async def slow_tool() -> str:
+    """This tool will use the agent's default timeout (30 seconds)."""
+    await asyncio.sleep(10)
+    return 'Done'
+
+
+@agent.tool_plain(timeout=5)
+async def fast_tool() -> str:
+    """This tool has its own timeout (5 seconds) that overrides the agent default."""
+    await asyncio.sleep(1)
+    return 'Done'
+```
+
+- **Agent-level timeout**: Set `tool_timeout` on the [`Agent`][pydantic_ai.Agent] to apply a default timeout to all tools.
+- **Per-tool timeout**: Set `timeout` on individual tools via [`@agent.tool`][pydantic_ai.Agent.tool], [`@agent.tool_plain`][pydantic_ai.Agent.tool_plain], or the [`Tool`][pydantic_ai.tools.Tool] dataclass. This overrides the agent-level default.
+
+When a timeout occurs, the tool is considered to have failed and the model receives a retry prompt with the message `"Timed out after {timeout} seconds."`. This counts towards the tool's retry limit just like validation errors or explicit [`ModelRetry`][pydantic_ai.exceptions.ModelRetry] exceptions.
+
 ### Parallel tool calls & concurrency
 
 When a model returns multiple tool calls in one response, Pydantic AI schedules them concurrently using `asyncio.create_task`.
@@ -381,6 +413,13 @@ Async functions are run on the event loop, while sync functions are offloaded to
 !!! note "Limiting tool executions"
     You can cap tool executions within a run using [`UsageLimits(tool_calls_limit=...)`](agents.md#usage-limits). The counter increments only after a successful tool invocation. Output tools (used for [structured output](output.md)) are not counted in the `tool_calls` metric.
 
+#### Output Tool Calls
+
+When a model calls an [output tool](output.md#tool-output) in parallel with other tools, the agent's [`end_strategy`][pydantic_ai.agent.Agent.end_strategy] parameter controls how these tool calls are executed.
+The `'exhaustive'` strategy ensures all tools are executed even after a final result is found, which is useful when tools have side effects (like logging, sending notifications, or updating metrics) that should always execute.
+
+For more information of how `end_strategy` works with both function tools and output tools, see the [Output Tool](output.md#parallel-output-tool-calls) docs.
+
 ## See Also
 
 - [Function Tools](tools.md) - Basic tool concepts and registration
 
@@ -60,11 +60,6 @@
 S = TypeVar('S')
 NoneType = type(None)
 EndStrategy = Literal['early', 'exhaustive']
-"""The strategy for handling multiple tool calls when a final result is found.
-
-- `'early'`: Stop processing other tool calls once a final result is found
-- `'exhaustive'`: Process all tool calls even after finding a final result
-"""
 DepsT = TypeVar('DepsT')
 OutputT = TypeVar('OutputT')
 
@@ -865,35 +860,56 @@ async def process_tool_calls(  # noqa: C901
 
     # First, we handle output tool calls
     for call in tool_calls_by_kind['output']:
-        if final_result:
-            if final_result.tool_call_id == call.tool_call_id:
-                part = _messages.ToolReturnPart(
-                    tool_name=call.tool_name,
-                    content='Final result processed.',
-                    tool_call_id=call.tool_call_id,
-                )
-            else:
-                yield _messages.FunctionToolCallEvent(call)
-                part = _messages.ToolReturnPart(
-                    tool_name=call.tool_name,
-                    content='Output tool not used - a final result was already processed.',
-                    tool_call_id=call.tool_call_id,
-                )
-                yield _messages.FunctionToolResultEvent(part)
-
+        # `final_result` can be passed into `process_tool_calls` from `Agent.run_stream`
+        # when streaming and there's already a final result
+        if final_result and final_result.tool_call_id == call.tool_call_id:
+            part = _messages.ToolReturnPart(
+                tool_name=call.tool_name,
+                content='Final result processed.',
+                tool_call_id=call.tool_call_id,
+            )
+            output_parts.append(part)
+        # Early strategy is chosen and final result is already set
+        elif ctx.deps.end_strategy == 'early' and final_result:
+            yield _messages.FunctionToolCallEvent(call)
+            part = _messages.ToolReturnPart(
+                tool_name=call.tool_name,
+                content='Output tool not used - a final result was already processed.',
+                tool_call_id=call.tool_call_id,
+            )
+            yield _messages.FunctionToolResultEvent(part)
             output_parts.append(part)
+        # Early strategy is chosen and final result is not yet set
+        # Or exhaustive strategy is chosen
         else:
             try:
                 result_data = await tool_manager.handle_call(call)
             except exceptions.UnexpectedModelBehavior as e:
-                ctx.state.increment_retries(
-                    ctx.deps.max_result_retries, error=e, model_settings=ctx.deps.model_settings
-                )
-                raise e  # pragma: lax no cover
+                # If we already have a valid final result, don't fail the entire run
+                # This allows exhaustive strategy to complete successfully when at least one output tool is valid
+                if final_result:
+                    # If output tool fails when we already have a final result, skip it without retrying
+                    yield _messages.FunctionToolCallEvent(call)
+                    part = _messages.ToolReturnPart(
+                        tool_name=call.tool_name,
+                        content='Output tool not used - output failed validation.',
+                        tool_call_id=call.tool_call_id,
+                    )
+                    output_parts.append(part)
+                    yield _messages.FunctionToolResultEvent(part)
+                else:
+                    # No valid result yet, so this is a real failure
+                    ctx.state.increment_retries(
+                        ctx.deps.max_result_retries, error=e, model_settings=ctx.deps.model_settings
+                    )
+                    raise e  # pragma: lax no cover
             except ToolRetryError as e:
-                ctx.state.increment_retries(
-                    ctx.deps.max_result_retries, error=e, model_settings=ctx.deps.model_settings
-                )
+                # If we already have a valid final result, don't increment retries for invalid output tools
+                # This allows the run to succeed if at least one output tool returned a valid result
+                if not final_result:
+                    ctx.state.increment_retries(
+                        ctx.deps.max_result_retries, error=e, model_settings=ctx.deps.model_settings
+                    )
                 yield _messages.FunctionToolCallEvent(call)
                 output_parts.append(e.tool_retry)
                 yield _messages.FunctionToolResultEvent(e.tool_retry)
@@ -904,7 +920,10 @@ async def process_tool_calls(  # noqa: C901
                     tool_call_id=call.tool_call_id,
                 )
                 output_parts.append(part)
-                final_result = result.FinalResult(result_data, call.tool_name, call.tool_call_id)
+
+                # In both `early` and `exhaustive` modes, use the first output tool's result as the final result
+                if not final_result:
+                    final_result = result.FinalResult(result_data, call.tool_name, call.tool_call_id)
 
     # Then, we handle function tool calls
     calls_to_run: list[_messages.ToolCallPart] = []
 
@@ -172,9 +172,7 @@ async def _call_tool(
                     call.args or {}, allow_partial=pyd_allow_partial, context=ctx.validation_context
                 )
 
-            result = await self.toolset.call_tool(name, args_dict, ctx, tool)
-
-            return result
+            return await self.toolset.call_tool(name, args_dict, ctx, tool)
         except (ValidationError, ModelRetry) as e:
             max_retries = tool.max_retries if tool is not None else 1
             current_retry = self.ctx.retries.get(name, 0)