Skip to content

Conversation

@shayne-fletcher
Copy link

Summary:
this change extends the explicit-shutdown work from the previous diff into the provisioner layer. the provisioner was still leaving HostMesh instances (including the implicit local host) alive at teardown, which meant their underlying processes could still be terminated via SIGTERM during interpreter or allocator shutdown. that put us back on the SIGTERM path that produces the unwanted folly backtrace on exit.

we now explicitly await host_mesh.shutdown() for every HostMesh managed by the provisioner, and then call shutdown_context() to tear down the remaining actor runtime state. this ensures all meshes and their processes exit cleanly under their own shutdown path rather than being dropped and signaled, keeping the shutdown sequence free of SIGTERM-driven backtraces.

i expect this diff together with

Differential Revision: D88856830

@meta-codesync
Copy link

meta-codesync bot commented Dec 10, 2025

@shayne-fletcher has exported this pull request. If you are a Meta employee, you can view the originating Diff in D88856830.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 10, 2025
@allenwang28
Copy link
Contributor

hey @shayne-fletcher , thanks for the PR! Unfortunately we cannot check this in yet as it breaks CI which is still pinned to the latest Monarch stable cc @JenniferWang

@shayne-fletcher
Copy link
Author

hey @shayne-fletcher , thanks for the PR! Unfortunately we cannot check this in yet as it breaks CI which is still pinned to the latest Monarch stable cc @JenniferWang

thanks allen. the diff is still in review but assuming it's accepted, i would appreciate a steer on the procedures for getting it landed.

Summary:
this change extends the explicit-shutdown work from D88796970 into the provisioner layer. the provisioner was leaving `HostMesh` instances (including the implicit local host) alive at teardown, which meant underlying processes are terminated via `SIGTERM `during interpreter or allocator shutdown. that in trun produces an unwanted folly backtrace on exit.

we now explicitly `await host_mesh.shutdown()` for every `HostMesh` managed by the provisioner, and then call `shutdown_context()` to tear down the remaining actor runtime state. this ensures all meshes and their processes exit cleanly under their own shutdown path rather than being dropped and signaled, keeping the shutdown sequence free of `SIGTERM`-driven backtraces.

i expect this diff together with D88796970 will fix the issue reported in this workplace post: https://fburl.com/workplace/ezbeze89 (T246657799).

Differential Revision: D88856830
@JenniferWang
Copy link
Contributor

hey @shayne-fletcher , thanks for the PR! Unfortunately we cannot check this in yet as it breaks CI which is still pinned to the latest Monarch stable cc @JenniferWang

thanks allen. the diff is still in review but assuming it's accepted, i would appreciate a steer on the procedures for getting it landed.

Sorry for the delayed response @shayne-fletcher. We're actively working on moving forge nightly build and CI, in the future, shall be run on the Monarch nightly build. Tracked in #625 .

Not sure what's the common practice for OSS but i am thinking merging this diff once we closed the above issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants