-
-
Notifications
You must be signed in to change notification settings - Fork 33.7k
gh-132657: Add maybe_enable_deferred_ref_count(). #142843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
If we are specializing to LOAD_GLOBAL_MODULE, set deferred reference counting for the value, if it meets criteria. For now, it's only done for frozenset objects.
Use it for LOAD_ATTR_MODULE in addition to LOAD_GLOBAL_MODULE. Don't enable deferred ref counts if the object is owned by the current thread. Specialized bytecode is per-thread so this works. Enable for frozensets, tuples and type objects.
colesbury
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense and is worth doing. Let's also add a scaling test to ftscalingbench.py
Python/specialize.c
Outdated
| PyObject *op; | ||
| if (PyDict_GetItemRef(dict, name, &op) != 1) { | ||
| return; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's combine this lookup with the earlier _PyDict_LookupIndex. _PyDict_LookupIndex already gets the value internally and throws it away.
Python/specialize.c
Outdated
| if (_Py_IsOwnedByCurrentThread(op) || | ||
| !PyType_IS_GC(Py_TYPE(op)) || | ||
| _PyObject_HasDeferredRefcount(op)) { | ||
| Py_DECREF(op); | ||
| return; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The !PyType_IS_GC and _PyObject_HasDeferredRefcount checks aren't really necessary because PyUnstable_Object_EnableDeferredRefcount handles those cases internally.
Python/specialize.c
Outdated
| if (PyFrozenSet_Check(op) || PyTuple_Check(op) || PyType_Check(op)) { | ||
| PyUnstable_Object_EnableDeferredRefcount(op); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should do this for all objects, not just a few types. Otherwise, I think we will keep run into scaling bottlenecks with global variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm... That's going to push a lot of objects to the GC for deallocation in programs that do a lot of work at the global level (but do have a few functions that are called enough that use those globals.) I'm thinking of naively written "scripts". Then again, it would only do that for objects that participate in GC (i.e. not strings or integers), so it's probably fine. I mean, nobody's too surprised when things aren't deallocated quite as quickly as they expect... right?
Yeah, I think this should be fine for main. I definitely wouldn't backport it to 3.14.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely agree regarding not backporting to 3.14.
I think the above check that excludes owned by the current thread should help with most scripts -- it'll only affect global variables accessed by multiple threads.
Another thing that may help is that I think the PyUnstable_Object_EnableDeferredRefcount call only happens at specialization time, not every time a variable is updated or accessed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm a bit worried about increased memory usage. Since it's only for objects shared between threads and only for hot LOAD_GLOBAL/LOAD_ATTR instructions, maybe it's okay. Perhaps we should consider changing gc_should_collect() to check not only the young object count but also the process memory size increase as well. The condition with count < gcstate->long_lived_total / 4 could be done with an "or" condition with the process size check.
Also, avoid PyDict_GetItemRef() call by returning the value when the index is looked up (as previously discarded).
This is taken from the PR pythonGH-132658.
If we are specializing to LOAD_GLOBAL_MODULE, set deferred reference counting for the value, if it meets criteria. For now, it's only done for frozenset objects.