Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible leak in double-booking cursor #31

Open
ankostis opened this issue Oct 23, 2016 · 8 comments
Open

Possible leak in double-booking cursor #31

ankostis opened this issue Oct 23, 2016 · 8 comments

Comments

@ankostis
Copy link
Contributor

Probably 70fae1f made a double acounting of region-clients in smmap.mman.py#L136, where a "manual" increment_client_count() follows a _obtain_region() which it also increments the client-count.

From gitdb TC gitdb.test.test_pack:TestPack.test_pack_entity() I can see that this is indeed the case, where it is increased 2 x 3 times for 3 mmaps, while it is decremented just 3 times.

ankostis added a commit to ankostis/smmap that referenced this issue Oct 24, 2016
Possible fixing gitpython-developers#31 double mmap increment, stop decrement on
destruction, rely on with... resources.
ankostis added a commit to ankostis/smmap that referenced this issue Oct 24, 2016
Possible fixing gitpython-developers#31 double mmap increment, stop decrement on
destruction, rely on with... resources.
ankostis added a commit to ankostis/smmap that referenced this issue Oct 24, 2016
Possible fixing gitpython-developers#31 by stop decrement on
destruction, rely only on `with...` resources.
ankostis added a commit to ankostis/smmap that referenced this issue Oct 24, 2016
+ Possible fixing gitpython-developers#31 by stop decrement on
destruction, rely only on `with...` resources.
+ feat(mmap): utility to check if regions have been closed (PY3-only).
@ankostis
Copy link
Contributor Author

ankostis commented Oct 25, 2016

OK, false alarm.
If I understand it well, mmap-manager always keeps a region-list for some file, even it is empty (no regions remain), so client-counts are always 1. You have to invoke mman.collect() to clear them.

On Windows, this is necessary before renaming/deleting git-repo, since if any (mmap) files open to some index/pack objects forbid these objects to move/delete.
This is explained in smmap.mman.force_map_handle_removal_win(), but actually, invoking mmap.collect() SHOULD be enough, if client codes does not wast regions (leaks).

I have done some minor refactorings (bba086a) and style changes to go along with leak-job done in gitdb, plus one assertion method to check if there are any regions with closed undelying mmaps (works only for PY3).

@ankostis
Copy link
Contributor Author

@Byron I'm reopening this to ask whether this behavior (of keeping always one ref to a region) was on purpose?

I mean, now with I have retrofitted gitdb to use contextmanagers for streams and packers, and it works - all regions are de-registered in the end. But mmaps (and underlying files) are not closed automatically, and i have to invoke mman.collect() in specific places. This can be done with 2 ways:

  1. retrofit StaticWindowMapManager as a context-manager, and invoke collect() at __exit__();
  2. close mmaps automatically when client-count reaches to 1.

Do you have any view on that?

@ankostis ankostis reopened this Oct 25, 2016
@Byron
Copy link
Member

Byron commented Oct 25, 2016

@ankostis Real quick (at lunch break): unfortunately I wrote it so long ago, that I have no idea about my intentions anymore. I just know that the entire ref-counting business was a mess, and at some point I just wanted to make it work while getting the impression it is still correct.
To me it feels it would indeed be best to use a context-manager, as they semantically exactly what we need: prevent leakage thanks to proper scoping.
Given my lack of memory, I believe by now you know the code better than I do, and should have free reign over how it has to change to actually work.

@Byron
Copy link
Member

Byron commented Oct 25, 2016

By the way, I am super happy to see you are paving your way through the entire codebase, and I am extremely grateful for that (knowing that it must be very frustrating and all).

@ankostis
Copy link
Contributor Author

@Byron you should have warned me that since GitPython#307, smmap is not used at all - all db-access is routed solely through GitCmdDB class. I think I just wasted 2-weeks job on a defunct project...pitty, since I believe I can now redesign the whole management of resources & buffers with context-managers and all :-(

@Byron
Copy link
Member

Byron commented Oct 30, 2016

I am very sorry you feel that way, as I had no intention to misinform you.
It's just that people can use the pure python version as backend, even
though it's entirely optional.
In the past issues with leaked resources forced me to change the default
from python to cgit. Now that this is fixed I would rather change it back
again.
What do you think?
On Sun, 30 Oct 2016 at 15:35, Kostis Anagnostopoulos <
[email protected]> wrote:

@Byron https://github.com/Byron you should have warned me that since
GitPython#307
gitpython-developers/GitPython#546, smmap is
not used at all - all db-access is routed solely through GitCmdDB class.
I think I just wasted 2-weeks job on a defunct project...pitty, since I
believe I can now redesign the whole management of resources & buffers with
context-managers and all :-(


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub
#31 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAD4hqXUVaqvx-Vp6XmME8xB3-rUmB5Cks5q5KshgaJpZM4KeSsx
.

@ankostis
Copy link
Contributor Author

ankostis commented Oct 30, 2016

No, no pun intended.
And no, you cannot plug smmap back, it is not ready.

But I will write down some conclusions, for my efforts not to go in vain:

  1. in PY3 (and Windows more so) the main problem is rogue memoryview instantiations, preventing underlying mmap instances to close (see this QnD notebook).
  2. Resource management coupled with non-deterministic destructors is not working - either you have to be strict (only one entrance/exit allowed, to debug missuses) or you have to waste hours to build flexible APIs that work both as context-managers and with destructors, and eventually...break under the burden of bad client usage!
  3. Resource accounting is easier when centralized, ala DB, i.e., having only immutable handles (regions and cursors) delegating associations to the central "manager" who accomplishes that with "indexes" (enhanced dictionaries). In case of errors, or half-finished associations due to errors, you can easily inspect those "indexes" to detect irregularities, than inspecting dispersed instances.
  4. Window-handles (regions & cursors) are easier when immutable and implemented with a very simple context-management behavior: "fire once, enter/exit, and then die".
  5. Maintaining PY2.6, PY3.3 is heavily taxing this project.
  6. I'm not sure it is easy to provide a sound API for a tiling memory-manager & sliding cursors. [Edit:] Maybe weakrefs is the answer.

@Byron
Copy link
Member

Byron commented Dec 8, 2016

Reading all this it dawns on me that not using ContextManagers everywhere was the worst mistake that GitPython suffers from to date. It's all based on my initial observations that __del__() is actually called consistently when the last reference to an object is dropped.

Add the split-personality of the Python ecosystem (py2 vs py3), and one ends up in a situation where python isn't really supposed to be used for anything that doesn't yet have the massively matured and proven implementation one might need to solve the problem at hand.

It really aches me to see all this time invested just to finally have to succumb to the overall brokenness of the system. Certainly there is a fix for everything, but a rewrite might be easier after all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants