Conversation
…ctionaries to manage memory-based zarr storage. Instances of `ManagedMemoryStore` have a URL representation based on the `id` of the backing dict, e.g. `memory://<id>/`. This means the same memory-backed store can be accessed without passing an explicit store reference.
Ensure that `ManagedMemoryStore` names do not contain the separator character "/" Ensure that `ManagedMemoryStore` instances are tied the PID of the creating process
…at/memory-store-registry
…at/memory-store-registry
|
Do you still show the memory address after these changes @d-v-b? Like David and Deepak, I have found that behavior useful (xref #2906 (comment)). |
In this PR I don't think there's any use for displaying the memory address, because for reference, here's the current behavior in this PR: |
|
Using an automatically-chosen name is opt-in. You can always specify your own name if you want: >>> ManagedMemoryStore(name="foo")
ManagedMemoryStore('memory://foo')
>>> ManagedMemoryStore(name=str(id({})))
ManagedMemoryStore('memory://4396301440') |
maxrjones
left a comment
There was a problem hiding this comment.
Thanks for putting this together @d-v-b. the memory:// URL syntax is a nice UX improvement and the weakref-based lifecycle management is a clean design. I have one bug, a couple of design questions, and some nits.
Bug
file:// URLs create a broken LocalStore
In make_store line 380:
elif parsed.scheme == "file" or not parsed.scheme:
return await make_store(Path(store_like), mode=mode, storage_options=storage_options)Path("file:///tmp/data") produces file:/tmp/data, so LocalStore ends up with root file://file:/tmp/data. This should use the parsed path:
return await make_store(Path(parsed.path), mode=mode, storage_options=storage_options)Verified:
>>> from zarr.storage._common import make_store
>>> store = await make_store("file:///tmp/zarr-test", mode="w")
>>> str(store)
'file://file:/tmp/zarr-test'Design questions
Thread safety of _ManagedStoreDictRegistry
get_or_create has a TOCTOU gap between self._registry.get(name) and self._registry[name] = store_dict — two threads creating a store with the same name could produce two separate backing dicts with one silently overwriting the other's registry entry. The _counter increment for auto-generated names has a similar race. The GIL makes the window narrow in CPython, but a threading.Lock around get_or_create would close it properly.
Duplicated memory:// routing in make_store and make_store_path
Both functions independently parse and handle memory:// URLs. The make_store_path version combines the URL path with the explicit path parameter, while make_store only uses the URL path — is that difference intentional? If so, a comment explaining why would help. If not, could make_store_path fall through to make_store for memory URLs like it does for other URL types?
Nits
-
from_urlerror message for non-memory schemes is misleading —ManagedMemoryStore.from_url("file:///tmp/test")says "The store may have been garbage collected" when the real issue is a wrong scheme. Maybe checkparsed.scheme != "memory"and raise a different message for that case vs. a missing registry entry. -
parse_store_urlis called twice inmake_store— once at line 337 (insidestorage_optionsvalidation) and again at line 373 (routing). The first result could be reused. -
TestManagedMemoryStore.set/getfixtures write directly to_store_dict[key]without path prefixing, so theStoreTestsbase class tests don't exercise the path prefix logic. The dedicated path tests cover this, but flagging in case it matters for the base suite's coverage.
|
thanks max! I will fix the bugs. and in general, no duplicated functionality was intended. the needless store / storepath split just makes this duplication very easy to fall into... |
…into feat/memory-store-registry
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3679 +/- ##
==========================================
+ Coverage 93.11% 93.20% +0.09%
==========================================
Files 85 85
Lines 11221 11330 +109
==========================================
+ Hits 10448 10560 +112
+ Misses 773 770 -3
🚀 New features to boost your workflow:
|
Also, change the `ExpectFail` helper class to require the `msg` parameter be set. Allowing a default value of `None` will make all `pytest.raises(case.exception, match=case.msg)` checks pass, which is not the intended behavior.
…/zarr-python into feat/memory-store-registry
This PR adds a new managed memory store class (
ManagedMemoryStore) that allows requesting in-memory stores with URL syntax like"memory://my-store/"That's the upside.The downside is that users can't use their own mutable mappings with this store, because we can't make weak references to generic mutable mappings, and so tracking user-defined mutable mappings would keep them from being garbage collected. Instead, instances of
ManagedMemoryStoreare constructed with name and path parameters.A note about pickling:
ManagedMemoryStorecan be pickled, but if it's unpickled in a separate process from where it was created, an exception is raised. Otherwise, writes in a multiprocessing context would appear successful when in fact they were writing to totally separate stores. I don't think there are any threading issues to worry about, but someone who knows more than me should check that.Basic usage:
closes #2906