PyPI - shared-tensor - Versions diffs - 0.2.2__tar.gz → 0.2.4__tar.gz - Mend

shared-tensor 0.2.2tar.gz → 0.2.4tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

{shared_tensor-0.2.2 → shared_tensor-0.2.4}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: shared-tensor
-Version: 0.2.2
+Version: 0.2.4
 Summary: Local endpoint-oriented RPC for same-host same-GPU PyTorch IPC
 Author-email: Athena Team <contact@world-sim-dev.org>
 Maintainer-email: Athena Team <contact@world-sim-dev.org>
@@ -25,6 +25,7 @@ Classifier: Topic :: System :: Distributed Computing
 Requires-Python: >=3.10
 Description-Content-Type: text/markdown
 License-File: LICENSE
+Requires-Dist: cloudpickle>=3.0.0
 Requires-Dist: numpy<2
 Requires-Dist: requests>=2.25.0
 Requires-Dist: torch>=2.2.0
@@ -63,6 +64,7 @@ Dynamic: license-file
 - task-backed slow object construction
 - endpoint-level serialization and cache-key singleflight
 - zero-branch auto mode driven by `SHARED_TENSOR_ROLE`
+- auto mode is gated by `SHARED_TENSOR_ENABLED=1`
 - port routing by `base_port + cuda_device_index`
 ## What It Does Not Support
@@ -104,9 +106,40 @@ conda activate shared-tensor-dev
 pip install -e ".[dev,test]"
 ```
-## Zero-Branch Example
+## Enabling Auto Mode
-One file:
+`SharedTensorProvider()` now defaults to safe local mode unless you explicitly enable shared-tensor behavior.
+Global default:
+```bash
+export SHARED_TENSOR_ENABLED=1
+```
+Per-provider override:
+```python
+provider = SharedTensorProvider(enabled=True)
+provider = SharedTensorProvider(enabled=False)
+provider = SharedTensorProvider(enabled=None)
+```
+`enabled=None` means do not override and keep using the environment variable.
+Then `execution_mode="auto"` behaves like this:
+- `enabled=False`: provider stays in local mode
+- `enabled=True` and `SHARED_TENSOR_ROLE=server`: auto-start server and execute locally on the server side
+- `enabled=True` and no role set: provider becomes a client wrapper
+- `enabled=None`: fall back to `SHARED_TENSOR_ENABLED`
+This makes accidental opt-in much less likely in scripts that import shared endpoints but did not intend to start RPC behavior.
+## Example 1: Zero-Branch Auto Mode
+See [examples/zero_branch_env.py](./examples/zero_branch_env.py).
+One file, two processes, no branch in user code:
 ```python
 import torch
@@ -141,55 +174,56 @@ if __name__ == "__main__":
 Run process A as the auto server:
 ```bash
-SHARED_TENSOR_ROLE=server python demo.py
+SHARED_TENSOR_ENABLED=1 SHARED_TENSOR_ROLE=server python demo.py
 ```
 Run process B as the client with the exact same file:
 ```bash
+SHARED_TENSOR_ENABLED=1 python demo.py
+```
+Equivalent stepwise form:
+```bash
+export SHARED_TENSOR_ENABLED=1
+SHARED_TENSOR_ROLE=server python demo.py
 python demo.py
 ```
 Behavior:
+- `SHARED_TENSOR_ENABLED=1` enables shared-tensor auto behavior for providers that keep `enabled=None`
 - `SHARED_TENSOR_ROLE=server` makes the provider auto-start a background localhost daemon
 - in the server process, shared functions still execute locally
 - in the client process, the same function names become RPC wrappers
 - no `SHARED_TENSOR_HOST` is used; transport is fixed to `127.0.0.1`
 - the final port is `SHARED_TENSOR_BASE_PORT + current_cuda_device_index`
-## Endpoint Semantics
-Each endpoint is registered once and then supports two client-side call styles.
+Why this works:
-- `fn(...)` or `provider.call(name, ...)`
-  - synchronous
-  - blocks until the result is ready
-- `fn.submit(...)` or `provider.submit(name, ...)`
-  - asynchronous
-  - returns a task id
+```text
+same code file
-Endpoint options:
+Process A                               Process B
+SHARED_TENSOR_ENABLED=1                 SHARED_TENSOR_ENABLED=1
+SHARED_TENSOR_ROLE=server               SHARED_TENSOR_ROLE unset
+----------------------------------      ----------------------------------
+provider.share(...)                     provider.share(...)
+provider auto-starts localhost daemon   provider builds RPC wrappers
+shared fn executes locally              shared fn becomes RPC call
-- `execution="direct"`
-  - sync calls run the function directly on the server
-  - best for fast tensor transforms
-- `execution="task"`
-  - sync calls still block, but they block on the task system
-  - async submit is the natural path for slow model construction
-- `concurrency="parallel"`
-  - multiple server executions may run at once
-- `concurrency="serialized"`
-  - only one execution of that endpoint runs at a time
-- `singleflight=True`
-  - identical in-flight cache keys collapse to one execution
-  - this is the recommended model-loading default
+load_model(...)                         load_model(...)
+  -> local CUDA model                     -> JSON-RPC to localhost daemon
+identity(x)                              -> receives CUDA IPC-backed result
+  -> local tensor return
+```
-## Common Scenarios
+Use this mode when you want the cleanest operator experience: one script, one env var difference, server side stays local, client side becomes remote automatically.
-### 1. Fast Tensor Transform
+## Example 2: Fast Tensor Transform
-Use this for cheap operations such as clone, view-like transforms, elementwise scaling, or lightweight preprocessing.
+See [examples/model_service.py](./examples/model_service.py).
 ```python
 @provider.share(execution="direct", cache=False)
@@ -197,6 +231,14 @@ def scale_tensor(tensor: torch.Tensor, factor: torch.Tensor) -> torch.Tensor:
     return tensor * factor
 ```
+What happens on the wire:
+```text
+client tensor -> direct RPC -> server runs function immediately -> CUDA result back
+```
+Use this for cheap tensor math, lightweight preprocessing, and request-scoped outputs.
 Recommended combination:
 - `execution="direct"`
@@ -204,130 +246,120 @@ Recommended combination:
 - `managed=False`
 - `concurrency="parallel"`
-Why:
-- direct execution has the lowest overhead
-- these calls are request-scoped, so caching is usually wrong
-- parallel execution is usually fine because the work is short-lived
+## Example 3: Reusable Model Service
-### 2. Slow Model Construction
-Use this for loading or building a CUDA model that may take hundreds of milliseconds or multiple seconds.
+See [examples/model_service.py](./examples/model_service.py).
 ```python
 @provider.share(
     execution="task",
     managed=True,
     concurrency="serialized",
-    cache_format_key="model:{hidden_size}",
+    cache_format_key="model:{input_dim}:{output_dim}",
 )
-def load_model(hidden_size: int) -> torch.nn.Module:
-    return torch.nn.Linear(hidden_size, 2, device="cuda")
+def load_linear_model(input_dim: int = 16, output_dim: int = 4) -> torch.nn.Module:
+    ...
 ```
-Recommended combination:
-- `execution="task"`
-- `managed=True`
-- `concurrency="serialized"`
-- `singleflight=True`
-- explicit `cache_format_key`
-Why:
-- task mode gives you both blocking sync calls and true async submission
-- managed handles let the client release the remote object explicitly
-- serialized execution avoids multiple concurrent heavy loads on one GPU
-- singleflight prevents duplicate in-flight construction for the same model key
-### 3. Reusable Shared Model Service
+What happens when two clients ask for the same model key:
-Use this when the model should be built once and reused by many client calls.
+```text
+Client A                      Server                         Client B
+------------------------      ------------------------       ------------------------
+call("load_model", k) -----> cache miss                    call("load_model", k)
+                              build object once              -------------> same key in flight
+                              object_id = obj-123                         wait on same future
+<-------------------------     return handle(obj-123)       <------------- return handle(obj-123)
-```python
-@provider.share(
-    execution="task",
-    managed=True,
-    cache_format_key="model:{model_name}:{dtype}",
-)
-def load_model(model_name: str, dtype: str) -> torch.nn.Module:
-    ...
+release(obj-123) ---------->  refcount 2 -> 1
+release(obj-123) ------------------------------------------> refcount 1 -> 0 -> destroy
 ```
-Recommended combination:
+Use this for big reusable models. The important mix is:
-- `cache=True`
+- `execution="task"`
 - `managed=True`
+- `concurrency="serialized"`
 - `singleflight=True`
-- explicit stable cache key
-Why:
+- explicit `cache_format_key`
-- caching makes the endpoint act like a model registry
-- managed handles keep explicit lifecycle control
-- stable cache keys prevent accidental duplication from argument shape changes
+`managed=True` gives explicit lifecycle control. `cache_format_key` turns the endpoint into a model registry. `singleflight=True` ensures duplicate in-flight loads collapse to one build.
-### 4. Fire-and-Poll Background Warmup
+## Example 4: Fire-And-Poll Warmup
-Use this when the caller should not block, for example prewarming a model or allocating a large reusable tensor in the background.
+This is the same task-backed endpoint style, but the caller chooses async use:
 ```python
 task_id = load_model.submit(hidden_size=8192)
 model_handle = provider.wait_for_task(task_id)
 ```
-Recommended combination:
-- endpoint uses `execution="task"`
-- caller uses `.submit(...)`
-- optionally add `managed=True` for long-lived objects
-Why:
+Runtime shape:
-- the endpoint stays declarative
-- the caller decides whether to block now or poll later
+```text
+submit now -> task queue -> slow build on server -> poll later -> consume handle/result
+```
-### 5. Strictly Non-Reusable Per-Request Work
+Use this when the build is slow enough that the caller should not block immediately.
-Use this when every request must create a fresh result and reuse is wrong.
+## Example 5: Serialized Fragile Path
 ```python
-@provider.share(execution="task", cache=False, singleflight=False)
-def build_request_tensor(template: torch.Tensor) -> torch.Tensor:
-    return template.clone()
+@provider.share(execution="task", concurrency="serialized", cache=False, singleflight=False)
+def compact_memory(tensor: torch.Tensor) -> torch.Tensor:
+    ...
 ```
-Recommended combination:
-- `cache=False`
-- `singleflight=False`
-- choose `execution="direct"` or `execution="task"` based on runtime cost
+Execution model:
-Why:
+```text
+request A -> lock -> run -> unlock
+request B -> wait -> lock -> run -> unlock
+```
-- disabling cache avoids cross-request reuse
-- disabling singleflight ensures independent requests stay independent
+Use this for GPU-heavy paths that must not overlap with themselves.
-### 6. Endpoint That Must Run One At A Time
+## Endpoint Semantics
-Use this when the endpoint mutates shared state, temporarily spikes memory, or must not overlap with itself.
+Each endpoint is registered once and then supports two client-side call styles.
-```python
-@provider.share(execution="task", concurrency="serialized", cache=False, singleflight=False)
-def compact_memory(tensor: torch.Tensor) -> torch.Tensor:
-    ...
-```
+- `fn(...)` or `provider.call(name, ...)`
+  - synchronous
+  - blocks until the result is ready
+- `fn.submit(...)` or `provider.submit(name, ...)`
+  - asynchronous
+  - returns a task id
-Recommended combination:
+Endpoint options:
+- `execution="direct"`
+  - sync calls run the function directly on the server
+  - use this for fast tensor transforms
 - `execution="task"`
+  - sync calls still block, but they block on the task system
+  - use this for slow construction, warmup, and reusable model loading
+- `concurrency="parallel"`
+  - multiple server executions may run at once
 - `concurrency="serialized"`
-- usually `cache=False`
-Why:
+  - only one execution of that endpoint runs at a time
+- `singleflight=True`
+  - identical in-flight cache keys collapse to one execution
+  - this is the recommended model-loading default
-- serialization is endpoint-wide, not just per cache key
-- useful for fragile GPU-heavy paths where overlap is unsafe or wasteful
+## Scenario Map
+- Fast tensor transform:
+  use `execution="direct"`, `cache=False`, `managed=False`
+- Slow model construction:
+  use `execution="task"`, `managed=True`, `concurrency="serialized"`
+- Reusable model registry:
+  add stable `cache_format_key` and keep `singleflight=True`
+- Background warmup:
+  keep endpoint as task-backed and use `.submit(...)`
+- Fragile non-overlapping GPU path:
+  use `concurrency="serialized"`
+- Fresh per-request work:
+  disable cache and usually disable singleflight
 ## Parameter Guide

{shared_tensor-0.2.2 → shared_tensor-0.2.4}/README.md RENAMED Viewed

@@ -12,6 +12,7 @@
 - task-backed slow object construction
 - endpoint-level serialization and cache-key singleflight
 - zero-branch auto mode driven by `SHARED_TENSOR_ROLE`
+- auto mode is gated by `SHARED_TENSOR_ENABLED=1`
 - port routing by `base_port + cuda_device_index`
 ## What It Does Not Support
@@ -53,9 +54,40 @@ conda activate shared-tensor-dev
 pip install -e ".[dev,test]"
 ```
-## Zero-Branch Example
+## Enabling Auto Mode
-One file:
+`SharedTensorProvider()` now defaults to safe local mode unless you explicitly enable shared-tensor behavior.
+Global default:
+```bash
+export SHARED_TENSOR_ENABLED=1
+```
+Per-provider override:
+```python
+provider = SharedTensorProvider(enabled=True)
+provider = SharedTensorProvider(enabled=False)
+provider = SharedTensorProvider(enabled=None)
+```
+`enabled=None` means do not override and keep using the environment variable.
+Then `execution_mode="auto"` behaves like this:
+- `enabled=False`: provider stays in local mode
+- `enabled=True` and `SHARED_TENSOR_ROLE=server`: auto-start server and execute locally on the server side
+- `enabled=True` and no role set: provider becomes a client wrapper
+- `enabled=None`: fall back to `SHARED_TENSOR_ENABLED`
+This makes accidental opt-in much less likely in scripts that import shared endpoints but did not intend to start RPC behavior.
+## Example 1: Zero-Branch Auto Mode
+See [examples/zero_branch_env.py](./examples/zero_branch_env.py).
+One file, two processes, no branch in user code:
 ```python
 import torch
@@ -90,55 +122,56 @@ if __name__ == "__main__":
 Run process A as the auto server:
 ```bash
-SHARED_TENSOR_ROLE=server python demo.py
+SHARED_TENSOR_ENABLED=1 SHARED_TENSOR_ROLE=server python demo.py
 ```
 Run process B as the client with the exact same file:
 ```bash
+SHARED_TENSOR_ENABLED=1 python demo.py
+```
+Equivalent stepwise form:
+```bash
+export SHARED_TENSOR_ENABLED=1
+SHARED_TENSOR_ROLE=server python demo.py
 python demo.py
 ```
 Behavior:
+- `SHARED_TENSOR_ENABLED=1` enables shared-tensor auto behavior for providers that keep `enabled=None`
 - `SHARED_TENSOR_ROLE=server` makes the provider auto-start a background localhost daemon
 - in the server process, shared functions still execute locally
 - in the client process, the same function names become RPC wrappers
 - no `SHARED_TENSOR_HOST` is used; transport is fixed to `127.0.0.1`
 - the final port is `SHARED_TENSOR_BASE_PORT + current_cuda_device_index`
-## Endpoint Semantics
-Each endpoint is registered once and then supports two client-side call styles.
+Why this works:
-- `fn(...)` or `provider.call(name, ...)`
-  - synchronous
-  - blocks until the result is ready
-- `fn.submit(...)` or `provider.submit(name, ...)`
-  - asynchronous
-  - returns a task id
+```text
+same code file
-Endpoint options:
+Process A                               Process B
+SHARED_TENSOR_ENABLED=1                 SHARED_TENSOR_ENABLED=1
+SHARED_TENSOR_ROLE=server               SHARED_TENSOR_ROLE unset
+----------------------------------      ----------------------------------
+provider.share(...)                     provider.share(...)
+provider auto-starts localhost daemon   provider builds RPC wrappers
+shared fn executes locally              shared fn becomes RPC call
-- `execution="direct"`
-  - sync calls run the function directly on the server
-  - best for fast tensor transforms
-- `execution="task"`
-  - sync calls still block, but they block on the task system
-  - async submit is the natural path for slow model construction
-- `concurrency="parallel"`
-  - multiple server executions may run at once
-- `concurrency="serialized"`
-  - only one execution of that endpoint runs at a time
-- `singleflight=True`
-  - identical in-flight cache keys collapse to one execution
-  - this is the recommended model-loading default
+load_model(...)                         load_model(...)
+  -> local CUDA model                     -> JSON-RPC to localhost daemon
+identity(x)                              -> receives CUDA IPC-backed result
+  -> local tensor return
+```
-## Common Scenarios
+Use this mode when you want the cleanest operator experience: one script, one env var difference, server side stays local, client side becomes remote automatically.
-### 1. Fast Tensor Transform
+## Example 2: Fast Tensor Transform
-Use this for cheap operations such as clone, view-like transforms, elementwise scaling, or lightweight preprocessing.
+See [examples/model_service.py](./examples/model_service.py).
 ```python
 @provider.share(execution="direct", cache=False)
@@ -146,6 +179,14 @@ def scale_tensor(tensor: torch.Tensor, factor: torch.Tensor) -> torch.Tensor:
     return tensor * factor
 ```
+What happens on the wire:
+```text
+client tensor -> direct RPC -> server runs function immediately -> CUDA result back
+```
+Use this for cheap tensor math, lightweight preprocessing, and request-scoped outputs.
 Recommended combination:
 - `execution="direct"`
@@ -153,130 +194,120 @@ Recommended combination:
 - `managed=False`
 - `concurrency="parallel"`
-Why:
-- direct execution has the lowest overhead
-- these calls are request-scoped, so caching is usually wrong
-- parallel execution is usually fine because the work is short-lived
+## Example 3: Reusable Model Service
-### 2. Slow Model Construction
-Use this for loading or building a CUDA model that may take hundreds of milliseconds or multiple seconds.
+See [examples/model_service.py](./examples/model_service.py).
 ```python
 @provider.share(
     execution="task",
     managed=True,
     concurrency="serialized",
-    cache_format_key="model:{hidden_size}",
+    cache_format_key="model:{input_dim}:{output_dim}",
 )
-def load_model(hidden_size: int) -> torch.nn.Module:
-    return torch.nn.Linear(hidden_size, 2, device="cuda")
+def load_linear_model(input_dim: int = 16, output_dim: int = 4) -> torch.nn.Module:
+    ...
 ```
-Recommended combination:
-- `execution="task"`
-- `managed=True`
-- `concurrency="serialized"`
-- `singleflight=True`
-- explicit `cache_format_key`
-Why:
-- task mode gives you both blocking sync calls and true async submission
-- managed handles let the client release the remote object explicitly
-- serialized execution avoids multiple concurrent heavy loads on one GPU
-- singleflight prevents duplicate in-flight construction for the same model key
-### 3. Reusable Shared Model Service
+What happens when two clients ask for the same model key:
-Use this when the model should be built once and reused by many client calls.
+```text
+Client A                      Server                         Client B
+------------------------      ------------------------       ------------------------
+call("load_model", k) -----> cache miss                    call("load_model", k)
+                              build object once              -------------> same key in flight
+                              object_id = obj-123                         wait on same future
+<-------------------------     return handle(obj-123)       <------------- return handle(obj-123)
-```python
-@provider.share(
-    execution="task",
-    managed=True,
-    cache_format_key="model:{model_name}:{dtype}",
-)
-def load_model(model_name: str, dtype: str) -> torch.nn.Module:
-    ...
+release(obj-123) ---------->  refcount 2 -> 1
+release(obj-123) ------------------------------------------> refcount 1 -> 0 -> destroy
 ```
-Recommended combination:
+Use this for big reusable models. The important mix is:
-- `cache=True`
+- `execution="task"`
 - `managed=True`
+- `concurrency="serialized"`
 - `singleflight=True`
-- explicit stable cache key
-Why:
+- explicit `cache_format_key`
-- caching makes the endpoint act like a model registry
-- managed handles keep explicit lifecycle control
-- stable cache keys prevent accidental duplication from argument shape changes
+`managed=True` gives explicit lifecycle control. `cache_format_key` turns the endpoint into a model registry. `singleflight=True` ensures duplicate in-flight loads collapse to one build.
-### 4. Fire-and-Poll Background Warmup
+## Example 4: Fire-And-Poll Warmup
-Use this when the caller should not block, for example prewarming a model or allocating a large reusable tensor in the background.
+This is the same task-backed endpoint style, but the caller chooses async use:
 ```python
 task_id = load_model.submit(hidden_size=8192)
 model_handle = provider.wait_for_task(task_id)
 ```
-Recommended combination:
-- endpoint uses `execution="task"`
-- caller uses `.submit(...)`
-- optionally add `managed=True` for long-lived objects
-Why:
+Runtime shape:
-- the endpoint stays declarative
-- the caller decides whether to block now or poll later
+```text
+submit now -> task queue -> slow build on server -> poll later -> consume handle/result
+```
-### 5. Strictly Non-Reusable Per-Request Work
+Use this when the build is slow enough that the caller should not block immediately.
-Use this when every request must create a fresh result and reuse is wrong.
+## Example 5: Serialized Fragile Path
 ```python
-@provider.share(execution="task", cache=False, singleflight=False)
-def build_request_tensor(template: torch.Tensor) -> torch.Tensor:
-    return template.clone()
+@provider.share(execution="task", concurrency="serialized", cache=False, singleflight=False)
+def compact_memory(tensor: torch.Tensor) -> torch.Tensor:
+    ...
 ```
-Recommended combination:
-- `cache=False`
-- `singleflight=False`
-- choose `execution="direct"` or `execution="task"` based on runtime cost
+Execution model:
-Why:
+```text
+request A -> lock -> run -> unlock
+request B -> wait -> lock -> run -> unlock
+```
-- disabling cache avoids cross-request reuse
-- disabling singleflight ensures independent requests stay independent
+Use this for GPU-heavy paths that must not overlap with themselves.
-### 6. Endpoint That Must Run One At A Time
+## Endpoint Semantics
-Use this when the endpoint mutates shared state, temporarily spikes memory, or must not overlap with itself.
+Each endpoint is registered once and then supports two client-side call styles.
-```python
-@provider.share(execution="task", concurrency="serialized", cache=False, singleflight=False)
-def compact_memory(tensor: torch.Tensor) -> torch.Tensor:
-    ...
-```
+- `fn(...)` or `provider.call(name, ...)`
+  - synchronous
+  - blocks until the result is ready
+- `fn.submit(...)` or `provider.submit(name, ...)`
+  - asynchronous
+  - returns a task id
-Recommended combination:
+Endpoint options:
+- `execution="direct"`
+  - sync calls run the function directly on the server
+  - use this for fast tensor transforms
 - `execution="task"`
+  - sync calls still block, but they block on the task system
+  - use this for slow construction, warmup, and reusable model loading
+- `concurrency="parallel"`
+  - multiple server executions may run at once
 - `concurrency="serialized"`
-- usually `cache=False`
-Why:
+  - only one execution of that endpoint runs at a time
+- `singleflight=True`
+  - identical in-flight cache keys collapse to one execution
+  - this is the recommended model-loading default
-- serialization is endpoint-wide, not just per cache key
-- useful for fragile GPU-heavy paths where overlap is unsafe or wasteful
+## Scenario Map
+- Fast tensor transform:
+  use `execution="direct"`, `cache=False`, `managed=False`
+- Slow model construction:
+  use `execution="task"`, `managed=True`, `concurrency="serialized"`
+- Reusable model registry:
+  add stable `cache_format_key` and keep `singleflight=True`
+- Background warmup:
+  keep endpoint as task-backed and use `.submit(...)`
+- Fragile non-overlapping GPU path:
+  use `concurrency="serialized"`
+- Fresh per-request work:
+  disable cache and usually disable singleflight
 ## Parameter Guide

{shared_tensor-0.2.2 → shared_tensor-0.2.4}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "shared-tensor"
-version = "0.2.2"
+version = "0.2.4"
 description = "Local endpoint-oriented RPC for same-host same-GPU PyTorch IPC"
 readme = "README.md"
 license = "Apache-2.0"
@@ -42,6 +42,7 @@ classifiers = [
 ]
 requires-python = ">=3.10"
 dependencies = [
+    "cloudpickle>=3.0.0",
     "numpy<2",
     "requests>=2.25.0",
     "torch>=2.2.0",

{shared_tensor-0.2.2 → shared_tensor-0.2.4}/shared_tensor/__init__.py RENAMED Viewed

@@ -19,4 +19,4 @@ __all__ = [
     "TaskStatus",
 ]
-__version__ = "0.2.2"
+__version__ = "0.2.4"

{shared_tensor-0.2.2 → shared_tensor-0.2.4}/shared_tensor/async_provider.py RENAMED Viewed

@@ -15,6 +15,7 @@ class AsyncSharedTensorProvider(SharedTensorProvider):
         base_port: int = 2537,
         poll_interval: float = 1.0,
         *,
+        enabled: bool | None = None,
         server_host: str = "127.0.0.1",
         device_index: int | None = None,
         timeout: float = 30.0,
@@ -23,6 +24,7 @@ class AsyncSharedTensorProvider(SharedTensorProvider):
     ) -> None:
         super().__init__(
             base_port=base_port,
+            enabled=enabled,
             server_host=server_host,
             device_index=device_index,
             timeout=timeout,

{shared_tensor-0.2.2 → shared_tensor-0.2.4}/shared_tensor/provider.py RENAMED Viewed

@@ -21,6 +21,7 @@ from shared_tensor.utils import (
 EndpointExecution = Literal["direct", "task"]
 EndpointConcurrency = Literal["parallel", "serialized"]
+SHARED_TENSOR_ENABLED_ENV = "SHARED_TENSOR_ENABLED"
 @dataclass(slots=True)
@@ -36,8 +37,14 @@ class EndpointDefinition:
     singleflight: bool = True
-def _resolve_execution_mode(execution_mode: str) -> tuple[str, bool]:
+def _resolve_execution_mode(
+    execution_mode: str,
+    *,
+    enabled: bool | None = None,
+) -> tuple[str, bool]:
     if execution_mode == "auto":
+        if not _is_shared_tensor_enabled(enabled):
+            return "local", True
         env_role = os.getenv("SHARED_TENSOR_ROLE", "").strip().lower()
         if env_role in {"server", "client", "local"}:
             return env_role, True
@@ -49,6 +56,13 @@ def _resolve_execution_mode(execution_mode: str) -> tuple[str, bool]:
     return execution_mode, False
+def _is_shared_tensor_enabled(enabled: bool | None) -> bool:
+    if enabled is not None:
+        return enabled
+    raw = os.getenv(SHARED_TENSOR_ENABLED_ENV, "").strip().lower()
+    return raw in {"1", "true", "yes", "on"}
 def _validate_endpoint_options(
     *,
     execution: EndpointExecution,
@@ -71,15 +85,20 @@ class SharedTensorProvider:
         self,
         base_port: int = 2537,
         *,
+        enabled: bool | None = None,
         server_host: str = "127.0.0.1",
         device_index: int | None = None,
         timeout: float = 30.0,
         execution_mode: str = "auto",
         verbose_debug: bool = False,
     ) -> None:
-        resolved_mode, auto_mode = _resolve_execution_mode(execution_mode)
+        resolved_mode, auto_mode = _resolve_execution_mode(
+            execution_mode,
+            enabled=enabled,
+        )
         self.server_host = server_host
         self.base_port = resolve_server_base_port(base_port)
+        self.enabled = enabled
         self.device_index = device_index
         self.timeout = timeout
         self.execution_mode = resolved_mode

{shared_tensor-0.2.2 → shared_tensor-0.2.4}/shared_tensor/server.py RENAMED Viewed

@@ -3,6 +3,7 @@
 from __future__ import annotations
 import logging
+import cloudpickle
 import multiprocessing as mp
 import os
 import threading
@@ -10,7 +11,6 @@ import time
 from concurrent.futures import Future
 from dataclasses import dataclass
 from http.server import BaseHTTPRequestHandler, HTTPServer
-from multiprocessing.process import BaseProcess
 from socketserver import ThreadingMixIn
 from typing import Any
@@ -37,7 +37,7 @@ from shared_tensor.utils import (
     capability_snapshot,
     deserialize_payload,
     serialize_payload,
-    validate_payload_for_transport,
+    validate_call_payload_for_transport,
 )
 logger = logging.getLogger(__name__)
@@ -118,7 +118,7 @@ class SharedTensorServer:
         self.max_workers = max_workers
         self.result_ttl = result_ttl
         self.server: ThreadedHTTPServer | None = None
-        self.server_process: BaseProcess | None = None
+        self.server_process: Any | None = None
         self.running = False
         self.started_at: float | None = None
         self.stats = {
@@ -231,6 +231,12 @@ class SharedTensorServer:
         if inflight_key is not None:
             future, owner = self._acquire_inflight(inflight_key)
             if not owner:
+                if definition.managed:
+                    payload = future.result()
+                    object_id = payload.get("object_id")
+                    if object_id is not None:
+                        self._managed_objects.add_ref(object_id)
+                    return payload
                 return future.result()
         else:
             future = None
@@ -401,8 +407,8 @@ class SharedTensorServer:
                     "Control encoding is reserved for empty args/kwargs only"
                 )
             return endpoint, args, kwargs
-        validate_payload_for_transport(args)
-        validate_payload_for_transport(kwargs, allow_dict_keys=True)
+        validate_call_payload_for_transport(args)
+        validate_call_payload_for_transport(kwargs, allow_dict_keys=True)
         return endpoint, args, kwargs
     def _encode_result(self, value: Any, *, object_id: str | None = None) -> dict[str, str | None]:
@@ -437,7 +443,7 @@ class SharedTensorServer:
         uptime = 0.0 if self.started_at is None else time.time() - self.started_at
         return {
             "server": "SharedTensorServer",
-            "version": "0.2.2",
+            "version": "0.2.4",
             "host": self.host,
             "port": self.port,
             "uptime": uptime,
@@ -454,16 +460,52 @@ class SharedTensorServer:
             return
         if os.name != "posix":
             raise SharedTensorConfigurationError(
-                "Non-blocking shared_tensor servers require POSIX fork semantics"
+                "Non-blocking shared_tensor servers require POSIX multiprocessing support"
             )
-        ctx = mp.get_context("fork")
-        process = ctx.Process(target=self._serve_forever, name=f"shared-tensor-daemon:{self.port}")
+        payload = cloudpickle.dumps(self.provider)
+        process = mp.get_context("spawn").Process(
+            target=self._serve_forever_from_payload,
+            args=(
+                payload,
+                self.host,
+                self.port,
+                self.max_request_bytes,
+                self.max_workers,
+                self.result_ttl,
+                self.verbose_debug,
+            ),
+            name=f"shared-tensor-daemon:{self.port}",
+        )
         process.start()
         self.server_process = process
         self.running = True
         self.started_at = time.time()
+    @staticmethod
+    def _serve_forever_from_payload(
+        payload: bytes,
+        host: str,
+        port: int,
+        max_request_bytes: int,
+        max_workers: int,
+        result_ttl: float,
+        verbose_debug: bool,
+    ) -> None:
+        SharedTensorServer._configure_cuda_runtime()
+        provider = cloudpickle.loads(payload)
+        server = SharedTensorServer(
+            provider,
+            host=host,
+            port=port,
+            max_request_bytes=max_request_bytes,
+            max_workers=max_workers,
+            result_ttl=result_ttl,
+            verbose_debug=verbose_debug,
+        )
+        server._serve_forever()
     def _serve_forever(self) -> None:
+        self._configure_cuda_runtime()
         self.server = ThreadedHTTPServer((self.host, self.port), SharedTensorRequestHandler)
         self.server.shared_tensor_server = self  # type: ignore[attr-defined]
         self.running = True

{shared_tensor-0.2.2 → shared_tensor-0.2.4}/shared_tensor/utils.py RENAMED Viewed

@@ -5,6 +5,7 @@ from __future__ import annotations
 import hashlib
 import inspect
 import io
+import multiprocessing.reduction as mp_reduction
 import os
 import pickle
 from collections.abc import Callable
@@ -35,7 +36,17 @@ _EMPTY_DICT: dict[str, Any] = {}
 def _torch_forking_pickler() -> type | None:
     if TORCH_MODULE is None:
         return None
-    return cast(type, TORCH_MODULE.multiprocessing.reductions.ForkingPickler)
+    reductions = TORCH_MODULE.multiprocessing.reductions
+    init_reductions = getattr(reductions, "init_reductions", None)
+    if callable(init_reductions):
+        try:
+            init_reductions()
+        except Exception:
+            return cast(type, mp_reduction.ForkingPickler)
+    pickler = getattr(reductions, "ForkingPickler", None)
+    if pickler is not None:
+        return cast(type, pickler)
+    return cast(type, mp_reduction.ForkingPickler)
 def _raise_unsupported_payload(message: str) -> None:
@@ -91,11 +102,55 @@ def _validate_torch_payload(obj: Any, *, allow_dict_keys: bool = False) -> None:
     _raise_unsupported_payload(f"Unsupported payload type: {type(obj).__name__}")
+def _validate_call_payload(obj: Any, *, allow_dict_keys: bool = False) -> None:
+    if TORCH_MODULE is None:
+        raise SharedTensorCapabilityError("PyTorch is required for shared_tensor")
+    if isinstance(obj, (str, int, float, bool, type(None), bytes)):
+        return
+    if isinstance(obj, TORCH_MODULE.Tensor):
+        if not obj.is_cuda:
+            _raise_unsupported_payload("CPU torch.Tensor payloads are not supported")
+        return
+    if isinstance(obj, TORCH_MODULE.nn.Module):
+        _validate_module_device(obj)
+        return
+    if isinstance(obj, tuple):
+        for item in obj:
+            _validate_call_payload(item)
+        return
+    if isinstance(obj, list):
+        for item in obj:
+            _validate_call_payload(item)
+        return
+    if isinstance(obj, dict):
+        for key, value in obj.items():
+            if allow_dict_keys:
+                if not isinstance(key, str):
+                    _raise_unsupported_payload("Dictionary payload keys must be strings")
+            else:
+                _validate_call_payload(key)
+            _validate_call_payload(value)
+        return
+    _raise_unsupported_payload(f"Unsupported payload type: {type(obj).__name__}")
 def validate_payload_for_transport(obj: Any, *, allow_dict_keys: bool = False) -> None:
     """Validate that a payload fits the supported CUDA torch transport contract."""
     _validate_torch_payload(obj, allow_dict_keys=allow_dict_keys)
+def validate_call_payload_for_transport(obj: Any, *, allow_dict_keys: bool = False) -> None:
+    """Validate RPC call args/kwargs, allowing scalar controls alongside CUDA payloads."""
+    _validate_call_payload(obj, allow_dict_keys=allow_dict_keys)
 def _torch_serialize(obj: Any) -> bytes:
     pickler_cls = _torch_forking_pickler()
     if pickler_cls is None:
@@ -140,8 +195,8 @@ def serialize_call_payloads(
         return CONTROL_ENCODING, args_payload, serialize_empty_payload(_EMPTY_DICT)[1]
     try:
-        _validate_torch_payload(args)
-        _validate_torch_payload(kwargs, allow_dict_keys=True)
+        _validate_call_payload(args)
+        _validate_call_payload(kwargs, allow_dict_keys=True)
         return TORCH_ENCODING, pickle.dumps(args, protocol=pickle.HIGHEST_PROTOCOL), pickle.dumps(
             kwargs, protocol=pickle.HIGHEST_PROTOCOL
         )

{shared_tensor-0.2.2 → shared_tensor-0.2.4}/LICENSE RENAMED Viewed

File without changes

{shared_tensor-0.2.2 → shared_tensor-0.2.4}/MANIFEST.in RENAMED Viewed

File without changes

{shared_tensor-0.2.2 → shared_tensor-0.2.4}/setup.cfg RENAMED Viewed

File without changes

{shared_tensor-0.2.2 → shared_tensor-0.2.4}/shared_tensor/async_client.py RENAMED Viewed

File without changes

{shared_tensor-0.2.2 → shared_tensor-0.2.4}/shared_tensor/async_task.py RENAMED Viewed

File without changes

{shared_tensor-0.2.2 → shared_tensor-0.2.4}/shared_tensor/client.py RENAMED Viewed

File without changes

{shared_tensor-0.2.2 → shared_tensor-0.2.4}/shared_tensor/errors.py RENAMED Viewed

File without changes

{shared_tensor-0.2.2 → shared_tensor-0.2.4}/shared_tensor/jsonrpc.py RENAMED Viewed

File without changes

{shared_tensor-0.2.2 → shared_tensor-0.2.4}/shared_tensor/managed_object.py RENAMED Viewed

File without changes

{shared_tensor-0.2.2 → shared_tensor-0.2.4}/shared_tensor.egg-info/SOURCES.txt RENAMED Viewed

File without changes

shared-tensor 0.2.2__tar.gz → 0.2.4__tar.gz

shared-tensor 0.2.2tar.gz → 0.2.4tar.gz