pos3 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
pos3-0.1.0/PKG-INFO ADDED
@@ -0,0 +1,152 @@
1
+ Metadata-Version: 2.4
2
+ Name: pos3
3
+ Version: 0.1.0
4
+ Summary: S3 Simple Sync - Make using S3 as simple as using local files
5
+ Author-email: Positronic Robotics <hi@positronic.ro>
6
+ License: Apache-2.0
7
+ Project-URL: Homepage, https://github.com/Positronic-Robotics/pos3
8
+ Project-URL: Repository, https://github.com/Positronic-Robotics/pos3
9
+ Classifier: Development Status :: 4 - Beta
10
+ Classifier: Intended Audience :: Developers
11
+ Classifier: License :: OSI Approved :: Apache Software License
12
+ Classifier: Programming Language :: Python :: 3
13
+ Requires-Python: >=3.9
14
+ Description-Content-Type: text/markdown
15
+ Requires-Dist: boto3>=1.26.0
16
+ Requires-Dist: tqdm>=4.65.0
17
+ Provides-Extra: dev
18
+ Requires-Dist: pytest>=7.0; extra == "dev"
19
+ Requires-Dist: pytest-cov; extra == "dev"
20
+ Requires-Dist: ruff; extra == "dev"
21
+ Requires-Dist: pre-commit; extra == "dev"
22
+
23
+ # pos3
24
+
25
+ **PO**sitronic **S3** — Make using S3 as simple as using local files.
26
+
27
+ `pos3` provides a Pythonic context manager for syncing directories and files with S3. It is designed for data processing pipelines and machine learning workflows where you need to integrate S3 with code that **only understands local files**.
28
+
29
+ > The main value of `pos3` is enabling you to pass S3 data to **third-party libraries or legacy scripts** that expect local file paths (e.g., `opencv`, `pandas.read_csv`, or model training scripts). Instead of rewriting their I/O logic to support S3, `pos3` transparently bridges the gap.
30
+
31
+ ## Core Concepts
32
+
33
+ - **Context Manager**: All operations run within a `with pos3.mirror():` block.
34
+ - **Enter**: Initializes the sync environment (threads, cache).
35
+ - **Body**: You explicitly call `pos3.download()` to fetch files and `pos3.upload()` to register outputs.
36
+ - **Exit**: Uploads registered output paths (mirroring local to S3).
37
+ - **Lazy & Efficient**: Only transfers files that have changed (based on size/presence).
38
+ - **Local Paths**: All API calls return a `pathlib.Path` to the local file/directory. If you pass a local path instead of an S3 URL, it is passed through unchanged (no copy).
39
+ - **Background Sync**: Can optionally upload changes in the background (e.g., every 60s) for long-running jobs.
40
+
41
+ ## Quick Start
42
+
43
+ The primary API is the `pos3.mirror()` context manager.
44
+
45
+ ```python
46
+ import pos3
47
+
48
+ # 1. Start the context
49
+ with pos3.mirror(cache_root='~/.cache/positronic/s3'):
50
+
51
+ # 2. Download Input
52
+ # - Downloads s3://bucket/data to cache
53
+ # - Deletes local files that don't exist in S3 (mirroring)
54
+ # - Returns local Path object
55
+ dataset_path = pos3.download('s3://bucket/data')
56
+
57
+ # 3. Sync Output (Resume & Upload)
58
+ # - Downloads existing checkpoints (to resume)
59
+ # - Registers path for background uploads
60
+ checkpoints_path = pos3.sync('s3://bucket/ckpt', interval=60, delete_remote=False)
61
+
62
+ # 4. Upload Logs (Write-only)
63
+ # - Creates local directory
64
+ # - Uploads new files to S3 on exit/interval
65
+ logs_path = pos3.upload('s3://bucket/logs', interval=30)
66
+
67
+ # 5. Use standard local file paths
68
+ print(f"Reading from {dataset_path}") # -> ~/.cache/positronic/s3/bucket/data
69
+ print(f"Writing to {checkpoints_path}") # -> ~/.cache/positronic/s3/bucket/ckpt
70
+ print(f"Logging to {logs_path}") # -> ~/.cache/positronic/s3/bucket/logs
71
+
72
+ train(dataset_path, checkpoints_path, logs_path)
73
+ ```
74
+
75
+ ## API Guide
76
+
77
+ > **Note**: All operational methods (`download`, `upload`, `sync`, `ls`) must be called within an active `pos3.mirror()` context. Calling them outside will raise a `RuntimeError`.
78
+
79
+ ### `pos3.mirror(...)` / `@pos3.with_mirror(...)`
80
+
81
+ Context manager (or decorator) that activates the sync environment.
82
+
83
+ **Parameters:**
84
+ - `cache_root` (default: `'~/.cache/positronic/s3/'`): Base directory for caching downloaded files.
85
+ - `show_progress` (default: `True`): Display tqdm progress bars.
86
+ - `max_workers` (default: `10`): Threads for parallel S3 operations.
87
+
88
+ **Decorator Example:**
89
+
90
+ ```python
91
+ @pos3.with_mirror(cache_root='/tmp/cache')
92
+ def main():
93
+ # Only works when called!
94
+ data_path = pos3.download('s3://bucket/data')
95
+ train(data_path)
96
+
97
+ if __name__ == "__main__":
98
+ main()
99
+ ```
100
+
101
+ ### `pos3.download(remote, local=None, delete=True, exclude=None)`
102
+
103
+ Registers a path for download. Ensures local copy matches S3 immediately.
104
+ - `remote`: S3 URL (e.g., `s3://bucket/key`) or local path.
105
+ - `local`: Explicit local destination. Defaults to standard cache path.
106
+ - `delete`: If `True` (default), deletes local files NOT in S3 ("mirror" behavior).
107
+ - `exclude`: List of glob patterns to skip.
108
+
109
+ **Returns**: `pathlib.Path` to the local directory/file.
110
+
111
+ ### `pos3.upload(remote, local=None, interval=300, delete=True, sync_on_error=False, exclude=None)`
112
+
113
+ Registers a local path for upload. Uploads on exit and optionally in background.
114
+ - `remote`: Destination S3 URL.
115
+ - `local`: Local source path. Auto-resolved from cache path if `None`.
116
+ - `interval`: Seconds between background syncs. `None` for exit-only.
117
+ - `delete`: If `True` (default), deletes S3 files NOT present locally.
118
+ - `sync_on_error`: If `True`, syncs even if the context exits with an exception.
119
+
120
+ **Returns**: `pathlib.Path` to the local directory/file.
121
+
122
+ ### `pos3.sync(remote, local=None, interval=300, delete_local=True, delete_remote=True, sync_on_error=False, exclude=None)`
123
+
124
+ Bi-directional helper. Performs `download()` then registers `upload()`. Useful for jobs that work on existing files, like when you resume training from a checkpoint.
125
+ - `delete_local`: Cleanup local files during download.
126
+ - `delete_remote`: Cleanup remote files during upload. carefully consider setting to `False` when resuming jobs to avoid deleting history.
127
+
128
+ **Returns**: `pathlib.Path` to the local directory/file.
129
+
130
+ ### `pos3.ls(prefix, recursive=False)`
131
+
132
+ Lists files/objects in a directory or S3 prefix.
133
+ - `prefix`: S3 URL or local path.
134
+ - `recursive`: List subdirectories if `True`.
135
+
136
+ **Returns**: List of full S3 URLs or local paths.
137
+
138
+ ## Comparison with Libraries
139
+
140
+ Why use `pos3` instead of other Python libraries?
141
+
142
+ | Feature | `pos3` | `boto3` | `s3fs` / `fsspec` |
143
+ | :--- | :--- | :--- | :--- |
144
+ | **Abstraction Level** | **High** (Context Manager) | **Low** (API Client) | **Medium** (File System) |
145
+ | **Sync Logic** | **Built-in** (Differential) | Manual Implementation | `put`/`get` (Recursive) |
146
+ | **Lifecycle** | **Automated** (Open/Close) | Manual | Manual |
147
+ | **Background Upload** | **Yes** (Non-blocking) | Manual Threading | No (Blocking) |
148
+ | **Local I/O Speed** | **Native** (SSD) | Native | Network Bound (Virtual FS) |
149
+ | **Use Case** | **ML / Pipelines / 3rd Party Code** | App Development | DataFrames / Interactive |
150
+
151
+ - **vs `boto3`**: `boto3` is the raw AWS SDK. `pos3` wraps it to provide "mirroring" logic, threading, and diffing out of the box.
152
+ - **vs `s3fs`**: `s3fs` treats S3 as a filesystem. `pos3` treats S3 as a persistence layer for your high-speed local storage, ensuring you always get native IO performance.
pos3-0.1.0/README.md ADDED
@@ -0,0 +1,130 @@
1
+ # pos3
2
+
3
+ **PO**sitronic **S3** — Make using S3 as simple as using local files.
4
+
5
+ `pos3` provides a Pythonic context manager for syncing directories and files with S3. It is designed for data processing pipelines and machine learning workflows where you need to integrate S3 with code that **only understands local files**.
6
+
7
+ > The main value of `pos3` is enabling you to pass S3 data to **third-party libraries or legacy scripts** that expect local file paths (e.g., `opencv`, `pandas.read_csv`, or model training scripts). Instead of rewriting their I/O logic to support S3, `pos3` transparently bridges the gap.
8
+
9
+ ## Core Concepts
10
+
11
+ - **Context Manager**: All operations run within a `with pos3.mirror():` block.
12
+ - **Enter**: Initializes the sync environment (threads, cache).
13
+ - **Body**: You explicitly call `pos3.download()` to fetch files and `pos3.upload()` to register outputs.
14
+ - **Exit**: Uploads registered output paths (mirroring local to S3).
15
+ - **Lazy & Efficient**: Only transfers files that have changed (based on size/presence).
16
+ - **Local Paths**: All API calls return a `pathlib.Path` to the local file/directory. If you pass a local path instead of an S3 URL, it is passed through unchanged (no copy).
17
+ - **Background Sync**: Can optionally upload changes in the background (e.g., every 60s) for long-running jobs.
18
+
19
+ ## Quick Start
20
+
21
+ The primary API is the `pos3.mirror()` context manager.
22
+
23
+ ```python
24
+ import pos3
25
+
26
+ # 1. Start the context
27
+ with pos3.mirror(cache_root='~/.cache/positronic/s3'):
28
+
29
+ # 2. Download Input
30
+ # - Downloads s3://bucket/data to cache
31
+ # - Deletes local files that don't exist in S3 (mirroring)
32
+ # - Returns local Path object
33
+ dataset_path = pos3.download('s3://bucket/data')
34
+
35
+ # 3. Sync Output (Resume & Upload)
36
+ # - Downloads existing checkpoints (to resume)
37
+ # - Registers path for background uploads
38
+ checkpoints_path = pos3.sync('s3://bucket/ckpt', interval=60, delete_remote=False)
39
+
40
+ # 4. Upload Logs (Write-only)
41
+ # - Creates local directory
42
+ # - Uploads new files to S3 on exit/interval
43
+ logs_path = pos3.upload('s3://bucket/logs', interval=30)
44
+
45
+ # 5. Use standard local file paths
46
+ print(f"Reading from {dataset_path}") # -> ~/.cache/positronic/s3/bucket/data
47
+ print(f"Writing to {checkpoints_path}") # -> ~/.cache/positronic/s3/bucket/ckpt
48
+ print(f"Logging to {logs_path}") # -> ~/.cache/positronic/s3/bucket/logs
49
+
50
+ train(dataset_path, checkpoints_path, logs_path)
51
+ ```
52
+
53
+ ## API Guide
54
+
55
+ > **Note**: All operational methods (`download`, `upload`, `sync`, `ls`) must be called within an active `pos3.mirror()` context. Calling them outside will raise a `RuntimeError`.
56
+
57
+ ### `pos3.mirror(...)` / `@pos3.with_mirror(...)`
58
+
59
+ Context manager (or decorator) that activates the sync environment.
60
+
61
+ **Parameters:**
62
+ - `cache_root` (default: `'~/.cache/positronic/s3/'`): Base directory for caching downloaded files.
63
+ - `show_progress` (default: `True`): Display tqdm progress bars.
64
+ - `max_workers` (default: `10`): Threads for parallel S3 operations.
65
+
66
+ **Decorator Example:**
67
+
68
+ ```python
69
+ @pos3.with_mirror(cache_root='/tmp/cache')
70
+ def main():
71
+ # Only works when called!
72
+ data_path = pos3.download('s3://bucket/data')
73
+ train(data_path)
74
+
75
+ if __name__ == "__main__":
76
+ main()
77
+ ```
78
+
79
+ ### `pos3.download(remote, local=None, delete=True, exclude=None)`
80
+
81
+ Registers a path for download. Ensures local copy matches S3 immediately.
82
+ - `remote`: S3 URL (e.g., `s3://bucket/key`) or local path.
83
+ - `local`: Explicit local destination. Defaults to standard cache path.
84
+ - `delete`: If `True` (default), deletes local files NOT in S3 ("mirror" behavior).
85
+ - `exclude`: List of glob patterns to skip.
86
+
87
+ **Returns**: `pathlib.Path` to the local directory/file.
88
+
89
+ ### `pos3.upload(remote, local=None, interval=300, delete=True, sync_on_error=False, exclude=None)`
90
+
91
+ Registers a local path for upload. Uploads on exit and optionally in background.
92
+ - `remote`: Destination S3 URL.
93
+ - `local`: Local source path. Auto-resolved from cache path if `None`.
94
+ - `interval`: Seconds between background syncs. `None` for exit-only.
95
+ - `delete`: If `True` (default), deletes S3 files NOT present locally.
96
+ - `sync_on_error`: If `True`, syncs even if the context exits with an exception.
97
+
98
+ **Returns**: `pathlib.Path` to the local directory/file.
99
+
100
+ ### `pos3.sync(remote, local=None, interval=300, delete_local=True, delete_remote=True, sync_on_error=False, exclude=None)`
101
+
102
+ Bi-directional helper. Performs `download()` then registers `upload()`. Useful for jobs that work on existing files, like when you resume training from a checkpoint.
103
+ - `delete_local`: Cleanup local files during download.
104
+ - `delete_remote`: Cleanup remote files during upload. carefully consider setting to `False` when resuming jobs to avoid deleting history.
105
+
106
+ **Returns**: `pathlib.Path` to the local directory/file.
107
+
108
+ ### `pos3.ls(prefix, recursive=False)`
109
+
110
+ Lists files/objects in a directory or S3 prefix.
111
+ - `prefix`: S3 URL or local path.
112
+ - `recursive`: List subdirectories if `True`.
113
+
114
+ **Returns**: List of full S3 URLs or local paths.
115
+
116
+ ## Comparison with Libraries
117
+
118
+ Why use `pos3` instead of other Python libraries?
119
+
120
+ | Feature | `pos3` | `boto3` | `s3fs` / `fsspec` |
121
+ | :--- | :--- | :--- | :--- |
122
+ | **Abstraction Level** | **High** (Context Manager) | **Low** (API Client) | **Medium** (File System) |
123
+ | **Sync Logic** | **Built-in** (Differential) | Manual Implementation | `put`/`get` (Recursive) |
124
+ | **Lifecycle** | **Automated** (Open/Close) | Manual | Manual |
125
+ | **Background Upload** | **Yes** (Non-blocking) | Manual Threading | No (Blocking) |
126
+ | **Local I/O Speed** | **Native** (SSD) | Native | Network Bound (Virtual FS) |
127
+ | **Use Case** | **ML / Pipelines / 3rd Party Code** | App Development | DataFrames / Interactive |
128
+
129
+ - **vs `boto3`**: `boto3` is the raw AWS SDK. `pos3` wraps it to provide "mirroring" logic, threading, and diffing out of the box.
130
+ - **vs `s3fs`**: `s3fs` treats S3 as a filesystem. `pos3` treats S3 as a persistence layer for your high-speed local storage, ensuring you always get native IO performance.