containerio 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,524 @@
1
+ Metadata-Version: 2.3
2
+ Name: containerio
3
+ Version: 1.0.0
4
+ Summary: Azure Blob Storage handler with unified local/cloud interface
5
+ Author: Alliance SwissPass
6
+ Requires-Dist: polars>=1.30.0
7
+ Requires-Dist: azure-storage-blob>=12.26.0
8
+ Requires-Dist: azure-identity>=1.25.0
9
+ Requires-Dist: python-dotenv>=1.0.0
10
+ Requires-Dist: rich>=13.0.0
11
+ Requires-Python: >=3.11
12
+ Project-URL: Repository, https://codefloe.com/Alliance-SwissPass/py-containerio
13
+ Description-Content-Type: text/markdown
14
+
15
+ # py-containerio
16
+
17
+ Shared Azure Blob Storage handler for internal and collaboration projects of [Alliance SwissPass](https://www.allianceswisspass.ch/).
18
+ Provides a unified interface for reading and writing data to both local filesystem and Azure Blob Storage, allowing seamless switching between local development and cloud environments.
19
+
20
+ ## Features
21
+
22
+ - **Unified API**: Same code works for local files and Azure Blob Storage
23
+ - **Environment-based switching**: Use `STORAGE_ENV` to switch between local/staging/prod
24
+ - **[Polars](https://pola.rs/) integration**: Native support for Polars DataFrames and LazyFrames
25
+ - **Multiple auth methods**: Device code flow (default), client credentials (CI/CD), or SAS tokens
26
+ - **CLI tools**: Unified `io` command for listing, downloading, uploading, deleting blobs and generating SAS tokens
27
+ - **Memory-efficient**: Streaming support with `scan_*` and `sink_*` methods
28
+
29
+ ## Install containerio
30
+
31
+ ### From PyPI
32
+
33
+ The package is available on PyPI: [https://pypi.org/project/containerio/](https://pypi.org/project/containerio/)
34
+
35
+ ```bash
36
+ pip install containerio
37
+ ```
38
+
39
+ Or with uv:
40
+
41
+ ```bash
42
+ uv add containerio
43
+ uv sync
44
+ ```
45
+
46
+ ## Use containerio as a package dependency
47
+
48
+ ### From PyPI (recommended)
49
+
50
+ Add to your project's `pyproject.toml`:
51
+
52
+ ```toml
53
+ [project]
54
+ dependencies = [
55
+ "containerio",
56
+ ]
57
+ ```
58
+
59
+ Then run:
60
+
61
+ ```bash
62
+ uv sync
63
+ ```
64
+
65
+ ### From Forgejo Package Registry
66
+
67
+ If you need the Forgejo version, configure the custom index:
68
+
69
+ ```toml
70
+ [project]
71
+ dependencies = [
72
+ "containerio",
73
+ ]
74
+
75
+ [tool.uv.sources]
76
+ containerio = { index = "forgejo" }
77
+
78
+ [[tool.uv.index]]
79
+ name = "forgejo"
80
+ url = "https://codefloe.com/api/packages/Alliance-SwissPass/pypi/simple"
81
+ ```
82
+
83
+ Then run:
84
+
85
+ ```bash
86
+ uv sync
87
+ ```
88
+
89
+ ### From Git
90
+
91
+ For development or testing, install directly from the repository:
92
+
93
+ ```toml
94
+ [project]
95
+ dependencies = [
96
+ "containerio",
97
+ ]
98
+
99
+ [tool.uv.sources]
100
+ containerio = { git = "https://codefloe.com/Alliance-SwissPass/py-containerio.git", tag = "v1.0.0" }
101
+ ```
102
+
103
+ ## Usage in Other Projects
104
+
105
+ When installed as a dependency, containerio integrates seamlessly with your project:
106
+
107
+ - **CLI tools are available** - Run `uv run io` directly from your project
108
+ - **`.env` files live in your project** - The storage module reads `.env` and `.env.staging` from your current working directory, not from the containerio package
109
+ - **No configuration in containerio needed** - Just add the dependency and configure in your project
110
+
111
+ Example workflow in your project:
112
+
113
+ ```bash
114
+ # Your project directory
115
+ cd my-project
116
+
117
+ # Create your .env.staging file here
118
+ cp /path/to/.env.example .env.staging
119
+ # Edit .env.staging with your credentials
120
+
121
+ # List blobs in both containers
122
+ uv run io ls --env staging
123
+
124
+ # List blobs in local mode (./data/ directory)
125
+ uv run io ls
126
+
127
+ # Generate SAS tokens (updates .env.staging)
128
+ uv run io generate-sas --env staging --update-env
129
+ ```
130
+
131
+ ## Example Usage
132
+
133
+ ```python
134
+ import os
135
+ from containerio import StorageHandler
136
+
137
+ # Set environment (defaults to "local" which uses ./data/ directory)
138
+ os.environ["STORAGE_ENV"] = "staging"
139
+ # or "prod"
140
+
141
+ # Read operations (uses read container)
142
+ handler = StorageHandler(container="ro")
143
+
144
+ # Read files
145
+ df = handler.read_csv_blob("path/to/file.csv")
146
+ df = handler.read_excel_blob("path/to/file.xlsx")
147
+ df, metadata = handler.read_parquet_blob("path/to/file.parquet")
148
+
149
+ # Lazy scan (memory-efficient for large files)
150
+ lf, metadata = handler.scan_parquet_blob("large_file.parquet")
151
+ lf = handler.scan_csv_blob("large_file.csv")
152
+
153
+ # Write operations (uses write container)
154
+ handler_rw = StorageHandler(container="rw")
155
+
156
+ handler_rw.write_csv_blob("output.csv", df)
157
+ handler_rw.write_parquet_blob("output.parquet", df)
158
+ handler_rw.write_excel_blob("output.xlsx", df)
159
+
160
+ # Stream LazyFrame directly to storage
161
+ handler_rw.sink_parquet_blob("output.parquet", lf)
162
+
163
+ # Upload any file (JSON, YAML, images, etc.)
164
+ handler_rw.upload_blob("config.json", "./local/config.json")
165
+
166
+ # List blobs
167
+ blobs = handler.list_blobs() # Returns List[BlobInfo]
168
+ handler.download_blob("file") # Download to local ./data/
169
+ handler_rw.delete_blob("file") # Delete a blob or folder
170
+ handler_rw.move_blob("old.csv", "archive/old.csv") # Move/rename
171
+ ```
172
+
173
+ ## Configuration
174
+
175
+ ### Environment Variables
176
+
177
+ Set `STORAGE_ENV` to control the storage backend:
178
+
179
+ | Value | Behavior |
180
+ |-------|----------|
181
+ | `local` (default) | Uses local filesystem (`./data/` directory) |
182
+ | `staging` | Uses Azure staging storage (reads from `.env.staging`) |
183
+ | `prod` | Uses Azure production storage (reads from `.env`) |
184
+
185
+ ### Azure Configuration
186
+
187
+ For Azure storage, create a `.env` file (for prod) or `.env.staging` (for staging).
188
+
189
+ **Important:** Restrict permissions on your `.env` file so only you can access it:
190
+
191
+ ```bash
192
+ chmod 600 .env
193
+ ```
194
+
195
+ #### Example `.env` (production)
196
+
197
+ ```bash
198
+ AZURE_STORAGE_ACCOUNT=<azure_storage_account>
199
+ AZURE_CONTAINER_NAME_READ_PROD=<azure_container_name_read_prod>
200
+ AZURE_CONTAINER_NAME_WRITE_PROD=<azure_container_name_write_prod>
201
+ AZURE_TENANT_ID=<azure_tenant_id>
202
+ AZURE_CLIENT_ID=<azure_client_id>
203
+
204
+ # SAS tokens (optional, if not using device code auth)
205
+ AZURE_STORAGE_SAS_TOKEN_READ=<azure_storage_sas_token_read>
206
+ AZURE_STORAGE_SAS_TOKEN_WRITE=<azure_storage_sas_token_write>
207
+ ```
208
+
209
+ #### Example `.env.staging`
210
+
211
+ ```bash
212
+ AZURE_STORAGE_ACCOUNT=<azure_storage_account>
213
+ AZURE_CONTAINER_NAME_READ_STAGING=<azure_container_name_read_staging>
214
+ AZURE_CONTAINER_NAME_WRITE_STAGING=<azure_container_name_write_staging>
215
+ AZURE_TENANT_ID=<azure_tenant_id>
216
+ AZURE_CLIENT_ID=<azure_client_id>
217
+
218
+ # SAS tokens (optional, if not using device code auth)
219
+ AZURE_STORAGE_SAS_TOKEN_READ=<azure_storage_sas_token_read>
220
+ AZURE_STORAGE_SAS_TOKEN_WRITE=<azure_storage_sas_token_write>
221
+ ```
222
+
223
+ ### Authentication
224
+
225
+ Use `io auth` to manage authentication explicitly.
226
+ Device-code login and status auto-discover the environment from your `.env` / `.env.staging` file, so `--env` is optional.
227
+ SAS token and client credentials login require explicit `--env`:
228
+
229
+ ```bash
230
+ # Device code login (default) — auto-discovers env from .env file
231
+ uv run io auth
232
+
233
+ # Device code login with explicit env
234
+ uv run io auth --env staging
235
+
236
+ # Check auth status across all methods
237
+ uv run io auth status
238
+
239
+ # SAS token login — requires --env (wrong env = wrong token)
240
+ uv run io auth login --method sas-token --env staging
241
+
242
+ # Client credentials — requires --env (for CI/CD pipelines)
243
+ uv run io auth login --method client-credentials --env prod
244
+
245
+ # Clear cached device code tokens
246
+ uv run io auth clear
247
+
248
+ # Force re-authentication (clears cache first)
249
+ uv run io auth login --force
250
+ ```
251
+
252
+ The chosen auth method is saved to the session. Subsequent commands (`ls`, `download`, etc.) automatically use it.
253
+
254
+ ### Authentication Methods
255
+
256
+ 1. **Device Code Flow** (recommended for development)
257
+ - Interactive browser-based authentication
258
+ - Tokens are cached locally for 90 days
259
+ - No credentials stored in files
260
+
261
+ 2. **Client Credentials**
262
+ - For CI/CD pipelines
263
+ - Use `io auth login --method client-credentials` to authenticate
264
+ - Requires:
265
+ - `AZURE_TENANT_ID`: Service principal's tenant ID
266
+ - `AZURE_CLIENT_ID_CONTAINER`: Service principal's client ID
267
+ - `AZURE_CLIENT_SECRET_CONTAINER_PROD`: Service principal's client secret for production
268
+ - `AZURE_CLIENT_SECRET_CONTAINER_STAGING`: Service principal's client secret for staging
269
+
270
+ 3. **SAS Token**
271
+ - Use pre-generated tokens with limited validity
272
+ - Good for sharing temporary access
273
+
274
+ ## CLI Tools
275
+
276
+ The package provides a unified `io` command with subcommands for all storage operations. All commands accept `--env prod|staging` to override `STORAGE_ENV`.
277
+
278
+ | Command | Usage | Description |
279
+ |---------|-------|-------------|
280
+ | `auth` | `uv run io auth [login\|status\|clear] [--method M] [--force]` | Manage authentication |
281
+ | `ls` | `uv run io ls [--container ro\|rw\|ro-rw]` | List blobs in storage |
282
+ | `download` | `uv run io download <blob_name> [--container ro\|rw]` | Download a blob to local `data/` directory |
283
+ | `upload` | `uv run io upload <blob_name> <file_path>` | Upload a local file to the write container |
284
+ | `rm` | `uv run io rm <path>` | Delete a blob or folder from the write container |
285
+ | `mv` | `uv run io mv <source> <dest> [--overwrite]` | Move (rename) a blob within the write container |
286
+ | `generate-sas` | `uv run io generate-sas [--container ro\|rw\|ro-rw] [--days N] [--update-env]` | Generate Azure Storage SAS tokens |
287
+
288
+ ### Examples
289
+
290
+ ```bash
291
+ # Authenticate with device code flow (auto-discovers env)
292
+ uv run io auth
293
+
294
+ # Check authentication status (auto-discovers env)
295
+ uv run io auth status
296
+
297
+ # List blobs in local mode (./data/)
298
+ uv run io ls
299
+
300
+ # List both read and write containers (default --container ro-rw)
301
+ uv run io ls --env staging
302
+
303
+ # Download a blob to the local data/ directory
304
+ uv run io download path/to/file.csv --env staging
305
+
306
+ # Upload a local file to the write container
307
+ uv run io upload path/to/blob.csv ./local/file.csv --env staging
308
+
309
+ # Delete a blob or folder from the write container
310
+ uv run io rm path/to/blob.csv --env staging
311
+ uv run io rm old-folder/ --env staging
312
+
313
+ # Move a blob within the write container
314
+ uv run io mv data/old.csv archive/old.csv --env staging
315
+
316
+ # Overwrite an existing destination
317
+ uv run io mv data/old.csv data/existing.csv --env staging --overwrite
318
+
319
+ # Generate SAS tokens and update .env.staging file
320
+ uv run io generate-sas --env staging --update-env
321
+
322
+ # Generate only read container token, valid for 3 days
323
+ uv run io generate-sas --env prod --container ro --days 3
324
+ ```
325
+
326
+ ### Session
327
+
328
+ When doing multiple operations on the same container, use `session` to save defaults so you don't have to repeat `--env` and `--container` on every command.
329
+ Both `--env` and `--container` are required to start a session.
330
+
331
+ ```bash
332
+ # Start a session — saves env and container
333
+ uv run io session --env staging --container rw
334
+
335
+ # Show active session
336
+ uv run io session
337
+
338
+ # Now these use the saved defaults
339
+ uv run io ls
340
+ uv run io rm old-folder/
341
+ uv run io mv data/old.csv archive/old.csv
342
+
343
+ # Explicit flags still override the session
344
+ uv run io ls --container ro
345
+
346
+ # End the session
347
+ uv run io session clear
348
+ ```
349
+
350
+ The session is stored in `.containerio/session.json` in the current directory. It is project-local and git-ignored.
351
+
352
+ The session is also available programmatically via the `Session` class:
353
+
354
+ ```python
355
+ from containerio.session import Session
356
+
357
+ session = Session()
358
+ print(session.auth_method) # e.g. 'client-credentials'
359
+ print(session.env) # e.g. 'staging'
360
+ print(session.is_active) # True if any session data exists
361
+ ```
362
+
363
+ Options:
364
+
365
+ - `--env`: `prod` or `staging` (default: staging for sas; uses `STORAGE_ENV` for other commands)
366
+ - `--container`: `ro`, `rw`, or `ro-rw` (default: ro-rw)
367
+ - `--days`: Token validity in days, max 7 (default: 1)
368
+ - `--update-env`: Write tokens to .env file instead of printing
369
+
370
+ **Note:** SAS tokens are generated using a *user delegation key* which has a maximum validity of **7 days** (Azure limitation). See [Microsoft documentation](https://learn.microsoft.com/en-us/rest/api/storageservices/get-user-delegation-key#request-body) for details.
371
+
372
+ ## API Reference
373
+
374
+ ### StorageHandler
375
+
376
+ Main class for storage operations.
377
+
378
+ ```python
379
+ StorageHandler(
380
+ container: Literal["ro", "rw"] = "ro",
381
+ auth_type: Optional[str] = None,
382
+ session: Optional[Session] = None,
383
+ )
384
+ ```
385
+
386
+ - `auth_type=None`: Falls back to session auth method, then device code flow
387
+ - `auth_type="device-code"`: Device code flow (interactive)
388
+ - `auth_type="sas-token"`: SAS token authentication
389
+ - `auth_type="client-credentials"`: Client credentials (service principal)
390
+ - `session=None`: Creates a new `Session` (skipped in local mode). Pass an existing `Session` instance for dependency injection.
391
+ - `container="ro"`: Uses the read container (`AZURE_CONTAINER_NAME_READ_{ENV}`)
392
+ - `container="rw"`: Uses the write container (`AZURE_CONTAINER_NAME_WRITE_{ENV}`)
393
+
394
+ #### Read Methods
395
+
396
+ | Method | Description |
397
+ |--------|-------------|
398
+ | `read_parquet_blob(blob_name)` | Read parquet file, returns `(DataFrame, metadata)` |
399
+ | `read_csv_blob(blob_name, **kwargs)` | Read CSV file into DataFrame |
400
+ | `read_excel_blob(blob_name, **kwargs)` | Read Excel file into DataFrame |
401
+ | `scan_parquet_blob(blob_name)` | Lazy scan parquet, returns `(LazyFrame, metadata)` |
402
+ | `scan_csv_blob(blob_name, **kwargs)` | Lazy scan CSV file |
403
+
404
+ #### Write Methods
405
+
406
+ | Method | Description |
407
+ |--------|-------------|
408
+ | `write_parquet_blob(blob_name, df)` | Write DataFrame to parquet |
409
+ | `write_csv_blob(blob_name, df, **kwargs)` | Write DataFrame to CSV |
410
+ | `write_excel_blob(blob_name, df, **kwargs)` | Write DataFrame to Excel |
411
+ | `sink_parquet_blob(blob_name, lf)` | Stream LazyFrame to parquet |
412
+
413
+ #### Utility Methods
414
+
415
+ | Method | Description |
416
+ |--------|-------------|
417
+ | `upload_blob(blob_name, file_path, progress_hook)` | Upload a local file to storage |
418
+ | `delete_blob(blob_name)` | Delete a blob, file, or folder recursively. Returns `List[str]` of deleted paths |
419
+ | `move_blob(source, destination, overwrite)` | Move (rename) a blob within the same container |
420
+ | `download_blob(blob_name, progress_hook)` | Download blob to local `data/` directory. Returns `Optional[Path]` |
421
+ | `list_blobs()` | List all blobs, returns `List[BlobInfo]` |
422
+
423
+ ### AzureStorageConfig
424
+
425
+ Configuration dataclass for Azure storage settings.
426
+
427
+ ```python
428
+ from containerio import AzureStorageConfig
429
+
430
+ # Load from environment
431
+ config = AzureStorageConfig.from_environment() # Uses STORAGE_ENV
432
+ config = AzureStorageConfig.from_environment(environment="staging")
433
+
434
+ # Access configuration
435
+ print(config.storage_account)
436
+ print(config.container_name_read)
437
+ ```
438
+
439
+ ### Environment
440
+
441
+ Enum for storage environments.
442
+
443
+ ```python
444
+ from containerio import Environment
445
+
446
+ # Parse from string
447
+ env = Environment.from_string("prod") # Environment.PROD
448
+
449
+ # Use enum directly
450
+ if env == Environment.LOCAL:
451
+ print("Using local storage")
452
+ ```
453
+
454
+ ## Development
455
+
456
+ ```bash
457
+ # Clone and install
458
+ git clone ssh://git@codefloe.com/Alliance-SwissPass/py-containerio.git
459
+ cd py-containerio
460
+ uv sync --group dev
461
+
462
+ # Run tests
463
+ uv run pytest
464
+
465
+ # Lint and format
466
+ uv run ruff check
467
+ uv run ruff format
468
+ ```
469
+
470
+ ## Version Management with git-sv
471
+
472
+ This project uses [git-sv](https://github.com/thegeeklab/git-sv) for semantic versioning and changelog generation.
473
+
474
+ The `.gitsv/config.yaml` file defines how versions are bumped based on commit messages.
475
+
476
+ 1. **Commit messages**: Follow [conventional commits](https://www.conventionalcommits.org/en/v1.0.0/) format
477
+ 2. **Version bump**: The CI pipeline automatically runs `git-sv bump` based on commit history
478
+ 3. **Changelog**: The CI pipeline generates the changelog using `git-sv changelog` and pins it as a Git issue
479
+ 4. **Release**: The CI pipeline automatically handles tagging and publishing
480
+
481
+ The CI pipeline will automatically:
482
+ - Bump the version based on commit history
483
+ - Generate and pin the changelog as a Git issue
484
+ - Build and publish the package to Forgejo
485
+ - Create a release with the changelog notes
486
+
487
+ ## Publishing
488
+
489
+ ### Via CI Pipeline (recommended)
490
+
491
+ The repository includes a Crow CI pipeline that automatically builds, tests, and publishes the package when a git tag is pushed:
492
+
493
+ 2. Commit and push your changes
494
+ 3. Create and push a git tag (e.g., `git tag v1.0.0 && git push --tags`)
495
+
496
+ The pipeline will:
497
+ - Run linting and tests
498
+ - Update the version in `pyproject.toml` automatically
499
+ - Generate changelog from git history using `git-sv`
500
+ - Build and publish the package to Forgejo and PyPI
501
+ - Create a release with changelog notes
502
+ - Push version updates back to the repository
503
+
504
+ **Note:** For CI publishing, ensure the `UV_PUBLISH_TOKEN` secret is configured in your Crow CI pipeline settings.
505
+
506
+ ## Security Best Practices
507
+
508
+ 1. **Never commit credentials**: Always add `.env*` to your `.gitignore` file
509
+ 2. **Restrict file permissions**: Use `chmod 600 .env` to limit access to your credentials
510
+ 3. **Use short-lived tokens**: SAS tokens have a maximum validity of 7 days
511
+ 4. **Rotate secrets regularly**: Especially for production environments and CI/CD pipelines
512
+ 5. **Use device code flow**: For development to avoid storing credentials in files
513
+ 6. **Environment isolation**: Use separate credentials for staging and production
514
+ 7. **Limit token scope**: When generating SAS tokens, specify the minimum required permissions
515
+
516
+ ## References
517
+
518
+ - [Delegate access with shared access signature](https://learn.microsoft.com/en-us/rest/api/storageservices/delegate-access-with-shared-access-signature)
519
+ - [Create user delegation SAS](https://learn.microsoft.com/en-us/rest/api/storageservices/create-user-delegation-sas)
520
+ - [Get user delegation key](https://learn.microsoft.com/en-us/rest/api/storageservices/get-user-delegation-key)
521
+
522
+ ## License
523
+
524
+ Alliance SwissPass <https://allianceswisspass.ch>