pydorky 2.1.8
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.devcontainer/devcontainer.json +17 -0
- package/.github/FUNDING.yml +15 -0
- package/.github/workflows/e2e-integration.yml +57 -0
- package/.github/workflows/publish.yml +24 -0
- package/.nvmrc +1 -0
- package/LICENSE +21 -0
- package/README.md +156 -0
- package/bin/index.js +19 -0
- package/bin/legacy.js +432 -0
- package/docs/doc#1 get-started/README.md +105 -0
- package/docs/doc#2 features-wishlist +33 -0
- package/docs/doc#2.5 python-port +31 -0
- package/docs/doc#3 the-correct-node-version +107 -0
- package/docs/doc#4 why-where-python +42 -0
- package/docs/doc#5 how-do-endpoints-cli-work +0 -0
- package/dorky-usage-aws.svg +1 -0
- package/dorky-usage-google-drive.svg +1 -0
- package/google-drive-credentials.json +16 -0
- package/openapi/openapi.yaml +257 -0
- package/package.json +46 -0
- package/python-client/README.md +19 -0
- package/python-client/dorky_client/__init__.py +3 -0
- package/python-client/dorky_client/client.py +32 -0
- package/python-client/pyproject.toml +13 -0
- package/python-client/tests/test_integration.py +20 -0
- package/rectdorky.png +0 -0
- package/server/index.js +193 -0
- package/server/package.json +12 -0
- package/todo/01-core-infrastructure.md +84 -0
- package/todo/02-storage-providers.md +104 -0
- package/todo/03-compression-formats.md +94 -0
- package/todo/04-python-client.md +126 -0
- package/todo/05-metadata-versioning.md +116 -0
- package/todo/06-performance-concurrency.md +130 -0
- package/todo/07-security-encryption.md +114 -0
- package/todo/08-developer-experience.md +175 -0
- package/todo/README.md +37 -0
- package/web-app/README.md +70 -0
- package/web-app/package-lock.json +17915 -0
- package/web-app/package.json +43 -0
- package/web-app/public/favicon.ico +0 -0
- package/web-app/public/index.html +43 -0
- package/web-app/public/logo192.png +0 -0
- package/web-app/public/logo512.png +0 -0
- package/web-app/public/manifest.json +25 -0
- package/web-app/public/robots.txt +3 -0
- package/web-app/src/App.css +23 -0
- package/web-app/src/App.js +84 -0
- package/web-app/src/App.test.js +8 -0
- package/web-app/src/PrivacyPolicy.js +26 -0
- package/web-app/src/TermsAndConditions.js +41 -0
- package/web-app/src/index.css +3 -0
- package/web-app/src/index.js +26 -0
- package/web-app/src/logo.svg +1 -0
- package/web-app/src/reportWebVitals.js +13 -0
- package/web-app/src/setupTests.js +5 -0
- package/web-app/tailwind.config.js +10 -0
|
@@ -0,0 +1,84 @@
|
|
|
1
|
+
# Core Infrastructure
|
|
2
|
+
|
|
3
|
+
Core HTTP service, CLI foundation, and Node version alignment.
|
|
4
|
+
|
|
5
|
+
## P0: HTTP Service Foundation
|
|
6
|
+
|
|
7
|
+
### Upload & Download
|
|
8
|
+
- [ ] Finalize OpenAPI spec (multipart form, JSON responses, error codes)
|
|
9
|
+
- [ ] Implement `POST /artifacts` endpoint with streaming support
|
|
10
|
+
- [ ] Implement `GET /artifacts/{id}` endpoint with streaming download
|
|
11
|
+
- [ ] Add HTTP 201 Created response for uploads
|
|
12
|
+
- [ ] Add HTTP 200 OK for downloads
|
|
13
|
+
- [ ] Handle large file uploads (>100MB) efficiently
|
|
14
|
+
|
|
15
|
+
### Metadata Operations
|
|
16
|
+
- [ ] Implement `GET /artifacts/{id}/metadata` endpoint
|
|
17
|
+
- [ ] Implement `PATCH /artifacts/{id}/metadata` endpoint
|
|
18
|
+
- [ ] Support custom key-value metadata
|
|
19
|
+
- [ ] Store `content_hash`, `file_size`, `last_updated`, `uploaded_by`
|
|
20
|
+
|
|
21
|
+
### Idempotency
|
|
22
|
+
- [ ] Add idempotency key support in upload endpoint
|
|
23
|
+
- [ ] Store idempotency key → artifact ID mapping
|
|
24
|
+
- [ ] Return cached result if same key uploaded twice
|
|
25
|
+
- [ ] Auto-clean old idempotency entries (TTL)
|
|
26
|
+
|
|
27
|
+
### Health & Status
|
|
28
|
+
- [ ] Add `GET /health` endpoint
|
|
29
|
+
- [ ] Add `GET /status` with service diagnostics
|
|
30
|
+
- [ ] Return storage backend status
|
|
31
|
+
|
|
32
|
+
## P0: Node Version & CI
|
|
33
|
+
|
|
34
|
+
### Node Version Management
|
|
35
|
+
- [ ] Add `.nvmrc` file with `16` in repo root
|
|
36
|
+
- [ ] Add `engines.node` to `package.json` (`>=16`)
|
|
37
|
+
- [ ] Update devcontainer image to Node 16 base
|
|
38
|
+
- [ ] Document Node version requirement in `README.md` and `docs/doc#1 get-started`
|
|
39
|
+
|
|
40
|
+
### CI/CD Alignment
|
|
41
|
+
- [ ] Ensure publish workflow runs on Node 16
|
|
42
|
+
- [ ] Add CI matrix job for Node 16 + Node 20 testing
|
|
43
|
+
- [ ] Run integration tests on both versions
|
|
44
|
+
- [ ] Plan migration steps to Node 20 (documented)
|
|
45
|
+
|
|
46
|
+
## P1: CLI Foundation
|
|
47
|
+
|
|
48
|
+
### Commit/Stage/Push Workflow
|
|
49
|
+
- [ ] Implement `dorky init` — scaffold project config
|
|
50
|
+
- [ ] Implement `dorky status` — show staged/changed files
|
|
51
|
+
- [ ] Implement `dorky stage <path>` — add file to staging
|
|
52
|
+
- [ ] Implement `dorky unstage <path>` — remove from staging
|
|
53
|
+
- [ ] Implement `dorky commit <message>` — create commit with staged files
|
|
54
|
+
- [ ] Implement `dorky push` — upload commits to remote bucket
|
|
55
|
+
- [ ] Implement `dorky pull` — download latest from bucket
|
|
56
|
+
|
|
57
|
+
### Configuration
|
|
58
|
+
- [ ] Add `.dorkyrc` project config file support
|
|
59
|
+
- [ ] Support environment variables for bucket URL, credentials
|
|
60
|
+
- [ ] Store local state in `.dorky/` directory
|
|
61
|
+
|
|
62
|
+
### Error Handling
|
|
63
|
+
- [ ] Add `--verbose` flag for debugging
|
|
64
|
+
- [ ] Add `--help` for all commands
|
|
65
|
+
- [ ] Return clear error messages with actionable hints
|
|
66
|
+
|
|
67
|
+
## P2: Testing & Documentation
|
|
68
|
+
|
|
69
|
+
### Testing
|
|
70
|
+
- [ ] Add unit tests for upload/download logic
|
|
71
|
+
- [ ] Add integration tests with local server
|
|
72
|
+
- [ ] Add E2E tests for CLI workflow
|
|
73
|
+
- [ ] Test with various file sizes (small, medium, large >1GB)
|
|
74
|
+
|
|
75
|
+
### Documentation
|
|
76
|
+
- [ ] Document OpenAPI spec with examples
|
|
77
|
+
- [ ] Add CLI command reference to README
|
|
78
|
+
- [ ] Add architecture diagram
|
|
79
|
+
- [ ] Add troubleshooting guide
|
|
80
|
+
|
|
81
|
+
---
|
|
82
|
+
|
|
83
|
+
**Status**: Core service + CLI in dev, Node alignment pending
|
|
84
|
+
**Next**: Finalize OpenAPI spec, implement upload/download endpoints
|
|
@@ -0,0 +1,104 @@
|
|
|
1
|
+
# Storage Providers (BYOB)
|
|
2
|
+
|
|
3
|
+
Bring-your-own-bucket support for AWS S3, Google Cloud Storage, Azure, and local backends.
|
|
4
|
+
|
|
5
|
+
## P0: Existing Providers
|
|
6
|
+
|
|
7
|
+
### AWS S3
|
|
8
|
+
- [x] Basic S3 client integration exists
|
|
9
|
+
- [ ] Verify streaming upload works (multipart)
|
|
10
|
+
- [ ] Verify streaming download works
|
|
11
|
+
- [ ] Add region configuration support
|
|
12
|
+
- [ ] Test with various file sizes
|
|
13
|
+
- [ ] Add S3 access error handling
|
|
14
|
+
|
|
15
|
+
### Google Cloud Storage (GCS)
|
|
16
|
+
- [x] Basic GCS integration exists
|
|
17
|
+
- [ ] Verify streaming upload works
|
|
18
|
+
- [ ] Verify streaming download works
|
|
19
|
+
- [ ] Add project ID configuration
|
|
20
|
+
- [ ] Test bucket access and permissions
|
|
21
|
+
- [ ] Add GCS-specific error handling
|
|
22
|
+
|
|
23
|
+
### Google Drive
|
|
24
|
+
- [x] Integration exists (documented in docs)
|
|
25
|
+
- [ ] Test with OAuth 2.0 flow
|
|
26
|
+
- [ ] Verify file sync and versioning
|
|
27
|
+
|
|
28
|
+
## P1: New Providers
|
|
29
|
+
|
|
30
|
+
### Azure Blob Storage
|
|
31
|
+
- [ ] Implement Azure Blob Storage client
|
|
32
|
+
- [ ] Add connection string support
|
|
33
|
+
- [ ] Implement streaming upload (block blobs)
|
|
34
|
+
- [ ] Implement streaming download
|
|
35
|
+
- [ ] Add container creation support
|
|
36
|
+
- [ ] Test with various blob sizes
|
|
37
|
+
|
|
38
|
+
### MinIO / S3-Compatible
|
|
39
|
+
- [ ] Implement S3-compatible endpoint support
|
|
40
|
+
- [ ] Allow custom endpoint URL configuration
|
|
41
|
+
- [ ] Test with MinIO local instance
|
|
42
|
+
- [ ] Test with other S3-compatible services (DigitalOcean Spaces, etc.)
|
|
43
|
+
|
|
44
|
+
### Local Filesystem (Dev/Testing)
|
|
45
|
+
- [ ] Implement local filesystem backend
|
|
46
|
+
- [ ] Use `.dorky-data/` directory
|
|
47
|
+
- [ ] Support all operations (upload, download, metadata, delete)
|
|
48
|
+
- [ ] Add for easy local testing without cloud credentials
|
|
49
|
+
|
|
50
|
+
## P2: Bucket Configuration & Management
|
|
51
|
+
|
|
52
|
+
### Bucket Provisioning
|
|
53
|
+
- [ ] Implement `dorky bucket create` command
|
|
54
|
+
- [ ] Auto-detect cloud provider (S3, GCS, Azure)
|
|
55
|
+
- [ ] Create bucket with proper permissions
|
|
56
|
+
- [ ] Store credentials securely
|
|
57
|
+
|
|
58
|
+
### Bucket Operations
|
|
59
|
+
- [ ] Implement `dorky bucket list` — list all configured buckets
|
|
60
|
+
- [ ] Implement `dorky bucket delete` — remove bucket reference
|
|
61
|
+
- [ ] Implement `dorky bucket validate` — test connection and permissions
|
|
62
|
+
- [ ] Add bucket selection for multi-bucket projects
|
|
63
|
+
|
|
64
|
+
### Credentials Management
|
|
65
|
+
- [ ] Support AWS IAM credentials (env vars or ~/.aws/credentials)
|
|
66
|
+
- [ ] Support GCS service account keys
|
|
67
|
+
- [ ] Support Azure connection strings
|
|
68
|
+
- [ ] Add `dorky config credentials` command
|
|
69
|
+
|
|
70
|
+
## P2: Hierarchical Sync
|
|
71
|
+
|
|
72
|
+
### Folder Structure
|
|
73
|
+
- [ ] Create hierarchical folder structure on bucket (e.g., `projects/myproject/data/`)
|
|
74
|
+
- [ ] Map local file structure to bucket paths
|
|
75
|
+
- [ ] Support nested directories at arbitrary depth
|
|
76
|
+
|
|
77
|
+
### Sync Operations
|
|
78
|
+
- [ ] Implement `dorky sync` — two-way sync of local/remote
|
|
79
|
+
- [ ] Implement `dorky upload-folder <path>` — upload entire directory
|
|
80
|
+
- [ ] Implement `dorky download-folder <path>` — download entire directory
|
|
81
|
+
- [ ] Support `.dorkyignore` patterns (like `.gitignore`)
|
|
82
|
+
|
|
83
|
+
### Incremental Sync
|
|
84
|
+
- [ ] Track synced files and timestamps
|
|
85
|
+
- [ ] Only sync changed files on subsequent runs
|
|
86
|
+
- [ ] Add `--force` flag to re-sync everything
|
|
87
|
+
- [ ] Detect conflicts and prompt user
|
|
88
|
+
|
|
89
|
+
## P3: Advanced Features
|
|
90
|
+
|
|
91
|
+
### Multi-Cloud
|
|
92
|
+
- [ ] Support multiple buckets across different cloud providers
|
|
93
|
+
- [ ] Implement replication across providers
|
|
94
|
+
- [ ] Add failover logic
|
|
95
|
+
|
|
96
|
+
### Performance
|
|
97
|
+
- [ ] Add concurrent uploads to multiple buckets (backup)
|
|
98
|
+
- [ ] Add bandwidth limiting for uploads/downloads
|
|
99
|
+
- [ ] Cache bucket listings for faster operations
|
|
100
|
+
|
|
101
|
+
---
|
|
102
|
+
|
|
103
|
+
**Status**: S3, GCS, Google Drive exist; Azure/MinIO/local pending
|
|
104
|
+
**Next**: Test existing providers, implement Azure + local filesystem
|
|
@@ -0,0 +1,94 @@
|
|
|
1
|
+
# Compression & Data Formats
|
|
2
|
+
|
|
3
|
+
Compress artifacts during upload and support data format conversions (CSV to Parquet, etc.).
|
|
4
|
+
|
|
5
|
+
## P1: Core Compression
|
|
6
|
+
|
|
7
|
+
### Gzip
|
|
8
|
+
- [ ] Implement gzip compression in upload flow
|
|
9
|
+
- [ ] Store `Content-Encoding: gzip` in metadata
|
|
10
|
+
- [ ] Auto-decompress on download
|
|
11
|
+
- [ ] Add `--compress=gzip` CLI flag
|
|
12
|
+
- [ ] Add compression level configuration (1-9)
|
|
13
|
+
|
|
14
|
+
### LZ4 (Fast Compression)
|
|
15
|
+
- [ ] Add LZ4 compression support (faster than gzip)
|
|
16
|
+
- [ ] Store compression type in metadata
|
|
17
|
+
- [ ] Add `--compress=lz4` CLI flag
|
|
18
|
+
- [ ] Test performance vs gzip
|
|
19
|
+
|
|
20
|
+
### Brotli
|
|
21
|
+
- [ ] Add Brotli compression support (better ratio)
|
|
22
|
+
- [ ] Add `--compress=brotli` CLI flag
|
|
23
|
+
- [ ] Test performance vs gzip/lz4
|
|
24
|
+
|
|
25
|
+
### Decompression on Download
|
|
26
|
+
- [ ] Auto-detect compression from metadata
|
|
27
|
+
- [ ] Decompress transparently during download
|
|
28
|
+
- [ ] Support partial decompression (streaming)
|
|
29
|
+
- [ ] Add `--no-decompress` flag to keep compressed
|
|
30
|
+
|
|
31
|
+
## P1: Compression Configuration
|
|
32
|
+
|
|
33
|
+
### Project-Level Config
|
|
34
|
+
- [ ] Add compression preference in `.dorkyrc`
|
|
35
|
+
- [ ] Support per-file pattern compression (e.g., `*.json` → gzip)
|
|
36
|
+
- [ ] Add `dorky config compression` command
|
|
37
|
+
|
|
38
|
+
### Metadata Tracking
|
|
39
|
+
- [ ] Store `compression_type` in metadata
|
|
40
|
+
- [ ] Store `compressed_size` alongside `file_size`
|
|
41
|
+
- [ ] Store `compression_ratio` for reporting
|
|
42
|
+
- [ ] Add `--show-compression` flag to display stats
|
|
43
|
+
|
|
44
|
+
## P2: Data Format Conversions
|
|
45
|
+
|
|
46
|
+
### CSV to Parquet
|
|
47
|
+
- [ ] Create Python worker service for conversions
|
|
48
|
+
- [ ] Implement CSV → Parquet conversion
|
|
49
|
+
- [ ] Infer schema from CSV headers
|
|
50
|
+
- [ ] Support schema config file (JSON)
|
|
51
|
+
- [ ] Add `dorky convert csv-to-parquet <input> <output>` command
|
|
52
|
+
|
|
53
|
+
### JSON Compression
|
|
54
|
+
- [ ] Minify JSON before compression
|
|
55
|
+
- [ ] Remove whitespace and comments
|
|
56
|
+
- [ ] Add `--minify-json` flag
|
|
57
|
+
- [ ] Store original schema separately for reconstruction
|
|
58
|
+
|
|
59
|
+
### Format Detection
|
|
60
|
+
- [ ] Auto-detect file type on upload (.json, .csv, .parquet, etc.)
|
|
61
|
+
- [ ] Suggest optimal compression based on file type
|
|
62
|
+
- [ ] Add `--format` hint flag for ambiguous files
|
|
63
|
+
|
|
64
|
+
## P2: Data Quality & Validation
|
|
65
|
+
|
|
66
|
+
### Integrity Checking
|
|
67
|
+
- [ ] Store SHA256 hash of original file in metadata
|
|
68
|
+
- [ ] Verify hash on download before decompression
|
|
69
|
+
- [ ] Add `--verify` flag for integrity check
|
|
70
|
+
- [ ] Report any corrupted files
|
|
71
|
+
|
|
72
|
+
### Schema Validation
|
|
73
|
+
- [ ] Optional schema validation for structured formats
|
|
74
|
+
- [ ] Store schema in `.dorkyschema` or metadata
|
|
75
|
+
- [ ] Validate before uploading
|
|
76
|
+
- [ ] Warn on schema changes
|
|
77
|
+
|
|
78
|
+
## P3: Advanced Conversions
|
|
79
|
+
|
|
80
|
+
### Other Formats
|
|
81
|
+
- [ ] Parquet → Arrow IPC
|
|
82
|
+
- [ ] JSON → MessagePack
|
|
83
|
+
- [ ] Binary format conversions
|
|
84
|
+
- [ ] Add `dorky convert list` to show available conversions
|
|
85
|
+
|
|
86
|
+
### Streaming Conversions
|
|
87
|
+
- [ ] Stream large CSV → Parquet without buffering
|
|
88
|
+
- [ ] Support incremental conversions
|
|
89
|
+
- [ ] Allow on-demand conversion (download → convert → return)
|
|
90
|
+
|
|
91
|
+
---
|
|
92
|
+
|
|
93
|
+
**Status**: Compression infrastructure pending, format conversions planned
|
|
94
|
+
**Next**: Implement gzip/lz4, add compression config, create Python conversion service
|
|
@@ -0,0 +1,126 @@
|
|
|
1
|
+
# Python Client (PyPI)
|
|
2
|
+
|
|
3
|
+
Lightweight Python package on PyPI that mirrors Node client feature set.
|
|
4
|
+
|
|
5
|
+
## P0: Core Blocking Client
|
|
6
|
+
|
|
7
|
+
### Basic Operations
|
|
8
|
+
- [x] Scaffold `python-client/` with `pyproject.toml`
|
|
9
|
+
- [x] Implement `DorkyClient.upload()` (blocking)
|
|
10
|
+
- [x] Implement `DorkyClient.download()` (blocking)
|
|
11
|
+
- [ ] Add `DorkyClient.get_metadata()` method
|
|
12
|
+
- [ ] Add `DorkyClient.delete()` method
|
|
13
|
+
- [ ] Verify streaming for large files (no buffering)
|
|
14
|
+
|
|
15
|
+
### Idempotency & Conflict Handling
|
|
16
|
+
- [ ] Add idempotency key support in `upload()`
|
|
17
|
+
- [ ] Handle 409 Conflict responses
|
|
18
|
+
- [ ] Add retry logic with exponential backoff
|
|
19
|
+
- [ ] Support `--force` option to overwrite
|
|
20
|
+
|
|
21
|
+
### Error Handling
|
|
22
|
+
- [ ] Catch and wrap HTTP errors
|
|
23
|
+
- [ ] Provide clear error messages
|
|
24
|
+
- [ ] Add retry-after handling (429 Rate Limited)
|
|
25
|
+
- [ ] Add timeout configuration
|
|
26
|
+
|
|
27
|
+
## P1: Async Client
|
|
28
|
+
|
|
29
|
+
### Async Implementation
|
|
30
|
+
- [ ] Create `AsyncDorkyClient` using `httpx`
|
|
31
|
+
- [ ] Mirror all sync methods in async version
|
|
32
|
+
- [ ] Add async `upload()`, `download()`, `get_metadata()`, `delete()`
|
|
33
|
+
- [ ] Add connection pooling and session reuse
|
|
34
|
+
- [ ] Support async context manager (`async with`)
|
|
35
|
+
|
|
36
|
+
### Testing
|
|
37
|
+
- [ ] Add integration tests for async client
|
|
38
|
+
- [ ] Test concurrent uploads/downloads
|
|
39
|
+
- [ ] Verify no blocking calls in async code
|
|
40
|
+
|
|
41
|
+
## P1: CLI Wrapper
|
|
42
|
+
|
|
43
|
+
### CLI Commands
|
|
44
|
+
- [ ] Add `dorky-py` or `dorky` CLI entry point
|
|
45
|
+
- [ ] Implement `dorky-py commit <message>` command
|
|
46
|
+
- [ ] Implement `dorky-py stage <path>` command
|
|
47
|
+
- [ ] Implement `dorky-py push` command
|
|
48
|
+
- [ ] Implement `dorky-py pull` command
|
|
49
|
+
- [ ] Implement `dorky-py download <artifact_id> <dest>` command
|
|
50
|
+
- [ ] Implement `dorky-py status` command
|
|
51
|
+
|
|
52
|
+
### Configuration
|
|
53
|
+
- [ ] Read from `.dorkyrc` in project root
|
|
54
|
+
- [ ] Support environment variables (DORKY_URL, DORKY_BUCKET, etc.)
|
|
55
|
+
- [ ] Add `dorky-py config` command to view/edit settings
|
|
56
|
+
- [ ] Support `~/.dorky/config` for user defaults
|
|
57
|
+
|
|
58
|
+
### Progress & Output
|
|
59
|
+
- [ ] Show progress bar for uploads/downloads
|
|
60
|
+
- [ ] Add `--quiet` / `-q` flag
|
|
61
|
+
- [ ] Add `--verbose` / `-v` flag for debugging
|
|
62
|
+
- [ ] Pretty-print metadata and file listings
|
|
63
|
+
|
|
64
|
+
## P1: PyPI Publishing
|
|
65
|
+
|
|
66
|
+
### Package Setup
|
|
67
|
+
- [ ] Finalize `pyproject.toml` with metadata
|
|
68
|
+
- [ ] Add PyPI classifiers (Python 3.7+)
|
|
69
|
+
- [ ] Set up build script (uses setuptools/wheel)
|
|
70
|
+
- [ ] Test local install: `pip install -e .`
|
|
71
|
+
|
|
72
|
+
### Publishing Workflow
|
|
73
|
+
- [ ] Create TestPyPI account
|
|
74
|
+
- [ ] Publish to TestPyPI first
|
|
75
|
+
- [ ] Test install from TestPyPI: `pip install --index-url https://test.pypi.org/simple/ dorky-client`
|
|
76
|
+
- [ ] Add GitHub Actions workflow for PyPI publish
|
|
77
|
+
- [ ] Publish to PyPI (production)
|
|
78
|
+
|
|
79
|
+
### Version Management
|
|
80
|
+
- [ ] Follow semantic versioning (MAJOR.MINOR.PATCH)
|
|
81
|
+
- [ ] Update version in `pyproject.toml` for each release
|
|
82
|
+
- [ ] Tag releases in git (e.g., `py-v0.1.0`)
|
|
83
|
+
- [ ] Auto-generate changelog from commits
|
|
84
|
+
|
|
85
|
+
## P2: Compression & Format Support
|
|
86
|
+
|
|
87
|
+
### Compression
|
|
88
|
+
- [ ] Add `upload(compress='gzip')` parameter
|
|
89
|
+
- [ ] Support `compress='lz4'`, `compress='brotli'`
|
|
90
|
+
- [ ] Auto-decompress downloads based on metadata
|
|
91
|
+
- [ ] Add `--compress` flag to CLI
|
|
92
|
+
|
|
93
|
+
### Data Formats
|
|
94
|
+
- [ ] Add optional `pyarrow` dependency for Parquet
|
|
95
|
+
- [ ] Support `upload(..., format='parquet')`
|
|
96
|
+
- [ ] Add format conversion helpers
|
|
97
|
+
- [ ] Make heavy dependencies optional (`dorky[parquet]`, `dorky[async]`)
|
|
98
|
+
|
|
99
|
+
## P2: Optional Extras
|
|
100
|
+
|
|
101
|
+
### Package Extras
|
|
102
|
+
- [ ] `dorky[parquet]` — includes `pyarrow` for Parquet support
|
|
103
|
+
- [ ] `dorky[async]` — includes `httpx` for async client
|
|
104
|
+
- [ ] `dorky[all]` — includes all optional dependencies
|
|
105
|
+
- [ ] Document in README
|
|
106
|
+
|
|
107
|
+
### Security Extras
|
|
108
|
+
- [ ] `dorky[encrypt]` — includes `cryptography` for encryption
|
|
109
|
+
- [ ] Support encrypted upload/download with client-side keys
|
|
110
|
+
|
|
111
|
+
## P3: Advanced Features
|
|
112
|
+
|
|
113
|
+
### Batch Operations
|
|
114
|
+
- [ ] Implement `upload_batch(files: List[str])` for multiple files
|
|
115
|
+
- [ ] Implement `download_batch(ids: List[str], dest_dir: str)`
|
|
116
|
+
- [ ] Support concurrent batch operations with asyncio
|
|
117
|
+
|
|
118
|
+
### Streaming & Chunking
|
|
119
|
+
- [ ] Expose raw streaming download (generator/iterator)
|
|
120
|
+
- [ ] Support range requests for partial downloads
|
|
121
|
+
- [ ] Add chunk size configuration
|
|
122
|
+
|
|
123
|
+
---
|
|
124
|
+
|
|
125
|
+
**Status**: Core client exists, async/CLI/publishing pending
|
|
126
|
+
**Next**: Add async client, CLI commands, publish to TestPyPI
|
|
@@ -0,0 +1,116 @@
|
|
|
1
|
+
# Metadata & Versioning
|
|
2
|
+
|
|
3
|
+
Artifact versioning, conflict detection, and complete change history.
|
|
4
|
+
|
|
5
|
+
## P1: Enhanced Metadata
|
|
6
|
+
|
|
7
|
+
### Standard Metadata Fields
|
|
8
|
+
- [ ] Implement `last_updated` timestamp (UTC ISO 8601)
|
|
9
|
+
- [ ] Implement `content_hash` (SHA256 hex string)
|
|
10
|
+
- [ ] Implement `uploaded_by` (user/email from config)
|
|
11
|
+
- [ ] Implement `file_size` (bytes)
|
|
12
|
+
- [ ] Implement `compressed_size` (bytes, if compressed)
|
|
13
|
+
- [ ] Implement `compression_type` (gzip, lz4, brotli, or null)
|
|
14
|
+
|
|
15
|
+
### Custom Metadata
|
|
16
|
+
- [ ] Support user-defined key-value metadata
|
|
17
|
+
- [ ] Store custom metadata in artifact metadata section
|
|
18
|
+
- [ ] Merge custom + standard metadata in response
|
|
19
|
+
- [ ] Document metadata schema in OpenAPI
|
|
20
|
+
|
|
21
|
+
### Metadata Update
|
|
22
|
+
- [ ] Implement `PATCH /artifacts/{id}/metadata` endpoint
|
|
23
|
+
- [ ] Allow updating custom metadata without re-uploading file
|
|
24
|
+
- [ ] Store update timestamp and updater info
|
|
25
|
+
- [ ] Maintain update history
|
|
26
|
+
|
|
27
|
+
## P1: Artifact Versioning
|
|
28
|
+
|
|
29
|
+
### Version Storage
|
|
30
|
+
- [ ] Store multiple versions of same artifact
|
|
31
|
+
- [ ] Use version ID (integer or UUID) for each version
|
|
32
|
+
- [ ] Track version creation timestamp and uploader
|
|
33
|
+
- [ ] Keep version metadata (size, hash, compression)
|
|
34
|
+
|
|
35
|
+
### Version Endpoints
|
|
36
|
+
- [ ] Implement `GET /artifacts/{id}/versions` — list all versions
|
|
37
|
+
- [ ] Implement `GET /artifacts/{id}?version={v}` — get specific version
|
|
38
|
+
- [ ] Implement `GET /artifacts/{id}/versions/{v}/metadata` — version metadata
|
|
39
|
+
- [ ] Implement `DELETE /artifacts/{id}/versions/{v}` — delete version (keep data)
|
|
40
|
+
|
|
41
|
+
### CLI Commands
|
|
42
|
+
- [ ] Add `dorky history <artifact_id>` — show version history
|
|
43
|
+
- [ ] Add `dorky show <artifact_id> --version=<v>` — view specific version
|
|
44
|
+
- [ ] Add `dorky checkout <artifact_id> --version=<v>` — download version
|
|
45
|
+
- [ ] Add `dorky info <artifact_id>` — show current + version info
|
|
46
|
+
|
|
47
|
+
## P1: Conflict Detection & Resolution
|
|
48
|
+
|
|
49
|
+
### Race Condition Detection
|
|
50
|
+
- [ ] Detect concurrent uploads of same file
|
|
51
|
+
- [ ] Implement optimistic locking using ETags
|
|
52
|
+
- [ ] Return HTTP 409 Conflict if updated concurrently
|
|
53
|
+
- [ ] Include conflict details in error response
|
|
54
|
+
|
|
55
|
+
### Conflict Resolution Strategies
|
|
56
|
+
- [ ] `--force` flag to overwrite
|
|
57
|
+
- [ ] `--merge` strategy for mergeable files
|
|
58
|
+
- [ ] `--abort` flag to fail on conflicts
|
|
59
|
+
- [ ] Show conflict info (who, when, hash diff)
|
|
60
|
+
|
|
61
|
+
### Merge Conflicts for Structured Data
|
|
62
|
+
- [ ] Detect schema mismatches
|
|
63
|
+
- [ ] Support JSON merge (deep merge)
|
|
64
|
+
- [ ] Support CSV/Parquet schema evolution
|
|
65
|
+
- [ ] Warn on incompatible changes
|
|
66
|
+
|
|
67
|
+
## P2: Time Travel & Diffs
|
|
68
|
+
|
|
69
|
+
### Historical Queries
|
|
70
|
+
- [ ] `dorky log <artifact_id>` — show commit-like history
|
|
71
|
+
- [ ] `dorky diff <id> --version=<v1> --version=<v2>` — compare versions
|
|
72
|
+
- [ ] `dorky blame <artifact_id>` — show line-by-line history (for text)
|
|
73
|
+
|
|
74
|
+
### Diff Implementation
|
|
75
|
+
- [ ] For text files: line-by-line diff (unified format)
|
|
76
|
+
- [ ] For JSON: semantic diff (structure aware)
|
|
77
|
+
- [ ] For Parquet: schema + row count diff
|
|
78
|
+
- [ ] For binary: byte-level diff or hash comparison
|
|
79
|
+
|
|
80
|
+
### Rollback
|
|
81
|
+
- [ ] `dorky rollback <artifact_id> --version=<v>` — restore version
|
|
82
|
+
- [ ] Creates new version (doesn't delete old)
|
|
83
|
+
- [ ] Mark as rollback in metadata/history
|
|
84
|
+
|
|
85
|
+
## P2: Retention & Cleanup
|
|
86
|
+
|
|
87
|
+
### Version Retention Policies
|
|
88
|
+
- [ ] Keep last N versions of each artifact
|
|
89
|
+
- [ ] Auto-delete old versions after TTL (configurable)
|
|
90
|
+
- [ ] Support explicit version deletion
|
|
91
|
+
- [ ] Warn before deleting versions
|
|
92
|
+
|
|
93
|
+
### Garbage Collection
|
|
94
|
+
- [ ] Implement cleanup of orphaned blob data
|
|
95
|
+
- [ ] Run GC on schedule (configurable)
|
|
96
|
+
- [ ] Track storage usage and report
|
|
97
|
+
- [ ] Add `dorky storage stats` command
|
|
98
|
+
|
|
99
|
+
## P3: Audit & Compliance
|
|
100
|
+
|
|
101
|
+
### Audit Log
|
|
102
|
+
- [ ] Log all mutations (upload, update, delete)
|
|
103
|
+
- [ ] Store user, timestamp, action, artifact ID
|
|
104
|
+
- [ ] Implement `GET /audit-log` endpoint (admin only)
|
|
105
|
+
- [ ] Support audit log export
|
|
106
|
+
|
|
107
|
+
### Compliance Features
|
|
108
|
+
- [ ] Lock versions for immutability (compliance)
|
|
109
|
+
- [ ] Add `--immutable` flag to upload
|
|
110
|
+
- [ ] Prevent deletion of locked versions
|
|
111
|
+
- [ ] Generate compliance reports
|
|
112
|
+
|
|
113
|
+
---
|
|
114
|
+
|
|
115
|
+
**Status**: Standard metadata planned, versioning in design, conflict detection pending
|
|
116
|
+
**Next**: Implement standard metadata fields, version storage, conflict detection
|
|
@@ -0,0 +1,130 @@
|
|
|
1
|
+
# Performance & Concurrency
|
|
2
|
+
|
|
3
|
+
Parallel uploads, streaming, incremental updates, and caching for high performance.
|
|
4
|
+
|
|
5
|
+
## P1: Parallel Upload Support
|
|
6
|
+
|
|
7
|
+
### Worker Threads
|
|
8
|
+
- [ ] Use Node `worker_threads` for compression
|
|
9
|
+
- [ ] Spawn `2 * num_cores - 1` worker threads
|
|
10
|
+
- [ ] Queue files/chunks for compression
|
|
11
|
+
- [ ] Collect compressed chunks and upload
|
|
12
|
+
|
|
13
|
+
### Chunked Upload (Multipart)
|
|
14
|
+
- [ ] Split large files into chunks (e.g., 5MB each)
|
|
15
|
+
- [ ] Upload chunks in parallel
|
|
16
|
+
- [ ] Use cloud provider multipart APIs (S3 MultipartUpload, GCS resumable)
|
|
17
|
+
- [ ] Implement retry logic per chunk
|
|
18
|
+
- [ ] Reconstruct file on server side
|
|
19
|
+
|
|
20
|
+
### Progress Reporting
|
|
21
|
+
- [ ] Show upload progress (bytes/total, percentage)
|
|
22
|
+
- [ ] Display ETA based on current speed
|
|
23
|
+
- [ ] Update progress in real-time
|
|
24
|
+
- [ ] Support `--no-progress` flag
|
|
25
|
+
|
|
26
|
+
## P1: Streaming & Chunking Download
|
|
27
|
+
|
|
28
|
+
### Streaming Download
|
|
29
|
+
- [ ] Stream response directly to file (no buffering)
|
|
30
|
+
- [ ] Support range requests (`Range` header)
|
|
31
|
+
- [ ] Allow partial download resume on network error
|
|
32
|
+
- [ ] Show download progress with ETA
|
|
33
|
+
|
|
34
|
+
### Chunk-based Download
|
|
35
|
+
- [ ] For very large files, download in parallel chunks
|
|
36
|
+
- [ ] Use range requests to fetch chunks concurrently
|
|
37
|
+
- [ ] Reconstruct file from chunks in correct order
|
|
38
|
+
- [ ] Verify integrity per chunk
|
|
39
|
+
|
|
40
|
+
### Resume on Failure
|
|
41
|
+
- [ ] Store partially downloaded files
|
|
42
|
+
- [ ] Resume from last byte on retry
|
|
43
|
+
- [ ] Clean up stale partial downloads
|
|
44
|
+
|
|
45
|
+
## P2: Incremental / Delta Updates
|
|
46
|
+
|
|
47
|
+
### File Change Detection
|
|
48
|
+
- [ ] Compute file hash on upload and download
|
|
49
|
+
- [ ] Detect changes by comparing hashes
|
|
50
|
+
- [ ] Only upload changed files in batch operations
|
|
51
|
+
|
|
52
|
+
### Rsync-like Delta Sync
|
|
53
|
+
- [ ] Implement rolling hash algorithm (rsync)
|
|
54
|
+
- [ ] Compute delta between local and remote versions
|
|
55
|
+
- [ ] Upload only differences (not whole file)
|
|
56
|
+
- [ ] Reconstruct file on server side
|
|
57
|
+
- [ ] Add `--incremental` flag to upload
|
|
58
|
+
|
|
59
|
+
### Partial File Updates
|
|
60
|
+
- [ ] Support append-only updates (logs)
|
|
61
|
+
- [ ] Support range writes for binary files
|
|
62
|
+
- [ ] Track which ranges have been written
|
|
63
|
+
- [ ] Merge multiple partial writes
|
|
64
|
+
|
|
65
|
+
## P2: Caching
|
|
66
|
+
|
|
67
|
+
### Local Artifact Cache
|
|
68
|
+
- [ ] Cache downloaded artifacts in `~/.dorky/cache/`
|
|
69
|
+
- [ ] Use artifact ID as cache key
|
|
70
|
+
- [ ] Check cache before downloading
|
|
71
|
+
- [ ] Validate cache using content_hash from metadata
|
|
72
|
+
|
|
73
|
+
### Cache Invalidation
|
|
74
|
+
- [ ] Invalidate cache if remote file changes (hash mismatch)
|
|
75
|
+
- [ ] Support TTL-based expiration (configurable)
|
|
76
|
+
- [ ] Manual cache clear: `dorky cache clear`
|
|
77
|
+
- [ ] Cache statistics: `dorky cache stats`
|
|
78
|
+
|
|
79
|
+
### Metadata Caching
|
|
80
|
+
- [ ] Cache artifact metadata to avoid repeated GET requests
|
|
81
|
+
- [ ] Use `Last-Modified` / `ETag` headers for validation
|
|
82
|
+
- [ ] Respect `Cache-Control` headers from server
|
|
83
|
+
|
|
84
|
+
## P2: Connection Pooling & Optimization
|
|
85
|
+
|
|
86
|
+
### HTTP Session Reuse
|
|
87
|
+
- [ ] Reuse HTTP connections across operations
|
|
88
|
+
- [ ] Implement connection pool with size limit
|
|
89
|
+
- [ ] Keep-alive to reduce connection overhead
|
|
90
|
+
- [ ] Graceful shutdown of idle connections
|
|
91
|
+
|
|
92
|
+
### Request Batching
|
|
93
|
+
- [ ] Batch metadata queries in single request
|
|
94
|
+
- [ ] Use GraphQL-like query language for complex operations
|
|
95
|
+
- [ ] Reduce round-trips for multi-operation workflows
|
|
96
|
+
|
|
97
|
+
## P3: Advanced Performance
|
|
98
|
+
|
|
99
|
+
### Bandwidth Limiting
|
|
100
|
+
- [ ] Add `--bandwidth-limit=<bytes/sec>` flag
|
|
101
|
+
- [ ] Throttle uploads to prevent network saturation
|
|
102
|
+
- [ ] Support upload/download speed limits
|
|
103
|
+
|
|
104
|
+
### Priority Queue
|
|
105
|
+
- [ ] Prioritize recent/important files in batch uploads
|
|
106
|
+
- [ ] Support `--priority=<high|normal|low>` flag
|
|
107
|
+
- [ ] Pre-warm cache with priority files
|
|
108
|
+
|
|
109
|
+
### Load Balancing
|
|
110
|
+
- [ ] Support multiple server replicas
|
|
111
|
+
- [ ] Distribute uploads across replicas
|
|
112
|
+
- [ ] Health check and failover
|
|
113
|
+
|
|
114
|
+
## P3: Monitoring & Metrics
|
|
115
|
+
|
|
116
|
+
### Performance Metrics
|
|
117
|
+
- [ ] Track upload/download speed (bytes/sec)
|
|
118
|
+
- [ ] Record compression ratio and time
|
|
119
|
+
- [ ] Monitor worker thread utilization
|
|
120
|
+
- [ ] Add `--metrics` flag to capture timing data
|
|
121
|
+
|
|
122
|
+
### Diagnostics
|
|
123
|
+
- [ ] Log slow operations (threshold configurable)
|
|
124
|
+
- [ ] Report bottlenecks (network, CPU, disk)
|
|
125
|
+
- [ ] Add `dorky profile <command>` for profiling
|
|
126
|
+
|
|
127
|
+
---
|
|
128
|
+
|
|
129
|
+
**Status**: Streaming partially exists, parallel/chunking/caching pending
|
|
130
|
+
**Next**: Implement multipart/chunked upload, local caching, worker threads for compression
|