pydorky 2.1.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (57) hide show
  1. package/.devcontainer/devcontainer.json +17 -0
  2. package/.github/FUNDING.yml +15 -0
  3. package/.github/workflows/e2e-integration.yml +57 -0
  4. package/.github/workflows/publish.yml +24 -0
  5. package/.nvmrc +1 -0
  6. package/LICENSE +21 -0
  7. package/README.md +156 -0
  8. package/bin/index.js +19 -0
  9. package/bin/legacy.js +432 -0
  10. package/docs/doc#1 get-started/README.md +105 -0
  11. package/docs/doc#2 features-wishlist +33 -0
  12. package/docs/doc#2.5 python-port +31 -0
  13. package/docs/doc#3 the-correct-node-version +107 -0
  14. package/docs/doc#4 why-where-python +42 -0
  15. package/docs/doc#5 how-do-endpoints-cli-work +0 -0
  16. package/dorky-usage-aws.svg +1 -0
  17. package/dorky-usage-google-drive.svg +1 -0
  18. package/google-drive-credentials.json +16 -0
  19. package/openapi/openapi.yaml +257 -0
  20. package/package.json +46 -0
  21. package/python-client/README.md +19 -0
  22. package/python-client/dorky_client/__init__.py +3 -0
  23. package/python-client/dorky_client/client.py +32 -0
  24. package/python-client/pyproject.toml +13 -0
  25. package/python-client/tests/test_integration.py +20 -0
  26. package/rectdorky.png +0 -0
  27. package/server/index.js +193 -0
  28. package/server/package.json +12 -0
  29. package/todo/01-core-infrastructure.md +84 -0
  30. package/todo/02-storage-providers.md +104 -0
  31. package/todo/03-compression-formats.md +94 -0
  32. package/todo/04-python-client.md +126 -0
  33. package/todo/05-metadata-versioning.md +116 -0
  34. package/todo/06-performance-concurrency.md +130 -0
  35. package/todo/07-security-encryption.md +114 -0
  36. package/todo/08-developer-experience.md +175 -0
  37. package/todo/README.md +37 -0
  38. package/web-app/README.md +70 -0
  39. package/web-app/package-lock.json +17915 -0
  40. package/web-app/package.json +43 -0
  41. package/web-app/public/favicon.ico +0 -0
  42. package/web-app/public/index.html +43 -0
  43. package/web-app/public/logo192.png +0 -0
  44. package/web-app/public/logo512.png +0 -0
  45. package/web-app/public/manifest.json +25 -0
  46. package/web-app/public/robots.txt +3 -0
  47. package/web-app/src/App.css +23 -0
  48. package/web-app/src/App.js +84 -0
  49. package/web-app/src/App.test.js +8 -0
  50. package/web-app/src/PrivacyPolicy.js +26 -0
  51. package/web-app/src/TermsAndConditions.js +41 -0
  52. package/web-app/src/index.css +3 -0
  53. package/web-app/src/index.js +26 -0
  54. package/web-app/src/logo.svg +1 -0
  55. package/web-app/src/reportWebVitals.js +13 -0
  56. package/web-app/src/setupTests.js +5 -0
  57. package/web-app/tailwind.config.js +10 -0
@@ -0,0 +1,84 @@
1
+ # Core Infrastructure
2
+
3
+ Core HTTP service, CLI foundation, and Node version alignment.
4
+
5
+ ## P0: HTTP Service Foundation
6
+
7
+ ### Upload & Download
8
+ - [ ] Finalize OpenAPI spec (multipart form, JSON responses, error codes)
9
+ - [ ] Implement `POST /artifacts` endpoint with streaming support
10
+ - [ ] Implement `GET /artifacts/{id}` endpoint with streaming download
11
+ - [ ] Add HTTP 201 Created response for uploads
12
+ - [ ] Add HTTP 200 OK for downloads
13
+ - [ ] Handle large file uploads (>100MB) efficiently
14
+
15
+ ### Metadata Operations
16
+ - [ ] Implement `GET /artifacts/{id}/metadata` endpoint
17
+ - [ ] Implement `PATCH /artifacts/{id}/metadata` endpoint
18
+ - [ ] Support custom key-value metadata
19
+ - [ ] Store `content_hash`, `file_size`, `last_updated`, `uploaded_by`
20
+
21
+ ### Idempotency
22
+ - [ ] Add idempotency key support in upload endpoint
23
+ - [ ] Store idempotency key → artifact ID mapping
24
+ - [ ] Return cached result if same key uploaded twice
25
+ - [ ] Auto-clean old idempotency entries (TTL)
26
+
27
+ ### Health & Status
28
+ - [ ] Add `GET /health` endpoint
29
+ - [ ] Add `GET /status` with service diagnostics
30
+ - [ ] Return storage backend status
31
+
32
+ ## P0: Node Version & CI
33
+
34
+ ### Node Version Management
35
+ - [ ] Add `.nvmrc` file with `16` in repo root
36
+ - [ ] Add `engines.node` to `package.json` (`>=16`)
37
+ - [ ] Update devcontainer image to Node 16 base
38
+ - [ ] Document Node version requirement in `README.md` and `docs/doc#1 get-started`
39
+
40
+ ### CI/CD Alignment
41
+ - [ ] Ensure publish workflow runs on Node 16
42
+ - [ ] Add CI matrix job for Node 16 + Node 20 testing
43
+ - [ ] Run integration tests on both versions
44
+ - [ ] Plan migration steps to Node 20 (documented)
45
+
46
+ ## P1: CLI Foundation
47
+
48
+ ### Commit/Stage/Push Workflow
49
+ - [ ] Implement `dorky init` — scaffold project config
50
+ - [ ] Implement `dorky status` — show staged/changed files
51
+ - [ ] Implement `dorky stage <path>` — add file to staging
52
+ - [ ] Implement `dorky unstage <path>` — remove from staging
53
+ - [ ] Implement `dorky commit <message>` — create commit with staged files
54
+ - [ ] Implement `dorky push` — upload commits to remote bucket
55
+ - [ ] Implement `dorky pull` — download latest from bucket
56
+
57
+ ### Configuration
58
+ - [ ] Add `.dorkyrc` project config file support
59
+ - [ ] Support environment variables for bucket URL, credentials
60
+ - [ ] Store local state in `.dorky/` directory
61
+
62
+ ### Error Handling
63
+ - [ ] Add `--verbose` flag for debugging
64
+ - [ ] Add `--help` for all commands
65
+ - [ ] Return clear error messages with actionable hints
66
+
67
+ ## P2: Testing & Documentation
68
+
69
+ ### Testing
70
+ - [ ] Add unit tests for upload/download logic
71
+ - [ ] Add integration tests with local server
72
+ - [ ] Add E2E tests for CLI workflow
73
+ - [ ] Test with various file sizes (small, medium, large >1GB)
74
+
75
+ ### Documentation
76
+ - [ ] Document OpenAPI spec with examples
77
+ - [ ] Add CLI command reference to README
78
+ - [ ] Add architecture diagram
79
+ - [ ] Add troubleshooting guide
80
+
81
+ ---
82
+
83
+ **Status**: Core service + CLI in dev, Node alignment pending
84
+ **Next**: Finalize OpenAPI spec, implement upload/download endpoints
@@ -0,0 +1,104 @@
1
+ # Storage Providers (BYOB)
2
+
3
+ Bring-your-own-bucket support for AWS S3, Google Cloud Storage, Azure, and local backends.
4
+
5
+ ## P0: Existing Providers
6
+
7
+ ### AWS S3
8
+ - [x] Basic S3 client integration exists
9
+ - [ ] Verify streaming upload works (multipart)
10
+ - [ ] Verify streaming download works
11
+ - [ ] Add region configuration support
12
+ - [ ] Test with various file sizes
13
+ - [ ] Add S3 access error handling
14
+
15
+ ### Google Cloud Storage (GCS)
16
+ - [x] Basic GCS integration exists
17
+ - [ ] Verify streaming upload works
18
+ - [ ] Verify streaming download works
19
+ - [ ] Add project ID configuration
20
+ - [ ] Test bucket access and permissions
21
+ - [ ] Add GCS-specific error handling
22
+
23
+ ### Google Drive
24
+ - [x] Integration exists (documented in docs)
25
+ - [ ] Test with OAuth 2.0 flow
26
+ - [ ] Verify file sync and versioning
27
+
28
+ ## P1: New Providers
29
+
30
+ ### Azure Blob Storage
31
+ - [ ] Implement Azure Blob Storage client
32
+ - [ ] Add connection string support
33
+ - [ ] Implement streaming upload (block blobs)
34
+ - [ ] Implement streaming download
35
+ - [ ] Add container creation support
36
+ - [ ] Test with various blob sizes
37
+
38
+ ### MinIO / S3-Compatible
39
+ - [ ] Implement S3-compatible endpoint support
40
+ - [ ] Allow custom endpoint URL configuration
41
+ - [ ] Test with MinIO local instance
42
+ - [ ] Test with other S3-compatible services (DigitalOcean Spaces, etc.)
43
+
44
+ ### Local Filesystem (Dev/Testing)
45
+ - [ ] Implement local filesystem backend
46
+ - [ ] Use `.dorky-data/` directory
47
+ - [ ] Support all operations (upload, download, metadata, delete)
48
+ - [ ] Add for easy local testing without cloud credentials
49
+
50
+ ## P2: Bucket Configuration & Management
51
+
52
+ ### Bucket Provisioning
53
+ - [ ] Implement `dorky bucket create` command
54
+ - [ ] Auto-detect cloud provider (S3, GCS, Azure)
55
+ - [ ] Create bucket with proper permissions
56
+ - [ ] Store credentials securely
57
+
58
+ ### Bucket Operations
59
+ - [ ] Implement `dorky bucket list` — list all configured buckets
60
+ - [ ] Implement `dorky bucket delete` — remove bucket reference
61
+ - [ ] Implement `dorky bucket validate` — test connection and permissions
62
+ - [ ] Add bucket selection for multi-bucket projects
63
+
64
+ ### Credentials Management
65
+ - [ ] Support AWS IAM credentials (env vars or ~/.aws/credentials)
66
+ - [ ] Support GCS service account keys
67
+ - [ ] Support Azure connection strings
68
+ - [ ] Add `dorky config credentials` command
69
+
70
+ ## P2: Hierarchical Sync
71
+
72
+ ### Folder Structure
73
+ - [ ] Create hierarchical folder structure on bucket (e.g., `projects/myproject/data/`)
74
+ - [ ] Map local file structure to bucket paths
75
+ - [ ] Support nested directories at arbitrary depth
76
+
77
+ ### Sync Operations
78
+ - [ ] Implement `dorky sync` — two-way sync of local/remote
79
+ - [ ] Implement `dorky upload-folder <path>` — upload entire directory
80
+ - [ ] Implement `dorky download-folder <path>` — download entire directory
81
+ - [ ] Support `.dorkyignore` patterns (like `.gitignore`)
82
+
83
+ ### Incremental Sync
84
+ - [ ] Track synced files and timestamps
85
+ - [ ] Only sync changed files on subsequent runs
86
+ - [ ] Add `--force` flag to re-sync everything
87
+ - [ ] Detect conflicts and prompt user
88
+
89
+ ## P3: Advanced Features
90
+
91
+ ### Multi-Cloud
92
+ - [ ] Support multiple buckets across different cloud providers
93
+ - [ ] Implement replication across providers
94
+ - [ ] Add failover logic
95
+
96
+ ### Performance
97
+ - [ ] Add concurrent uploads to multiple buckets (backup)
98
+ - [ ] Add bandwidth limiting for uploads/downloads
99
+ - [ ] Cache bucket listings for faster operations
100
+
101
+ ---
102
+
103
+ **Status**: S3, GCS, Google Drive exist; Azure/MinIO/local pending
104
+ **Next**: Test existing providers, implement Azure + local filesystem
@@ -0,0 +1,94 @@
1
+ # Compression & Data Formats
2
+
3
+ Compress artifacts during upload and support data format conversions (CSV to Parquet, etc.).
4
+
5
+ ## P1: Core Compression
6
+
7
+ ### Gzip
8
+ - [ ] Implement gzip compression in upload flow
9
+ - [ ] Store `Content-Encoding: gzip` in metadata
10
+ - [ ] Auto-decompress on download
11
+ - [ ] Add `--compress=gzip` CLI flag
12
+ - [ ] Add compression level configuration (1-9)
13
+
14
+ ### LZ4 (Fast Compression)
15
+ - [ ] Add LZ4 compression support (faster than gzip)
16
+ - [ ] Store compression type in metadata
17
+ - [ ] Add `--compress=lz4` CLI flag
18
+ - [ ] Test performance vs gzip
19
+
20
+ ### Brotli
21
+ - [ ] Add Brotli compression support (better ratio)
22
+ - [ ] Add `--compress=brotli` CLI flag
23
+ - [ ] Test performance vs gzip/lz4
24
+
25
+ ### Decompression on Download
26
+ - [ ] Auto-detect compression from metadata
27
+ - [ ] Decompress transparently during download
28
+ - [ ] Support partial decompression (streaming)
29
+ - [ ] Add `--no-decompress` flag to keep compressed
30
+
31
+ ## P1: Compression Configuration
32
+
33
+ ### Project-Level Config
34
+ - [ ] Add compression preference in `.dorkyrc`
35
+ - [ ] Support per-file pattern compression (e.g., `*.json` → gzip)
36
+ - [ ] Add `dorky config compression` command
37
+
38
+ ### Metadata Tracking
39
+ - [ ] Store `compression_type` in metadata
40
+ - [ ] Store `compressed_size` alongside `file_size`
41
+ - [ ] Store `compression_ratio` for reporting
42
+ - [ ] Add `--show-compression` flag to display stats
43
+
44
+ ## P2: Data Format Conversions
45
+
46
+ ### CSV to Parquet
47
+ - [ ] Create Python worker service for conversions
48
+ - [ ] Implement CSV → Parquet conversion
49
+ - [ ] Infer schema from CSV headers
50
+ - [ ] Support schema config file (JSON)
51
+ - [ ] Add `dorky convert csv-to-parquet <input> <output>` command
52
+
53
+ ### JSON Compression
54
+ - [ ] Minify JSON before compression
55
+ - [ ] Remove whitespace and comments
56
+ - [ ] Add `--minify-json` flag
57
+ - [ ] Store original schema separately for reconstruction
58
+
59
+ ### Format Detection
60
+ - [ ] Auto-detect file type on upload (.json, .csv, .parquet, etc.)
61
+ - [ ] Suggest optimal compression based on file type
62
+ - [ ] Add `--format` hint flag for ambiguous files
63
+
64
+ ## P2: Data Quality & Validation
65
+
66
+ ### Integrity Checking
67
+ - [ ] Store SHA256 hash of original file in metadata
68
+ - [ ] Verify hash on download before decompression
69
+ - [ ] Add `--verify` flag for integrity check
70
+ - [ ] Report any corrupted files
71
+
72
+ ### Schema Validation
73
+ - [ ] Optional schema validation for structured formats
74
+ - [ ] Store schema in `.dorkyschema` or metadata
75
+ - [ ] Validate before uploading
76
+ - [ ] Warn on schema changes
77
+
78
+ ## P3: Advanced Conversions
79
+
80
+ ### Other Formats
81
+ - [ ] Parquet → Arrow IPC
82
+ - [ ] JSON → MessagePack
83
+ - [ ] Binary format conversions
84
+ - [ ] Add `dorky convert list` to show available conversions
85
+
86
+ ### Streaming Conversions
87
+ - [ ] Stream large CSV → Parquet without buffering
88
+ - [ ] Support incremental conversions
89
+ - [ ] Allow on-demand conversion (download → convert → return)
90
+
91
+ ---
92
+
93
+ **Status**: Compression infrastructure pending, format conversions planned
94
+ **Next**: Implement gzip/lz4, add compression config, create Python conversion service
@@ -0,0 +1,126 @@
1
+ # Python Client (PyPI)
2
+
3
+ Lightweight Python package on PyPI that mirrors Node client feature set.
4
+
5
+ ## P0: Core Blocking Client
6
+
7
+ ### Basic Operations
8
+ - [x] Scaffold `python-client/` with `pyproject.toml`
9
+ - [x] Implement `DorkyClient.upload()` (blocking)
10
+ - [x] Implement `DorkyClient.download()` (blocking)
11
+ - [ ] Add `DorkyClient.get_metadata()` method
12
+ - [ ] Add `DorkyClient.delete()` method
13
+ - [ ] Verify streaming for large files (no buffering)
14
+
15
+ ### Idempotency & Conflict Handling
16
+ - [ ] Add idempotency key support in `upload()`
17
+ - [ ] Handle 409 Conflict responses
18
+ - [ ] Add retry logic with exponential backoff
19
+ - [ ] Support `--force` option to overwrite
20
+
21
+ ### Error Handling
22
+ - [ ] Catch and wrap HTTP errors
23
+ - [ ] Provide clear error messages
24
+ - [ ] Add retry-after handling (429 Rate Limited)
25
+ - [ ] Add timeout configuration
26
+
27
+ ## P1: Async Client
28
+
29
+ ### Async Implementation
30
+ - [ ] Create `AsyncDorkyClient` using `httpx`
31
+ - [ ] Mirror all sync methods in async version
32
+ - [ ] Add async `upload()`, `download()`, `get_metadata()`, `delete()`
33
+ - [ ] Add connection pooling and session reuse
34
+ - [ ] Support async context manager (`async with`)
35
+
36
+ ### Testing
37
+ - [ ] Add integration tests for async client
38
+ - [ ] Test concurrent uploads/downloads
39
+ - [ ] Verify no blocking calls in async code
40
+
41
+ ## P1: CLI Wrapper
42
+
43
+ ### CLI Commands
44
+ - [ ] Add `dorky-py` or `dorky` CLI entry point
45
+ - [ ] Implement `dorky-py commit <message>` command
46
+ - [ ] Implement `dorky-py stage <path>` command
47
+ - [ ] Implement `dorky-py push` command
48
+ - [ ] Implement `dorky-py pull` command
49
+ - [ ] Implement `dorky-py download <artifact_id> <dest>` command
50
+ - [ ] Implement `dorky-py status` command
51
+
52
+ ### Configuration
53
+ - [ ] Read from `.dorkyrc` in project root
54
+ - [ ] Support environment variables (DORKY_URL, DORKY_BUCKET, etc.)
55
+ - [ ] Add `dorky-py config` command to view/edit settings
56
+ - [ ] Support `~/.dorky/config` for user defaults
57
+
58
+ ### Progress & Output
59
+ - [ ] Show progress bar for uploads/downloads
60
+ - [ ] Add `--quiet` / `-q` flag
61
+ - [ ] Add `--verbose` / `-v` flag for debugging
62
+ - [ ] Pretty-print metadata and file listings
63
+
64
+ ## P1: PyPI Publishing
65
+
66
+ ### Package Setup
67
+ - [ ] Finalize `pyproject.toml` with metadata
68
+ - [ ] Add PyPI classifiers (Python 3.7+)
69
+ - [ ] Set up build script (uses setuptools/wheel)
70
+ - [ ] Test local install: `pip install -e .`
71
+
72
+ ### Publishing Workflow
73
+ - [ ] Create TestPyPI account
74
+ - [ ] Publish to TestPyPI first
75
+ - [ ] Test install from TestPyPI: `pip install --index-url https://test.pypi.org/simple/ dorky-client`
76
+ - [ ] Add GitHub Actions workflow for PyPI publish
77
+ - [ ] Publish to PyPI (production)
78
+
79
+ ### Version Management
80
+ - [ ] Follow semantic versioning (MAJOR.MINOR.PATCH)
81
+ - [ ] Update version in `pyproject.toml` for each release
82
+ - [ ] Tag releases in git (e.g., `py-v0.1.0`)
83
+ - [ ] Auto-generate changelog from commits
84
+
85
+ ## P2: Compression & Format Support
86
+
87
+ ### Compression
88
+ - [ ] Add `upload(compress='gzip')` parameter
89
+ - [ ] Support `compress='lz4'`, `compress='brotli'`
90
+ - [ ] Auto-decompress downloads based on metadata
91
+ - [ ] Add `--compress` flag to CLI
92
+
93
+ ### Data Formats
94
+ - [ ] Add optional `pyarrow` dependency for Parquet
95
+ - [ ] Support `upload(..., format='parquet')`
96
+ - [ ] Add format conversion helpers
97
+ - [ ] Make heavy dependencies optional (`dorky[parquet]`, `dorky[async]`)
98
+
99
+ ## P2: Optional Extras
100
+
101
+ ### Package Extras
102
+ - [ ] `dorky[parquet]` — includes `pyarrow` for Parquet support
103
+ - [ ] `dorky[async]` — includes `httpx` for async client
104
+ - [ ] `dorky[all]` — includes all optional dependencies
105
+ - [ ] Document in README
106
+
107
+ ### Security Extras
108
+ - [ ] `dorky[encrypt]` — includes `cryptography` for encryption
109
+ - [ ] Support encrypted upload/download with client-side keys
110
+
111
+ ## P3: Advanced Features
112
+
113
+ ### Batch Operations
114
+ - [ ] Implement `upload_batch(files: List[str])` for multiple files
115
+ - [ ] Implement `download_batch(ids: List[str], dest_dir: str)`
116
+ - [ ] Support concurrent batch operations with asyncio
117
+
118
+ ### Streaming & Chunking
119
+ - [ ] Expose raw streaming download (generator/iterator)
120
+ - [ ] Support range requests for partial downloads
121
+ - [ ] Add chunk size configuration
122
+
123
+ ---
124
+
125
+ **Status**: Core client exists, async/CLI/publishing pending
126
+ **Next**: Add async client, CLI commands, publish to TestPyPI
@@ -0,0 +1,116 @@
1
+ # Metadata & Versioning
2
+
3
+ Artifact versioning, conflict detection, and complete change history.
4
+
5
+ ## P1: Enhanced Metadata
6
+
7
+ ### Standard Metadata Fields
8
+ - [ ] Implement `last_updated` timestamp (UTC ISO 8601)
9
+ - [ ] Implement `content_hash` (SHA256 hex string)
10
+ - [ ] Implement `uploaded_by` (user/email from config)
11
+ - [ ] Implement `file_size` (bytes)
12
+ - [ ] Implement `compressed_size` (bytes, if compressed)
13
+ - [ ] Implement `compression_type` (gzip, lz4, brotli, or null)
14
+
15
+ ### Custom Metadata
16
+ - [ ] Support user-defined key-value metadata
17
+ - [ ] Store custom metadata in artifact metadata section
18
+ - [ ] Merge custom + standard metadata in response
19
+ - [ ] Document metadata schema in OpenAPI
20
+
21
+ ### Metadata Update
22
+ - [ ] Implement `PATCH /artifacts/{id}/metadata` endpoint
23
+ - [ ] Allow updating custom metadata without re-uploading file
24
+ - [ ] Store update timestamp and updater info
25
+ - [ ] Maintain update history
26
+
27
+ ## P1: Artifact Versioning
28
+
29
+ ### Version Storage
30
+ - [ ] Store multiple versions of same artifact
31
+ - [ ] Use version ID (integer or UUID) for each version
32
+ - [ ] Track version creation timestamp and uploader
33
+ - [ ] Keep version metadata (size, hash, compression)
34
+
35
+ ### Version Endpoints
36
+ - [ ] Implement `GET /artifacts/{id}/versions` — list all versions
37
+ - [ ] Implement `GET /artifacts/{id}?version={v}` — get specific version
38
+ - [ ] Implement `GET /artifacts/{id}/versions/{v}/metadata` — version metadata
39
+ - [ ] Implement `DELETE /artifacts/{id}/versions/{v}` — delete version (keep data)
40
+
41
+ ### CLI Commands
42
+ - [ ] Add `dorky history <artifact_id>` — show version history
43
+ - [ ] Add `dorky show <artifact_id> --version=<v>` — view specific version
44
+ - [ ] Add `dorky checkout <artifact_id> --version=<v>` — download version
45
+ - [ ] Add `dorky info <artifact_id>` — show current + version info
46
+
47
+ ## P1: Conflict Detection & Resolution
48
+
49
+ ### Race Condition Detection
50
+ - [ ] Detect concurrent uploads of same file
51
+ - [ ] Implement optimistic locking using ETags
52
+ - [ ] Return HTTP 409 Conflict if updated concurrently
53
+ - [ ] Include conflict details in error response
54
+
55
+ ### Conflict Resolution Strategies
56
+ - [ ] `--force` flag to overwrite
57
+ - [ ] `--merge` strategy for mergeable files
58
+ - [ ] `--abort` flag to fail on conflicts
59
+ - [ ] Show conflict info (who, when, hash diff)
60
+
61
+ ### Merge Conflicts for Structured Data
62
+ - [ ] Detect schema mismatches
63
+ - [ ] Support JSON merge (deep merge)
64
+ - [ ] Support CSV/Parquet schema evolution
65
+ - [ ] Warn on incompatible changes
66
+
67
+ ## P2: Time Travel & Diffs
68
+
69
+ ### Historical Queries
70
+ - [ ] `dorky log <artifact_id>` — show commit-like history
71
+ - [ ] `dorky diff <id> --version=<v1> --version=<v2>` — compare versions
72
+ - [ ] `dorky blame <artifact_id>` — show line-by-line history (for text)
73
+
74
+ ### Diff Implementation
75
+ - [ ] For text files: line-by-line diff (unified format)
76
+ - [ ] For JSON: semantic diff (structure aware)
77
+ - [ ] For Parquet: schema + row count diff
78
+ - [ ] For binary: byte-level diff or hash comparison
79
+
80
+ ### Rollback
81
+ - [ ] `dorky rollback <artifact_id> --version=<v>` — restore version
82
+ - [ ] Creates new version (doesn't delete old)
83
+ - [ ] Mark as rollback in metadata/history
84
+
85
+ ## P2: Retention & Cleanup
86
+
87
+ ### Version Retention Policies
88
+ - [ ] Keep last N versions of each artifact
89
+ - [ ] Auto-delete old versions after TTL (configurable)
90
+ - [ ] Support explicit version deletion
91
+ - [ ] Warn before deleting versions
92
+
93
+ ### Garbage Collection
94
+ - [ ] Implement cleanup of orphaned blob data
95
+ - [ ] Run GC on schedule (configurable)
96
+ - [ ] Track storage usage and report
97
+ - [ ] Add `dorky storage stats` command
98
+
99
+ ## P3: Audit & Compliance
100
+
101
+ ### Audit Log
102
+ - [ ] Log all mutations (upload, update, delete)
103
+ - [ ] Store user, timestamp, action, artifact ID
104
+ - [ ] Implement `GET /audit-log` endpoint (admin only)
105
+ - [ ] Support audit log export
106
+
107
+ ### Compliance Features
108
+ - [ ] Lock versions for immutability (compliance)
109
+ - [ ] Add `--immutable` flag to upload
110
+ - [ ] Prevent deletion of locked versions
111
+ - [ ] Generate compliance reports
112
+
113
+ ---
114
+
115
+ **Status**: Standard metadata planned, versioning in design, conflict detection pending
116
+ **Next**: Implement standard metadata fields, version storage, conflict detection
@@ -0,0 +1,130 @@
1
+ # Performance & Concurrency
2
+
3
+ Parallel uploads, streaming, incremental updates, and caching for high performance.
4
+
5
+ ## P1: Parallel Upload Support
6
+
7
+ ### Worker Threads
8
+ - [ ] Use Node `worker_threads` for compression
9
+ - [ ] Spawn `2 * num_cores - 1` worker threads
10
+ - [ ] Queue files/chunks for compression
11
+ - [ ] Collect compressed chunks and upload
12
+
13
+ ### Chunked Upload (Multipart)
14
+ - [ ] Split large files into chunks (e.g., 5MB each)
15
+ - [ ] Upload chunks in parallel
16
+ - [ ] Use cloud provider multipart APIs (S3 MultipartUpload, GCS resumable)
17
+ - [ ] Implement retry logic per chunk
18
+ - [ ] Reconstruct file on server side
19
+
20
+ ### Progress Reporting
21
+ - [ ] Show upload progress (bytes/total, percentage)
22
+ - [ ] Display ETA based on current speed
23
+ - [ ] Update progress in real-time
24
+ - [ ] Support `--no-progress` flag
25
+
26
+ ## P1: Streaming & Chunking Download
27
+
28
+ ### Streaming Download
29
+ - [ ] Stream response directly to file (no buffering)
30
+ - [ ] Support range requests (`Range` header)
31
+ - [ ] Allow partial download resume on network error
32
+ - [ ] Show download progress with ETA
33
+
34
+ ### Chunk-based Download
35
+ - [ ] For very large files, download in parallel chunks
36
+ - [ ] Use range requests to fetch chunks concurrently
37
+ - [ ] Reconstruct file from chunks in correct order
38
+ - [ ] Verify integrity per chunk
39
+
40
+ ### Resume on Failure
41
+ - [ ] Store partially downloaded files
42
+ - [ ] Resume from last byte on retry
43
+ - [ ] Clean up stale partial downloads
44
+
45
+ ## P2: Incremental / Delta Updates
46
+
47
+ ### File Change Detection
48
+ - [ ] Compute file hash on upload and download
49
+ - [ ] Detect changes by comparing hashes
50
+ - [ ] Only upload changed files in batch operations
51
+
52
+ ### Rsync-like Delta Sync
53
+ - [ ] Implement rolling hash algorithm (rsync)
54
+ - [ ] Compute delta between local and remote versions
55
+ - [ ] Upload only differences (not whole file)
56
+ - [ ] Reconstruct file on server side
57
+ - [ ] Add `--incremental` flag to upload
58
+
59
+ ### Partial File Updates
60
+ - [ ] Support append-only updates (logs)
61
+ - [ ] Support range writes for binary files
62
+ - [ ] Track which ranges have been written
63
+ - [ ] Merge multiple partial writes
64
+
65
+ ## P2: Caching
66
+
67
+ ### Local Artifact Cache
68
+ - [ ] Cache downloaded artifacts in `~/.dorky/cache/`
69
+ - [ ] Use artifact ID as cache key
70
+ - [ ] Check cache before downloading
71
+ - [ ] Validate cache using content_hash from metadata
72
+
73
+ ### Cache Invalidation
74
+ - [ ] Invalidate cache if remote file changes (hash mismatch)
75
+ - [ ] Support TTL-based expiration (configurable)
76
+ - [ ] Manual cache clear: `dorky cache clear`
77
+ - [ ] Cache statistics: `dorky cache stats`
78
+
79
+ ### Metadata Caching
80
+ - [ ] Cache artifact metadata to avoid repeated GET requests
81
+ - [ ] Use `Last-Modified` / `ETag` headers for validation
82
+ - [ ] Respect `Cache-Control` headers from server
83
+
84
+ ## P2: Connection Pooling & Optimization
85
+
86
+ ### HTTP Session Reuse
87
+ - [ ] Reuse HTTP connections across operations
88
+ - [ ] Implement connection pool with size limit
89
+ - [ ] Keep-alive to reduce connection overhead
90
+ - [ ] Graceful shutdown of idle connections
91
+
92
+ ### Request Batching
93
+ - [ ] Batch metadata queries in single request
94
+ - [ ] Use GraphQL-like query language for complex operations
95
+ - [ ] Reduce round-trips for multi-operation workflows
96
+
97
+ ## P3: Advanced Performance
98
+
99
+ ### Bandwidth Limiting
100
+ - [ ] Add `--bandwidth-limit=<bytes/sec>` flag
101
+ - [ ] Throttle uploads to prevent network saturation
102
+ - [ ] Support upload/download speed limits
103
+
104
+ ### Priority Queue
105
+ - [ ] Prioritize recent/important files in batch uploads
106
+ - [ ] Support `--priority=<high|normal|low>` flag
107
+ - [ ] Pre-warm cache with priority files
108
+
109
+ ### Load Balancing
110
+ - [ ] Support multiple server replicas
111
+ - [ ] Distribute uploads across replicas
112
+ - [ ] Health check and failover
113
+
114
+ ## P3: Monitoring & Metrics
115
+
116
+ ### Performance Metrics
117
+ - [ ] Track upload/download speed (bytes/sec)
118
+ - [ ] Record compression ratio and time
119
+ - [ ] Monitor worker thread utilization
120
+ - [ ] Add `--metrics` flag to capture timing data
121
+
122
+ ### Diagnostics
123
+ - [ ] Log slow operations (threshold configurable)
124
+ - [ ] Report bottlenecks (network, CPU, disk)
125
+ - [ ] Add `dorky profile <command>` for profiling
126
+
127
+ ---
128
+
129
+ **Status**: Streaming partially exists, parallel/chunking/caching pending
130
+ **Next**: Implement multipart/chunked upload, local caching, worker threads for compression