pydorky 2.1.8
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.devcontainer/devcontainer.json +17 -0
- package/.github/FUNDING.yml +15 -0
- package/.github/workflows/e2e-integration.yml +57 -0
- package/.github/workflows/publish.yml +24 -0
- package/.nvmrc +1 -0
- package/LICENSE +21 -0
- package/README.md +156 -0
- package/bin/index.js +19 -0
- package/bin/legacy.js +432 -0
- package/docs/doc#1 get-started/README.md +105 -0
- package/docs/doc#2 features-wishlist +33 -0
- package/docs/doc#2.5 python-port +31 -0
- package/docs/doc#3 the-correct-node-version +107 -0
- package/docs/doc#4 why-where-python +42 -0
- package/docs/doc#5 how-do-endpoints-cli-work +0 -0
- package/dorky-usage-aws.svg +1 -0
- package/dorky-usage-google-drive.svg +1 -0
- package/google-drive-credentials.json +16 -0
- package/openapi/openapi.yaml +257 -0
- package/package.json +46 -0
- package/python-client/README.md +19 -0
- package/python-client/dorky_client/__init__.py +3 -0
- package/python-client/dorky_client/client.py +32 -0
- package/python-client/pyproject.toml +13 -0
- package/python-client/tests/test_integration.py +20 -0
- package/rectdorky.png +0 -0
- package/server/index.js +193 -0
- package/server/package.json +12 -0
- package/todo/01-core-infrastructure.md +84 -0
- package/todo/02-storage-providers.md +104 -0
- package/todo/03-compression-formats.md +94 -0
- package/todo/04-python-client.md +126 -0
- package/todo/05-metadata-versioning.md +116 -0
- package/todo/06-performance-concurrency.md +130 -0
- package/todo/07-security-encryption.md +114 -0
- package/todo/08-developer-experience.md +175 -0
- package/todo/README.md +37 -0
- package/web-app/README.md +70 -0
- package/web-app/package-lock.json +17915 -0
- package/web-app/package.json +43 -0
- package/web-app/public/favicon.ico +0 -0
- package/web-app/public/index.html +43 -0
- package/web-app/public/logo192.png +0 -0
- package/web-app/public/logo512.png +0 -0
- package/web-app/public/manifest.json +25 -0
- package/web-app/public/robots.txt +3 -0
- package/web-app/src/App.css +23 -0
- package/web-app/src/App.js +84 -0
- package/web-app/src/App.test.js +8 -0
- package/web-app/src/PrivacyPolicy.js +26 -0
- package/web-app/src/TermsAndConditions.js +41 -0
- package/web-app/src/index.css +3 -0
- package/web-app/src/index.js +26 -0
- package/web-app/src/logo.svg +1 -0
- package/web-app/src/reportWebVitals.js +13 -0
- package/web-app/src/setupTests.js +5 -0
- package/web-app/tailwind.config.js +10 -0
|
@@ -0,0 +1,107 @@
|
|
|
1
|
+
|
|
2
|
+
Node version: current decision and migration plan
|
|
3
|
+
-------------------------------------------------
|
|
4
|
+
|
|
5
|
+
Canonical choice (short)
|
|
6
|
+
- Target Node version now: **Node 16.x** (maximum current compatibility for consumers and CI). Plan to migrate to **Node 20.x** in a future scheduled upgrade.
|
|
7
|
+
|
|
8
|
+
Why Node 16 now
|
|
9
|
+
- Broadest compatibility: many users and older hosting environments still run Node 16.
|
|
10
|
+
- CI already uses `node-version: 16.x` in the publish workflow; keeping Node 16 avoids immediate CI churn.
|
|
11
|
+
- Minimizes breaking changes for native modules and downstream consumers.
|
|
12
|
+
|
|
13
|
+
Why migrate to Node 20 later
|
|
14
|
+
- Better performance, updated V8, improved diagnostics, and a longer support window.
|
|
15
|
+
- Native `fetch` and runtime improvements simplify clients and reduce polyfills.
|
|
16
|
+
|
|
17
|
+
What we will change/track now
|
|
18
|
+
- Add a source-of-truth file: create `.nvmrc` set to `16` and add `engines.node` to `package.json` to declare minimum required Node.
|
|
19
|
+
- Keep CI at `16.x` for now; add an optional migration branch that tests `20.x` in a short-lived matrix.
|
|
20
|
+
- Update `README.md` and `docs/doc#1 get-started` with the chosen Node version and `nvm` instructions.
|
|
21
|
+
- When ready to migrate to Node 20:
|
|
22
|
+
1. Update devcontainer image to a Node 20 base.
|
|
23
|
+
2. Run `npm ci` and full test suite under Node 20; regenerate `package-lock.json` if necessary.
|
|
24
|
+
3. Run a CI matrix for `16.x` and `20.x` for a short stabilization period, then switch default to `20.x`.
|
|
25
|
+
|
|
26
|
+
Checklist (short)
|
|
27
|
+
- [ ] Add `.nvmrc` with `16`
|
|
28
|
+
- [ ] Add `engines.node` to `package.json` (`>=16`)
|
|
29
|
+
- [ ] Update devcontainer image to match when migrating
|
|
30
|
+
- [ ] Add CI matrix job for Node 20 during migration window
|
|
31
|
+
|
|
32
|
+
|
|
33
|
+
Full discussion and rationale (expanded)
|
|
34
|
+
---------------------------------------
|
|
35
|
+
|
|
36
|
+
Summary of alternatives we considered
|
|
37
|
+
- Node 16 (current baseline): oldest LTS in our compatibility target set, broadest runtime compatibility for users and CI. Fewer surprises with native modules and older hosting environments.
|
|
38
|
+
- Node 18: adds built-in Web APIs (`fetch`, `Blob`, `FormData`, Web Streams) and more modern stdlib conveniences. Lower upgrade surface than jumping directly to Node 20, but we decided to keep compatibility at Node 16 for now.
|
|
39
|
+
- Node 20: recommended future target — improved V8, better performance for crypto/hashing and IO, improved diagnostics, and longer LTS window. Offers native `fetch` and core runtime improvements.
|
|
40
|
+
- Node 22: longest-term future-proofing with the newest V8 and JS features, but may introduce early compatibility friction with some ecosystem packages.
|
|
41
|
+
|
|
42
|
+
Why we picked Node 16 now (detailed)
|
|
43
|
+
- CI and many users still rely on Node 16 LTS; keeping Node 16 minimizes immediate friction for publishing and consumer installs.
|
|
44
|
+
- Native modules (e.g., binary bindings used by compression or Parquet-related packages) are more likely to remain compatible on Node 16 without rebuilds or pinned binaries.
|
|
45
|
+
- Our development and publishing pipelines are minimal today; a conservative choice reduces risk while we implement architecture changes (HTTP service + clients).
|
|
46
|
+
|
|
47
|
+
Why migrate later to Node 20
|
|
48
|
+
- Node 20 gives us measurable runtime and crypto performance improvements which matter for artifact hashing, streaming, and compression workloads.
|
|
49
|
+
- Later migration lets us focus first on the architecture (language-agnostic HTTP service + thin clients) and test compatibility in CI before committing to a new runtime.
|
|
50
|
+
|
|
51
|
+
How Node 16 maps to our wishlist (detailed compatibility)
|
|
52
|
+
- BYOB (bring-your-own-bucket): fully supported. Storage clients for S3/GCS/Azure work on Node 16.
|
|
53
|
+
- Commit / stage / push workflow: entirely application-level — supported.
|
|
54
|
+
- Hierarchical sync, metadata, versioning, idempotency: supported at the service layer; Node 16 supports required primitives (streams, `worker_threads`, `crypto`).
|
|
55
|
+
- Compression: `zlib` built-in; `lz4`, `brotli` available via npm. Parquet conversions are better in Python (`pyarrow`) — we will provide Python client or a conversion service for heavy data formats.
|
|
56
|
+
- Concurrency: use `worker_threads` (available and stable in Node 16) and streams to parallelize compression and uploads.
|
|
57
|
+
- Partial/incremental updates and ranged fetches: implementable with streaming and backend range/compose APIs (S3 multipart, GCS compose).
|
|
58
|
+
- Encryption and KMS integration: supported via Node `crypto` and provider SDKs.
|
|
59
|
+
- Native modules: possible but require CI pins and prebuilds — mitigate with CI and optional community-supported prebuilds.
|
|
60
|
+
|
|
61
|
+
Risks and mitigations
|
|
62
|
+
- Native module incompatibility: mitigate by adding CI builds and publishing prebuilt binaries (or using pure-JS fallbacks). Test on Node 16 and Node 20 during migration.
|
|
63
|
+
- Missing Web APIs (global `fetch`) on Node 16: use `undici` or `node-fetch` polyfills in clients; keep network layer modular to swap implementations when moving to Node 20.
|
|
64
|
+
- Performance-sensitive workloads (large artifact compression/hash): use `worker_threads`, native modules, or offload heavy conversions to the Python client/service.
|
|
65
|
+
- Lockfile/regeneration issues when moving Node versions: regenerate `package-lock.json` under target Node during migration and run `npm ci` in CI.
|
|
66
|
+
|
|
67
|
+
Migration plan (concrete steps)
|
|
68
|
+
1. Declare current Node target: add `.nvmrc` with `16` and `engines.node` to `package.json` (`>=16`).
|
|
69
|
+
2. Keep CI default at `16.x` for publishing and quick feedback.
|
|
70
|
+
3. Add an optional CI matrix job to run a test matrix including `20.x` (and 16) for a short stabilization period.
|
|
71
|
+
4. Run full test suite and integration tests (uploads/downloads, idempotency, metadata, compression) for Node 20 in CI, iterate to fix compatibility issues.
|
|
72
|
+
5. Once stable, update `devcontainer.json` to a Node 20 base image, update docs to say `Node 20` as recommended runtime, and switch CI default to `20.x`.
|
|
73
|
+
|
|
74
|
+
Recommended immediate actions (next PRs)
|
|
75
|
+
- Add `.nvmrc` with `16` to repo root.
|
|
76
|
+
- Add `engines.node` to `package.json`: `"engines": { "node": ">=16" }`.
|
|
77
|
+
- Add an integration test job to CI that runs key scenarios on Node 16.
|
|
78
|
+
- Add a short-lived CI matrix job for Node 20 to surface issues early.
|
|
79
|
+
- Document these changes in `README.md` and in `docs/doc#1 get-started`.
|
|
80
|
+
|
|
81
|
+
Decision record
|
|
82
|
+
- Decision: use **Node 16** as the canonical immediate target for compatibility. Migrate to **Node 20** after stabilization and testing.
|
|
83
|
+
- Rationale: maximize compatibility now; plan migration to gain runtime and dev ergonomics benefits later.
|
|
84
|
+
|
|
85
|
+
Links and references
|
|
86
|
+
- Get-started: see `docs/doc#1 get-started` for local dev steps.
|
|
87
|
+
- Python client guidance: see `docs/doc#2.5 python-port` for PyPI client plans and Parquet/pyarrow recommendations.
|
|
88
|
+
|
|
89
|
+
Appendix: commands and examples
|
|
90
|
+
- Set Node and run locally (recommended for developers):
|
|
91
|
+
|
|
92
|
+
```bash
|
|
93
|
+
nvm install 16
|
|
94
|
+
nvm use 16
|
|
95
|
+
npm ci
|
|
96
|
+
npm run build
|
|
97
|
+
```
|
|
98
|
+
|
|
99
|
+
- Run quick compatibility test in CI (example matrix snippet for GitHub Actions):
|
|
100
|
+
|
|
101
|
+
```yaml
|
|
102
|
+
strategy:
|
|
103
|
+
matrix:
|
|
104
|
+
node-version: [16.x, 20.x]
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
|
|
@@ -0,0 +1,42 @@
|
|
|
1
|
+
|
|
2
|
+
Why Python and where it fits
|
|
3
|
+
---------------------------
|
|
4
|
+
|
|
5
|
+
Overview
|
|
6
|
+
- Role split: Node (canonical HTTP service + JS client/CLI) handles core storage logic, streaming, idempotency, and integration with object stores. Python is used for:
|
|
7
|
+
- Data-format heavy lifting (Parquet, pyarrow conversions, complex serialization)
|
|
8
|
+
- Language-native clients for data teams (PyPI package)
|
|
9
|
+
- Optional transformation microservices that are easier to implement with Python data libraries
|
|
10
|
+
- Integration tests and ETL-friendly tooling that call the Node HTTP API
|
|
11
|
+
|
|
12
|
+
How Python integrates with Node
|
|
13
|
+
- Language-agnostic HTTP API: The Node service exposes an OpenAPI-defined HTTP surface (uploads, downloads, metadata, idempotency). Python clients call this API directly.
|
|
14
|
+
- Sidecar/worker pattern: For heavy conversions (CSV -> Parquet, complex schema transforms) Python workers run as separate processes or services and either:
|
|
15
|
+
- push transformed artifacts to the Node service via the same upload API, or
|
|
16
|
+
- run as an on-demand conversion service invoked by Node (HTTP call or queue message).
|
|
17
|
+
- CLI interoperability: Python client will include a small CLI that mirrors the Node CLI (`commit`, `stage`, `push`) by calling the HTTP service; teams can use whichever CLI fits their environment.
|
|
18
|
+
|
|
19
|
+
Why this is practical for our checklist
|
|
20
|
+
- Parquet & data formats: Python has mature libraries (`pyarrow`, `pandas`) for conversions and efficient columnar storage — avoids complex native bindings in Node.
|
|
21
|
+
- Testing parity: Integration tests can exercise the same HTTP API across Node and Python clients to ensure behaviour parity (idempotency, metadata, hierarchical sync).
|
|
22
|
+
- Maintainability: One canonical service in Node reduces duplication of business logic; Python focuses on data transformations and client ergonomics.
|
|
23
|
+
|
|
24
|
+
Examples (usage patterns)
|
|
25
|
+
- Python client uploading a transformed Parquet file to the Node service:
|
|
26
|
+
|
|
27
|
+
```python
|
|
28
|
+
from dorky_client import DorkyClient
|
|
29
|
+
|
|
30
|
+
client = DorkyClient('http://localhost:3000')
|
|
31
|
+
client.upload('data/table.parquet', metadata={'table':'events'})
|
|
32
|
+
```
|
|
33
|
+
|
|
34
|
+
- Node service calls a Python conversion service (HTTP) to convert CSV to Parquet, then ingests the result into the bucket via internal upload flow.
|
|
35
|
+
|
|
36
|
+
Packaging & publishing
|
|
37
|
+
- Python client will be a small PyPI package (see `python-client/pyproject.toml`) that depends on `requests` (sync) and optionally `httpx` for async.
|
|
38
|
+
- Heavy dependencies like `pyarrow` are optional extras (`dorky[parquet]`) to keep the core client lightweight.
|
|
39
|
+
|
|
40
|
+
Next steps
|
|
41
|
+
- We added a minimal OpenAPI spec and a Python client scaffold in `python-client/` to start implementation and tests.
|
|
42
|
+
|
|
File without changes
|