pycharter 0.0.24__py3-none-any.whl → 0.0.26__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- pycharter/__init__.py +6 -0
- pycharter/api/README.md +340 -0
- {api → pycharter/api}/__init__.py +1 -1
- {api → pycharter/api}/dependencies/__init__.py +2 -2
- pycharter/api/dependencies/auth.py +158 -0
- {api → pycharter/api}/main.py +32 -4
- {api → pycharter/api}/models/__init__.py +4 -4
- pycharter/api/models/etl.py +66 -0
- {api → pycharter/api}/routes/v1/__init__.py +5 -1
- pycharter/api/routes/v1/auth.py +97 -0
- {api → pycharter/api}/routes/v1/contracts.py +14 -12
- {api → pycharter/api}/routes/v1/docs.py +2 -2
- pycharter/api/routes/v1/etl.py +131 -0
- {api → pycharter/api}/routes/v1/evolution.py +2 -2
- {api → pycharter/api}/routes/v1/metadata.py +5 -5
- {api → pycharter/api}/routes/v1/quality.py +3 -3
- {api → pycharter/api}/routes/v1/schemas.py +1 -1
- {api → pycharter/api}/routes/v1/settings.py +1 -1
- {api → pycharter/api}/routes/v1/tracking.py +1 -1
- {api → pycharter/api}/routes/v1/validation.py +2 -2
- {api → pycharter/api}/routes/v1/validation_jobs.py +3 -3
- pycharter/cli.py +9 -11
- pycharter/config.py +69 -0
- pycharter/contract_builder/builder.py +32 -37
- pycharter/data/seed/compliance_frameworks.yaml +22 -0
- pycharter/data/seed/contracts.yaml +130 -0
- pycharter/data/seed/data_feeds.yaml +22 -0
- pycharter/data/seed/domains.yaml +13 -0
- pycharter/data/seed/environments.yaml +19 -0
- pycharter/data/seed/owners.yaml +21 -0
- pycharter/data/seed/systems.yaml +13 -0
- pycharter/data/seed/tags.yaml +25 -0
- pycharter/data/templates/contract/README.md +161 -0
- pycharter/data/templates/contract/template_contract.yaml +37 -0
- pycharter/data/templates/etl/README.md +1 -1
- pycharter/data/templates/etl/extract_with_validation.yaml +86 -0
- pycharter/data/templates/etl/load_with_validation.yaml +111 -0
- pycharter/data/templates/etl/settings.yaml +55 -0
- pycharter/db/README.md +179 -0
- pycharter/db/cli.py +126 -4
- pycharter/db/migrations/versions/20260122000000_change_artifact_unique_constraints_to_title_version.py +2 -2
- pycharter/db/schemas/README.md +96 -0
- pycharter/etl_generator/ASYNC_AND_EXECUTION.md +91 -0
- pycharter/etl_generator/INTERFACES.md +142 -0
- pycharter/etl_generator/README.md +271 -0
- pycharter/etl_generator/TRANSFORMATION_GUIDE.md +452 -0
- pycharter/etl_generator/__init__.py +47 -11
- pycharter/etl_generator/config_models.py +673 -0
- pycharter/etl_generator/config_validator.py +133 -157
- pycharter/etl_generator/context.py +3 -0
- pycharter/etl_generator/database.py +5 -1
- pycharter/etl_generator/extractors/__init__.py +4 -2
- pycharter/etl_generator/extractors/cloud_storage.py +9 -9
- pycharter/etl_generator/extractors/database.py +2 -2
- pycharter/etl_generator/extractors/factory.py +15 -33
- pycharter/etl_generator/extractors/file.py +2 -2
- pycharter/etl_generator/extractors/http.py +2 -2
- pycharter/etl_generator/extractors/mongodb.py +393 -0
- pycharter/etl_generator/extractors/streaming.py +2 -2
- pycharter/etl_generator/loaders/__init__.py +15 -9
- pycharter/etl_generator/loaders/{cloud_storage_loader.py → cloud_storage.py} +95 -2
- pycharter/etl_generator/loaders/factory.py +16 -29
- pycharter/etl_generator/loaders/file.py +135 -1
- pycharter/etl_generator/loaders/mongodb.py +416 -0
- pycharter/etl_generator/pipeline.py +283 -164
- pycharter/etl_generator/result.py +16 -0
- pycharter/etl_generator/schemas/__init__.py +71 -42
- pycharter/etl_generator/transformers/config.py +3 -2
- pycharter/etl_generator/transformers/simple_operations.py +57 -4
- pycharter/etl_generator/validation.py +551 -0
- pycharter/metadata_store/README.md +229 -0
- pycharter/quality/README.md +235 -0
- pycharter/runtime_validator/__init__.py +7 -0
- pycharter/runtime_validator/utils.py +33 -0
- pycharter/runtime_validator/validator.py +13 -10
- pycharter/ui/.eslintrc.json +4 -0
- pycharter/ui/README.md +186 -0
- {ui → pycharter/ui}/__init__.py +3 -3
- pycharter/ui/components.json +17 -0
- pycharter/ui/package-lock.json +6617 -0
- pycharter/ui/package.json +37 -0
- {ui → pycharter/ui}/server.py +7 -8
- pycharter/ui/static/404/index.html +1 -0
- pycharter/ui/static/404.html +1 -0
- pycharter/ui/static/__next.__PAGE__.txt +10 -0
- pycharter/ui/static/__next._full.txt +30 -0
- pycharter/ui/static/__next._head.txt +7 -0
- pycharter/ui/static/__next._index.txt +9 -0
- pycharter/ui/static/__next._tree.txt +2 -0
- pycharter/ui/static/_next/static/YCnlK66gA7FV5vvcixspB/_clientMiddlewareManifest.json +1 -0
- pycharter/ui/static/_next/static/chunks/0fc1f70b787b8845.js +1 -0
- pycharter/ui/static/_next/static/chunks/17bb8075d7b75663.css +1 -0
- pycharter/ui/static/_next/static/chunks/381932864dcbfdb8.js +1 -0
- pycharter/ui/static/_next/static/chunks/4c951b8e4507e2b3.js +1 -0
- pycharter/ui/static/_next/static/chunks/68b87a6f65abd3ed.js +1 -0
- pycharter/ui/static/_next/static/chunks/78572617b8fae189.js +1 -0
- pycharter/ui/static/_next/static/chunks/8b7be2803e3fe184.js +1 -0
- pycharter/ui/static/_next/static/chunks/a8e529fd1e67f121.js +1 -0
- pycharter/ui/static/_next/static/chunks/c35d998f80be3ff5.js +1 -0
- pycharter/ui/static/_next/static/chunks/e453aa5d01c32c17.js +1 -0
- pycharter/ui/static/_next/static/chunks/f2d240eb057f898a.js +970 -0
- pycharter/ui/static/_next/static/chunks/f7722448f6040846.js +1 -0
- pycharter/ui/static/_not-found/__next._full.txt +17 -0
- pycharter/ui/static/_not-found/__next._head.txt +7 -0
- pycharter/ui/static/_not-found/__next._index.txt +9 -0
- pycharter/ui/static/_not-found/__next._not-found.__PAGE__.txt +5 -0
- pycharter/ui/static/_not-found/__next._not-found.txt +4 -0
- pycharter/ui/static/_not-found/__next._tree.txt +2 -0
- pycharter/ui/static/_not-found/index.html +1 -0
- pycharter/ui/static/_not-found/index.txt +17 -0
- pycharter/ui/static/contracts/__next._full.txt +21 -0
- pycharter/ui/static/contracts/__next._head.txt +7 -0
- pycharter/ui/static/contracts/__next._index.txt +9 -0
- pycharter/ui/static/contracts/__next._tree.txt +2 -0
- pycharter/ui/static/contracts/__next.contracts.__PAGE__.txt +9 -0
- pycharter/ui/static/contracts/__next.contracts.txt +4 -0
- pycharter/ui/static/contracts/index.html +1 -0
- pycharter/ui/static/contracts/index.txt +21 -0
- pycharter/ui/static/documentation/__next._full.txt +21 -0
- pycharter/ui/static/documentation/__next._head.txt +7 -0
- pycharter/ui/static/documentation/__next._index.txt +9 -0
- pycharter/ui/static/documentation/__next._tree.txt +2 -0
- pycharter/ui/static/documentation/__next.documentation.__PAGE__.txt +9 -0
- pycharter/ui/static/documentation/__next.documentation.txt +4 -0
- pycharter/ui/static/documentation/index.html +93 -0
- pycharter/ui/static/documentation/index.txt +21 -0
- pycharter/ui/static/etl/__next._full.txt +21 -0
- pycharter/ui/static/etl/__next._head.txt +7 -0
- pycharter/ui/static/etl/__next._index.txt +9 -0
- pycharter/ui/static/etl/__next._tree.txt +2 -0
- pycharter/ui/static/etl/__next.etl.__PAGE__.txt +9 -0
- pycharter/ui/static/etl/__next.etl.txt +4 -0
- pycharter/ui/static/etl/index.html +2 -0
- pycharter/ui/static/etl/index.txt +21 -0
- pycharter/ui/static/index.html +1 -0
- pycharter/ui/static/index.txt +30 -0
- pycharter/ui/static/metadata/__next._full.txt +21 -0
- pycharter/ui/static/metadata/__next._head.txt +7 -0
- pycharter/ui/static/metadata/__next._index.txt +9 -0
- pycharter/ui/static/metadata/__next._tree.txt +2 -0
- pycharter/ui/static/metadata/__next.metadata.__PAGE__.txt +9 -0
- pycharter/ui/static/metadata/__next.metadata.txt +4 -0
- pycharter/ui/static/metadata/index.html +1 -0
- pycharter/ui/static/metadata/index.txt +21 -0
- pycharter/ui/static/quality/__next._full.txt +21 -0
- pycharter/ui/static/quality/__next._head.txt +7 -0
- pycharter/ui/static/quality/__next._index.txt +9 -0
- pycharter/ui/static/quality/__next._tree.txt +2 -0
- pycharter/ui/static/quality/__next.quality.__PAGE__.txt +9 -0
- pycharter/ui/static/quality/__next.quality.txt +4 -0
- pycharter/ui/static/quality/index.html +2 -0
- pycharter/ui/static/quality/index.txt +21 -0
- pycharter/ui/static/rules/__next._full.txt +21 -0
- pycharter/ui/static/rules/__next._head.txt +7 -0
- pycharter/ui/static/rules/__next._index.txt +9 -0
- pycharter/ui/static/rules/__next._tree.txt +2 -0
- pycharter/ui/static/rules/__next.rules.__PAGE__.txt +9 -0
- pycharter/ui/static/rules/__next.rules.txt +4 -0
- pycharter/ui/static/rules/index.html +1 -0
- pycharter/ui/static/rules/index.txt +21 -0
- pycharter/ui/static/schemas/__next._full.txt +21 -0
- pycharter/ui/static/schemas/__next._head.txt +7 -0
- pycharter/ui/static/schemas/__next._index.txt +9 -0
- pycharter/ui/static/schemas/__next._tree.txt +2 -0
- pycharter/ui/static/schemas/__next.schemas.__PAGE__.txt +9 -0
- pycharter/ui/static/schemas/__next.schemas.txt +4 -0
- pycharter/ui/static/schemas/index.html +1 -0
- pycharter/ui/static/schemas/index.txt +21 -0
- pycharter/ui/static/settings/__next._full.txt +21 -0
- pycharter/ui/static/settings/__next._head.txt +7 -0
- pycharter/ui/static/settings/__next._index.txt +9 -0
- pycharter/ui/static/settings/__next._tree.txt +2 -0
- pycharter/ui/static/settings/__next.settings.__PAGE__.txt +9 -0
- pycharter/ui/static/settings/__next.settings.txt +4 -0
- pycharter/ui/static/settings/index.html +1 -0
- pycharter/ui/static/settings/index.txt +21 -0
- pycharter/ui/static/static/_next/static/2gKjNv6YvE6BcIdFthBLs/_clientMiddlewareManifest.json +1 -0
- pycharter/ui/static/static/static/_next/static/0rYA78L88aUyD2Uh38hhX/_clientMiddlewareManifest.json +1 -0
- pycharter/ui/static/static/static/_next/static/chunks/f7d1a90dd75d2572.js +1 -0
- pycharter/ui/static/static/static/static/.gitkeep +0 -0
- pycharter/ui/static/static/static/static/_next/static/chunks/222442f6da32302a.js +1 -0
- pycharter/ui/static/static/static/static/_next/static/chunks/247eb132b7f7b574.js +1 -0
- pycharter/ui/static/static/static/static/_next/static/chunks/297d55555b71baba.js +1 -0
- pycharter/ui/static/static/static/static/_next/static/chunks/414e77373f8ff61c.js +1 -0
- pycharter/ui/static/static/static/static/_next/static/chunks/652ad0aa26265c47.js +2 -0
- pycharter/ui/static/static/static/static/_next/static/chunks/9c23f44fff36548a.js +1 -0
- pycharter/ui/static/static/static/static/_next/static/chunks/a6dad97d9634a72d.js +1 -0
- pycharter/ui/static/static/static/static/_next/static/chunks/b32a0963684b9933.js +4 -0
- pycharter/ui/static/static/static/static/_next/static/chunks/db913959c675cea6.js +1 -0
- pycharter/ui/static/static/static/static/_next/static/chunks/f2e7afeab1178138.js +1 -0
- pycharter/ui/static/static/static/static/_next/static/chunks/ff1a16fafef87110.js +1 -0
- pycharter/ui/static/static/static/static/_next/static/chunks/turbopack-ffcb7ab6794027ef.js +3 -0
- pycharter/ui/static/static/static/static/_next/static/tNTkVW6puVXC4bAm4WrHl/_buildManifest.js +11 -0
- pycharter/ui/static/static/static/static/_next/static/tNTkVW6puVXC4bAm4WrHl/_clientMiddlewareManifest.json +1 -0
- pycharter/ui/static/static/static/static/_next/static/tNTkVW6puVXC4bAm4WrHl/_ssgManifest.js +1 -0
- pycharter/ui/static/validation/__next._full.txt +21 -0
- pycharter/ui/static/validation/__next._head.txt +7 -0
- pycharter/ui/static/validation/__next._index.txt +9 -0
- pycharter/ui/static/validation/__next._tree.txt +2 -0
- pycharter/ui/static/validation/__next.validation.__PAGE__.txt +9 -0
- pycharter/ui/static/validation/__next.validation.txt +4 -0
- pycharter/ui/static/validation/index.html +1 -0
- pycharter/ui/static/validation/index.txt +21 -0
- pycharter/ui/tsconfig.json +42 -0
- pycharter/worker/README.md +187 -0
- pycharter/worker/backends/__init__.py +8 -0
- pycharter/worker/backends/base.py +46 -0
- pycharter/worker/backends/spark.py +233 -0
- {worker → pycharter/worker}/cli.py +1 -1
- {worker → pycharter/worker}/processor.py +2 -2
- pycharter/worker/queue/__init__.py +8 -0
- pycharter/worker/queue/redis_queue.py +147 -0
- {pycharter-0.0.24.dist-info → pycharter-0.0.26.dist-info}/METADATA +57 -26
- pycharter-0.0.26.dist-info/RECORD +702 -0
- pycharter-0.0.26.dist-info/top_level.txt +1 -0
- pycharter/etl_generator/config_loader.py +0 -394
- pycharter/etl_generator/loaders/cloud.py +0 -87
- pycharter/etl_generator/loaders/file_loader.py +0 -130
- pycharter-0.0.24.dist-info/RECORD +0 -543
- pycharter-0.0.24.dist-info/top_level.txt +0 -4
- {api → pycharter/api}/dependencies/database.py +0 -0
- {api → pycharter/api}/dependencies/store.py +0 -0
- {api → pycharter/api}/models/contracts.py +0 -0
- {api → pycharter/api}/models/docs.py +0 -0
- {api → pycharter/api}/models/evolution.py +0 -0
- {api → pycharter/api}/models/metadata.py +0 -0
- {api → pycharter/api}/models/metadata_entities.py +0 -0
- {api → pycharter/api}/models/quality.py +0 -0
- {api → pycharter/api}/models/schemas.py +0 -0
- {api → pycharter/api}/models/tracking.py +0 -0
- {api → pycharter/api}/models/validation.py +0 -0
- {api → pycharter/api}/routes/__init__.py +0 -0
- {api → pycharter/api}/routes/v1/templates.py +0 -0
- {api → pycharter/api}/utils.py +0 -0
- {ui → pycharter/ui}/build.py +0 -0
- {ui → pycharter/ui}/dev.py +0 -0
- {ui → pycharter/ui}/static/.gitkeep +0 -0
- {ui/static/_next/static/2gKjNv6YvE6BcIdFthBLs → pycharter/ui/static/_next/static/YCnlK66gA7FV5vvcixspB}/_buildManifest.js +0 -0
- {ui/static/_next/static/2gKjNv6YvE6BcIdFthBLs → pycharter/ui/static/_next/static/YCnlK66gA7FV5vvcixspB}/_ssgManifest.js +0 -0
- {ui → pycharter/ui}/static/_next/static/chunks/222442f6da32302a.js +0 -0
- {ui → pycharter/ui}/static/_next/static/chunks/247eb132b7f7b574.js +0 -0
- {ui → pycharter/ui}/static/_next/static/chunks/297d55555b71baba.js +0 -0
- {ui → pycharter/ui}/static/_next/static/chunks/414e77373f8ff61c.js +0 -0
- {ui → pycharter/ui}/static/_next/static/chunks/652ad0aa26265c47.js +0 -0
- {ui → pycharter/ui}/static/_next/static/chunks/9c23f44fff36548a.js +0 -0
- {ui → pycharter/ui}/static/_next/static/chunks/a6dad97d9634a72d.js +0 -0
- {ui → pycharter/ui}/static/_next/static/chunks/b32a0963684b9933.js +0 -0
- {ui → pycharter/ui}/static/_next/static/chunks/db913959c675cea6.js +0 -0
- {ui → pycharter/ui}/static/_next/static/chunks/f2e7afeab1178138.js +0 -0
- {ui → pycharter/ui}/static/_next/static/chunks/f7d1a90dd75d2572.js +0 -0
- {ui → pycharter/ui}/static/_next/static/chunks/ff1a16fafef87110.js +0 -0
- {ui → pycharter/ui}/static/_next/static/chunks/turbopack-ffcb7ab6794027ef.js +0 -0
- {ui → pycharter/ui}/static/static/.gitkeep +0 -0
- {ui → pycharter/ui/static}/static/404/index.html +0 -0
- {ui → pycharter/ui/static}/static/404.html +0 -0
- {ui → pycharter/ui/static}/static/__next.__PAGE__.txt +0 -0
- {ui → pycharter/ui/static}/static/__next._full.txt +0 -0
- {ui → pycharter/ui/static}/static/__next._head.txt +0 -0
- {ui → pycharter/ui/static}/static/__next._index.txt +0 -0
- {ui → pycharter/ui/static}/static/__next._tree.txt +0 -0
- {ui/static/static/_next/static/0rYA78L88aUyD2Uh38hhX → pycharter/ui/static/static/_next/static/2gKjNv6YvE6BcIdFthBLs}/_buildManifest.js +0 -0
- {ui/static/static/_next/static/0rYA78L88aUyD2Uh38hhX → pycharter/ui/static/static/_next/static/2gKjNv6YvE6BcIdFthBLs}/_ssgManifest.js +0 -0
- {ui → pycharter/ui/static}/static/_next/static/chunks/13d4a0fbd74c1ee4.js +0 -0
- {ui → pycharter/ui}/static/static/_next/static/chunks/222442f6da32302a.js +0 -0
- {ui → pycharter/ui}/static/static/_next/static/chunks/247eb132b7f7b574.js +0 -0
- {ui → pycharter/ui/static}/static/_next/static/chunks/26dfc590f7714c03.js +0 -0
- {ui → pycharter/ui}/static/static/_next/static/chunks/297d55555b71baba.js +0 -0
- {ui → pycharter/ui/static}/static/_next/static/chunks/2ab439ce003cd691.js +0 -0
- {ui → pycharter/ui/static}/static/_next/static/chunks/2edb43b48432ac04.js +0 -0
- {ui → pycharter/ui/static}/static/_next/static/chunks/34d289e6db2ef551.js +0 -0
- {ui → pycharter/ui}/static/static/_next/static/chunks/414e77373f8ff61c.js +0 -0
- {ui → pycharter/ui/static}/static/_next/static/chunks/49ca65abd26ae49e.js +0 -0
- {ui → pycharter/ui}/static/static/_next/static/chunks/652ad0aa26265c47.js +0 -0
- {ui → pycharter/ui/static}/static/_next/static/chunks/9667e7a3d359eb39.js +0 -0
- {ui → pycharter/ui/static}/static/_next/static/chunks/99508d9d5869cc27.js +0 -0
- {ui → pycharter/ui}/static/static/_next/static/chunks/9c23f44fff36548a.js +0 -0
- {ui → pycharter/ui}/static/static/_next/static/chunks/a6dad97d9634a72d.js +0 -0
- {ui → pycharter/ui/static}/static/_next/static/chunks/b313c35a6ba76574.js +0 -0
- {ui → pycharter/ui}/static/static/_next/static/chunks/b32a0963684b9933.js +0 -0
- {ui → pycharter/ui/static}/static/_next/static/chunks/c69f6cba366bd988.js +0 -0
- {ui → pycharter/ui/static}/static/_next/static/chunks/d2363397e1b2bcab.css +0 -0
- {ui → pycharter/ui}/static/static/_next/static/chunks/db913959c675cea6.js +0 -0
- {ui → pycharter/ui/static}/static/_next/static/chunks/f061a4be97bfc3b3.js +0 -0
- {ui → pycharter/ui}/static/static/_next/static/chunks/f2e7afeab1178138.js +0 -0
- {ui → pycharter/ui}/static/static/_next/static/chunks/f7d1a90dd75d2572.js +0 -0
- {ui → pycharter/ui}/static/static/_next/static/chunks/ff1a16fafef87110.js +0 -0
- {ui → pycharter/ui}/static/static/_next/static/chunks/turbopack-ffcb7ab6794027ef.js +0 -0
- {ui → pycharter/ui/static}/static/_not-found/__next._full.txt +0 -0
- {ui → pycharter/ui/static}/static/_not-found/__next._head.txt +0 -0
- {ui → pycharter/ui/static}/static/_not-found/__next._index.txt +0 -0
- {ui → pycharter/ui/static}/static/_not-found/__next._not-found.__PAGE__.txt +0 -0
- {ui → pycharter/ui/static}/static/_not-found/__next._not-found.txt +0 -0
- {ui → pycharter/ui/static}/static/_not-found/__next._tree.txt +0 -0
- {ui → pycharter/ui/static}/static/_not-found/index.html +0 -0
- {ui → pycharter/ui/static}/static/_not-found/index.txt +0 -0
- {ui → pycharter/ui/static}/static/contracts/__next._full.txt +0 -0
- {ui → pycharter/ui/static}/static/contracts/__next._head.txt +0 -0
- {ui → pycharter/ui/static}/static/contracts/__next._index.txt +0 -0
- {ui → pycharter/ui/static}/static/contracts/__next._tree.txt +0 -0
- {ui → pycharter/ui/static}/static/contracts/__next.contracts.__PAGE__.txt +0 -0
- {ui → pycharter/ui/static}/static/contracts/__next.contracts.txt +0 -0
- {ui → pycharter/ui/static}/static/contracts/index.html +0 -0
- {ui → pycharter/ui/static}/static/contracts/index.txt +0 -0
- {ui → pycharter/ui/static}/static/documentation/__next._full.txt +0 -0
- {ui → pycharter/ui/static}/static/documentation/__next._head.txt +0 -0
- {ui → pycharter/ui/static}/static/documentation/__next._index.txt +0 -0
- {ui → pycharter/ui/static}/static/documentation/__next._tree.txt +0 -0
- {ui → pycharter/ui/static}/static/documentation/__next.documentation.__PAGE__.txt +0 -0
- {ui → pycharter/ui/static}/static/documentation/__next.documentation.txt +0 -0
- {ui → pycharter/ui/static}/static/documentation/index.html +0 -0
- {ui → pycharter/ui/static}/static/documentation/index.txt +0 -0
- {ui → pycharter/ui/static}/static/index.html +0 -0
- {ui → pycharter/ui/static}/static/index.txt +0 -0
- {ui → pycharter/ui/static}/static/metadata/__next._full.txt +0 -0
- {ui → pycharter/ui/static}/static/metadata/__next._head.txt +0 -0
- {ui → pycharter/ui/static}/static/metadata/__next._index.txt +0 -0
- {ui → pycharter/ui/static}/static/metadata/__next._tree.txt +0 -0
- {ui → pycharter/ui/static}/static/metadata/__next.metadata.__PAGE__.txt +0 -0
- {ui → pycharter/ui/static}/static/metadata/__next.metadata.txt +0 -0
- {ui → pycharter/ui/static}/static/metadata/index.html +0 -0
- {ui → pycharter/ui/static}/static/metadata/index.txt +0 -0
- {ui → pycharter/ui/static}/static/quality/__next._full.txt +0 -0
- {ui → pycharter/ui/static}/static/quality/__next._head.txt +0 -0
- {ui → pycharter/ui/static}/static/quality/__next._index.txt +0 -0
- {ui → pycharter/ui/static}/static/quality/__next._tree.txt +0 -0
- {ui → pycharter/ui/static}/static/quality/__next.quality.__PAGE__.txt +0 -0
- {ui → pycharter/ui/static}/static/quality/__next.quality.txt +0 -0
- {ui → pycharter/ui/static}/static/quality/index.html +0 -0
- {ui → pycharter/ui/static}/static/quality/index.txt +0 -0
- {ui → pycharter/ui/static}/static/rules/__next._full.txt +0 -0
- {ui → pycharter/ui/static}/static/rules/__next._head.txt +0 -0
- {ui → pycharter/ui/static}/static/rules/__next._index.txt +0 -0
- {ui → pycharter/ui/static}/static/rules/__next._tree.txt +0 -0
- {ui → pycharter/ui/static}/static/rules/__next.rules.__PAGE__.txt +0 -0
- {ui → pycharter/ui/static}/static/rules/__next.rules.txt +0 -0
- {ui → pycharter/ui/static}/static/rules/index.html +0 -0
- {ui → pycharter/ui/static}/static/rules/index.txt +0 -0
- {ui → pycharter/ui/static}/static/schemas/__next._full.txt +0 -0
- {ui → pycharter/ui/static}/static/schemas/__next._head.txt +0 -0
- {ui → pycharter/ui/static}/static/schemas/__next._index.txt +0 -0
- {ui → pycharter/ui/static}/static/schemas/__next._tree.txt +0 -0
- {ui → pycharter/ui/static}/static/schemas/__next.schemas.__PAGE__.txt +0 -0
- {ui → pycharter/ui/static}/static/schemas/__next.schemas.txt +0 -0
- {ui → pycharter/ui/static}/static/schemas/index.html +0 -0
- {ui → pycharter/ui/static}/static/schemas/index.txt +0 -0
- {ui → pycharter/ui/static}/static/settings/__next._full.txt +0 -0
- {ui → pycharter/ui/static}/static/settings/__next._head.txt +0 -0
- {ui → pycharter/ui/static}/static/settings/__next._index.txt +0 -0
- {ui → pycharter/ui/static}/static/settings/__next._tree.txt +0 -0
- {ui → pycharter/ui/static}/static/settings/__next.settings.__PAGE__.txt +0 -0
- {ui → pycharter/ui/static}/static/settings/__next.settings.txt +0 -0
- {ui → pycharter/ui/static}/static/settings/index.html +0 -0
- {ui → pycharter/ui/static}/static/settings/index.txt +0 -0
- {ui → pycharter/ui}/static/static/static/.gitkeep +0 -0
- {ui → pycharter/ui/static}/static/static/404/index.html +0 -0
- {ui → pycharter/ui/static}/static/static/404.html +0 -0
- {ui → pycharter/ui/static}/static/static/__next.__PAGE__.txt +0 -0
- {ui → pycharter/ui/static}/static/static/__next._full.txt +0 -0
- {ui → pycharter/ui/static}/static/static/__next._head.txt +0 -0
- {ui → pycharter/ui/static}/static/static/__next._index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/__next._tree.txt +0 -0
- {ui/static/static/static/_next/static/tNTkVW6puVXC4bAm4WrHl → pycharter/ui/static/static/static/_next/static/0rYA78L88aUyD2Uh38hhX}/_buildManifest.js +0 -0
- {ui/static/static/static/_next/static/tNTkVW6puVXC4bAm4WrHl → pycharter/ui/static/static/static/_next/static/0rYA78L88aUyD2Uh38hhX}/_ssgManifest.js +0 -0
- {ui → pycharter/ui/static}/static/static/_next/static/chunks/13d4a0fbd74c1ee4.js +0 -0
- {ui → pycharter/ui}/static/static/static/_next/static/chunks/222442f6da32302a.js +0 -0
- {ui → pycharter/ui}/static/static/static/_next/static/chunks/247eb132b7f7b574.js +0 -0
- {ui → pycharter/ui}/static/static/static/_next/static/chunks/297d55555b71baba.js +0 -0
- {ui → pycharter/ui/static}/static/static/_next/static/chunks/2ab439ce003cd691.js +0 -0
- {ui → pycharter/ui/static}/static/static/_next/static/chunks/2edb43b48432ac04.js +0 -0
- {ui → pycharter/ui}/static/static/static/_next/static/chunks/414e77373f8ff61c.js +0 -0
- {ui → pycharter/ui/static}/static/static/_next/static/chunks/49ca65abd26ae49e.js +0 -0
- {ui → pycharter/ui/static}/static/static/_next/static/chunks/5e04d10c4a7b58a3.js +0 -0
- {ui → pycharter/ui}/static/static/static/_next/static/chunks/652ad0aa26265c47.js +0 -0
- {ui → pycharter/ui/static}/static/static/_next/static/chunks/75d88a058d8ffaa6.js +0 -0
- {ui → pycharter/ui/static}/static/static/_next/static/chunks/8c89634cf6bad76f.js +0 -0
- {ui → pycharter/ui/static}/static/static/_next/static/chunks/9667e7a3d359eb39.js +0 -0
- {ui → pycharter/ui}/static/static/static/_next/static/chunks/9c23f44fff36548a.js +0 -0
- {ui → pycharter/ui}/static/static/static/_next/static/chunks/a6dad97d9634a72d.js +0 -0
- {ui → pycharter/ui}/static/static/static/_next/static/chunks/b32a0963684b9933.js +0 -0
- {ui → pycharter/ui/static}/static/static/_next/static/chunks/c4fa4f4114b7c352.js +0 -0
- {ui → pycharter/ui/static}/static/static/_next/static/chunks/c69f6cba366bd988.js +0 -0
- {ui → pycharter/ui/static}/static/static/_next/static/chunks/d2363397e1b2bcab.css +0 -0
- {ui → pycharter/ui}/static/static/static/_next/static/chunks/db913959c675cea6.js +0 -0
- {ui → pycharter/ui/static}/static/static/_next/static/chunks/f061a4be97bfc3b3.js +0 -0
- {ui → pycharter/ui}/static/static/static/_next/static/chunks/f2e7afeab1178138.js +0 -0
- {ui → pycharter/ui}/static/static/static/_next/static/chunks/ff1a16fafef87110.js +0 -0
- {ui → pycharter/ui}/static/static/static/_next/static/chunks/turbopack-ffcb7ab6794027ef.js +0 -0
- {ui → pycharter/ui/static}/static/static/_not-found/__next._full.txt +0 -0
- {ui → pycharter/ui/static}/static/static/_not-found/__next._head.txt +0 -0
- {ui → pycharter/ui/static}/static/static/_not-found/__next._index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/_not-found/__next._not-found.__PAGE__.txt +0 -0
- {ui → pycharter/ui/static}/static/static/_not-found/__next._not-found.txt +0 -0
- {ui → pycharter/ui/static}/static/static/_not-found/__next._tree.txt +0 -0
- {ui → pycharter/ui/static}/static/static/_not-found/index.html +0 -0
- {ui → pycharter/ui/static}/static/static/_not-found/index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/contracts/__next._full.txt +0 -0
- {ui → pycharter/ui/static}/static/static/contracts/__next._head.txt +0 -0
- {ui → pycharter/ui/static}/static/static/contracts/__next._index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/contracts/__next._tree.txt +0 -0
- {ui → pycharter/ui/static}/static/static/contracts/__next.contracts.__PAGE__.txt +0 -0
- {ui → pycharter/ui/static}/static/static/contracts/__next.contracts.txt +0 -0
- {ui → pycharter/ui/static}/static/static/contracts/index.html +0 -0
- {ui → pycharter/ui/static}/static/static/contracts/index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/documentation/__next._full.txt +0 -0
- {ui → pycharter/ui/static}/static/static/documentation/__next._head.txt +0 -0
- {ui → pycharter/ui/static}/static/static/documentation/__next._index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/documentation/__next._tree.txt +0 -0
- {ui → pycharter/ui/static}/static/static/documentation/__next.documentation.__PAGE__.txt +0 -0
- {ui → pycharter/ui/static}/static/static/documentation/__next.documentation.txt +0 -0
- {ui → pycharter/ui/static}/static/static/documentation/index.html +0 -0
- {ui → pycharter/ui/static}/static/static/documentation/index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/index.html +0 -0
- {ui → pycharter/ui/static}/static/static/index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/metadata/__next._full.txt +0 -0
- {ui → pycharter/ui/static}/static/static/metadata/__next._head.txt +0 -0
- {ui → pycharter/ui/static}/static/static/metadata/__next._index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/metadata/__next._tree.txt +0 -0
- {ui → pycharter/ui/static}/static/static/metadata/__next.metadata.__PAGE__.txt +0 -0
- {ui → pycharter/ui/static}/static/static/metadata/__next.metadata.txt +0 -0
- {ui → pycharter/ui/static}/static/static/metadata/index.html +0 -0
- {ui → pycharter/ui/static}/static/static/metadata/index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/quality/__next._full.txt +0 -0
- {ui → pycharter/ui/static}/static/static/quality/__next._head.txt +0 -0
- {ui → pycharter/ui/static}/static/static/quality/__next._index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/quality/__next._tree.txt +0 -0
- {ui → pycharter/ui/static}/static/static/quality/__next.quality.__PAGE__.txt +0 -0
- {ui → pycharter/ui/static}/static/static/quality/__next.quality.txt +0 -0
- {ui → pycharter/ui/static}/static/static/quality/index.html +0 -0
- {ui → pycharter/ui/static}/static/static/quality/index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/rules/__next._full.txt +0 -0
- {ui → pycharter/ui/static}/static/static/rules/__next._head.txt +0 -0
- {ui → pycharter/ui/static}/static/static/rules/__next._index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/rules/__next._tree.txt +0 -0
- {ui → pycharter/ui/static}/static/static/rules/__next.rules.__PAGE__.txt +0 -0
- {ui → pycharter/ui/static}/static/static/rules/__next.rules.txt +0 -0
- {ui → pycharter/ui/static}/static/static/rules/index.html +0 -0
- {ui → pycharter/ui/static}/static/static/rules/index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/schemas/__next._full.txt +0 -0
- {ui → pycharter/ui/static}/static/static/schemas/__next._head.txt +0 -0
- {ui → pycharter/ui/static}/static/static/schemas/__next._index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/schemas/__next._tree.txt +0 -0
- {ui → pycharter/ui/static}/static/static/schemas/__next.schemas.__PAGE__.txt +0 -0
- {ui → pycharter/ui/static}/static/static/schemas/__next.schemas.txt +0 -0
- {ui → pycharter/ui/static}/static/static/schemas/index.html +0 -0
- {ui → pycharter/ui/static}/static/static/schemas/index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/settings/__next._full.txt +0 -0
- {ui → pycharter/ui/static}/static/static/settings/__next._head.txt +0 -0
- {ui → pycharter/ui/static}/static/static/settings/__next._index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/settings/__next._tree.txt +0 -0
- {ui → pycharter/ui/static}/static/static/settings/__next.settings.__PAGE__.txt +0 -0
- {ui → pycharter/ui/static}/static/static/settings/__next.settings.txt +0 -0
- {ui → pycharter/ui/static}/static/static/settings/index.html +0 -0
- {ui → pycharter/ui/static}/static/static/settings/index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/404/index.html +0 -0
- {ui → pycharter/ui/static}/static/static/static/404.html +0 -0
- {ui → pycharter/ui/static}/static/static/static/__next.__PAGE__.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/__next._full.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/__next._head.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/__next._index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/__next._tree.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/_next/static/chunks/2ab439ce003cd691.js +0 -0
- {ui → pycharter/ui/static}/static/static/static/_next/static/chunks/49ca65abd26ae49e.js +0 -0
- {ui → pycharter/ui/static}/static/static/static/_next/static/chunks/4e310fe5005770a3.css +0 -0
- {ui → pycharter/ui/static}/static/static/static/_next/static/chunks/5e04d10c4a7b58a3.js +0 -0
- {ui → pycharter/ui/static}/static/static/static/_next/static/chunks/5fc14c00a2779dc5.js +0 -0
- {ui → pycharter/ui/static}/static/static/static/_next/static/chunks/75d88a058d8ffaa6.js +0 -0
- {ui → pycharter/ui/static}/static/static/static/_next/static/chunks/8c89634cf6bad76f.js +0 -0
- {ui → pycharter/ui/static}/static/static/static/_next/static/chunks/9667e7a3d359eb39.js +0 -0
- {ui → pycharter/ui/static}/static/static/static/_next/static/chunks/b584574fdc8ab13e.js +0 -0
- {ui → pycharter/ui/static}/static/static/static/_next/static/chunks/c69f6cba366bd988.js +0 -0
- {ui → pycharter/ui/static}/static/static/static/_next/static/chunks/d5989c94d3614b3a.js +0 -0
- {ui → pycharter/ui/static}/static/static/static/_next/static/chunks/f061a4be97bfc3b3.js +0 -0
- {ui → pycharter/ui/static}/static/static/static/_not-found/__next._full.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/_not-found/__next._head.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/_not-found/__next._index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/_not-found/__next._not-found.__PAGE__.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/_not-found/__next._not-found.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/_not-found/__next._tree.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/_not-found/index.html +0 -0
- {ui → pycharter/ui/static}/static/static/static/_not-found/index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/contracts/__next._full.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/contracts/__next._head.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/contracts/__next._index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/contracts/__next._tree.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/contracts/__next.contracts.__PAGE__.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/contracts/__next.contracts.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/contracts/index.html +0 -0
- {ui → pycharter/ui/static}/static/static/static/contracts/index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/documentation/__next._full.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/documentation/__next._head.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/documentation/__next._index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/documentation/__next._tree.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/documentation/__next.documentation.__PAGE__.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/documentation/__next.documentation.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/documentation/index.html +0 -0
- {ui → pycharter/ui/static}/static/static/static/documentation/index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/index.html +0 -0
- {ui → pycharter/ui/static}/static/static/static/index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/metadata/__next._full.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/metadata/__next._head.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/metadata/__next._index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/metadata/__next._tree.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/metadata/__next.metadata.__PAGE__.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/metadata/__next.metadata.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/metadata/index.html +0 -0
- {ui → pycharter/ui/static}/static/static/static/metadata/index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/quality/__next._full.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/quality/__next._head.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/quality/__next._index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/quality/__next._tree.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/quality/__next.quality.__PAGE__.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/quality/__next.quality.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/quality/index.html +0 -0
- {ui → pycharter/ui/static}/static/static/static/quality/index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/rules/__next._full.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/rules/__next._head.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/rules/__next._index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/rules/__next._tree.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/rules/__next.rules.__PAGE__.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/rules/__next.rules.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/rules/index.html +0 -0
- {ui → pycharter/ui/static}/static/static/static/rules/index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/schemas/__next._full.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/schemas/__next._head.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/schemas/__next._index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/schemas/__next._tree.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/schemas/__next.schemas.__PAGE__.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/schemas/__next.schemas.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/schemas/index.html +0 -0
- {ui → pycharter/ui/static}/static/static/static/schemas/index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/settings/__next._full.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/settings/__next._head.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/settings/__next._index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/settings/__next._tree.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/settings/__next.settings.__PAGE__.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/settings/__next.settings.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/settings/index.html +0 -0
- {ui → pycharter/ui/static}/static/static/static/settings/index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/validation/__next._full.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/validation/__next._head.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/validation/__next._index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/validation/__next._tree.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/validation/__next.validation.__PAGE__.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/validation/__next.validation.txt +0 -0
- {ui → pycharter/ui/static}/static/static/static/validation/index.html +0 -0
- {ui → pycharter/ui/static}/static/static/static/validation/index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/validation/__next._full.txt +0 -0
- {ui → pycharter/ui/static}/static/static/validation/__next._head.txt +0 -0
- {ui → pycharter/ui/static}/static/static/validation/__next._index.txt +0 -0
- {ui → pycharter/ui/static}/static/static/validation/__next._tree.txt +0 -0
- {ui → pycharter/ui/static}/static/static/validation/__next.validation.__PAGE__.txt +0 -0
- {ui → pycharter/ui/static}/static/static/validation/__next.validation.txt +0 -0
- {ui → pycharter/ui/static}/static/static/validation/index.html +0 -0
- {ui → pycharter/ui/static}/static/static/validation/index.txt +0 -0
- {ui → pycharter/ui/static}/static/validation/__next._full.txt +0 -0
- {ui → pycharter/ui/static}/static/validation/__next._head.txt +0 -0
- {ui → pycharter/ui/static}/static/validation/__next._index.txt +0 -0
- {ui → pycharter/ui/static}/static/validation/__next._tree.txt +0 -0
- {ui → pycharter/ui/static}/static/validation/__next.validation.__PAGE__.txt +0 -0
- {ui → pycharter/ui/static}/static/validation/__next.validation.txt +0 -0
- {ui → pycharter/ui/static}/static/validation/index.html +0 -0
- {ui → pycharter/ui/static}/static/validation/index.txt +0 -0
- {worker → pycharter/worker}/__init__.py +0 -0
- {worker → pycharter/worker}/models.py +0 -0
- {pycharter-0.0.24.dist-info → pycharter-0.0.26.dist-info}/WHEEL +0 -0
- {pycharter-0.0.24.dist-info → pycharter-0.0.26.dist-info}/entry_points.txt +0 -0
- {pycharter-0.0.24.dist-info → pycharter-0.0.26.dist-info}/licenses/LICENSE +0 -0
|
@@ -0,0 +1,91 @@
|
|
|
1
|
+
# Async and Execution Model
|
|
2
|
+
|
|
3
|
+
This document describes how PyCharter's ETL pipeline uses async execution, where the event loop runs, and how to run pipelines from scripts and long-running applications.
|
|
4
|
+
|
|
5
|
+
## Pipeline execution is async
|
|
6
|
+
|
|
7
|
+
The `Pipeline.run()` method is **async** and returns a coroutine. Extractors yield batches asynchronously, and loaders perform I/O asynchronously. You must run the pipeline within an event loop.
|
|
8
|
+
|
|
9
|
+
## Running a pipeline
|
|
10
|
+
|
|
11
|
+
### From a script (one-off run)
|
|
12
|
+
|
|
13
|
+
Use `asyncio.run()` to run the pipeline. This creates an event loop, runs the pipeline, and closes the loop when done.
|
|
14
|
+
|
|
15
|
+
```python
|
|
16
|
+
import asyncio
|
|
17
|
+
from pycharter import Pipeline
|
|
18
|
+
|
|
19
|
+
async def main():
|
|
20
|
+
pipeline = Pipeline.from_config_dir("pipelines/users/")
|
|
21
|
+
result = await pipeline.run()
|
|
22
|
+
print(f"Loaded {result.rows_loaded} rows")
|
|
23
|
+
|
|
24
|
+
if __name__ == "__main__":
|
|
25
|
+
asyncio.run(main())
|
|
26
|
+
```
|
|
27
|
+
|
|
28
|
+
**Important:** Call `asyncio.run(main())` only once per process. Do not nest `asyncio.run()` calls.
|
|
29
|
+
|
|
30
|
+
### From an async application (FastAPI, Celery async, etc.)
|
|
31
|
+
|
|
32
|
+
If you are already inside an async context (e.g. a FastAPI route or an async Celery task), **await** the pipeline directly. Do not use `asyncio.run()` — it would create a new event loop and can conflict with the existing one.
|
|
33
|
+
|
|
34
|
+
```python
|
|
35
|
+
from fastapi import APIRouter
|
|
36
|
+
from pycharter import Pipeline
|
|
37
|
+
|
|
38
|
+
router = APIRouter()
|
|
39
|
+
|
|
40
|
+
@router.post("/run-etl")
|
|
41
|
+
async def run_etl():
|
|
42
|
+
pipeline = Pipeline.from_config_dir("pipelines/users/")
|
|
43
|
+
result = await pipeline.run()
|
|
44
|
+
return {"rows_loaded": result.rows_loaded}
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
### Where the event loop runs
|
|
48
|
+
|
|
49
|
+
| Context | Event loop | How to run the pipeline |
|
|
50
|
+
|---------------------------|-------------------------------|------------------------------|
|
|
51
|
+
| Script (e.g. `python run.py`) | Created by `asyncio.run()` | `asyncio.run(main())` |
|
|
52
|
+
| Jupyter / IPython | Built-in loop | `await pipeline.run()` |
|
|
53
|
+
| FastAPI / Starlette | Uvicorn’s loop | `await pipeline.run()` |
|
|
54
|
+
| Async Celery task | Worker’s loop | `await pipeline.run()` |
|
|
55
|
+
| Sync code (no loop) | None | Use `asyncio.run(main())` |
|
|
56
|
+
|
|
57
|
+
## Error handling and error mode
|
|
58
|
+
|
|
59
|
+
`Pipeline.run()` accepts an optional `error_context` (from `pycharter.shared.errors`). The default error context’s mode controls whether failures **raise** or are **collected**:
|
|
60
|
+
|
|
61
|
+
- **STRICT (default for many paths):** Extraction or load failures raise exceptions. Use when you want fail-fast behavior.
|
|
62
|
+
- **LENIENT:** Failures are logged and appended to `result.errors`; the pipeline continues where possible.
|
|
63
|
+
- **COLLECT:** Same as lenient but errors are also collected on the context for later inspection.
|
|
64
|
+
|
|
65
|
+
```python
|
|
66
|
+
from pycharter import Pipeline
|
|
67
|
+
from pycharter.shared.errors import get_error_context, set_error_mode, ErrorMode
|
|
68
|
+
|
|
69
|
+
# Optional: set global mode (e.g. lenient for a script)
|
|
70
|
+
set_error_mode(ErrorMode.LENIENT)
|
|
71
|
+
|
|
72
|
+
pipeline = Pipeline.from_config_dir("pipelines/users/")
|
|
73
|
+
result = await pipeline.run()
|
|
74
|
+
|
|
75
|
+
if not result.success:
|
|
76
|
+
for err in result.errors:
|
|
77
|
+
print(err)
|
|
78
|
+
```
|
|
79
|
+
|
|
80
|
+
You can also pass a specific `ErrorContext` into `run()` instead of using the global default:
|
|
81
|
+
|
|
82
|
+
```python
|
|
83
|
+
from pycharter.shared.errors import ErrorContext, ErrorMode
|
|
84
|
+
|
|
85
|
+
ctx = ErrorContext(mode=ErrorMode.LENIENT)
|
|
86
|
+
result = await pipeline.run(error_context=ctx)
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
## No async context manager (yet)
|
|
90
|
+
|
|
91
|
+
The pipeline does not provide an `async with` context manager. Connections (e.g. DB, HTTP) are managed inside the extractor and loader. For cleanup, instantiate and run the pipeline in a scope where you control resource lifetime, or wrap the run in your own try/finally or async context manager.
|
|
@@ -0,0 +1,142 @@
|
|
|
1
|
+
# ETL Generator — Main Interfaces
|
|
2
|
+
|
|
3
|
+
This document describes the main public interfaces of `pycharter.etl_generator`. Use these when building or extending ETL pipelines.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## Primary interface: run pipelines
|
|
8
|
+
|
|
9
|
+
### `ETLOrchestrator`
|
|
10
|
+
|
|
11
|
+
**Import:** `from pycharter.etl_generator import ETLOrchestrator`
|
|
12
|
+
|
|
13
|
+
The main entry point for running ETL. It runs: **Extract → Transform → Load** from contract artifacts and ETL configs (`extract.yaml`, `transform.yaml`, `load.yaml`).
|
|
14
|
+
|
|
15
|
+
- **Constructor:** `ETLOrchestrator(contract_dir=None, contract_file=None, contract_dict=None, contract_metadata=None, checkpoint_dir=None, progress_callback=None, verbose=True, max_memory_mb=None, config_context=None, extract_config=None, transform_config=None, load_config=None, extract_file=None, transform_file=None, load_file=None)`
|
|
16
|
+
- **Main methods:**
|
|
17
|
+
- `run(**kwargs)` → `Dict` — Run the full pipeline (async). Pass `dry_run=True` to transform/load without writing. Input params (e.g. `symbol`, `start_date`) come from `extract.yaml`’s `input_params` and `**kwargs`.
|
|
18
|
+
- `extract_stream(batch_size=None, max_records=None, **kwargs)` → `AsyncIterator[List[Dict]]` — Stream batches from extract only.
|
|
19
|
+
- **Config sources (priority):** direct dict args > file paths > files in `contract_dir`.
|
|
20
|
+
|
|
21
|
+
### `create_orchestrator(contract_dir=None, **kwargs) -> ETLOrchestrator`
|
|
22
|
+
|
|
23
|
+
**Import:** `from pycharter.etl_generator import create_orchestrator`
|
|
24
|
+
|
|
25
|
+
Factory helper that returns `ETLOrchestrator(contract_dir=contract_dir, **kwargs)`.
|
|
26
|
+
|
|
27
|
+
---
|
|
28
|
+
|
|
29
|
+
## Pipeline discovery
|
|
30
|
+
|
|
31
|
+
### `PipelineFactory`
|
|
32
|
+
|
|
33
|
+
**Import:** `from pycharter.etl_generator import PipelineFactory`
|
|
34
|
+
|
|
35
|
+
Discovers pipelines from a root directory: each subdir that contains `extract.yaml`, `transform.yaml`, and `load.yaml` is treated as one pipeline.
|
|
36
|
+
|
|
37
|
+
- **Constructor:** `PipelineFactory(config_root="configs", excluded_dirs=None, required_files=None)`
|
|
38
|
+
- **Methods:**
|
|
39
|
+
- `get_pipeline_names() -> List[str]`
|
|
40
|
+
- `get_contract_dir(pipeline_name: str) -> Optional[str]`
|
|
41
|
+
- `create_orchestrator(pipeline_name, config_context=None, verbose=True, **orchestrator_kwargs) -> ETLOrchestrator`
|
|
42
|
+
- `refresh()` — Rescan `config_root`
|
|
43
|
+
|
|
44
|
+
---
|
|
45
|
+
|
|
46
|
+
## Extract
|
|
47
|
+
|
|
48
|
+
### `extract_with_pagination_streaming(...) -> AsyncIterator[List[Dict]]`
|
|
49
|
+
|
|
50
|
+
**Import:** `from pycharter.etl_generator.extractors import extract_with_pagination_streaming`
|
|
51
|
+
|
|
52
|
+
Async generator that yields batches of records. Dispatches to the right extractor via `ExtractorFactory` using `type` (or auto-detection from `extract_config`).
|
|
53
|
+
|
|
54
|
+
- **Args:** `extract_config`, `params`, `headers`, `contract_dir=None`, `batch_size=1000`, `max_records=None`, `config_context=None`
|
|
55
|
+
|
|
56
|
+
### `ExtractorFactory.create(extract_config) -> BaseExtractor`
|
|
57
|
+
|
|
58
|
+
**Import:** `from pycharter.etl_generator.extractors import ExtractorFactory`
|
|
59
|
+
|
|
60
|
+
- `ExtractorFactory.create(extract_config)` — Returns an extractor instance for the given config.
|
|
61
|
+
- **Auto-detection:** `base_url`/`api_endpoint` → http; `file_path` → file; `database` → database; `storage` → cloud_storage.
|
|
62
|
+
|
|
63
|
+
### Extractors (implementations of `BaseExtractor`)
|
|
64
|
+
|
|
65
|
+
**Import:** `from pycharter.etl_generator.extractors import HTTPExtractor, FileExtractor, DatabaseExtractor, CloudStorageExtractor, BaseExtractor`
|
|
66
|
+
|
|
67
|
+
- **BaseExtractor** — Abstract base. Subclasses implement `validate_config(extract_config)` and `extract_streaming(extract_config, params, headers, contract_dir, batch_size, max_records, config_context)`.
|
|
68
|
+
- **HTTPExtractor** — HTTP/API (single or paginated).
|
|
69
|
+
- **FileExtractor** — Local files (CSV, JSON, Parquet, etc., including glob).
|
|
70
|
+
- **DatabaseExtractor** — SQL over PostgreSQL, MySQL, SQLite, MSSQL, Oracle.
|
|
71
|
+
- **CloudStorageExtractor** — S3, GCS, Azure Blob.
|
|
72
|
+
|
|
73
|
+
Custom extractors: `ExtractorFactory.register(type_name, extractor_class)`.
|
|
74
|
+
|
|
75
|
+
---
|
|
76
|
+
|
|
77
|
+
## Transform
|
|
78
|
+
|
|
79
|
+
### `apply_transforms(data, transform_config, **kwargs) -> List[Dict]`
|
|
80
|
+
|
|
81
|
+
**Import:** `from pycharter.etl_generator.transformers import apply_transforms`
|
|
82
|
+
|
|
83
|
+
Runs the transform pipeline on a list of records. Order: **simple_ops → jsonata → custom_function**. Each step is skipped if not present in config.
|
|
84
|
+
|
|
85
|
+
- **Args:** `data: List[Dict]`, `transform_config: Dict`, plus any `**kwargs` passed to custom functions.
|
|
86
|
+
- **Config shape:** Supports canonical `transform: { rename, convert, defaults, add, select, drop }` and/or top-level `jsonata`, `custom_function`. See `transformers.config.normalize_transform_config` and `pycharter/data/templates/etl/`.
|
|
87
|
+
|
|
88
|
+
---
|
|
89
|
+
|
|
90
|
+
## Load
|
|
91
|
+
|
|
92
|
+
### `load_to_file(data, load_config, ...) -> Dict`
|
|
93
|
+
|
|
94
|
+
**Import:** `from pycharter.etl_generator.loaders import load_to_file`
|
|
95
|
+
|
|
96
|
+
Writes records to a local file. Used when `load_config` has `destination_type: file` (or implies it via `file_path`).
|
|
97
|
+
|
|
98
|
+
- **Args:** `data`, `load_config`, `contract_dir=None`, `config_context=None`
|
|
99
|
+
- **Returns:** `{ "written": n, "total": n }` (and related metadata).
|
|
100
|
+
|
|
101
|
+
### `load_to_cloud_storage(data, load_config, ...) -> Dict`
|
|
102
|
+
|
|
103
|
+
**Import:** `from pycharter.etl_generator.loaders import load_to_cloud_storage`
|
|
104
|
+
|
|
105
|
+
Writes to S3, GCS, or Azure Blob. Used when `destination_type: cloud_storage` or config has `storage`.
|
|
106
|
+
|
|
107
|
+
- **Args:** `data`, `load_config`, `contract_dir=None`, `config_context=None`
|
|
108
|
+
- **Returns:** `{ "written": n, "total": n }` (and related metadata).
|
|
109
|
+
|
|
110
|
+
Database loading is handled inside the orchestrator via `pycharter.etl_generator.database.load_data` when `destination_type` is omitted or `database` (and `load_config` has `database`, `target_table`, `schema_name`, etc.).
|
|
111
|
+
|
|
112
|
+
---
|
|
113
|
+
|
|
114
|
+
## Config generation
|
|
115
|
+
|
|
116
|
+
### `generate_etl_config(...)` / `generate_etl_config_from_contract(...)` / `generate_etl_config_from_store(...)`
|
|
117
|
+
|
|
118
|
+
**Import:** `from pycharter.etl_generator import generate_etl_config, generate_etl_config_from_contract, generate_etl_config_from_store`
|
|
119
|
+
|
|
120
|
+
Helpers to produce ETL config dicts (extract/transform/load) from contracts or from the metadata store. See docstrings and `config_generator.py` for parameters.
|
|
121
|
+
|
|
122
|
+
---
|
|
123
|
+
|
|
124
|
+
## Utilities (progress, checkpoint, DLQ)
|
|
125
|
+
|
|
126
|
+
- **Progress:** `ETLProgress`, `ProgressTracker` — `from pycharter.etl_generator import ETLProgress, ProgressTracker`
|
|
127
|
+
- **Checkpoint/resume:** `CheckpointManager`, `CheckpointState` — `from pycharter.etl_generator import CheckpointManager, CheckpointState`
|
|
128
|
+
- **Dead letter:** `DeadLetterQueue`, `DeadLetterRecord`, `DLQReason` — `from pycharter.etl_generator import DeadLetterQueue, DeadLetterRecord, DLQReason`
|
|
129
|
+
|
|
130
|
+
---
|
|
131
|
+
|
|
132
|
+
## Quick reference: import map
|
|
133
|
+
|
|
134
|
+
| Use case | Import |
|
|
135
|
+
|--------------------------|--------|
|
|
136
|
+
| Run ETL | `from pycharter.etl_generator import ETLOrchestrator, create_orchestrator` |
|
|
137
|
+
| Discover pipelines | `from pycharter.etl_generator import PipelineFactory` |
|
|
138
|
+
| Extract (streaming) | `from pycharter.etl_generator.extractors import extract_with_pagination_streaming, ExtractorFactory` |
|
|
139
|
+
| Transform | `from pycharter.etl_generator.transformers import apply_transforms` |
|
|
140
|
+
| Load to file/cloud | `from pycharter.etl_generator.loaders import load_to_file, load_to_cloud_storage` |
|
|
141
|
+
| Config generation | `from pycharter.etl_generator import generate_etl_config, generate_etl_config_from_contract, generate_etl_config_from_store` |
|
|
142
|
+
| Progress / checkpoint/DLQ| `from pycharter.etl_generator import ETLProgress, ProgressTracker, CheckpointManager, CheckpointState, DeadLetterQueue, DeadLetterRecord, DLQReason` |
|
|
@@ -0,0 +1,271 @@
|
|
|
1
|
+
# ETL Orchestrator - User Guide
|
|
2
|
+
|
|
3
|
+
This document describes the ETL Orchestrator features, including simple transformations, JSONata support, and custom functions.
|
|
4
|
+
|
|
5
|
+
## Transformation Capabilities
|
|
6
|
+
|
|
7
|
+
The ETL orchestrator supports **three levels of transformation complexity**, applied in order:
|
|
8
|
+
|
|
9
|
+
1. **Simple Operations** (declarative, easy to use) - NEW! ✅
|
|
10
|
+
2. **JSONata** (powerful query language for complex transformations)
|
|
11
|
+
3. **Custom Functions** (Python functions for advanced logic)
|
|
12
|
+
|
|
13
|
+
### Simple Operations (Recommended for Most Use Cases)
|
|
14
|
+
|
|
15
|
+
Simple, declarative operations that handle 90% of transformation needs:
|
|
16
|
+
|
|
17
|
+
```yaml
|
|
18
|
+
# transform.yaml
|
|
19
|
+
transform:
|
|
20
|
+
rename:
|
|
21
|
+
oldName: new_name
|
|
22
|
+
camelCase: snake_case
|
|
23
|
+
convert:
|
|
24
|
+
price: float
|
|
25
|
+
quantity: integer
|
|
26
|
+
defaults:
|
|
27
|
+
status: "pending"
|
|
28
|
+
add:
|
|
29
|
+
full_name: "${first_name} ${last_name}"
|
|
30
|
+
created_at: "now()"
|
|
31
|
+
select:
|
|
32
|
+
- field1
|
|
33
|
+
- field2
|
|
34
|
+
drop:
|
|
35
|
+
- internal_id
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
**See [TRANSFORMATION_GUIDE.md](TRANSFORMATION_GUIDE.md) for complete documentation.**
|
|
39
|
+
|
|
40
|
+
### JSONata (Advanced)
|
|
41
|
+
|
|
42
|
+
Full JSONata support for complex transformations:
|
|
43
|
+
|
|
44
|
+
```yaml
|
|
45
|
+
jsonata:
|
|
46
|
+
expression: |
|
|
47
|
+
$.{
|
|
48
|
+
"ticker": symbol,
|
|
49
|
+
"avg_price": $average(prices),
|
|
50
|
+
"total_volume": $sum(volumes)
|
|
51
|
+
}
|
|
52
|
+
mode: "batch" # or "record"
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
### Custom Functions
|
|
56
|
+
|
|
57
|
+
Import and run external Python modules/functions:
|
|
58
|
+
|
|
59
|
+
```yaml
|
|
60
|
+
custom_function:
|
|
61
|
+
module: "myproject.transforms"
|
|
62
|
+
function: "optimize_data"
|
|
63
|
+
mode: "batch"
|
|
64
|
+
kwargs:
|
|
65
|
+
method: "min_volatility"
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
**All three can be used together!** Simple operations → JSONata → Custom functions.
|
|
69
|
+
|
|
70
|
+
## Enhanced Features
|
|
71
|
+
|
|
72
|
+
## Features Overview
|
|
73
|
+
|
|
74
|
+
### Phase 1: Core Streaming Infrastructure ✅
|
|
75
|
+
- **Streaming/Incremental ETL**: Process data in batches (Extract-Batch → Transform-Batch → Load-Batch)
|
|
76
|
+
- **Generator-based Extraction**: Async generators for memory-efficient data extraction
|
|
77
|
+
- **Memory Management**: Automatic memory monitoring and limits
|
|
78
|
+
|
|
79
|
+
### Phase 2: Observability & Configuration ✅
|
|
80
|
+
- **Configurable Processing Modes**: Choose between `full`, `streaming`, or `hybrid` modes
|
|
81
|
+
- **Progress Tracking**: Real-time progress reporting with callbacks
|
|
82
|
+
- **Error Recovery**: Retry strategies and error threshold management
|
|
83
|
+
|
|
84
|
+
### Phase 3: Advanced Features ✅
|
|
85
|
+
- **Checkpoint/Resume**: Save and resume long-running jobs
|
|
86
|
+
- **Multiple Runs Support**: Process multiple parameter sets efficiently with rate limiting
|
|
87
|
+
|
|
88
|
+
## Usage Examples
|
|
89
|
+
|
|
90
|
+
### Basic Usage (Backward Compatible)
|
|
91
|
+
|
|
92
|
+
```python
|
|
93
|
+
from pycharter.etl_generator import ETLOrchestrator
|
|
94
|
+
|
|
95
|
+
# Default behavior (full mode) - backward compatible
|
|
96
|
+
orchestrator = ETLOrchestrator(contract_dir="data/examples/my_contract")
|
|
97
|
+
result = await orchestrator.run()
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
### Streaming Mode (Memory Efficient)
|
|
101
|
+
|
|
102
|
+
Configure in `extract.yaml`:
|
|
103
|
+
```yaml
|
|
104
|
+
processing_mode: streaming
|
|
105
|
+
batch_size: 1000
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
Or use programmatically:
|
|
109
|
+
```python
|
|
110
|
+
orchestrator = ETLOrchestrator(contract_dir="data/examples/my_contract")
|
|
111
|
+
result = await orchestrator.run_streaming(batch_size=1000)
|
|
112
|
+
```
|
|
113
|
+
|
|
114
|
+
### Progress Tracking
|
|
115
|
+
|
|
116
|
+
```python
|
|
117
|
+
from pycharter.etl_generator import ETLOrchestrator, ETLProgress
|
|
118
|
+
|
|
119
|
+
def log_progress(progress: ETLProgress):
|
|
120
|
+
print(f"{progress}")
|
|
121
|
+
|
|
122
|
+
orchestrator = ETLOrchestrator(
|
|
123
|
+
contract_dir="data/examples/my_contract",
|
|
124
|
+
progress_callback=log_progress,
|
|
125
|
+
verbose=True
|
|
126
|
+
)
|
|
127
|
+
result = await orchestrator.run()
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
### Checkpoint/Resume
|
|
131
|
+
|
|
132
|
+
```python
|
|
133
|
+
orchestrator = ETLOrchestrator(
|
|
134
|
+
contract_dir="data/examples/my_contract",
|
|
135
|
+
checkpoint_dir="./checkpoints"
|
|
136
|
+
)
|
|
137
|
+
|
|
138
|
+
# Run with checkpoint
|
|
139
|
+
result = await orchestrator.run(checkpoint_id="my_job_001")
|
|
140
|
+
|
|
141
|
+
# Resume from checkpoint
|
|
142
|
+
result = await orchestrator.run(
|
|
143
|
+
checkpoint_id="my_job_001",
|
|
144
|
+
resume=True
|
|
145
|
+
)
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
### Memory Management
|
|
149
|
+
|
|
150
|
+
```python
|
|
151
|
+
orchestrator = ETLOrchestrator(
|
|
152
|
+
contract_dir="data/examples/my_contract",
|
|
153
|
+
max_memory_mb=2048 # Limit to 2GB
|
|
154
|
+
)
|
|
155
|
+
result = await orchestrator.run()
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
### Multiple Runs Processing
|
|
159
|
+
|
|
160
|
+
Run the same ETL pipeline multiple times with different parameters efficiently:
|
|
161
|
+
|
|
162
|
+
```python
|
|
163
|
+
orchestrator = ETLOrchestrator(contract_dir="data/examples/my_contract")
|
|
164
|
+
|
|
165
|
+
# Simple case: vary a single parameter (e.g., symbols)
|
|
166
|
+
results = await orchestrator.run_multiple(
|
|
167
|
+
param_name='symbol',
|
|
168
|
+
param_values=["AAPL", "MSFT", "GOOGL", "TSLA"],
|
|
169
|
+
batch_size=5,
|
|
170
|
+
delay_between_runs=1.0
|
|
171
|
+
)
|
|
172
|
+
|
|
173
|
+
for result in results:
|
|
174
|
+
params = result['params']
|
|
175
|
+
print(f"{params}: {result['success']} - {result.get('records', 0)} records")
|
|
176
|
+
|
|
177
|
+
# Complex case: vary multiple parameters
|
|
178
|
+
results = await orchestrator.run_multiple(
|
|
179
|
+
param_sets=[
|
|
180
|
+
{'symbol': 'AAPL', 'date': '2024-01-01'},
|
|
181
|
+
{'symbol': 'MSFT', 'date': '2024-01-02'},
|
|
182
|
+
{'symbol': 'GOOGL', 'date': '2024-01-03'},
|
|
183
|
+
],
|
|
184
|
+
batch_size=3,
|
|
185
|
+
delay_between_runs=0.5
|
|
186
|
+
)
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
### Error Recovery
|
|
190
|
+
|
|
191
|
+
```python
|
|
192
|
+
result = await orchestrator.run_streaming(
|
|
193
|
+
batch_size=1000,
|
|
194
|
+
max_retries=3,
|
|
195
|
+
error_threshold=0.1 # Abort if >10% of batches fail
|
|
196
|
+
)
|
|
197
|
+
```
|
|
198
|
+
|
|
199
|
+
## Configuration Options
|
|
200
|
+
|
|
201
|
+
### extract.yaml
|
|
202
|
+
|
|
203
|
+
```yaml
|
|
204
|
+
# Processing mode: 'full', 'streaming', or 'hybrid'
|
|
205
|
+
processing_mode: streaming
|
|
206
|
+
|
|
207
|
+
# Batch size for processing
|
|
208
|
+
batch_size: 1000
|
|
209
|
+
|
|
210
|
+
# Memory limit (optional)
|
|
211
|
+
max_memory_mb: 2048
|
|
212
|
+
|
|
213
|
+
# Checkpoint configuration (optional)
|
|
214
|
+
checkpoint:
|
|
215
|
+
enabled: true
|
|
216
|
+
interval: 100 # Checkpoint every N batches
|
|
217
|
+
```
|
|
218
|
+
|
|
219
|
+
## Processing Modes
|
|
220
|
+
|
|
221
|
+
### Full Mode (Default)
|
|
222
|
+
- **Behavior**: Extract all → Transform all → Load all
|
|
223
|
+
- **Use Case**: Small to medium datasets, backward compatible
|
|
224
|
+
- **Memory**: All data in memory at once
|
|
225
|
+
|
|
226
|
+
### Streaming Mode
|
|
227
|
+
- **Behavior**: Extract-Batch → Transform-Batch → Load-Batch (incremental)
|
|
228
|
+
- **Use Case**: Large datasets, memory-constrained environments
|
|
229
|
+
- **Memory**: Constant memory usage (batch size)
|
|
230
|
+
|
|
231
|
+
### Hybrid Mode
|
|
232
|
+
- **Behavior**: Extract in chunks, transform/load in batches
|
|
233
|
+
- **Use Case**: Fast extraction but slow transformation/loading
|
|
234
|
+
- **Memory**: Moderate memory usage
|
|
235
|
+
|
|
236
|
+
## Architecture
|
|
237
|
+
|
|
238
|
+
### New Modules
|
|
239
|
+
|
|
240
|
+
- **`progress.py`**: Progress tracking and observability
|
|
241
|
+
- **`checkpoint.py`**: Checkpoint/resume functionality
|
|
242
|
+
- **`extractors/`**: Modular extraction (HTTP, file, database, cloud) with streaming entry point `extract_with_pagination_streaming`
|
|
243
|
+
- **`orchestrator.py`**: Enhanced with all new features
|
|
244
|
+
|
|
245
|
+
### Backward Compatibility
|
|
246
|
+
|
|
247
|
+
All new features are **100% backward compatible**. Existing code continues to work without changes. New features are opt-in via:
|
|
248
|
+
- Configuration in `extract.yaml`
|
|
249
|
+
- Optional constructor parameters
|
|
250
|
+
- New method calls (`run_streaming()`, `run_multiple()`, etc.)
|
|
251
|
+
|
|
252
|
+
## Performance Considerations
|
|
253
|
+
|
|
254
|
+
### Memory Usage
|
|
255
|
+
- **Full Mode**: O(n) where n = total records
|
|
256
|
+
- **Streaming Mode**: O(b) where b = batch size
|
|
257
|
+
- **Hybrid Mode**: O(c) where c = chunk size
|
|
258
|
+
|
|
259
|
+
### Throughput
|
|
260
|
+
- **Full Mode**: Fastest for small datasets (single pass)
|
|
261
|
+
- **Streaming Mode**: Slower but handles unlimited size
|
|
262
|
+
- **Hybrid Mode**: Balanced approach
|
|
263
|
+
|
|
264
|
+
## Best Practices
|
|
265
|
+
|
|
266
|
+
1. **Use streaming mode** for datasets > 100K records
|
|
267
|
+
2. **Enable checkpoints** for jobs expected to run > 1 hour
|
|
268
|
+
3. **Set memory limits** to prevent OOM crashes
|
|
269
|
+
4. **Use progress callbacks** for monitoring long-running jobs
|
|
270
|
+
5. **Configure error thresholds** based on data quality expectations
|
|
271
|
+
|