dataenginex 0.4.0__tar.gz → 0.4.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (290) hide show
  1. {dataenginex-0.4.0 → dataenginex-0.4.2}/.github/PULL_REQUEST_TEMPLATE.md +1 -1
  2. {dataenginex-0.4.0 → dataenginex-0.4.2}/.github/workflows/ci.yml +5 -5
  3. dataenginex-0.4.2/.github/workflows/docs-sync.yml +26 -0
  4. {dataenginex-0.4.0 → dataenginex-0.4.2}/.github/workflows/release.yml +21 -5
  5. dataenginex-0.4.2/CHANGELOG.md +87 -0
  6. {dataenginex-0.4.0 → dataenginex-0.4.2}/CONTRIBUTING.md +1 -1
  7. {dataenginex-0.4.0 → dataenginex-0.4.2}/PKG-INFO +12 -5
  8. {dataenginex-0.4.0 → dataenginex-0.4.2}/README.md +3 -3
  9. {dataenginex-0.4.0 → dataenginex-0.4.2}/SECURITY.md +5 -3
  10. {dataenginex-0.4.0 → dataenginex-0.4.2}/docs/adr/0001-medallion-architecture.md +2 -0
  11. {dataenginex-0.4.0 → dataenginex-0.4.2}/docs/api-reference/index.md +0 -1
  12. {dataenginex-0.4.0 → dataenginex-0.4.2}/docs/ci-cd.md +1 -1
  13. {dataenginex-0.4.0 → dataenginex-0.4.2}/docs/contributing.md +1 -1
  14. {dataenginex-0.4.0 → dataenginex-0.4.2}/docs/development.md +8 -26
  15. dataenginex-0.4.2/docs/index.md +28 -0
  16. dataenginex-0.4.2/docs/observability.md +61 -0
  17. {dataenginex-0.4.0 → dataenginex-0.4.2}/docs/quickstart.md +2 -2
  18. dataenginex-0.4.2/docs/release-notes.md +94 -0
  19. {dataenginex-0.4.0 → dataenginex-0.4.2}/docs/sdlc.md +1 -1
  20. {dataenginex-0.4.0 → dataenginex-0.4.2}/examples/ecommerce/run_all.py +144 -101
  21. dataenginex-0.4.2/poe_tasks.toml +48 -0
  22. {dataenginex-0.4.0 → dataenginex-0.4.2}/pyproject.toml +37 -53
  23. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/README.md +3 -3
  24. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/RELEASE_NOTES.md +2 -0
  25. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/__init__.py +1 -1
  26. dataenginex-0.4.2/src/dataenginex/_json.py +27 -0
  27. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ai/agents/builtin.py +5 -6
  28. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ai/memory/long_term.py +3 -3
  29. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ai/runtime/executor.py +3 -3
  30. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ai/runtime/sandbox.py +8 -2
  31. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ai/tools/builtin.py +7 -2
  32. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/api/pagination.py +115 -114
  33. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/config/loader.py +27 -8
  34. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/config/schema.py +14 -7
  35. dataenginex-0.4.2/src/dataenginex/config/settings.py +127 -0
  36. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/core/interfaces.py +9 -0
  37. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/data/connectors/__init__.py +4 -0
  38. dataenginex-0.4.2/src/dataenginex/data/connectors/delta.py +161 -0
  39. dataenginex-0.4.2/src/dataenginex/data/connectors/http.py +191 -0
  40. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/data/connectors/legacy.py +4 -3
  41. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/data/connectors/parquet.py +4 -1
  42. dataenginex-0.4.2/src/dataenginex/data/connectors/rest.py +129 -0
  43. dataenginex-0.4.2/src/dataenginex/data/connectors/sse.py +168 -0
  44. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/data/pipeline/run_history.py +4 -4
  45. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/data/pipeline/runner.py +199 -13
  46. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/data/quality/gates.py +37 -6
  47. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/data/registry.py +4 -3
  48. dataenginex-0.4.2/src/dataenginex/data/transforms/sql.py +347 -0
  49. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/engine.py +43 -14
  50. dataenginex-0.4.2/src/dataenginex/lakehouse/catalog.py +212 -0
  51. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/lakehouse/storage.py +879 -722
  52. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ml/__init__.py +6 -4
  53. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ml/features/builtin.py +16 -0
  54. dataenginex-0.4.2/src/dataenginex/ml/registry.py +293 -0
  55. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ml/tracking/builtin.py +5 -5
  56. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ml/training.py +236 -2
  57. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/secops/audit.py +95 -52
  58. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/store.py +265 -218
  59. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/warehouse/lineage.py +7 -6
  60. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/conftest.py +24 -6
  61. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_dex_engine.py +5 -2
  62. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_dex_store.py +8 -5
  63. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_ml.py +189 -2
  64. {dataenginex-0.4.0 → dataenginex-0.4.2}/uv.lock +893 -1090
  65. dataenginex-0.4.0/CHANGELOG.md +0 -572
  66. dataenginex-0.4.0/docs/api-reference/dashboard.md +0 -8
  67. dataenginex-0.4.0/docs/index.md +0 -10
  68. dataenginex-0.4.0/docs/observability.md +0 -796
  69. dataenginex-0.4.0/docs/release-notes.md +0 -141
  70. dataenginex-0.4.0/poe_tasks.toml +0 -162
  71. dataenginex-0.4.0/src/dataenginex/data/transforms/sql.py +0 -171
  72. dataenginex-0.4.0/src/dataenginex/lakehouse/catalog.py +0 -172
  73. dataenginex-0.4.0/src/dataenginex/ml/registry.py +0 -187
  74. {dataenginex-0.4.0 → dataenginex-0.4.2}/.claude/commands/new-feature.md +0 -0
  75. {dataenginex-0.4.0 → dataenginex-0.4.2}/.claude/commands/validate.md +0 -0
  76. {dataenginex-0.4.0 → dataenginex-0.4.2}/.claude/settings.json +0 -0
  77. {dataenginex-0.4.0 → dataenginex-0.4.2}/.env.template +0 -0
  78. {dataenginex-0.4.0 → dataenginex-0.4.2}/.github/ISSUE_TEMPLATE/bug_report.yml +0 -0
  79. {dataenginex-0.4.0 → dataenginex-0.4.2}/.github/ISSUE_TEMPLATE/config.yml +0 -0
  80. {dataenginex-0.4.0 → dataenginex-0.4.2}/.github/ISSUE_TEMPLATE/feature_request.yml +0 -0
  81. {dataenginex-0.4.0 → dataenginex-0.4.2}/.github/dependabot.yml +0 -0
  82. {dataenginex-0.4.0 → dataenginex-0.4.2}/.github/labels.yml +0 -0
  83. {dataenginex-0.4.0 → dataenginex-0.4.2}/.github/release-pr-template.md +0 -0
  84. {dataenginex-0.4.0 → dataenginex-0.4.2}/.github/workflows/auto-pr.yml +0 -0
  85. {dataenginex-0.4.0 → dataenginex-0.4.2}/.github/workflows/enforce-dev-to-main.yml +0 -0
  86. {dataenginex-0.4.0 → dataenginex-0.4.2}/.github/workflows/security.yml +0 -0
  87. {dataenginex-0.4.0 → dataenginex-0.4.2}/.gitignore +0 -0
  88. {dataenginex-0.4.0 → dataenginex-0.4.2}/.gitleaks.toml +0 -0
  89. {dataenginex-0.4.0 → dataenginex-0.4.2}/.pre-commit-config.yaml +0 -0
  90. {dataenginex-0.4.0 → dataenginex-0.4.2}/.python-version +0 -0
  91. {dataenginex-0.4.0 → dataenginex-0.4.2}/CLAUDE.md +0 -0
  92. {dataenginex-0.4.0 → dataenginex-0.4.2}/CODEOWNERS +0 -0
  93. {dataenginex-0.4.0 → dataenginex-0.4.2}/CODE_OF_CONDUCT.md +0 -0
  94. {dataenginex-0.4.0 → dataenginex-0.4.2}/Dockerfile +0 -0
  95. {dataenginex-0.4.0 → dataenginex-0.4.2}/LICENSE +0 -0
  96. {dataenginex-0.4.0 → dataenginex-0.4.2}/docker-compose.test.yml +0 -0
  97. {dataenginex-0.4.0 → dataenginex-0.4.2}/docs/adr/0000-template.md +0 -0
  98. {dataenginex-0.4.0 → dataenginex-0.4.2}/docs/api-reference/api.md +0 -0
  99. {dataenginex-0.4.0 → dataenginex-0.4.2}/docs/api-reference/core.md +0 -0
  100. {dataenginex-0.4.0 → dataenginex-0.4.2}/docs/api-reference/data.md +0 -0
  101. {dataenginex-0.4.0 → dataenginex-0.4.2}/docs/api-reference/lakehouse.md +0 -0
  102. {dataenginex-0.4.0 → dataenginex-0.4.2}/docs/api-reference/middleware.md +0 -0
  103. {dataenginex-0.4.0 → dataenginex-0.4.2}/docs/api-reference/ml.md +0 -0
  104. {dataenginex-0.4.0 → dataenginex-0.4.2}/docs/api-reference/plugins.md +0 -0
  105. {dataenginex-0.4.0 → dataenginex-0.4.2}/docs/api-reference/warehouse.md +0 -0
  106. {dataenginex-0.4.0 → dataenginex-0.4.2}/docs/architecture.md +0 -0
  107. {dataenginex-0.4.0 → dataenginex-0.4.2}/docs/security-scanning.md +0 -0
  108. {dataenginex-0.4.0 → dataenginex-0.4.2}/examples/01_hello_pipeline.py +0 -0
  109. {dataenginex-0.4.0 → dataenginex-0.4.2}/examples/02_api_quickstart.py +0 -0
  110. {dataenginex-0.4.0 → dataenginex-0.4.2}/examples/03_quality_gate.py +0 -0
  111. {dataenginex-0.4.0 → dataenginex-0.4.2}/examples/04_ml_training.py +0 -0
  112. {dataenginex-0.4.0 → dataenginex-0.4.2}/examples/05_rag_demo.py +0 -0
  113. {dataenginex-0.4.0 → dataenginex-0.4.2}/examples/06_llm_quickstart.py +0 -0
  114. {dataenginex-0.4.0 → dataenginex-0.4.2}/examples/07_api_ingestion.py +0 -0
  115. {dataenginex-0.4.0 → dataenginex-0.4.2}/examples/08_spark_ml.py +0 -0
  116. {dataenginex-0.4.0 → dataenginex-0.4.2}/examples/09_feature_engineering.py +0 -0
  117. {dataenginex-0.4.0 → dataenginex-0.4.2}/examples/10_model_analysis.py +0 -0
  118. {dataenginex-0.4.0 → dataenginex-0.4.2}/examples/GUIDE.md +0 -0
  119. {dataenginex-0.4.0 → dataenginex-0.4.2}/examples/dashboard/dashboard_config.yaml +0 -0
  120. {dataenginex-0.4.0 → dataenginex-0.4.2}/examples/dashboard/run_dashboard.py +0 -0
  121. {dataenginex-0.4.0 → dataenginex-0.4.2}/examples/data/events.csv +0 -0
  122. {dataenginex-0.4.0 → dataenginex-0.4.2}/examples/data/users.csv +0 -0
  123. {dataenginex-0.4.0 → dataenginex-0.4.2}/examples/dex.yaml +0 -0
  124. {dataenginex-0.4.0 → dataenginex-0.4.2}/examples/ecommerce/data/customers.csv +0 -0
  125. {dataenginex-0.4.0 → dataenginex-0.4.2}/examples/ecommerce/data/orders.csv +0 -0
  126. {dataenginex-0.4.0 → dataenginex-0.4.2}/examples/ecommerce/data/products.csv +0 -0
  127. {dataenginex-0.4.0 → dataenginex-0.4.2}/examples/ecommerce/dex.yaml +0 -0
  128. {dataenginex-0.4.0 → dataenginex-0.4.2}/examples/movies.csv +0 -0
  129. {dataenginex-0.4.0 → dataenginex-0.4.2}/scripts/localstack/init.sh +0 -0
  130. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ai/__init__.py +0 -0
  131. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ai/agents/__init__.py +0 -0
  132. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ai/llm.py +0 -0
  133. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ai/memory/__init__.py +0 -0
  134. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ai/memory/base.py +0 -0
  135. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ai/memory/episodic.py +0 -0
  136. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ai/observability/__init__.py +0 -0
  137. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ai/observability/audit.py +0 -0
  138. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ai/observability/cost.py +0 -0
  139. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ai/observability/metrics.py +0 -0
  140. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ai/retrieval/__init__.py +0 -0
  141. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ai/retrieval/builtin.py +0 -0
  142. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ai/retrieval/graph.py +0 -0
  143. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ai/routing/__init__.py +0 -0
  144. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ai/routing/anthropic.py +0 -0
  145. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ai/routing/guarded.py +0 -0
  146. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ai/routing/ollama.py +0 -0
  147. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ai/routing/openai.py +0 -0
  148. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ai/routing/router.py +0 -0
  149. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ai/runtime/__init__.py +0 -0
  150. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ai/runtime/checkpoint.py +0 -0
  151. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ai/tools/__init__.py +0 -0
  152. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ai/vectorstore.py +0 -0
  153. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ai/workflows/__init__.py +0 -0
  154. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ai/workflows/conditions.py +0 -0
  155. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ai/workflows/dag.py +0 -0
  156. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ai/workflows/human_loop.py +0 -0
  157. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/api/__init__.py +0 -0
  158. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/api/errors.py +0 -0
  159. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/api/schemas.py +0 -0
  160. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/cli/__init__.py +0 -0
  161. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/cli/main.py +0 -0
  162. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/cli/run.py +0 -0
  163. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/cli/secops.py +0 -0
  164. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/cli/train.py +0 -0
  165. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/config/__init__.py +0 -0
  166. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/config/defaults.py +0 -0
  167. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/core/__init__.py +0 -0
  168. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/core/exceptions.py +0 -0
  169. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/core/medallion_architecture.py +0 -0
  170. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/core/quality.py +0 -0
  171. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/core/registry.py +0 -0
  172. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/core/schemas.py +0 -0
  173. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/core/validators.py +0 -0
  174. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/data/__init__.py +0 -0
  175. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/data/connectors/_utils.py +0 -0
  176. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/data/connectors/csv.py +0 -0
  177. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/data/connectors/dbt.py +0 -0
  178. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/data/connectors/duckdb.py +0 -0
  179. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/data/connectors/spark.py +0 -0
  180. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/data/pipeline/__init__.py +0 -0
  181. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/data/pipeline/dag.py +0 -0
  182. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/data/profiler.py +0 -0
  183. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/data/quality/__init__.py +0 -0
  184. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/data/quality/spark.py +0 -0
  185. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/data/transforms/__init__.py +0 -0
  186. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/lakehouse/__init__.py +0 -0
  187. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/lakehouse/partitioning.py +0 -0
  188. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/middleware/__init__.py +0 -0
  189. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/middleware/domain_metrics.py +0 -0
  190. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/middleware/logging_config.py +0 -0
  191. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/middleware/metrics.py +0 -0
  192. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ml/drift.py +0 -0
  193. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ml/features/__init__.py +0 -0
  194. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ml/metrics.py +0 -0
  195. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ml/mlflow_registry.py +0 -0
  196. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ml/serving.py +0 -0
  197. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ml/serving_engine/__init__.py +0 -0
  198. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ml/serving_engine/builtin.py +0 -0
  199. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/ml/tracking/__init__.py +0 -0
  200. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/orchestration/__init__.py +0 -0
  201. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/orchestration/builtin.py +0 -0
  202. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/orchestration/scheduler.py +0 -0
  203. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/plugins/__init__.py +0 -0
  204. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/plugins/registry.py +0 -0
  205. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/py.typed +0 -0
  206. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/secops/__init__.py +0 -0
  207. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/secops/gate.py +0 -0
  208. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/secops/guard.py +0 -0
  209. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/secops/masking.py +0 -0
  210. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/secops/pii.py +0 -0
  211. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/warehouse/__init__.py +0 -0
  212. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/warehouse/transforms.py +0 -0
  213. {dataenginex-0.4.0 → dataenginex-0.4.2}/src/dataenginex/worker.py +0 -0
  214. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/__init__.py +0 -0
  215. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/conformance/__init__.py +0 -0
  216. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/conformance/test_connector.py +0 -0
  217. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/conformance/test_feature_store.py +0 -0
  218. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/conformance/test_tracker.py +0 -0
  219. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/conformance/test_transform.py +0 -0
  220. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/fixtures/__init__.py +0 -0
  221. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/fixtures/sample_data.py +0 -0
  222. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/fixtures/sample_jobs.csv +0 -0
  223. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/fixtures/sample_jobs.json +0 -0
  224. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/integration/__init__.py +0 -0
  225. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/integration/test_ai_integration.py +0 -0
  226. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/integration/test_cli_run.py +0 -0
  227. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/integration/test_config_cli.py +0 -0
  228. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/integration/test_lineage_integration.py +0 -0
  229. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/integration/test_ml_integration.py +0 -0
  230. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/integration/test_pipeline_e2e.py +0 -0
  231. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/integration/test_secops_integration.py +0 -0
  232. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/integration/test_storage_real.py +0 -0
  233. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/load/__init__.py +0 -0
  234. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/__init__.py +0 -0
  235. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_agent_runtime.py +0 -0
  236. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_ai_modules.py +0 -0
  237. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_api_pagination.py +0 -0
  238. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_api_schemas.py +0 -0
  239. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_api_validators.py +0 -0
  240. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_builtin_agent.py +0 -0
  241. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_builtin_feature_store.py +0 -0
  242. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_builtin_serving.py +0 -0
  243. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_builtin_tracker.py +0 -0
  244. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_cli_train.py +0 -0
  245. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_config_loader.py +0 -0
  246. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_config_schema.py +0 -0
  247. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_config_schema_extended.py +0 -0
  248. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_core_exceptions.py +0 -0
  249. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_core_interfaces.py +0 -0
  250. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_core_quality.py +0 -0
  251. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_core_registry.py +0 -0
  252. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_core_schemas_extended.py +0 -0
  253. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_csv_connector.py +0 -0
  254. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_data.py +0 -0
  255. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_dbt_connector.py +0 -0
  256. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_domain_metrics.py +0 -0
  257. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_domain_metrics_wiring.py +0 -0
  258. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_drift_scheduler.py +0 -0
  259. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_duckdb_connector.py +0 -0
  260. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_guarded_provider.py +0 -0
  261. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_lakehouse.py +0 -0
  262. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_llm.py +0 -0
  263. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_llm_extended.py +0 -0
  264. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_llm_litellm_vllm.py +0 -0
  265. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_logging.py +0 -0
  266. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_medallion.py +0 -0
  267. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_medallion_extended.py +0 -0
  268. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_middleware_metrics.py +0 -0
  269. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_parquet_connector.py +0 -0
  270. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_pipeline_dag.py +0 -0
  271. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_pipeline_runner.py +0 -0
  272. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_plugins.py +0 -0
  273. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_privacy_guard_wiring.py +0 -0
  274. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_quality_gates.py +0 -0
  275. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_quality_spark.py +0 -0
  276. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_retriever.py +0 -0
  277. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_retriever_graph.py +0 -0
  278. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_run_history.py +0 -0
  279. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_scheduler.py +0 -0
  280. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_secops.py +0 -0
  281. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_secops_engine_and_cli.py +0 -0
  282. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_secops_guard.py +0 -0
  283. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_spark_connector.py +0 -0
  284. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_spark_fixtures.py +0 -0
  285. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_sql_transforms.py +0 -0
  286. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_storage_abstraction.py +0 -0
  287. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_vectorstore.py +0 -0
  288. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_vectorstore_extended.py +0 -0
  289. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_warehouse.py +0 -0
  290. {dataenginex-0.4.0 → dataenginex-0.4.2}/tests/unit/test_warehouse_transforms.py +0 -0
@@ -70,6 +70,6 @@ chore: establish org/domain foundation (pages, labels, project automation)
70
70
  - [ ] Org/Repo variable set: `ORG_PROJECT_URL`
71
71
  - [ ] Org/Repo secret set: `ORG_PROJECT_TOKEN`
72
72
  - [ ] Cloudflare DNS updated for docs/api/apex domains
73
- - [ ] Post-cutover checks completed (see `docs/DEPLOY_RUNBOOK.md` → Org + Domain Rollout)
73
+ - [ ] Post-cutover checks completed
74
74
 
75
75
  ## Notes for Reviewers
@@ -9,7 +9,7 @@ on:
9
9
 
10
10
  schedule:
11
11
  # Weekly Python version compatibility check
12
- - cron: '0 0 * * 0'
12
+ - cron: "0 0 * * 0"
13
13
 
14
14
  permissions:
15
15
  contents: read
@@ -20,7 +20,7 @@ jobs:
20
20
  runs-on: ubuntu-latest
21
21
  steps:
22
22
  - uses: actions/checkout@v6
23
- - uses: astral-sh/setup-uv@v8.1.0
23
+ - uses: astral-sh/setup-uv@v8.2.0
24
24
  with:
25
25
  version: "latest"
26
26
  python-version: "3.13"
@@ -35,7 +35,7 @@ jobs:
35
35
  needs: quality
36
36
  steps:
37
37
  - uses: actions/checkout@v6
38
- - uses: astral-sh/setup-uv@v8.1.0
38
+ - uses: astral-sh/setup-uv@v8.2.0
39
39
  with:
40
40
  version: "latest"
41
41
  python-version: "3.13"
@@ -43,7 +43,7 @@ jobs:
43
43
  env:
44
44
  UV_PROJECT_ENVIRONMENT: .venv
45
45
  - run: uv run poe test-cov-core
46
- - uses: codecov/codecov-action@v5
46
+ - uses: codecov/codecov-action@v7
47
47
  with:
48
48
  flags: dataenginex
49
49
  fail_ci_if_error: false
@@ -59,7 +59,7 @@ jobs:
59
59
  python-version: ["3.11", "3.12"]
60
60
  steps:
61
61
  - uses: actions/checkout@v6
62
- - uses: astral-sh/setup-uv@v8.1.0
62
+ - uses: astral-sh/setup-uv@v8.2.0
63
63
  with:
64
64
  version: "latest"
65
65
  python-version: ${{ matrix.python-version }}
@@ -0,0 +1,26 @@
1
+ name: Trigger Docs Sync
2
+
3
+ on:
4
+ push:
5
+ tags:
6
+ - "v*"
7
+ workflow_dispatch:
8
+ inputs:
9
+ version:
10
+ description: "Version tag to sync to website"
11
+ required: false
12
+ default: ""
13
+
14
+ jobs:
15
+ dispatch:
16
+ runs-on: ubuntu-latest
17
+ steps:
18
+ - name: Send repository_dispatch to website
19
+ uses: peter-evans/repository-dispatch@v4
20
+ with:
21
+ token: ${{ secrets.DOCS_SYNC_PAT }}
22
+ repository: TheDataEngineX/website
23
+ event-type: sync-docs
24
+ client-payload: >-
25
+ {"repo": "dataenginex",
26
+ "version": "${{ github.ref_name || inputs.version }}"}
@@ -3,7 +3,18 @@ name: Release
3
3
  on:
4
4
  push:
5
5
  tags:
6
- - 'v[0-9]+.[0-9]+.[0-9]+'
6
+ - "v[0-9]+.[0-9]+.[0-9]+"
7
+ workflow_dispatch:
8
+ inputs:
9
+ tag:
10
+ description: "Release tag (e.g. v0.4.1)"
11
+ required: true
12
+ type: string
13
+ ref:
14
+ description: "Git ref to build from (branch or SHA)"
15
+ required: false
16
+ default: "main"
17
+ type: string
7
18
 
8
19
  jobs:
9
20
  build:
@@ -13,7 +24,9 @@ jobs:
13
24
  contents: read
14
25
  steps:
15
26
  - uses: actions/checkout@v6
16
- - uses: astral-sh/setup-uv@v8.1.0
27
+ with:
28
+ ref: ${{ inputs.ref || github.ref }}
29
+ - uses: astral-sh/setup-uv@v8.2.0
17
30
  with:
18
31
  version: "latest"
19
32
  python-version: "3.13"
@@ -46,7 +59,9 @@ jobs:
46
59
  contents: write
47
60
  steps:
48
61
  - uses: actions/checkout@v6
49
- - uses: astral-sh/setup-uv@v8.1.0
62
+ with:
63
+ ref: ${{ inputs.ref || github.ref }}
64
+ - uses: astral-sh/setup-uv@v8.2.0
50
65
  with:
51
66
  version: "latest"
52
67
  - name: Extract version
@@ -62,8 +77,9 @@ jobs:
62
77
  - name: Create GitHub release + attach SBOM
63
78
  env:
64
79
  GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
80
+ RELEASE_TAG: ${{ inputs.tag || github.ref_name }}
65
81
  run: |
66
- gh release create ${{ github.ref_name }} \
67
- --title "DEX ${{ github.ref_name }}" \
82
+ gh release create "$RELEASE_TAG" \
83
+ --title "DEX $RELEASE_TAG" \
68
84
  --generate-notes \
69
85
  "sbom-dex-${{ steps.version.outputs.version }}.json"
@@ -0,0 +1,87 @@
1
+ # Changelog
2
+
3
+ See [docs/release-notes.md](docs/release-notes.md) for the complete release history.
4
+
5
+ All notable changes to `dataenginex` will be documented in this file.
6
+
7
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
8
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
9
+
10
+ ## [0.4.2] - 2026-06-23
11
+
12
+ ### Added
13
+
14
+ - Example scripts refreshed: `08_spark_ml.py`, `09_feature_engineering.py`, `10_model_analysis.py` — demonstrate PySpark ML, feature transforms, and drift detection
15
+ - `docs/release-notes.md` rewritten — cleaned up pre-reset 1.x entries, corrected version history to start from 0.3.5
16
+ - `docs/observability.md` trimmed to library-level content only; HTTP/K8s content moved to `dex-studio/docs/observability.md`
17
+
18
+ ### Changed
19
+
20
+ - Version bumped to 0.4.2
21
+
22
+ ## [0.4.1] - 2026-06-12
23
+
24
+ ### Added
25
+
26
+ - `dataenginex._json` — drop-in `orjson`-backed JSON shim (`dumps`, `loads`, `JSONResponse`) replacing stdlib `json` across the library for ~3–5× serialization throughput
27
+ - `DeltaConnector` — native Delta Lake read/write via `deltalake` (new `delta` optional extra: `pip install "dataenginex[delta]"`)
28
+ - `ml.features.builtin` — built-in feature transformers: `StandardScalerTransform`, `MinMaxScalerTransform`, `OneHotEncoderTransform`, `PolynomialFeaturesTransform`
29
+ - `core.interfaces` — new `Closeable` and `AsyncCloseable` protocols for uniform resource lifecycle
30
+ - `orjson>=3.11.0` and `zstandard>=0.25.0` promoted to core runtime dependencies
31
+
32
+ ### Changed
33
+
34
+ - **Lakehouse storage** (`lakehouse/storage.py`) — full rewrite: unified `LakehouseStorage` with pluggable backends (local, S3, GCS), Zstandard compression throughout, columnar partition pruning
35
+ - **Lakehouse catalog** (`lakehouse/catalog.py`) — catalog entries now carry partition stats and schema fingerprints; `register` / `resolve` API stabilised
36
+ - **ML registry** (`ml/registry.py`) — artifact versioning with aliasing (`promote_alias`), stage transitions (`development` → `staging` → `production`), and metadata search
37
+ - **ML training** (`ml/training.py`) — `TrainingJob` lifecycle management, early-stopping callbacks, cross-validation harness, experiment comparison utilities
38
+ - **Pagination** (`api/pagination.py`) — cursor-based and page-number strategies unified under `PaginationResult`; `paginate_query` helper works with any iterable
39
+ - **Store** (`store.py`) — async-safe DuckDB connection pool, `get_pipeline_runs` and `list_model_artifacts` now return typed dataclasses
40
+ - **SecOps audit** (`secops/audit.py`) — structured audit events with severity levels, retention policy enforcement, export to JSONL
41
+ - **AI runtime** (`ai/runtime/executor.py`) — tool call concurrency limit, timeout per tool, structured error envelopes
42
+ - **Config loader** (`config/loader.py`) — environment variable interpolation (`${VAR}`) and `include:` directive for config composition
43
+ - `zstandard` used for pipeline run history compression reducing on-disk footprint by ~60%
44
+
45
+ ### Fixed
46
+
47
+ - `mypy --strict` passes cleanly across all modules after strict type annotation pass
48
+ - `DeltaConnector` and `LakehouseStorage` excluded from coverage thresholds (require live filesystems); coverage gate unchanged for all other modules
49
+
50
+ ## [0.4.0] - 2026-02-21
51
+
52
+ > **Scope reset from 1.x.** Versions 1.0.0–1.1.2 were prematurely tagged stable. Resetting to `0.4.0` to honestly reflect pre-1.0 maturity. See [ADR-0007](https://github.com/TheDataEngineX/docs/blob/main/adr/0007-local-first-scope-reset.md) for rationale. The 1.x versions on PyPI are yanked but remain installable by exact pin (`pip install 'dataenginex==1.1.2'`); plain `pip install dataenginex` now resolves to `0.4.0`.
53
+
54
+ ### Added
55
+
56
+ - Stable `__all__` exports in every subpackage `__init__.py`
57
+ - `from __future__ import annotations` in all public modules
58
+ - Comprehensive module-level docstrings with usage examples
59
+ - New public API exports: `ComponentHealth`, `AuthMiddleware`, `AuthUser`,
60
+ `create_token`, `decode_token`, `BadRequestError`, `NotFoundError`,
61
+ `PaginationMeta`, `RateLimiter`, `RateLimitMiddleware`,
62
+ `ConnectorStatus`, `FetchResult`, `ColumnProfile`, `get_logger`, `get_tracer`
63
+
64
+ ### Changed
65
+
66
+ - Reorganized `__all__` in all subpackages for logical grouping
67
+ - Updated package version to 0.4.0
68
+
69
+ ## [0.3.5] - 2026-02-13
70
+
71
+ ### Added
72
+
73
+ - Production hardening: structured logging, Prometheus/OTel, health probes
74
+ - Data connectors: `RestConnector`, `FileConnector` with async interface
75
+ - Schema registry with versioned schema management
76
+ - Data profiler with automated dataset statistics
77
+ - Lakehouse catalog, partitioning, and storage backends
78
+ - ML framework: trainer, model registry, drift detection, serving
79
+ - Warehouse transforms and persistent lineage tracking
80
+ - JWT authentication middleware
81
+ - Rate limiting middleware
82
+ - Cursor-based pagination utilities
83
+ - Versioned API router (`/api/v1/`)
84
+
85
+ [0.3.5]: https://github.com/TheDataEngineX/dataenginex/releases/tag/v0.3.5
86
+ [0.4.0]: https://github.com/TheDataEngineX/dataenginex/compare/v0.3.5...v0.4.0
87
+ [0.4.1]: https://github.com/TheDataEngineX/dataenginex/compare/v0.4.0...v0.4.1
@@ -7,7 +7,7 @@ Thank you for your interest in contributing to DataEngineX!
7
7
  Quick essentials:
8
8
 
9
9
  - Development setup: [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md)
10
- - Governance & legal: [GOVERNANCE.md](GOVERNANCE.md)
10
+ - Governance & legal: See [docs/CONTRIBUTING.md](docs/CONTRIBUTING.md)
11
11
  - Code of Conduct: [CODE_OF_CONDUCT.md](CODE_OF_CONDUCT.md)
12
12
 
13
13
  ## Quick Start
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: dataenginex
3
- Version: 0.4.0
3
+ Version: 0.4.2
4
4
  Summary: DataEngineX — open-source, self-hosted, local-first Data + ML + AI workbench library
5
5
  Author-email: Jay <jayapal.myaka99@gmail.com>
6
6
  License: MIT License
@@ -30,18 +30,25 @@ Requires-Dist: click>=8.3.3
30
30
  Requires-Dist: croniter>=6.2.2
31
31
  Requires-Dist: duckdb>=1.5.2
32
32
  Requires-Dist: httpx>=0.28.1
33
+ Requires-Dist: msgpack>=1.2.1
34
+ Requires-Dist: orjson>=3.11.0
33
35
  Requires-Dist: prometheus-client>=0.25.0
34
36
  Requires-Dist: pyarrow>=23.0.1
37
+ Requires-Dist: pydantic-settings>=2.14.2
35
38
  Requires-Dist: pydantic>=2.13.4
36
- Requires-Dist: python-dotenv>=1.2.2
37
39
  Requires-Dist: pyyaml>=6.0.3
38
40
  Requires-Dist: structlog>=25.5.0
41
+ Requires-Dist: zstandard>=0.25.0
39
42
  Provides-Extra: cloud
40
43
  Requires-Dist: boto3>=1.43.7; extra == 'cloud'
41
44
  Requires-Dist: google-cloud-bigquery>=3.41.0; extra == 'cloud'
42
45
  Requires-Dist: google-cloud-storage>=3.10.1; extra == 'cloud'
46
+ Provides-Extra: delta
47
+ Requires-Dist: deltalake>=0.24.0; extra == 'delta'
43
48
  Provides-Extra: postgres
44
49
  Requires-Dist: asyncpg>=0.31.0; extra == 'postgres'
50
+ Provides-Extra: pytorch
51
+ Requires-Dist: torch>=2.0.0; extra == 'pytorch'
45
52
  Provides-Extra: qdrant
46
53
  Requires-Dist: qdrant-client>=1.18.0; extra == 'qdrant'
47
54
  Provides-Extra: queue
@@ -52,7 +59,7 @@ Description-Content-Type: text/markdown
52
59
 
53
60
  The Python library that powers [DEX Studio](https://github.com/TheDataEngineX/dex-studio) — an open-source, self-hosted, local-first Data + ML + AI workbench. Use the library directly when you want code, not a UI.
54
61
 
55
- > **Pre-1.0 status.** `0.4.0` is honest about that. See the [scope reset CHANGELOG](https://github.com/TheDataEngineX/dex/blob/main/CHANGELOG.md) for the rationale.
62
+ > **Pre-1.0 status.** `0.4.0` is honest about that. See the [scope reset CHANGELOG](https://github.com/TheDataEngineX/dataenginex/blob/main/CHANGELOG.md) for the rationale.
56
63
 
57
64
  ## Install
58
65
 
@@ -148,9 +155,9 @@ DEX Studio imports `dataenginex` directly — no separate API server.
148
155
 
149
156
  ## Links
150
157
 
151
- - Source: [github.com/TheDataEngineX/dex](https://github.com/TheDataEngineX/dex)
158
+ - Source: [github.com/TheDataEngineX/dataenginex](https://github.com/TheDataEngineX/dataenginex)
152
159
  - Docs: [docs.thedataenginex.org](https://docs.thedataenginex.org)
153
160
  - Roadmap: [docs/docs/roadmap/DESIGN-2026.md](https://github.com/TheDataEngineX/docs/blob/main/docs/roadmap/DESIGN-2026.md)
154
161
  - ADRs: [docs/adr/](https://github.com/TheDataEngineX/docs/tree/main/adr)
155
- - Issues: [github.com/TheDataEngineX/dex/issues](https://github.com/TheDataEngineX/dex/issues)
162
+ - Issues: [github.com/TheDataEngineX/dataenginex/issues](https://github.com/TheDataEngineX/dataenginex/issues)
156
163
  - License: MIT
@@ -1,6 +1,6 @@
1
1
  # dataenginex
2
2
 
3
- [![CI](https://github.com/TheDataEngineX/DEX/actions/workflows/ci.yml/badge.svg?branch=dev)](https://github.com/TheDataEngineX/DEX/actions/workflows/ci.yml)
3
+ [![CI](https://github.com/TheDataEngineX/dataenginex/actions/workflows/ci.yml/badge.svg?branch=dev)](https://github.com/TheDataEngineX/dataenginex/actions/workflows/ci.yml)
4
4
  [![PyPI](https://img.shields.io/pypi/v/dataenginex)](https://pypi.org/project/dataenginex/)
5
5
  [![Python 3.13+](https://img.shields.io/badge/python-3.13+-blue.svg)](https://www.python.org/downloads/)
6
6
  [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
@@ -96,7 +96,7 @@ ______________________________________________________________________
96
96
  ## Development
97
97
 
98
98
  ```bash
99
- git clone https://github.com/TheDataEngineX/dex && cd dex
99
+ git clone https://github.com/TheDataEngineX/dataenginex && cd dataenginex
100
100
  uv sync
101
101
  uv run poe check-all # lint + typecheck + tests
102
102
  uv run poe test-cov # tests + coverage
@@ -126,7 +126,7 @@ ______________________________________________________________________
126
126
 
127
127
  | Repo | Purpose |
128
128
  | --- | --- |
129
- | [dataenginex](https://github.com/TheDataEngineX/dex) | This library (PyPI) |
129
+ | [dataenginex](https://github.com/TheDataEngineX/dataenginex) | This library (PyPI) |
130
130
  | [dex-studio](https://github.com/TheDataEngineX/dex-studio) | Web UI — FastAPI + Jinja2 + HTMX |
131
131
  | [docs](https://github.com/TheDataEngineX/docs) | Docs site ([docs.thedataenginex.org](https://docs.thedataenginex.org)) — ADRs + roadmap live here |
132
132
 
@@ -4,7 +4,7 @@
4
4
 
5
5
  | Version | Supported |
6
6
  |---------|-----------|
7
- | Latest minor release (1.0.x) | ✅ |
7
+ | Latest (0.4.x) | ✅ |
8
8
  | Previous minor release | ✅ (security fixes only) |
9
9
  | Older versions | ❌ |
10
10
 
@@ -15,7 +15,7 @@
15
15
  Instead, please report them via one of these channels:
16
16
 
17
17
  1. **Email**: security@thedataenginex.dev
18
- 2. **GitHub Security Advisories**: Use the "Report a vulnerability" button on the Security tab
18
+ 1. **GitHub Security Advisories**: Use the "Report a vulnerability" button on the Security tab
19
19
 
20
20
  ### What to Include
21
21
 
@@ -53,6 +53,8 @@ DataEngineX follows these security practices:
53
53
 
54
54
  ## Security-Related Dependencies
55
55
 
56
+ *This table is illustrative, not exhaustive.*
57
+
56
58
  | Dependency | Purpose | Security Note |
57
59
  |------------|---------|---------------|
58
60
  | pydantic | Config validation | Validates all inputs |
@@ -68,4 +70,4 @@ Run security audits locally:
68
70
  uv run poe security # pip-audit for vulnerabilities
69
71
  ```
70
72
 
71
- CI runs `pip-audit` and dependency scanning on every PR.
73
+ CI runs `pip-audit` and dependency scanning on every PR.
@@ -4,6 +4,8 @@
4
4
  **Date**: 2026-02-15
5
5
  **Authors**: Data Engineering Team
6
6
 
7
+ > **Note:** Implementation details (tooling, storage) have evolved since this ADR was written. See [docs/architecture.md](docs/architecture.md) for the current design. References to ADR-0002, ADR-0003, and ADR-0004 are placeholders for decisions that were planned but not written.
8
+
7
9
  ## Context
8
10
 
9
11
  DEX needs a scalable, standardized data architecture that works across all projects (CareerDEX, Weather, etc.). The data pipeline needs to handle:
@@ -11,5 +11,4 @@ Auto-generated API documentation for the `dataenginex` package.
11
11
  | [middleware](middleware.md) | Logging, metrics, tracing, request middleware |
12
12
  | [ml](ml.md) | ML training, model registry, drift detection, serving, vectorstore, LLM adapters |
13
13
  | [plugins](plugins.md) | Plugin system — ABC, registry, auto-discovery |
14
- | [dashboard](dashboard.md) | Streamlit dashboard framework — panels, config |
15
14
  | [warehouse](warehouse.md) | Transforms, persistent lineage tracking |
@@ -124,7 +124,7 @@ ______________________________________________________________________
124
124
 
125
125
  ## Release Automation
126
126
 
127
- **Workflow**: [`.github/workflows/release.yml`](https://github.com/TheDataEngineX/dex/blob/main/.github/workflows/release.yml)
127
+ **Workflow**: [`.github/workflows/release.yml`](https://github.com/TheDataEngineX/dataenginex/blob/main/.github/workflows/release.yml)
128
128
 
129
129
  **Trigger**: Push a tag matching `v[0-9]+.[0-9]+.[0-9]+` to `main`
130
130
 
@@ -69,7 +69,7 @@ Use semantic commit format:
69
69
  - This project is open source under MIT; keep license and attribution notices in redistributions.
70
70
  - Forks are welcome, but should use a distinct public name when redistributed as a separate project.
71
71
  - Do not present a fork as the official DataEngineX/DEX project.
72
- - See [Trademark Policy](https://github.com/TheDataEngineX/dataenginex/blob/main/TRADEMARK_POLICY.md) for brand-usage details.
72
+ - See the project's license and brand guidelines for brand-usage details.
73
73
 
74
74
  ## Code Reviews
75
75
 
@@ -37,14 +37,14 @@ This installs all Python dependencies and configures pre-commit hooks.
37
37
 
38
38
  ```bash
39
39
  # 1. Clone repo and create feature branch
40
- git clone https://github.com/TheDataEngineX/dex.git
41
- cd dex
40
+ git clone https://github.com/TheDataEngineX/dataenginex.git
41
+ cd dataenginex
42
42
  git checkout -b feat/issue-XXX-description dev
43
43
 
44
- # 3. Install Python deps & pre-commit hooks
44
+ # 2. Install Python deps & pre-commit hooks
45
45
  uv run poe setup
46
46
 
47
- # 4. Verify setup
47
+ # 3. Verify setup
48
48
  uv run poe check-all
49
49
  ```
50
50
 
@@ -53,12 +53,11 @@ All tests and linting should pass. You're ready to develop!
53
53
  ## Project Structure
54
54
 
55
55
  ```
56
- DEX/
56
+ dataenginex/
57
57
  ├── src/dataenginex/ # Core framework package
58
58
  ├── examples/ # Runnable example scripts (01–10)
59
59
  ├── tests/ # Test suite
60
60
  ├── docs/ # Documentation
61
- ├── monitoring/ # Local observability stack configs
62
61
  ├── .github/workflows/ # CI/CD pipelines
63
62
  ├── pyproject.toml # Project config
64
63
  └── poe_tasks.toml # Task definitions
@@ -160,24 +159,7 @@ uv run poe test-unit
160
159
  uv run poe check-all
161
160
  ```
162
161
 
163
- ### Monitoring & Debugging
164
-
165
- ```bash
166
- # View application logs
167
- tail -f logs/app.log
168
-
169
- # Enable debug logging
170
- export LOG_LEVEL=DEBUG
171
- uv run poe dev
172
-
173
- # Use Python debugger
174
- python -m pdb examples/02_api_quickstart.py
175
-
176
- # Prometheus metrics (if running)
177
- open http://localhost:9090
178
- ```
179
-
180
- ## Troubleshooting
162
+ ### Troubleshooting
181
163
 
182
164
  | Issue | Solution |
183
165
  |-------|----------|
@@ -198,7 +180,7 @@ uv run poe test # Run all tests
198
180
  uv run poe test-cov # Tests with coverage report
199
181
  uv run poe security # pip-audit vulnerability scan
200
182
  uv run poe pre-commit # Run all pre-commit hooks
201
- uv run poe dev # Run dev server (localhost:17000)
183
+ uv run poe docker-up # Run Docker Compose stack
202
184
  uv run poe clean # Remove caches and build artifacts
203
185
  ```
204
186
 
@@ -208,5 +190,5 @@ uv run poe clean # Remove caches and build artifacts
208
190
  - **Architecture**: See [architecture.md](./architecture.md)
209
191
  - **ADRs**: See [ADR-0001](./adr/0001-medallion-architecture.md) for architectural decisions
210
192
  - **Deployment**: See Deployment Runbook in the `infradex` repo
211
- - **Issues**: [GitHub Issues](https://github.com/TheDataEngineX/dex/issues)
193
+ - **Issues**: [GitHub Issues](https://github.com/TheDataEngineX/dataenginex/issues)
212
194
  - **Discussions**: [GitHub Discussions](https://github.com/orgs/TheDataEngineX/discussions)
@@ -0,0 +1,28 @@
1
+ # DataEngineX Documentation
2
+
3
+ **The Python library (PyPI) — engine, config, CLI, pipelines, ML, AI, PrivacyGuard.**
4
+
5
+ Documentation for the core `dataenginex` library. For the web UI, see [dex-studio](https://github.com/TheDataEngineX/dex-studio).
6
+
7
+ ## Quick start
8
+
9
+ ```bash
10
+ pip install dataenginex
11
+ ```
12
+
13
+ or
14
+
15
+ ```bash
16
+ uv add dataenginex
17
+ ```
18
+
19
+ See [Quickstart](quickstart.md) for a full walkthrough with a sample project.
20
+
21
+ ## Guides
22
+
23
+ - [Quickstart](quickstart.md) — Get up and running in 5 minutes
24
+ - [Architecture](architecture.md) — Core patterns, module map, design decisions
25
+ - [Development Setup](development.md) — Prerequisites, workflow, troubleshooting
26
+ - [Release Notes](release-notes.md) — Full version history
27
+ - [Observability](observability.md) — Metrics, logging, tracing
28
+ - [Contributing](contributing.md) — How to contribute to the project
@@ -0,0 +1,61 @@
1
+ # Observability: Metrics, Logging & Tracing
2
+
3
+ **Library-level observability for `dataenginex`.** For application-level monitoring (HTTP middleware, health endpoints, Grafana dashboards), see [dex-studio/docs/observability.md](https://github.com/TheDataEngineX/dex-studio/blob/main/docs/observability.md).
4
+
5
+ ## Logging
6
+
7
+ `dataenginex` uses `structlog` for structured logging throughout the library. All loggers are configured by the host application (e.g., dex-studio). The library does not configure handlers itself.
8
+
9
+ ### Logging in your application
10
+
11
+ ```python
12
+ import structlog
13
+ from dataenginex.engine import DexEngine
14
+
15
+ structlog.configure(
16
+ processors=[structlog.dev.ConsoleRenderer()],
17
+ wrapper_class=structlog.make_filtering_bound_logger(20), # INFO
18
+ )
19
+ engine = DexEngine("dex.yaml")
20
+ ```
21
+
22
+ ## Metrics
23
+
24
+ Prometheus metrics are exposed via `dataenginex.observability.metrics`:
25
+
26
+ ```python
27
+ from dataenginex.observability.metrics import (
28
+ HTTP_REQUESTS_TOTAL,
29
+ HTTP_REQUEST_DURATION_SECONDS,
30
+ PIPELINE_RUN_DURATION,
31
+ )
32
+ ```
33
+
34
+ ### Available library-level metrics
35
+
36
+ | Metric | Type | Description | Labels |
37
+ |--------|------|-------------|--------|
38
+ | `pipeline_run_duration_seconds` | Histogram | Pipeline execution time | pipeline_name, status |
39
+ | `model_prediction_latency_seconds` | Histogram | Model inference time | model_name |
40
+ | `llm_request_duration_seconds` | Histogram | LLM call time | provider, model |
41
+ | `data_connector_rows_read` | Counter | Rows read by source connectors | connector_type |
42
+
43
+ ## Tracing
44
+
45
+ OpenTelemetry tracing is available via `dataenginex.tracing`:
46
+
47
+ ```python
48
+ from dataenginex.tracing import get_tracer
49
+
50
+ tracer = get_tracer(__name__)
51
+
52
+ with tracer.start_as_current_span("pipeline_run") as span:
53
+ span.set_attribute("pipeline.name", "ingest")
54
+ engine.run_pipeline("ingest")
55
+ ```
56
+
57
+ Enable OTLP export:
58
+
59
+ ```bash
60
+ export OTLP_ENDPOINT="http://localhost:4317"
61
+ ```
@@ -7,7 +7,7 @@ Get a DataEngineX pipeline running in under five minutes.
7
7
  ```bash
8
8
  pip install dataenginex
9
9
  # or from source:
10
- git clone https://github.com/TheDataEngineX/dex && cd dex
10
+ git clone https://github.com/TheDataEngineX/dataenginex && cd dataenginex
11
11
  uv sync
12
12
  ```
13
13
 
@@ -138,7 +138,7 @@ events = engine.secops_audit.events # list of AuditEvent
138
138
  - [API Reference](api-reference/index.md) — auto-generated module docs
139
139
  - `examples/` directory — full list of runnable examples
140
140
 
141
- ---
141
+ ______________________________________________________________________
142
142
 
143
143
  ## DEX Studio
144
144
 
@@ -0,0 +1,94 @@
1
+ # Release Notes
2
+
3
+ ## [0.4.2] — 2026-06-23
4
+
5
+ ### Added
6
+
7
+ - Example scripts refreshed: `08_spark_ml.py`, `09_feature_engineering.py`, `10_model_analysis.py` — demonstrate PySpark ML, feature transforms, and drift detection
8
+ - Documentation cleanup across all docs
9
+
10
+ ### Changed
11
+
12
+ - Version bumped to 0.4.2
13
+
14
+ ### Verification checklist
15
+
16
+ 1. `uv run poe lint` — Ruff checks clean
17
+ 1. `uv run poe typecheck` — mypy strict, 0 errors
18
+ 1. `uv run poe test` — all tests pass
19
+
20
+ ______________________________________________________________________
21
+
22
+ ## [0.4.1] - 2026-06-12
23
+
24
+ ### Added
25
+
26
+ - `dataenginex._json` — drop-in `orjson`-backed JSON shim (`dumps`, `loads`, `JSONResponse`) replacing stdlib `json` across the library for ~3–5× serialization throughput
27
+ - `DeltaConnector` — native Delta Lake read/write via `deltalake` (new `delta` optional extra: `pip install "dataenginex[delta]"`)
28
+ - `ml.features.builtin` — built-in feature transformers: `StandardScalerTransform`, `MinMaxScalerTransform`, `OneHotEncoderTransform`, `PolynomialFeaturesTransform`
29
+ - `core.interfaces` — new `Closeable` and `AsyncCloseable` protocols for uniform resource lifecycle
30
+ - `orjson>=3.11.0` and `zstandard>=0.25.0` promoted to core runtime dependencies
31
+
32
+ ### Changed
33
+
34
+ - **Lakehouse storage** (`lakehouse/storage.py`) — full rewrite: unified `LakehouseStorage` with pluggable backends (local, S3, GCS), Zstandard compression throughout, columnar partition pruning
35
+ - **Lakehouse catalog** (`lakehouse/catalog.py`) — catalog entries now carry partition stats and schema fingerprints; `register` / `resolve` API stabilised
36
+ - **ML registry** (`ml/registry.py`) — artifact versioning with aliasing (`promote_alias`), stage transitions (`development` → `staging` → `production`), and metadata search
37
+ - **ML training** (`ml/training.py`) — `TrainingJob` lifecycle management, early-stopping callbacks, cross-validation harness, experiment comparison utilities
38
+ - **Pagination** (`api/pagination.py`) — cursor-based and page-number strategies unified under `PaginationResult`; `paginate_query` helper works with any iterable
39
+ - **Store** (`store.py`) — async-safe DuckDB connection pool, `get_pipeline_runs` and `list_model_artifacts` now return typed dataclasses
40
+ - **SecOps audit** (`secops/audit.py`) — structured audit events with severity levels, retention policy enforcement, export to JSONL
41
+ - **AI runtime** (`ai/runtime/executor.py`) — tool call concurrency limit, timeout per tool, structured error envelopes
42
+ - **Config loader** (`config/loader.py`) — environment variable interpolation (`${VAR}`) and `include:` directive for config composition
43
+ - `zstandard` used for pipeline run history compression reducing on-disk footprint by ~60%
44
+
45
+ ### Fixed
46
+
47
+ - `mypy --strict` passes cleanly across all modules after strict type annotation pass
48
+ - `DeltaConnector` and `LakehouseStorage` excluded from coverage thresholds (require live filesystems); coverage gate unchanged for all other modules
49
+
50
+ ______________________________________________________________________
51
+
52
+ ## [0.4.0] - 2026-02-21
53
+
54
+ > **Scope reset from 1.x.** Versions 1.0.0–1.1.2 were prematurely tagged stable. Resetting to `0.4.0` to honestly reflect pre-1.0 maturity. See [ADR-0007](https://github.com/TheDataEngineX/docs/blob/main/adr/0007-local-first-scope-reset.md) for rationale. The 1.x versions on PyPI are yanked but remain installable by exact pin (`pip install 'dataenginex==1.1.2'`); plain `pip install dataenginex` now resolves to `0.4.0`.
55
+
56
+ ### Added
57
+
58
+ - Stable `__all__` exports in every subpackage `__init__.py`
59
+ - `from __future__ import annotations` in all public modules
60
+ - Comprehensive module-level docstrings with usage examples
61
+ - New public API exports: `ComponentHealth`, `AuthMiddleware`, `AuthUser`,
62
+ `create_token`, `decode_token`, `BadRequestError`, `NotFoundError`,
63
+ `PaginationMeta`, `RateLimiter`, `RateLimitMiddleware`,
64
+ `ConnectorStatus`, `FetchResult`, `ColumnProfile`, `get_logger`, `get_tracer`
65
+
66
+ ### Changed
67
+
68
+ - Reorganized `__all__` in all subpackages for logical grouping
69
+ - Updated package version to 0.4.0
70
+
71
+ ______________________________________________________________________
72
+
73
+ ## [0.3.5] - 2026-02-13
74
+
75
+ ### Added
76
+
77
+ - Production hardening: structured logging, Prometheus/OTel, health probes
78
+ - Data connectors: `RestConnector`, `FileConnector` with async interface
79
+ - Schema registry with versioned schema management
80
+ - Data profiler with automated dataset statistics
81
+ - Lakehouse catalog, partitioning, and storage backends
82
+ - ML framework: trainer, model registry, drift detection, serving
83
+ - Warehouse transforms and persistent lineage tracking
84
+ - JWT authentication middleware
85
+ - Rate limiting middleware
86
+ - Cursor-based pagination utilities
87
+ - Versioned API router (`/api/v1/`)
88
+
89
+ ______________________________________________________________________
90
+
91
+ [0.3.5]: https://github.com/TheDataEngineX/dataenginex/releases/tag/v0.3.5
92
+ [0.4.0]: https://github.com/TheDataEngineX/dataenginex/compare/v0.3.5...v0.4.0
93
+ [0.4.1]: https://github.com/TheDataEngineX/dataenginex/compare/v0.4.0...v0.4.1
94
+ [0.4.2]: https://github.com/TheDataEngineX/dataenginex/compare/v0.4.1...v0.4.2