py-data-engine 0.1.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- py_data_engine-0.1.0/PKG-INFO +330 -0
- py_data_engine-0.1.0/README.md +264 -0
- py_data_engine-0.1.0/pyproject.toml +120 -0
- py_data_engine-0.1.0/setup.cfg +4 -0
- py_data_engine-0.1.0/src/data_engine/__init__.py +37 -0
- py_data_engine-0.1.0/src/data_engine/application/__init__.py +39 -0
- py_data_engine-0.1.0/src/data_engine/application/actions.py +42 -0
- py_data_engine-0.1.0/src/data_engine/application/catalog.py +151 -0
- py_data_engine-0.1.0/src/data_engine/application/control.py +213 -0
- py_data_engine-0.1.0/src/data_engine/application/details.py +73 -0
- py_data_engine-0.1.0/src/data_engine/application/runtime.py +449 -0
- py_data_engine-0.1.0/src/data_engine/application/workspace.py +62 -0
- py_data_engine-0.1.0/src/data_engine/authoring/__init__.py +14 -0
- py_data_engine-0.1.0/src/data_engine/authoring/builder.py +31 -0
- py_data_engine-0.1.0/src/data_engine/authoring/execution/__init__.py +6 -0
- py_data_engine-0.1.0/src/data_engine/authoring/execution/app.py +6 -0
- py_data_engine-0.1.0/src/data_engine/authoring/execution/context.py +82 -0
- py_data_engine-0.1.0/src/data_engine/authoring/execution/continuous.py +176 -0
- py_data_engine-0.1.0/src/data_engine/authoring/execution/grouped.py +106 -0
- py_data_engine-0.1.0/src/data_engine/authoring/execution/logging.py +83 -0
- py_data_engine-0.1.0/src/data_engine/authoring/execution/polling.py +135 -0
- py_data_engine-0.1.0/src/data_engine/authoring/execution/runner.py +210 -0
- py_data_engine-0.1.0/src/data_engine/authoring/execution/single.py +171 -0
- py_data_engine-0.1.0/src/data_engine/authoring/flow.py +361 -0
- py_data_engine-0.1.0/src/data_engine/authoring/helpers.py +160 -0
- py_data_engine-0.1.0/src/data_engine/authoring/model.py +59 -0
- py_data_engine-0.1.0/src/data_engine/authoring/primitives.py +430 -0
- py_data_engine-0.1.0/src/data_engine/authoring/services.py +42 -0
- py_data_engine-0.1.0/src/data_engine/devtools/__init__.py +3 -0
- py_data_engine-0.1.0/src/data_engine/devtools/project_ast_map.py +503 -0
- py_data_engine-0.1.0/src/data_engine/docs/__init__.py +1 -0
- py_data_engine-0.1.0/src/data_engine/docs/sphinx_source/_static/custom.css +13 -0
- py_data_engine-0.1.0/src/data_engine/docs/sphinx_source/api.rst +42 -0
- py_data_engine-0.1.0/src/data_engine/docs/sphinx_source/conf.py +37 -0
- py_data_engine-0.1.0/src/data_engine/docs/sphinx_source/guides/app-runtime-and-workspaces.md +397 -0
- py_data_engine-0.1.0/src/data_engine/docs/sphinx_source/guides/authoring-flow-modules.md +215 -0
- py_data_engine-0.1.0/src/data_engine/docs/sphinx_source/guides/configuring-flows.md +185 -0
- py_data_engine-0.1.0/src/data_engine/docs/sphinx_source/guides/core-concepts.md +208 -0
- py_data_engine-0.1.0/src/data_engine/docs/sphinx_source/guides/database-methods.md +107 -0
- py_data_engine-0.1.0/src/data_engine/docs/sphinx_source/guides/duckdb-helpers.md +462 -0
- py_data_engine-0.1.0/src/data_engine/docs/sphinx_source/guides/flow-context.md +538 -0
- py_data_engine-0.1.0/src/data_engine/docs/sphinx_source/guides/flow-methods.md +206 -0
- py_data_engine-0.1.0/src/data_engine/docs/sphinx_source/guides/getting-started.md +271 -0
- py_data_engine-0.1.0/src/data_engine/docs/sphinx_source/guides/project-inventory.md +5683 -0
- py_data_engine-0.1.0/src/data_engine/docs/sphinx_source/guides/project-map.md +118 -0
- py_data_engine-0.1.0/src/data_engine/docs/sphinx_source/guides/recipes.md +268 -0
- py_data_engine-0.1.0/src/data_engine/docs/sphinx_source/index.rst +22 -0
- py_data_engine-0.1.0/src/data_engine/domain/__init__.py +92 -0
- py_data_engine-0.1.0/src/data_engine/domain/actions.py +69 -0
- py_data_engine-0.1.0/src/data_engine/domain/catalog.py +128 -0
- py_data_engine-0.1.0/src/data_engine/domain/details.py +214 -0
- py_data_engine-0.1.0/src/data_engine/domain/diagnostics.py +56 -0
- py_data_engine-0.1.0/src/data_engine/domain/errors.py +104 -0
- py_data_engine-0.1.0/src/data_engine/domain/inspection.py +99 -0
- py_data_engine-0.1.0/src/data_engine/domain/logs.py +118 -0
- py_data_engine-0.1.0/src/data_engine/domain/operations.py +172 -0
- py_data_engine-0.1.0/src/data_engine/domain/operator.py +72 -0
- py_data_engine-0.1.0/src/data_engine/domain/runs.py +155 -0
- py_data_engine-0.1.0/src/data_engine/domain/runtime.py +279 -0
- py_data_engine-0.1.0/src/data_engine/domain/source_state.py +17 -0
- py_data_engine-0.1.0/src/data_engine/domain/support.py +54 -0
- py_data_engine-0.1.0/src/data_engine/domain/time.py +23 -0
- py_data_engine-0.1.0/src/data_engine/domain/workspace.py +159 -0
- py_data_engine-0.1.0/src/data_engine/flow_modules/__init__.py +1 -0
- py_data_engine-0.1.0/src/data_engine/flow_modules/flow_module_compiler.py +179 -0
- py_data_engine-0.1.0/src/data_engine/flow_modules/flow_module_loader.py +201 -0
- py_data_engine-0.1.0/src/data_engine/helpers/__init__.py +25 -0
- py_data_engine-0.1.0/src/data_engine/helpers/duckdb.py +705 -0
- py_data_engine-0.1.0/src/data_engine/hosts/__init__.py +1 -0
- py_data_engine-0.1.0/src/data_engine/hosts/daemon/__init__.py +23 -0
- py_data_engine-0.1.0/src/data_engine/hosts/daemon/app.py +221 -0
- py_data_engine-0.1.0/src/data_engine/hosts/daemon/bootstrap.py +69 -0
- py_data_engine-0.1.0/src/data_engine/hosts/daemon/client.py +465 -0
- py_data_engine-0.1.0/src/data_engine/hosts/daemon/commands.py +64 -0
- py_data_engine-0.1.0/src/data_engine/hosts/daemon/composition.py +310 -0
- py_data_engine-0.1.0/src/data_engine/hosts/daemon/constants.py +15 -0
- py_data_engine-0.1.0/src/data_engine/hosts/daemon/entrypoints.py +97 -0
- py_data_engine-0.1.0/src/data_engine/hosts/daemon/lifecycle.py +191 -0
- py_data_engine-0.1.0/src/data_engine/hosts/daemon/manager.py +272 -0
- py_data_engine-0.1.0/src/data_engine/hosts/daemon/ownership.py +126 -0
- py_data_engine-0.1.0/src/data_engine/hosts/daemon/runtime_commands.py +188 -0
- py_data_engine-0.1.0/src/data_engine/hosts/daemon/runtime_control.py +31 -0
- py_data_engine-0.1.0/src/data_engine/hosts/daemon/server.py +84 -0
- py_data_engine-0.1.0/src/data_engine/hosts/daemon/shared_state.py +147 -0
- py_data_engine-0.1.0/src/data_engine/hosts/daemon/state_sync.py +101 -0
- py_data_engine-0.1.0/src/data_engine/platform/__init__.py +1 -0
- py_data_engine-0.1.0/src/data_engine/platform/identity.py +35 -0
- py_data_engine-0.1.0/src/data_engine/platform/local_settings.py +146 -0
- py_data_engine-0.1.0/src/data_engine/platform/theme.py +259 -0
- py_data_engine-0.1.0/src/data_engine/platform/workspace_models.py +190 -0
- py_data_engine-0.1.0/src/data_engine/platform/workspace_policy.py +333 -0
- py_data_engine-0.1.0/src/data_engine/runtime/__init__.py +1 -0
- py_data_engine-0.1.0/src/data_engine/runtime/file_watch.py +185 -0
- py_data_engine-0.1.0/src/data_engine/runtime/ledger_models.py +116 -0
- py_data_engine-0.1.0/src/data_engine/runtime/runtime_db.py +938 -0
- py_data_engine-0.1.0/src/data_engine/runtime/shared_state.py +523 -0
- py_data_engine-0.1.0/src/data_engine/services/__init__.py +49 -0
- py_data_engine-0.1.0/src/data_engine/services/daemon.py +64 -0
- py_data_engine-0.1.0/src/data_engine/services/daemon_state.py +40 -0
- py_data_engine-0.1.0/src/data_engine/services/flow_catalog.py +102 -0
- py_data_engine-0.1.0/src/data_engine/services/flow_execution.py +48 -0
- py_data_engine-0.1.0/src/data_engine/services/ledger.py +85 -0
- py_data_engine-0.1.0/src/data_engine/services/logs.py +65 -0
- py_data_engine-0.1.0/src/data_engine/services/runtime_binding.py +105 -0
- py_data_engine-0.1.0/src/data_engine/services/runtime_execution.py +126 -0
- py_data_engine-0.1.0/src/data_engine/services/runtime_history.py +62 -0
- py_data_engine-0.1.0/src/data_engine/services/settings.py +58 -0
- py_data_engine-0.1.0/src/data_engine/services/shared_state.py +28 -0
- py_data_engine-0.1.0/src/data_engine/services/theme.py +59 -0
- py_data_engine-0.1.0/src/data_engine/services/workspace_provisioning.py +224 -0
- py_data_engine-0.1.0/src/data_engine/services/workspaces.py +74 -0
- py_data_engine-0.1.0/src/data_engine/ui/__init__.py +3 -0
- py_data_engine-0.1.0/src/data_engine/ui/cli/__init__.py +19 -0
- py_data_engine-0.1.0/src/data_engine/ui/cli/app.py +161 -0
- py_data_engine-0.1.0/src/data_engine/ui/cli/commands_doctor.py +178 -0
- py_data_engine-0.1.0/src/data_engine/ui/cli/commands_run.py +80 -0
- py_data_engine-0.1.0/src/data_engine/ui/cli/commands_start.py +100 -0
- py_data_engine-0.1.0/src/data_engine/ui/cli/commands_workspace.py +97 -0
- py_data_engine-0.1.0/src/data_engine/ui/cli/dependencies.py +44 -0
- py_data_engine-0.1.0/src/data_engine/ui/cli/parser.py +56 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/__init__.py +25 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/app.py +116 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/bootstrap.py +487 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/bootstrapper.py +140 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/cache_models.py +23 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/control_support.py +185 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/controllers/__init__.py +6 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/controllers/flows.py +439 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/controllers/runtime.py +245 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/dialogs/__init__.py +12 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/dialogs/messages.py +88 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/dialogs/previews.py +222 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/helpers/__init__.py +62 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/helpers/inspection.py +81 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/helpers/lifecycle.py +112 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/helpers/scroll.py +28 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/helpers/theming.py +87 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/icons/dark_light.svg +12 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/icons/documentation.svg +1 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/icons/failed.svg +3 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/icons/group.svg +4 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/icons/home.svg +2 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/icons/manual.svg +2 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/icons/poll.svg +2 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/icons/schedule.svg +4 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/icons/settings.svg +2 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/icons/started.svg +3 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/icons/success.svg +3 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/icons/view-log.svg +3 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/icons.py +50 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/launcher.py +48 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/presenters/__init__.py +72 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/presenters/docs.py +140 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/presenters/logs.py +58 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/presenters/runtime_projection.py +29 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/presenters/sidebar.py +88 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/presenters/steps.py +148 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/presenters/workspace.py +39 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/presenters/workspace_binding.py +75 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/presenters/workspace_settings.py +182 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/preview_models.py +37 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/render_support.py +241 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/rendering/__init__.py +12 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/rendering/artifacts.py +95 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/rendering/icons.py +50 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/runtime.py +47 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/state_support.py +193 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/support.py +214 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/surface.py +209 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/theme.py +720 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/widgets/__init__.py +34 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/widgets/config.py +41 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/widgets/logs.py +62 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/widgets/panels.py +507 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/widgets/sidebar.py +130 -0
- py_data_engine-0.1.0/src/data_engine/ui/gui/widgets/steps.py +84 -0
- py_data_engine-0.1.0/src/data_engine/ui/tui/__init__.py +5 -0
- py_data_engine-0.1.0/src/data_engine/ui/tui/app.py +222 -0
- py_data_engine-0.1.0/src/data_engine/ui/tui/bootstrap.py +475 -0
- py_data_engine-0.1.0/src/data_engine/ui/tui/bootstrapper.py +117 -0
- py_data_engine-0.1.0/src/data_engine/ui/tui/controllers/__init__.py +6 -0
- py_data_engine-0.1.0/src/data_engine/ui/tui/controllers/flows.py +349 -0
- py_data_engine-0.1.0/src/data_engine/ui/tui/controllers/runtime.py +167 -0
- py_data_engine-0.1.0/src/data_engine/ui/tui/runtime.py +34 -0
- py_data_engine-0.1.0/src/data_engine/ui/tui/state_support.py +141 -0
- py_data_engine-0.1.0/src/data_engine/ui/tui/support.py +63 -0
- py_data_engine-0.1.0/src/data_engine/ui/tui/theme.py +204 -0
- py_data_engine-0.1.0/src/data_engine/ui/tui/widgets.py +123 -0
- py_data_engine-0.1.0/src/data_engine/views/__init__.py +109 -0
- py_data_engine-0.1.0/src/data_engine/views/actions.py +80 -0
- py_data_engine-0.1.0/src/data_engine/views/artifacts.py +58 -0
- py_data_engine-0.1.0/src/data_engine/views/flow_display.py +69 -0
- py_data_engine-0.1.0/src/data_engine/views/logs.py +54 -0
- py_data_engine-0.1.0/src/data_engine/views/models.py +96 -0
- py_data_engine-0.1.0/src/data_engine/views/presentation.py +133 -0
- py_data_engine-0.1.0/src/data_engine/views/runs.py +62 -0
- py_data_engine-0.1.0/src/data_engine/views/state.py +39 -0
- py_data_engine-0.1.0/src/data_engine/views/status.py +13 -0
- py_data_engine-0.1.0/src/data_engine/views/text.py +109 -0
- py_data_engine-0.1.0/src/py_data_engine.egg-info/PKG-INFO +330 -0
- py_data_engine-0.1.0/src/py_data_engine.egg-info/SOURCES.txt +247 -0
- py_data_engine-0.1.0/src/py_data_engine.egg-info/dependency_links.txt +1 -0
- py_data_engine-0.1.0/src/py_data_engine.egg-info/entry_points.txt +2 -0
- py_data_engine-0.1.0/src/py_data_engine.egg-info/requires.txt +54 -0
- py_data_engine-0.1.0/src/py_data_engine.egg-info/top_level.txt +1 -0
- py_data_engine-0.1.0/tests/test_application.py +1200 -0
- py_data_engine-0.1.0/tests/test_authoring_helpers.py +179 -0
- py_data_engine-0.1.0/tests/test_authoring_services.py +89 -0
- py_data_engine-0.1.0/tests/test_builder.py +991 -0
- py_data_engine-0.1.0/tests/test_cli.py +372 -0
- py_data_engine-0.1.0/tests/test_cli_helpers.py +162 -0
- py_data_engine-0.1.0/tests/test_daemon.py +1759 -0
- py_data_engine-0.1.0/tests/test_domain_actions.py +71 -0
- py_data_engine-0.1.0/tests/test_domain_catalog.py +63 -0
- py_data_engine-0.1.0/tests/test_domain_details.py +107 -0
- py_data_engine-0.1.0/tests/test_domain_errors.py +25 -0
- py_data_engine-0.1.0/tests/test_domain_inspection.py +31 -0
- py_data_engine-0.1.0/tests/test_domain_logs.py +101 -0
- py_data_engine-0.1.0/tests/test_domain_operations.py +50 -0
- py_data_engine-0.1.0/tests/test_domain_operator.py +30 -0
- py_data_engine-0.1.0/tests/test_domain_runs.py +115 -0
- py_data_engine-0.1.0/tests/test_domain_runtime.py +126 -0
- py_data_engine-0.1.0/tests/test_domain_support.py +20 -0
- py_data_engine-0.1.0/tests/test_domain_workspace.py +90 -0
- py_data_engine-0.1.0/tests/test_export_project_bundle_script.py +144 -0
- py_data_engine-0.1.0/tests/test_flow_module_compiler.py +179 -0
- py_data_engine-0.1.0/tests/test_flow_module_loader.py +323 -0
- py_data_engine-0.1.0/tests/test_helpers_duckdb.py +856 -0
- py_data_engine-0.1.0/tests/test_integration.py +661 -0
- py_data_engine-0.1.0/tests/test_live_runtime_suite.py +830 -0
- py_data_engine-0.1.0/tests/test_local_settings.py +70 -0
- py_data_engine-0.1.0/tests/test_logs.py +246 -0
- py_data_engine-0.1.0/tests/test_platform_identity.py +29 -0
- py_data_engine-0.1.0/tests/test_project_ast_map.py +121 -0
- py_data_engine-0.1.0/tests/test_qt_ui.py +2476 -0
- py_data_engine-0.1.0/tests/test_runtime_db.py +307 -0
- py_data_engine-0.1.0/tests/test_runtime_history_service.py +193 -0
- py_data_engine-0.1.0/tests/test_services.py +505 -0
- py_data_engine-0.1.0/tests/test_shared_state.py +602 -0
- py_data_engine-0.1.0/tests/test_sources.py +230 -0
- py_data_engine-0.1.0/tests/test_theme.py +28 -0
- py_data_engine-0.1.0/tests/test_tui.py +730 -0
- py_data_engine-0.1.0/tests/test_tui_widgets.py +147 -0
- py_data_engine-0.1.0/tests/test_ui_bootstrap.py +420 -0
- py_data_engine-0.1.0/tests/test_ui_models.py +364 -0
- py_data_engine-0.1.0/tests/test_ui_runtime_theme.py +91 -0
- py_data_engine-0.1.0/tests/test_ui_state.py +83 -0
- py_data_engine-0.1.0/tests/test_views_helpers.py +409 -0
- py_data_engine-0.1.0/tests/test_workspace_provisioning.py +75 -0
|
@@ -0,0 +1,330 @@
|
|
|
1
|
+
Metadata-Version: 2.4
|
|
2
|
+
Name: py-data-engine
|
|
3
|
+
Version: 0.1.0
|
|
4
|
+
Summary: Workbook-driven workflow jobs with strict filename parsing and DuckDB-backed processing.
|
|
5
|
+
Author: Data Engine contributors
|
|
6
|
+
License-Expression: MIT
|
|
7
|
+
Project-URL: Homepage, https://github.com/bj-data-eng/data-engine
|
|
8
|
+
Project-URL: Repository, https://github.com/bj-data-eng/data-engine
|
|
9
|
+
Project-URL: Issues, https://github.com/bj-data-eng/data-engine/issues
|
|
10
|
+
Keywords: duckdb,excel,parquet,flow,polars
|
|
11
|
+
Classifier: Development Status :: 3 - Alpha
|
|
12
|
+
Classifier: Intended Audience :: Developers
|
|
13
|
+
Classifier: Programming Language :: Python :: 3
|
|
14
|
+
Classifier: Programming Language :: Python :: 3.14
|
|
15
|
+
Classifier: Topic :: Database
|
|
16
|
+
Classifier: Topic :: Office/Business :: Financial :: Spreadsheet
|
|
17
|
+
Classifier: Topic :: Software Development :: Libraries :: Python Modules
|
|
18
|
+
Requires-Python: >=3.14
|
|
19
|
+
Description-Content-Type: text/markdown
|
|
20
|
+
Requires-Dist: duckdb>=1.5.0
|
|
21
|
+
Requires-Dist: fastexcel>=0.19.0
|
|
22
|
+
Requires-Dist: pyarrow>=23.0.1
|
|
23
|
+
Requires-Dist: PySide6>=6.9.0
|
|
24
|
+
Requires-Dist: textual>=0.74.0
|
|
25
|
+
Requires-Dist: xlsxwriter>=3.2.9
|
|
26
|
+
Provides-Extra: polars
|
|
27
|
+
Requires-Dist: polars>=1.39.0; extra == "polars"
|
|
28
|
+
Provides-Extra: polars-lts
|
|
29
|
+
Requires-Dist: polars-lts-cpu>=1.33.1; extra == "polars-lts"
|
|
30
|
+
Provides-Extra: synthetic
|
|
31
|
+
Requires-Dist: numpy>=2.4.0; extra == "synthetic"
|
|
32
|
+
Requires-Dist: openpyxl>=3.1.0; extra == "synthetic"
|
|
33
|
+
Requires-Dist: pandas>=3.0.0; extra == "synthetic"
|
|
34
|
+
Provides-Extra: notebook
|
|
35
|
+
Requires-Dist: ipykernel>=7.0.0; extra == "notebook"
|
|
36
|
+
Requires-Dist: ipython>=9.0.0; extra == "notebook"
|
|
37
|
+
Requires-Dist: jupyterlab>=4.4.0; extra == "notebook"
|
|
38
|
+
Requires-Dist: notebook>=7.4.0; extra == "notebook"
|
|
39
|
+
Provides-Extra: docs
|
|
40
|
+
Requires-Dist: myst-parser>=4.0.0; extra == "docs"
|
|
41
|
+
Requires-Dist: Sphinx>=9.1.0; extra == "docs"
|
|
42
|
+
Requires-Dist: sphinx_rtd_theme>=3.1.0; extra == "docs"
|
|
43
|
+
Provides-Extra: package
|
|
44
|
+
Requires-Dist: pyinstaller>=6.15.0; extra == "package"
|
|
45
|
+
Provides-Extra: test
|
|
46
|
+
Requires-Dist: openpyxl>=3.1.0; extra == "test"
|
|
47
|
+
Requires-Dist: pandas>=3.0.0; extra == "test"
|
|
48
|
+
Requires-Dist: pydocstyle>=6.3.0; extra == "test"
|
|
49
|
+
Requires-Dist: pytest>=9.0.0; extra == "test"
|
|
50
|
+
Requires-Dist: pytest-cov>=7.0.0; extra == "test"
|
|
51
|
+
Provides-Extra: dev
|
|
52
|
+
Requires-Dist: ipykernel>=7.0.0; extra == "dev"
|
|
53
|
+
Requires-Dist: ipython>=9.0.0; extra == "dev"
|
|
54
|
+
Requires-Dist: jupyterlab>=4.4.0; extra == "dev"
|
|
55
|
+
Requires-Dist: myst-parser>=4.0.0; extra == "dev"
|
|
56
|
+
Requires-Dist: notebook>=7.4.0; extra == "dev"
|
|
57
|
+
Requires-Dist: numpy>=2.4.0; extra == "dev"
|
|
58
|
+
Requires-Dist: openpyxl>=3.1.0; extra == "dev"
|
|
59
|
+
Requires-Dist: pandas>=3.0.0; extra == "dev"
|
|
60
|
+
Requires-Dist: pydocstyle>=6.3.0; extra == "dev"
|
|
61
|
+
Requires-Dist: pyinstaller>=6.15.0; extra == "dev"
|
|
62
|
+
Requires-Dist: pytest>=9.0.0; extra == "dev"
|
|
63
|
+
Requires-Dist: pytest-cov>=7.0.0; extra == "dev"
|
|
64
|
+
Requires-Dist: Sphinx>=9.1.0; extra == "dev"
|
|
65
|
+
Requires-Dist: sphinx_rtd_theme>=3.1.0; extra == "dev"
|
|
66
|
+
|
|
67
|
+
# Data Engine
|
|
68
|
+
|
|
69
|
+
Data Engine is a pre-alpha workflow runtime for file-driven jobs. A flow declares:
|
|
70
|
+
|
|
71
|
+
- a group
|
|
72
|
+
- an optional runtime trigger via `watch(...)`
|
|
73
|
+
- ordered generic `step(...)` callables
|
|
74
|
+
|
|
75
|
+
The runtime orchestrates source handling, scheduling, and mirrored output routing. Poll freshness is tracked in the runtime ledger rather than by comparing output mtimes. Step functions use native libraries directly, such as Polars for dataframe work and DuckDB for SQL work.
|
|
76
|
+
|
|
77
|
+
## Install
|
|
78
|
+
|
|
79
|
+
### Installer scripts
|
|
80
|
+
|
|
81
|
+
Use the installer that matches your environment:
|
|
82
|
+
|
|
83
|
+
- macOS: [INSTALL/INSTALL MAC.command](INSTALL/INSTALL%20MAC.command)
|
|
84
|
+
- Windows: [INSTALL/INSTALL WINDOWS.bat](INSTALL/INSTALL%20WINDOWS.bat)
|
|
85
|
+
- Windows VM / CPU-safe Polars test path: [INSTALL/INSTALL WINDOWS_VM.bat](INSTALL/INSTALL%20WINDOWS_VM.bat)
|
|
86
|
+
|
|
87
|
+
The macOS and standard Windows installers install the regular Polars package. The Windows VM installer installs the `polars-lts-cpu` variant instead.
|
|
88
|
+
|
|
89
|
+
### Manual install
|
|
90
|
+
|
|
91
|
+
Data Engine now uses explicit Polars extras so you can choose the runtime package:
|
|
92
|
+
|
|
93
|
+
```bash
|
|
94
|
+
python -m pip install -e ".[dev,polars]"
|
|
95
|
+
```
|
|
96
|
+
|
|
97
|
+
For the CPU-safe LTS variant:
|
|
98
|
+
|
|
99
|
+
```bash
|
|
100
|
+
python -m pip install -e ".[dev,polars-lts]"
|
|
101
|
+
```
|
|
102
|
+
|
|
103
|
+
Launch the GUI with:
|
|
104
|
+
|
|
105
|
+
```bash
|
|
106
|
+
python -m data_engine.ui.cli.app start gui
|
|
107
|
+
```
|
|
108
|
+
|
|
109
|
+
## Public API
|
|
110
|
+
|
|
111
|
+
```python
|
|
112
|
+
from data_engine import Flow, FlowContext, discover_flows, load_flow, run
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
## Headless CLI
|
|
116
|
+
|
|
117
|
+
Data Engine now ships with a headless CLI:
|
|
118
|
+
|
|
119
|
+
```bash
|
|
120
|
+
data-engine list
|
|
121
|
+
data-engine show example_summary
|
|
122
|
+
data-engine run --once example_summary
|
|
123
|
+
data-engine run
|
|
124
|
+
```
|
|
125
|
+
|
|
126
|
+
`data-engine run` starts the automated engine headlessly for discovered automated flows and keeps running until stopped. Use `--once` to force a single pass instead.
|
|
127
|
+
|
|
128
|
+
## Workspace Model
|
|
129
|
+
|
|
130
|
+
Data Engine discovers workspaces from a collection root resolved from:
|
|
131
|
+
|
|
132
|
+
- `DATA_ENGINE_WORKSPACE_COLLECTION_ROOT`, when explicitly set
|
|
133
|
+
- `DATA_ENGINE_WORKSPACE_ROOT`, when binding directly to one authored workspace
|
|
134
|
+
- otherwise the machine-local app settings store
|
|
135
|
+
|
|
136
|
+
Each immediate child folder containing `flow_modules/` is treated as a workspace, for example:
|
|
137
|
+
|
|
138
|
+
- `workspaces/example_workspace/flow_modules/`
|
|
139
|
+
- `workspaces/claims2/flow_modules/`
|
|
140
|
+
|
|
141
|
+
The app resolves per-workspace local artifacts under:
|
|
142
|
+
|
|
143
|
+
- `artifacts/workspace_cache/<workspace_id>/`
|
|
144
|
+
- `artifacts/runtime_state/<workspace_id>/`
|
|
145
|
+
|
|
146
|
+
Shared lease and checkpoint state lives inside each authored workspace:
|
|
147
|
+
|
|
148
|
+
- `workspaces/<workspace_id>/.workspace_state/`
|
|
149
|
+
|
|
150
|
+
The app's workspace selection and collection-root preference are machine-local state, not repo-local config checked into the project tree.
|
|
151
|
+
|
|
152
|
+
## Basic shape
|
|
153
|
+
|
|
154
|
+
```python
|
|
155
|
+
from data_engine import Flow
|
|
156
|
+
import polars as pl
|
|
157
|
+
|
|
158
|
+
|
|
159
|
+
def read_claims(context):
|
|
160
|
+
return pl.read_excel(context.source.path)
|
|
161
|
+
|
|
162
|
+
|
|
163
|
+
def keep_open(context):
|
|
164
|
+
return context.current.filter(pl.col("status") == "OPEN")
|
|
165
|
+
|
|
166
|
+
|
|
167
|
+
def write_parquet(context):
|
|
168
|
+
output = context.mirror.with_suffix(".parquet")
|
|
169
|
+
context.current.write_parquet(output)
|
|
170
|
+
return output
|
|
171
|
+
|
|
172
|
+
|
|
173
|
+
def build():
|
|
174
|
+
return (
|
|
175
|
+
Flow(group="Claims")
|
|
176
|
+
.watch(
|
|
177
|
+
mode="poll",
|
|
178
|
+
source="../../../example_data/Input/claims_flat",
|
|
179
|
+
interval="5s",
|
|
180
|
+
extensions=[".xlsx", ".xls", ".xlsm"],
|
|
181
|
+
settle=1,
|
|
182
|
+
)
|
|
183
|
+
.mirror(root="../../../example_data/Output/example_mirror")
|
|
184
|
+
.step(read_claims, save_as="raw_df")
|
|
185
|
+
.step(keep_open, use="raw_df", save_as="filtered_df")
|
|
186
|
+
.step(write_parquet, use="filtered_df")
|
|
187
|
+
)
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
## Batch helpers
|
|
191
|
+
|
|
192
|
+
For batch-oriented flows, use `Flow.collect(...)` and either `Flow.map(...)` or `Flow.step_each(...)` instead of importing extra helpers or hand-managing raw lists.
|
|
193
|
+
|
|
194
|
+
```python
|
|
195
|
+
from data_engine import Flow
|
|
196
|
+
|
|
197
|
+
|
|
198
|
+
def validate_workbook(context, file_ref):
|
|
199
|
+
return {
|
|
200
|
+
"name": file_ref.name,
|
|
201
|
+
"path": file_ref.path,
|
|
202
|
+
"ok": file_ref.exists(),
|
|
203
|
+
}
|
|
204
|
+
|
|
205
|
+
|
|
206
|
+
def build():
|
|
207
|
+
return (
|
|
208
|
+
Flow(group="Claims")
|
|
209
|
+
.watch(mode="schedule", run_as="batch", interval="15m", source="../../../example_data/Input/claims_flat")
|
|
210
|
+
.collect([".xlsx"])
|
|
211
|
+
.map(validate_workbook)
|
|
212
|
+
)
|
|
213
|
+
```
|
|
214
|
+
|
|
215
|
+
`Flow.collect(...)` returns a `Batch` of `FileRef` items. `Flow.map(...)` applies one callable to each item and returns a new `Batch`. `Flow.step_each(...)` is the equivalent readability-first alias. If the batch is empty, both forms raise immediately so the mapped step gets the useful failure.
|
|
216
|
+
|
|
217
|
+
## Flow API
|
|
218
|
+
|
|
219
|
+
- `Flow(group=...)`
|
|
220
|
+
- `.watch(mode="manual", source=None, run_as="individual")`
|
|
221
|
+
- `.watch(mode="poll", source=..., interval=..., extensions=None, settle=1, run_as="individual")`
|
|
222
|
+
- `.watch(mode="schedule", interval=..., source=None, run_as="individual" | "batch")`
|
|
223
|
+
- `.watch(mode="schedule", time="HH:MM", source=None, run_as="individual" | "batch")`
|
|
224
|
+
- `.watch(mode="schedule", time=["08:15", "14:45"], source=..., run_as="individual" | "batch")`
|
|
225
|
+
- `.mirror(root=...)`
|
|
226
|
+
- `.step(fn, use=None, save_as=None, label=None)`
|
|
227
|
+
- `.collect(extensions, root=None, recursive=False, use=None, save_as=None, label=None)`
|
|
228
|
+
- `.map(fn, use=None, save_as=None, label=None)`
|
|
229
|
+
- `.step_each(fn, use=None, save_as=None, label=None)`
|
|
230
|
+
- `.preview(use=None)`
|
|
231
|
+
- `.run_once()`
|
|
232
|
+
- `.run()`
|
|
233
|
+
- `.show()`
|
|
234
|
+
|
|
235
|
+
`step()` callables always receive one `FlowContext` parameter and return the next value for `context.current`.
|
|
236
|
+
`map()` and `step_each()` callables accept either `(item)` or `(context, item)` and return a mapped `Batch`.
|
|
237
|
+
|
|
238
|
+
For notebook authoring, `preview()` is usually the most useful inspection helper:
|
|
239
|
+
|
|
240
|
+
```python
|
|
241
|
+
build().preview(use="raw_df").head(10)
|
|
242
|
+
build().preview(use="filtered_df")
|
|
243
|
+
```
|
|
244
|
+
|
|
245
|
+
`preview(use="name")` runs the flow until that `save_as="name"` object exists, then returns the real object without running later steps.
|
|
246
|
+
|
|
247
|
+
## Flow context
|
|
248
|
+
|
|
249
|
+
`FlowContext` exposes the active run state:
|
|
250
|
+
|
|
251
|
+
- `context.source`
|
|
252
|
+
- `context.mirror`
|
|
253
|
+
- `context.current`
|
|
254
|
+
- `context.objects`
|
|
255
|
+
- `context.metadata`
|
|
256
|
+
- `context.source_metadata()`
|
|
257
|
+
|
|
258
|
+
`context.source` is the resolved input namespace for the active source. The most useful helpers are:
|
|
259
|
+
|
|
260
|
+
- `context.source.path`
|
|
261
|
+
- `context.source.with_extension(".json")`
|
|
262
|
+
- `context.source.with_suffix(".json")`
|
|
263
|
+
- `context.source.file("notes.json")`
|
|
264
|
+
- `context.source.namespaced_file("notes.json")`
|
|
265
|
+
- `context.source.root_file("lookup.csv")`
|
|
266
|
+
|
|
267
|
+
`context.mirror` is the mirrored output namespace for the active source. The two core helpers are:
|
|
268
|
+
|
|
269
|
+
- `context.mirror.with_extension(".parquet")`
|
|
270
|
+
- `context.mirror.with_suffix(".parquet")`
|
|
271
|
+
- `context.mirror.file("open_claims.parquet")`
|
|
272
|
+
- `context.mirror.namespaced_file("open_claims.parquet")`
|
|
273
|
+
|
|
274
|
+
`with_extension(...)` is the clearer extension-changing helper. `with_suffix(...)` remains available as the pathlib-style alias.
|
|
275
|
+
`file(...)` stays in the mirrored/source folder. `namespaced_file(...)` creates a source-stem namespace for multi-output cases.
|
|
276
|
+
|
|
277
|
+
When a step writes one inspectable artifact, return that existing `Path`. The UI uses returned output paths to enable the `Inspect` button for that step.
|
|
278
|
+
|
|
279
|
+
`use="name"` loads `context.objects["name"]` into `context.current` before the step runs. `save_as="name"` stores the returned value into `context.objects["name"]`. Those same saved names are what `build().preview(use="name")` uses in notebooks.
|
|
280
|
+
|
|
281
|
+
## Discovery
|
|
282
|
+
|
|
283
|
+
Flows are code-defined. Starter flow modules live in:
|
|
284
|
+
|
|
285
|
+
- `workspaces/<workspace_id>/flow_modules/`
|
|
286
|
+
- `artifacts/workspace_cache/<workspace_id>/compiled_flow_modules/`
|
|
287
|
+
|
|
288
|
+
Each flow module must export:
|
|
289
|
+
|
|
290
|
+
- optional `DESCRIPTION`
|
|
291
|
+
- `build() -> Flow`
|
|
292
|
+
|
|
293
|
+
The flow-module filename is the flow identity. Authored flow modules should use `Flow(group=...)` and let the loader inject the name from the module filename.
|
|
294
|
+
|
|
295
|
+
Authored flow modules compile into `artifacts/workspace_cache/<workspace_id>/compiled_flow_modules/*.py`, and the runtime loads discovered flows from those compiled modules.
|
|
296
|
+
|
|
297
|
+
## Workspace layout
|
|
298
|
+
|
|
299
|
+
- `src/data_engine/`
|
|
300
|
+
Runtime package and desktop UI
|
|
301
|
+
- `workspaces/<workspace_id>/flow_modules/`
|
|
302
|
+
Authored flow sources (`.py` or `.ipynb`)
|
|
303
|
+
- `workspaces/<workspace_id>/.workspace_state/`
|
|
304
|
+
Shared lease markers and checkpoint parquet snapshots
|
|
305
|
+
- `artifacts/workspace_cache/<workspace_id>/compiled_flow_modules/`
|
|
306
|
+
Generated/importable flow modules
|
|
307
|
+
- `artifacts/runtime_state/<workspace_id>/`
|
|
308
|
+
Internal runtime ledger state for one workspace
|
|
309
|
+
- `artifacts/documentation/`
|
|
310
|
+
Generated documentation output
|
|
311
|
+
- `example_data/Input`
|
|
312
|
+
Example input files
|
|
313
|
+
- `example_data/Settings`
|
|
314
|
+
Example single-file inputs
|
|
315
|
+
- `example_data/Output`
|
|
316
|
+
Flow outputs
|
|
317
|
+
- `example_data/databases`
|
|
318
|
+
DuckDB files created on demand
|
|
319
|
+
|
|
320
|
+
## Live Smoke Suite
|
|
321
|
+
|
|
322
|
+
The live smoke suite is intentionally self-contained:
|
|
323
|
+
|
|
324
|
+
- `tests/test_live_runtime_suite.py`
|
|
325
|
+
|
|
326
|
+
It generates temporary workspaces from scratch, generates temporary `example_data/` and `data2/` with the real starter-data generator, adds notebook-authored poll/schedule/manual flows, runs the daemons, and tears the whole environment down afterward. It does not rely on existing `workspaces/example_workspace` or `workspaces/claims2` or live repo data directories.
|
|
327
|
+
|
|
328
|
+
## Status
|
|
329
|
+
|
|
330
|
+
This project is pre-alpha. Backwards compatibility is not a goal; the API should stay small and explicit while the runtime architecture settles.
|
|
@@ -0,0 +1,264 @@
|
|
|
1
|
+
# Data Engine
|
|
2
|
+
|
|
3
|
+
Data Engine is a pre-alpha workflow runtime for file-driven jobs. A flow declares:
|
|
4
|
+
|
|
5
|
+
- a group
|
|
6
|
+
- an optional runtime trigger via `watch(...)`
|
|
7
|
+
- ordered generic `step(...)` callables
|
|
8
|
+
|
|
9
|
+
The runtime orchestrates source handling, scheduling, and mirrored output routing. Poll freshness is tracked in the runtime ledger rather than by comparing output mtimes. Step functions use native libraries directly, such as Polars for dataframe work and DuckDB for SQL work.
|
|
10
|
+
|
|
11
|
+
## Install
|
|
12
|
+
|
|
13
|
+
### Installer scripts
|
|
14
|
+
|
|
15
|
+
Use the installer that matches your environment:
|
|
16
|
+
|
|
17
|
+
- macOS: [INSTALL/INSTALL MAC.command](INSTALL/INSTALL%20MAC.command)
|
|
18
|
+
- Windows: [INSTALL/INSTALL WINDOWS.bat](INSTALL/INSTALL%20WINDOWS.bat)
|
|
19
|
+
- Windows VM / CPU-safe Polars test path: [INSTALL/INSTALL WINDOWS_VM.bat](INSTALL/INSTALL%20WINDOWS_VM.bat)
|
|
20
|
+
|
|
21
|
+
The macOS and standard Windows installers install the regular Polars package. The Windows VM installer installs the `polars-lts-cpu` variant instead.
|
|
22
|
+
|
|
23
|
+
### Manual install
|
|
24
|
+
|
|
25
|
+
Data Engine now uses explicit Polars extras so you can choose the runtime package:
|
|
26
|
+
|
|
27
|
+
```bash
|
|
28
|
+
python -m pip install -e ".[dev,polars]"
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
For the CPU-safe LTS variant:
|
|
32
|
+
|
|
33
|
+
```bash
|
|
34
|
+
python -m pip install -e ".[dev,polars-lts]"
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
Launch the GUI with:
|
|
38
|
+
|
|
39
|
+
```bash
|
|
40
|
+
python -m data_engine.ui.cli.app start gui
|
|
41
|
+
```
|
|
42
|
+
|
|
43
|
+
## Public API
|
|
44
|
+
|
|
45
|
+
```python
|
|
46
|
+
from data_engine import Flow, FlowContext, discover_flows, load_flow, run
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
## Headless CLI
|
|
50
|
+
|
|
51
|
+
Data Engine now ships with a headless CLI:
|
|
52
|
+
|
|
53
|
+
```bash
|
|
54
|
+
data-engine list
|
|
55
|
+
data-engine show example_summary
|
|
56
|
+
data-engine run --once example_summary
|
|
57
|
+
data-engine run
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
`data-engine run` starts the automated engine headlessly for discovered automated flows and keeps running until stopped. Use `--once` to force a single pass instead.
|
|
61
|
+
|
|
62
|
+
## Workspace Model
|
|
63
|
+
|
|
64
|
+
Data Engine discovers workspaces from a collection root resolved from:
|
|
65
|
+
|
|
66
|
+
- `DATA_ENGINE_WORKSPACE_COLLECTION_ROOT`, when explicitly set
|
|
67
|
+
- `DATA_ENGINE_WORKSPACE_ROOT`, when binding directly to one authored workspace
|
|
68
|
+
- otherwise the machine-local app settings store
|
|
69
|
+
|
|
70
|
+
Each immediate child folder containing `flow_modules/` is treated as a workspace, for example:
|
|
71
|
+
|
|
72
|
+
- `workspaces/example_workspace/flow_modules/`
|
|
73
|
+
- `workspaces/claims2/flow_modules/`
|
|
74
|
+
|
|
75
|
+
The app resolves per-workspace local artifacts under:
|
|
76
|
+
|
|
77
|
+
- `artifacts/workspace_cache/<workspace_id>/`
|
|
78
|
+
- `artifacts/runtime_state/<workspace_id>/`
|
|
79
|
+
|
|
80
|
+
Shared lease and checkpoint state lives inside each authored workspace:
|
|
81
|
+
|
|
82
|
+
- `workspaces/<workspace_id>/.workspace_state/`
|
|
83
|
+
|
|
84
|
+
The app's workspace selection and collection-root preference are machine-local state, not repo-local config checked into the project tree.
|
|
85
|
+
|
|
86
|
+
## Basic shape
|
|
87
|
+
|
|
88
|
+
```python
|
|
89
|
+
from data_engine import Flow
|
|
90
|
+
import polars as pl
|
|
91
|
+
|
|
92
|
+
|
|
93
|
+
def read_claims(context):
|
|
94
|
+
return pl.read_excel(context.source.path)
|
|
95
|
+
|
|
96
|
+
|
|
97
|
+
def keep_open(context):
|
|
98
|
+
return context.current.filter(pl.col("status") == "OPEN")
|
|
99
|
+
|
|
100
|
+
|
|
101
|
+
def write_parquet(context):
|
|
102
|
+
output = context.mirror.with_suffix(".parquet")
|
|
103
|
+
context.current.write_parquet(output)
|
|
104
|
+
return output
|
|
105
|
+
|
|
106
|
+
|
|
107
|
+
def build():
|
|
108
|
+
return (
|
|
109
|
+
Flow(group="Claims")
|
|
110
|
+
.watch(
|
|
111
|
+
mode="poll",
|
|
112
|
+
source="../../../example_data/Input/claims_flat",
|
|
113
|
+
interval="5s",
|
|
114
|
+
extensions=[".xlsx", ".xls", ".xlsm"],
|
|
115
|
+
settle=1,
|
|
116
|
+
)
|
|
117
|
+
.mirror(root="../../../example_data/Output/example_mirror")
|
|
118
|
+
.step(read_claims, save_as="raw_df")
|
|
119
|
+
.step(keep_open, use="raw_df", save_as="filtered_df")
|
|
120
|
+
.step(write_parquet, use="filtered_df")
|
|
121
|
+
)
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
## Batch helpers
|
|
125
|
+
|
|
126
|
+
For batch-oriented flows, use `Flow.collect(...)` and either `Flow.map(...)` or `Flow.step_each(...)` instead of importing extra helpers or hand-managing raw lists.
|
|
127
|
+
|
|
128
|
+
```python
|
|
129
|
+
from data_engine import Flow
|
|
130
|
+
|
|
131
|
+
|
|
132
|
+
def validate_workbook(context, file_ref):
|
|
133
|
+
return {
|
|
134
|
+
"name": file_ref.name,
|
|
135
|
+
"path": file_ref.path,
|
|
136
|
+
"ok": file_ref.exists(),
|
|
137
|
+
}
|
|
138
|
+
|
|
139
|
+
|
|
140
|
+
def build():
|
|
141
|
+
return (
|
|
142
|
+
Flow(group="Claims")
|
|
143
|
+
.watch(mode="schedule", run_as="batch", interval="15m", source="../../../example_data/Input/claims_flat")
|
|
144
|
+
.collect([".xlsx"])
|
|
145
|
+
.map(validate_workbook)
|
|
146
|
+
)
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
`Flow.collect(...)` returns a `Batch` of `FileRef` items. `Flow.map(...)` applies one callable to each item and returns a new `Batch`. `Flow.step_each(...)` is the equivalent readability-first alias. If the batch is empty, both forms raise immediately so the mapped step gets the useful failure.
|
|
150
|
+
|
|
151
|
+
## Flow API
|
|
152
|
+
|
|
153
|
+
- `Flow(group=...)`
|
|
154
|
+
- `.watch(mode="manual", source=None, run_as="individual")`
|
|
155
|
+
- `.watch(mode="poll", source=..., interval=..., extensions=None, settle=1, run_as="individual")`
|
|
156
|
+
- `.watch(mode="schedule", interval=..., source=None, run_as="individual" | "batch")`
|
|
157
|
+
- `.watch(mode="schedule", time="HH:MM", source=None, run_as="individual" | "batch")`
|
|
158
|
+
- `.watch(mode="schedule", time=["08:15", "14:45"], source=..., run_as="individual" | "batch")`
|
|
159
|
+
- `.mirror(root=...)`
|
|
160
|
+
- `.step(fn, use=None, save_as=None, label=None)`
|
|
161
|
+
- `.collect(extensions, root=None, recursive=False, use=None, save_as=None, label=None)`
|
|
162
|
+
- `.map(fn, use=None, save_as=None, label=None)`
|
|
163
|
+
- `.step_each(fn, use=None, save_as=None, label=None)`
|
|
164
|
+
- `.preview(use=None)`
|
|
165
|
+
- `.run_once()`
|
|
166
|
+
- `.run()`
|
|
167
|
+
- `.show()`
|
|
168
|
+
|
|
169
|
+
`step()` callables always receive one `FlowContext` parameter and return the next value for `context.current`.
|
|
170
|
+
`map()` and `step_each()` callables accept either `(item)` or `(context, item)` and return a mapped `Batch`.
|
|
171
|
+
|
|
172
|
+
For notebook authoring, `preview()` is usually the most useful inspection helper:
|
|
173
|
+
|
|
174
|
+
```python
|
|
175
|
+
build().preview(use="raw_df").head(10)
|
|
176
|
+
build().preview(use="filtered_df")
|
|
177
|
+
```
|
|
178
|
+
|
|
179
|
+
`preview(use="name")` runs the flow until that `save_as="name"` object exists, then returns the real object without running later steps.
|
|
180
|
+
|
|
181
|
+
## Flow context
|
|
182
|
+
|
|
183
|
+
`FlowContext` exposes the active run state:
|
|
184
|
+
|
|
185
|
+
- `context.source`
|
|
186
|
+
- `context.mirror`
|
|
187
|
+
- `context.current`
|
|
188
|
+
- `context.objects`
|
|
189
|
+
- `context.metadata`
|
|
190
|
+
- `context.source_metadata()`
|
|
191
|
+
|
|
192
|
+
`context.source` is the resolved input namespace for the active source. The most useful helpers are:
|
|
193
|
+
|
|
194
|
+
- `context.source.path`
|
|
195
|
+
- `context.source.with_extension(".json")`
|
|
196
|
+
- `context.source.with_suffix(".json")`
|
|
197
|
+
- `context.source.file("notes.json")`
|
|
198
|
+
- `context.source.namespaced_file("notes.json")`
|
|
199
|
+
- `context.source.root_file("lookup.csv")`
|
|
200
|
+
|
|
201
|
+
`context.mirror` is the mirrored output namespace for the active source. The two core helpers are:
|
|
202
|
+
|
|
203
|
+
- `context.mirror.with_extension(".parquet")`
|
|
204
|
+
- `context.mirror.with_suffix(".parquet")`
|
|
205
|
+
- `context.mirror.file("open_claims.parquet")`
|
|
206
|
+
- `context.mirror.namespaced_file("open_claims.parquet")`
|
|
207
|
+
|
|
208
|
+
`with_extension(...)` is the clearer extension-changing helper. `with_suffix(...)` remains available as the pathlib-style alias.
|
|
209
|
+
`file(...)` stays in the mirrored/source folder. `namespaced_file(...)` creates a source-stem namespace for multi-output cases.
|
|
210
|
+
|
|
211
|
+
When a step writes one inspectable artifact, return that existing `Path`. The UI uses returned output paths to enable the `Inspect` button for that step.
|
|
212
|
+
|
|
213
|
+
`use="name"` loads `context.objects["name"]` into `context.current` before the step runs. `save_as="name"` stores the returned value into `context.objects["name"]`. Those same saved names are what `build().preview(use="name")` uses in notebooks.
|
|
214
|
+
|
|
215
|
+
## Discovery
|
|
216
|
+
|
|
217
|
+
Flows are code-defined. Starter flow modules live in:
|
|
218
|
+
|
|
219
|
+
- `workspaces/<workspace_id>/flow_modules/`
|
|
220
|
+
- `artifacts/workspace_cache/<workspace_id>/compiled_flow_modules/`
|
|
221
|
+
|
|
222
|
+
Each flow module must export:
|
|
223
|
+
|
|
224
|
+
- optional `DESCRIPTION`
|
|
225
|
+
- `build() -> Flow`
|
|
226
|
+
|
|
227
|
+
The flow-module filename is the flow identity. Authored flow modules should use `Flow(group=...)` and let the loader inject the name from the module filename.
|
|
228
|
+
|
|
229
|
+
Authored flow modules compile into `artifacts/workspace_cache/<workspace_id>/compiled_flow_modules/*.py`, and the runtime loads discovered flows from those compiled modules.
|
|
230
|
+
|
|
231
|
+
## Workspace layout
|
|
232
|
+
|
|
233
|
+
- `src/data_engine/`
|
|
234
|
+
Runtime package and desktop UI
|
|
235
|
+
- `workspaces/<workspace_id>/flow_modules/`
|
|
236
|
+
Authored flow sources (`.py` or `.ipynb`)
|
|
237
|
+
- `workspaces/<workspace_id>/.workspace_state/`
|
|
238
|
+
Shared lease markers and checkpoint parquet snapshots
|
|
239
|
+
- `artifacts/workspace_cache/<workspace_id>/compiled_flow_modules/`
|
|
240
|
+
Generated/importable flow modules
|
|
241
|
+
- `artifacts/runtime_state/<workspace_id>/`
|
|
242
|
+
Internal runtime ledger state for one workspace
|
|
243
|
+
- `artifacts/documentation/`
|
|
244
|
+
Generated documentation output
|
|
245
|
+
- `example_data/Input`
|
|
246
|
+
Example input files
|
|
247
|
+
- `example_data/Settings`
|
|
248
|
+
Example single-file inputs
|
|
249
|
+
- `example_data/Output`
|
|
250
|
+
Flow outputs
|
|
251
|
+
- `example_data/databases`
|
|
252
|
+
DuckDB files created on demand
|
|
253
|
+
|
|
254
|
+
## Live Smoke Suite
|
|
255
|
+
|
|
256
|
+
The live smoke suite is intentionally self-contained:
|
|
257
|
+
|
|
258
|
+
- `tests/test_live_runtime_suite.py`
|
|
259
|
+
|
|
260
|
+
It generates temporary workspaces from scratch, generates temporary `example_data/` and `data2/` with the real starter-data generator, adds notebook-authored poll/schedule/manual flows, runs the daemons, and tears the whole environment down afterward. It does not rely on existing `workspaces/example_workspace` or `workspaces/claims2` or live repo data directories.
|
|
261
|
+
|
|
262
|
+
## Status
|
|
263
|
+
|
|
264
|
+
This project is pre-alpha. Backwards compatibility is not a goal; the API should stay small and explicit while the runtime architecture settles.
|