PyPI - jerry-thomas - Versions diffs - 1.0.3__py3-none-any.whl → 2.0.0__py3-none-any.whl - Mend

jerry-thomas 1.0.3py3-none-any.whl → 2.0.0py3-none-any.whl

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (192) hide show

datapipeline/templates/demo_skeleton/src/{{PACKAGE_NAME}}/parsers/sandbox_ohlcv_dto_parser.py ADDED Viewed

@@ -0,0 +1,46 @@
+from datetime import datetime, timezone
+from typing import Any
+from datapipeline.sources.models.parser import DataParser
+from {{PACKAGE_NAME}}.dtos.sandbox_ohlcv_dto import SandboxOhlcvDTO
+def _parse_time(value: Any) -> datetime | None:
+    if isinstance(value, datetime):
+        if value.tzinfo is None:
+            return value.replace(tzinfo=timezone.utc)
+        return value
+    if isinstance(value, str):
+        try:
+            dt = datetime.fromisoformat(value.replace("Z", "+00:00"))
+        except ValueError:
+            return None
+        if dt.tzinfo is None:
+            dt = dt.replace(tzinfo=timezone.utc)
+        return dt
+    return None
+class SandboxOhlcvDTOParser(DataParser[SandboxOhlcvDTO]):
+    def parse(self, raw: Any) -> SandboxOhlcvDTO | None:
+        """
+        Convert one raw item (row/dict/tuple/record) into a SandboxOhlcvDTO.
+        - Return a DTO instance to keep the item, or None to drop it.
+        - Keep this logic thin and mirror your source data.
+        """
+        if not isinstance(raw, dict):
+            return None
+        parsed_time = _parse_time(raw.get("time"))
+        if parsed_time is None:
+            return None
+        return SandboxOhlcvDTO(
+            time=parsed_time,
+            open=float(raw["open"]),
+            high=float(raw["high"]),
+            low=float(raw["low"]),
+            close=float(raw["close"]),
+            volume=float(raw["volume"]),
+            symbol=str(raw["symbol"]),
+        )

datapipeline/templates/plugin_skeleton/README.md CHANGED Viewed

@@ -1,142 +1,63 @@
 # {{DIST_NAME}}
-Minimal plugin skeleton for the Jerry Thomas (datapipeline) framework.
-Quick start
-- Initialize a plugin (already done if you’re reading this here):
-- `jerry plugin init {{DIST_NAME}}`
-- Add a source via CLI (transport-specific placeholders are scaffolded):
-  - File data: `jerry source add <provider> <dataset> -t fs -f <csv|json|json-lines|pickle>`
-  - HTTP data: `jerry source add <provider>.<dataset> -t http -f <json|json-lines|csv>`
-  - Synthetic: `jerry source add -p <provider> -d <dataset> -t synthetic`
-- Edit the generated `config/sources/*.yaml` to fill in the `path`, delimiter, etc.
-- `jerry.yaml` is placed in your workspace root (alongside the plugin folder) so
-  you can run CLI commands from there; `plugin_root` points back to this plugin.
-- Reinstall after EP changes (pyproject.toml) and restart Python processes:
-  - Core: `cd lib/datapipeline && python -m pip install -e .`
-  - This plugin: `python -m pip install -e .`
-Folder layout
-- `example/`
-  - `project.yaml` — project root (paths, globals, cadence/split)
-  - `dataset.yaml` — feature/target declarations (uses `${group_by}` from globals)
-  - `postprocess.yaml` — postprocess transforms
-  - `contracts/*.yaml` — canonical stream definitions
-  - `sources/*.yaml` — raw source definitions (one file per source)
-  - `tasks/*.yaml` — task specs (schema/scaler/metadata/serve)
-- Every dataset `project.yaml` declares a `name`; reference it via `${project_name}`
-  inside other config files (e.g., `paths.artifacts: ../artifacts/${project_name}`) to
-  avoid hard-coding per-dataset directories.
-- `src/{{PACKAGE_NAME}}/`
-  - `sources/<provider>/<dataset>/dto.py` — DTO model for the source
-  - `sources/<provider>/<dataset>/parser.py` — parse raw → DTO
-  - Optional: `sources/<provider>/<dataset>/loader.py` for synthetic sources
-  - `domains/<domain>/model.py` — domain record models
-  - `mappers/*.py` — map DTOs → domain records
-How loaders work
-- For fs/http, sources use the generic loader entry point:
-  - `loader.entrypoint: "{{DEFAULT_IO_LOADER_EP}}"`
-- `loader.args` include `transport`, `format`, and source-specific args (placeholders are provided):
-    - fs: `path`, `glob`, `encoding`, plus `delimiter` for csv
-    - http: `url`, `headers`, `params`, `encoding`, optional `count_by_fetch`
-- Synthetic sources generate data in-process and keep a small loader stub.
-Run data flows
-- Build artifacts once: `jerry build --project example/project.yaml`
-- Preview records (stage 1): `jerry serve --project example/project.yaml --stage 1 --limit 100`
-- Preview features (stage 3): `jerry serve --project example/project.yaml --stage 3 --limit 100`
-- Preview vectors (stage 7): `jerry serve --project example/project.yaml --stage 7 --limit 100`
-Analyze vectors
-- `jerry inspect report   --project example/project.yaml` (console only)
-- `jerry inspect partitions --project example/project.yaml` (writes build/partitions.json)
-- `jerry inspect matrix   --project example/project.yaml --format html` (writes build/matrix.html)
-- `jerry inspect expected --project example/project.yaml` (writes build/expected.txt)
-- Use post-processing transforms in `postprocess.yaml` to keep coverage high
-  (history/horizontal fills, constants, or drop rules) before serving vectors.
-  Add `payload: targets` inside a transform when you need to mutate label vectors.
-Train/Val/Test splits (deterministic)
-- Configure split mechanics once in your project file:
-  - Edit `example/project.yaml` and set:
-    ```yaml
-    globals:
-      group_by: 10m          # dataset cadence; reused as contract cadence
-      split:
-        mode: hash            # hash|time
-        key: group            # group or feature:<id> (entity-stable)
-        seed: 42              # deterministic hash seed
-        ratios: {train: 0.8, val: 0.1, test: 0.1}
-    ```
-- Select the active slice via `example/tasks/serve.<name>.yaml` (or `--keep`):
-  ```yaml
-  kind: serve
-  name: train               # defaults to filename stem when omitted
-  keep: train               # any label defined in globals.split; null disables filtering
-  output:
-    transport: stdout       # stdout | fs
-    format: print           # print | json-lines | json | csv | pickle
-  limit: 100                # cap vectors per serve run (null = unlimited)
-  throttle_ms: null         # sleep between vectors (milliseconds)
-  # visuals: AUTO  # AUTO | TQDM | RICH | OFF
-  # progress: AUTO # AUTO | SPINNER | BARS | OFF
-  ```
-- Add additional `kind: serve` files (e.g., `serve.val.yaml`, `serve.test.yaml`) and the CLI will run each enabled file in order unless you pass `--run <name>`.
-- Serve examples (change the serve task or pass `--keep val|test`):
-  - `jerry serve -p example/project.yaml --out-transport stdout --out-format json-lines > train.jsonl`
-  - `jerry serve -p example/project.yaml --keep val --out-transport stdout --out-format json-lines > val.jsonl`
-  - Add `--visuals rich --progress bars` for a richer interactive UI; defaults to `AUTO`.
-- For shared workspace defaults (visual renderer, progress display, build mode), drop a `jerry.yaml` next to your workspace root and set `shared.visuals`, `shared.progress`, etc. CLI commands walk up from the current directory to find it.
-- The split is applied at the end (after postprocess transforms), and assignment
-  is deterministic (hash-based) with a fixed seed; no overlap across runs.
-Key selection guidance
-- `key: group` hashes the group key (commonly the time bucket). This yields a uniform random split per group but may allow the same entity to appear in multiple splits across time.
-- `key: feature:<id>` hashes a specific feature value, e.g., `feature:entity_id` or `feature:station_id`, ensuring all vectors for that entity land in the same split (recommended to avoid leakage).
-Postprocess expected IDs
-- Build once with `jerry build --project config/project.yaml` (or run `jerry inspect expected …`) to materialize `<paths.artifacts>/expected.txt`.
-- Bootstrap registers the artifact; postprocess transforms read it automatically. Per-transform `expected:` overrides are no longer required or supported — the build output is the single source of truth.
-Scaler statistics
-- Jerry computes scaler stats automatically. If you need custom paths or settings, add `tasks/scaler.yaml` and override the defaults.
-- The build writes `<paths.artifacts>/scaler.pkl`; runtime scaling requires this artifact. If it is missing, scaling transforms raise a runtime error.
-Tips
-- Keep parsers thin — mirror source schema and return DTOs; use the identity parser only if your loader already emits domain records.
-- Prefer small, composable configs over monolithic ones: one YAML per source is easier to review and reuse.
-Composed streams (engineered domains)
-- Declare engineered streams that depend on other canonical streams directly in contracts. The runtime builds each input to stage 4, stream‑aligns by partition+timestamp, runs your composer, and emits fresh records for the derived stream.
-```yaml
-# example/contracts/air_density.processed.yaml
-kind: composed
-id: air_density.processed
-inputs:
-  - p=pressure.processed
-  - t=temp_dry.processed
-partition_by: station_id
-sort_batch_size: 20000
-mapper:
-  entrypoint: {{PACKAGE_NAME}}.mappers.air_density:mapper
-  args:
-    driver: p   # optional; defaults to first input alias
-# Optional post‑compose policies (same as any stream):
-# record: [...]
-# stream: [...]
-# debug: [...]
+Minimal plugin skeleton for the Jerry Thomas (datapipeline) runtime.
+## Quick start
+```bash
+python -m pip install -U jerry-thomas
+jerry plugin init {{DIST_NAME}} --out .
+python -m pip install -e {{DIST_NAME}}
+# One-stop wizard: source YAML + DTO/parser + domain + mapper + contract.
+jerry inflow create
+# If a workspace-level `jerry.yaml` was created (fresh workspace), you can use the dataset alias:
+jerry serve --dataset your-dataset --limit 3
+#
+# If you already had a workspace `jerry.yaml`, `jerry plugin init` will not overwrite it.
+# In that case, either add a dataset alias to your existing `jerry.yaml` or pass `--project`:
+# jerry serve --project your-dataset/project.yaml --limit 3
 ```
-Then reference the composed stream in your dataset:
+## After scaffolding: what you must edit
+- `your-dataset/sources/*.yaml`
+  - Replace placeholders (`path`/`url`, headers/params, delimiter, etc.)
+- `your-dataset/dataset.yaml`
+  - Ensure `record_stream:` points at the contract id you created.
+  - Select a `field:` for each feature/target (record attribute to use as value).
+  - Ensure `group_by` matches `^\d+(m|min|h|d)$` (e.g. `10m`, `1h`, `1d`).
+If you add/edit entry points in `pyproject.toml`, reinstall the plugin:
-```yaml
-# example/dataset.yaml
-group_by: ${group_by}
-features:
-  - id: air_density
-    record_stream: air_density.processed
+```bash
+python -m pip install -e .
 ```
+## Folder layout
+YAML config (dataset project root):
+- `your-dataset/`
+  - `project.yaml` (paths, globals, split)
+  - `sources/*.yaml` (raw source definitions)
+  - `contracts/*.yaml` (canonical streams)
+  - `dataset.yaml` (features/targets)
+  - `postprocess.yaml` (vector-level transforms)
+  - `tasks/*.yaml` (serve/build tasks; optional overrides)
+Python plugin code:
+- `src/{{PACKAGE_NAME}}/`
+  - `dtos/` (DTO models)
+  - `parsers/` (raw -> DTO)
+  - `domains/<domain>/model.py` (domain record models)
+  - `mappers/` (DTO -> domain records)
+  - `loaders/` (optional custom loaders)
+## Learn more
+- Pipeline stages and split/build timing: the Jerry Thomas runtime `README.md` ("Pipeline Stages (serve --stage)").
+- Deep dives: runtime `docs/config.md`, `docs/transforms.md`, `docs/artifacts.md`, `docs/extending.md`, `docs/architecture.md`.

datapipeline/templates/plugin_skeleton/jerry.yaml CHANGED Viewed

@@ -1,34 +1,22 @@
-# Workspace defaults. The scaffolder copies this to your workspace root (where
-# you ran `jerry plugin init`). CLI commands walk upward from cwd to find it.
-# Relative path from this workspace file back to the plugin root.
-plugin_root: . # e.g., "lib/myplugin" if your plugin lives under lib/
-# Dataset aliases for `--dataset`; values may be dirs (auto-append project.yaml).
+# See reference/jerry.yaml for full options and explanations.
+plugin_root: .
 datasets:
-  example: example/project.yaml
-  your-second-example-dataset: your-dataset/project.yaml
+  your-dataset: your-dataset/project.yaml
+  interim-builder: your-interim-data-builder/project.yaml # use this to build interim data used by other datasets
-# Default dataset alias when --dataset/--project are omitted.
-default_dataset: example
+default_dataset: your-dataset
-# Shared fallbacks used by all commands (unless overridden).
 shared:
-  visuals: AUTO # AUTO | TQDM | RICH | OFF
-  progress: BARS # AUTO | SPINNER | BARS | OFF
-  log_level: INFO # Default log level when not set elsewhere
+  visuals: AUTO
+  progress: BARS
+  log_level: INFO
-# Defaults for `jerry serve` (run-time options).
 serve:
-  # log_level: INFO # Uncomment to force INFO for serve runs
-  limit: null      # Cap vectors; null means unlimited
-  stage: null      # Preview a specific stage; null runs the full pipeline
+  limit: null
+  stage: null
   output:
     transport: stdout
-    format: print  # stdout: print|json-lines|json|csv|pickle
-    # directory: artifacts/serve # Required when transport=fs
+    format: print
-# Defaults for `jerry build` (artifact materialization).
 build:
-  # log_level: INFO # Uncomment to set build log level
-  mode: AUTO # AUTO | FORCE | OFF
+  mode: AUTO

datapipeline/templates/plugin_skeleton/reference/jerry.yaml ADDED Viewed

@@ -0,0 +1,28 @@
+# Jerry workspace config reference (all options).
+# This file is documentation only; uncomment the keys you want to use.
+#
+# plugin_root: ./path/to/your/plugin # optional
+#
+# datasets:                        # optional
+#   default: example/project.yaml  # optional (relative to jerry.yaml)
+# default_dataset: default         # optional (must be a key in datasets)
+#
+# shared:                          # optional
+#   visuals: AUTO                  # optional; AUTO | TQDM | RICH | OFF
+#   progress: AUTO                 # optional; AUTO | SPINNER | BARS | OFF
+#   log_level: INFO                # optional; CRITICAL | ERROR | WARNING | INFO | DEBUG
+#
+# serve:                           # optional
+#   log_level: INFO                # optional
+#   limit: 100                     # optional
+#   stage: 8                       # optional
+#   throttle_ms: 0                 # optional
+#   output:                        # optional
+#     transport: stdout            # optional; stdout | fs
+#     format: json-lines           # optional; stdout: print | json-lines | json
+#     payload: sample              # optional; sample | vector
+#     # directory: artifacts/serve # optional; fs only; relative to jerry.yaml
+#
+# build:                           # optional
+#   log_level: INFO                # optional
+#   mode: AUTO                     # optional; AUTO | FORCE | OFF (false -> OFF)

datapipeline/templates/plugin_skeleton/reference/reference/contracts/composed.reference.yaml ADDED Viewed

@@ -0,0 +1,29 @@
+# Composed contract reference (kind: composed).
+# This file is documentation only; uncomment the keys you want to use.
+#
+# kind: composed
+# id: domain.dataset.variant
+# inputs:                      # required (list of stream ids or alias=stream)
+#   - alias=upstream.stream.id
+#   - other.stream.id
+# mapper:                       # optional (defaults to identity)
+#   entrypoint: my_composer
+#   args: {}
+# cadence: ${group_by}          # optional per-contract variable for interpolation
+# partition_by: station_id      # optional; string or list of strings
+# sort_batch_size: 100000       # optional; in-memory chunk size for sorting
+#
+# record transforms (one-key mappings; optional):
+# - filter: { field: time, operator: ge, comparand: "${start_time}" }
+# - floor_time: { cadence: "${cadence}" }
+# - lag: { lag: "${cadence}" }
+#
+# stream transforms (one-key mappings; optional; operate on record fields):
+# - dedupe: {}
+# - granularity: { field: close, to: close, mode: first }
+# - ensure_cadence: { field: close, to: close, cadence: "${cadence}" }
+# - fill: { field: close, to: close, statistic: median, window: 6, min_samples: 1 }
+#
+# debug transforms (one-key mappings; optional):
+# - lint: { mode: error, tick: "${cadence}" }
+# - identity: {}

datapipeline/templates/plugin_skeleton/reference/reference/contracts/ingest.reference.yaml ADDED Viewed

@@ -0,0 +1,31 @@
+# Ingest contract reference (kind: ingest).
+# This file is documentation only; uncomment the keys you want to use.
+#
+# kind: ingest
+# id: domain.dataset.variant
+# source: source.alias          # required
+# mapper:                       # optional (defaults to identity)
+#   entrypoint: my_mapper
+#   args: {}
+# cadence: ${group_by}          # optional per-contract variable for interpolation
+# partition_by: station_id      # optional; string or list of strings
+# sort_batch_size: 100000       # optional; in-memory chunk size for sorting
+#
+# record transforms (one-key mappings; optional):
+# - filter: { field: time, operator: ge, comparand: "${start_time}" }
+# - floor_time: { cadence: "${cadence}" }
+# - lag: { lag: "${cadence}" }
+#
+# stream transforms (one-key mappings; optional; operate on record fields):
+# - floor_time: { cadence: "${cadence}" }
+# - lag: { lag: "${cadence}" }
+# - filter: { field: close, operator: ge, comparand: 1000000 }
+# - dedupe: {}
+# - granularity: { field: close, to: close, mode: first }
+# - ensure_cadence: { field: close, to: close, cadence: "${cadence}" }
+# - rolling: { field: dollar_volume, to: adv60, window: 60, statistic: mean, min_samples: 60 }
+# - fill: { field: close, to: close, statistic: median, window: 6, min_samples: 1 }
+#
+# debug transforms (one-key mappings; optional):
+# - lint: { mode: error, tick: "${cadence}" }
+# - identity: {}

datapipeline/templates/plugin_skeleton/reference/reference/contracts/overview.reference.yaml ADDED Viewed

@@ -0,0 +1,34 @@
+# Contract config reference (overview).
+# This file is documentation only; uncomment the keys you want to use.
+#
+# kind: ingest | composed
+# id: domain.dataset.variant
+# source: source.alias          # required when kind: ingest
+# inputs: [stream.id]           # required when kind: composed
+# mapper:                       # optional (defaults to identity)
+#   entrypoint: my_mapper
+#   args: {}
+# cadence: ${group_by}          # optional per-contract variable for interpolation
+# partition_by: station_id      # optional; string or list of strings
+# sort_batch_size: 100000       # optional; in-memory chunk size for sorting
+#
+# record transforms (one-key mappings; optional):
+# - filter: { field: time, operator: ge, comparand: "${start_time}" }
+#   # operator: eq|ne|lt|le|gt|ge|in|nin (aliases: ==, !=, >=, <=, etc.)
+# - floor_time: { cadence: "${cadence}" }
+# - lag: { lag: "${cadence}" }
+#
+# stream transforms (one-key mappings; optional):
+# - dedupe: {}
+# - granularity: { field: close, to: close, mode: first }   # first | last | mean | median
+# - ensure_cadence: { field: close, to: close, cadence: "${cadence}" }
+# - fill: { field: close, to: close, statistic: median, window: 6, min_samples: 1 }
+#   # statistic: mean | median; window must be > 0
+#
+# debug transforms (one-key mappings; optional):
+# - lint: { mode: error, tick: "${cadence}" }  # mode: warn | error
+# - identity: {}
+#
+# See also:
+# - ingest.reference.yaml
+# - composed.reference.yaml

datapipeline/templates/plugin_skeleton/reference/reference/dataset.yaml ADDED Viewed

@@ -0,0 +1,29 @@
+# Dataset config reference (all options).
+# This file is documentation only; uncomment the keys you want to use.
+#
+# Feature/vector stages require group_by.
+# group_by must match ^\d+(m|min|h|d)$ (e.g., 10m, 1h, 1d).
+#
+# group_by: ${group_by}        # required for feature/vector stages
+#
+# features:                    # optional
+#   - id: time_linear
+#     record_stream: time.ticks.linear
+#     field: value
+#     scale: true           # optional; true | false | mapping (see below)
+#     # scale:
+#     #   model_path: ../artifacts/example/v1/scaler.pkl
+#     #   with_mean: true
+#     #   with_std: true
+#     #   epsilon: 1.0e-12
+#     #   on_none: skip      # skip | error
+#     sequence: { size: 6, stride: 1 }  # optional
+#
+# targets:                     # optional
+#   - id: some_target
+#     record_stream: time.ticks.linear
+#     field: value
+#     scale: false            # optional
+#     sequence: null          # optional
+#
+# Record-only stage uses only record_stream entries; id/field/scale/sequence are ignored.

datapipeline/templates/plugin_skeleton/reference/reference/postprocess.yaml ADDED Viewed

@@ -0,0 +1,25 @@
+# Postprocess config reference (vector transforms).
+# This file is documentation only; uncomment the keys you want to use.
+# Each list item is a one-key mapping: <transform_name>: <params>.
+#
+# - drop:
+#     axis: vertical           # optional; horizontal | vertical
+#     payload: targets         # optional; features | targets | both (both only for horizontal)
+#     threshold: 0.9           # required; 0.0 - 1.0
+#     only: [feature_id]       # optional
+#     exclude: [feature_id]    # optional
+#
+# - fill:
+#     statistic: median        # optional; mean | median
+#     window: 48               # optional; rolling window size
+#     min_samples: 6           # optional
+#     payload: features        # optional; features | targets | both
+#     only: [feature_id]       # optional
+#     exclude: [feature_id]    # optional
+#
+# - replace:
+#     value: 0.0               # required
+#     payload: targets         # optional; features | targets | both
+#     target: null             # optional; replace only when value equals target; default replaces missing
+#     only: [feature_id]       # optional
+#     exclude: [feature_id]    # optional

datapipeline/templates/plugin_skeleton/reference/reference/project.yaml ADDED Viewed

@@ -0,0 +1,32 @@
+# Project config reference (all options).
+# This file is documentation only; uncomment the keys you want to use.
+#
+# version: 1                  # optional
+# name: example               # optional
+# paths:
+#   streams: ./contracts      # required
+#   sources: ./sources        # required
+#   dataset: dataset.yaml     # required
+#   postprocess: postprocess.yaml # required
+#   artifacts: ../artifacts/${project_name}/v${version} # required
+#   tasks: ./tasks            # optional
+# globals:                    # optional; available via ${var}
+#   group_by: 1h              # optional; dataset cadence
+#   start_time: 2021-01-01T00:00:00Z # optional; used in contracts
+#   end_time: 2021-01-02T00:00:00Z   # optional; used in contracts
+#   split:                    # optional; applied at serve time after postprocess
+#     mode: hash              # hash | time
+#     ratios: { train: 0.8, val: 0.1, test: 0.1 }  # must sum to 1.0
+#     seed: 42                # deterministic hash seed
+#     key: group              # group | feature:<id>
+#
+# Time-based split (labels length must be len(boundaries) + 1):
+# globals:
+#   split:
+#     mode: time
+#     boundaries:
+#       - 2021-01-01T00:00:00Z  # first cutover (train -> val)
+#       - 2021-01-02T00:00:00Z  # second cutover (val -> test)
+#     labels: [train, val, test]
+#
+# Any extra keys under globals are allowed and can be referenced via ${var}.

datapipeline/templates/plugin_skeleton/reference/reference/sources/foreach.http.reference.yaml ADDED Viewed

@@ -0,0 +1,24 @@
+# Foreach + HTTP loader reference (core.foreach + core.io).
+# This file is documentation only; uncomment the keys you want to use.
+#
+# id: provider.dataset         # required
+# parser:
+#   entrypoint: my_pkg.sources.provider.dataset:parse # required
+#   args: {}                   # optional
+# loader:
+#   entrypoint: core.foreach
+#   args:
+#     foreach:
+#       page: [1, 2, 3]        # required; exactly one key; list of values
+#     loader:
+#       entrypoint: {{DEFAULT_IO_LOADER_EP}}
+#       args:
+#         transport: http
+#         format: json-lines
+#         url: "https://example/api?page=${page}" # required
+#         headers: { Authorization: "Bearer ..." } # optional
+#         params: {}            # optional
+#         encoding: utf-8       # optional
+#         count_by_fetch: false # optional
+#     # inject_field: page      # optional
+#     # throttle_seconds: 0     # optional

datapipeline/templates/plugin_skeleton/reference/reference/sources/foreach.reference.yaml ADDED Viewed

@@ -0,0 +1,21 @@
+# Foreach loader reference (core.foreach).
+# This file is documentation only; uncomment the keys you want to use.
+#
+# id: provider.dataset         # required
+# parser:
+#   entrypoint: my_pkg.sources.provider.dataset:parse # required
+#   args: {}                   # optional
+# loader:
+#   entrypoint: core.foreach
+#   args:
+#     foreach:
+#       month: ["2024-01", "2024-02"]  # required; exactly one key; list of values
+#     loader:
+#       entrypoint: {{DEFAULT_IO_LOADER_EP}}
+#       args:
+#         transport: fs
+#         format: csv          # required
+#         path: ./data/${month}.csv # required
+#     # inject_field: month    # optional; mapping output only
+#     # inject: { month: "${month}" } # optional; mapping output only
+#     # throttle_seconds: 0    # optional

datapipeline/templates/plugin_skeleton/reference/reference/sources/fs.reference.yaml ADDED Viewed

@@ -0,0 +1,16 @@
+# FS loader reference (generic I/O loader).
+# This file is documentation only; uncomment the keys you want to use.
+#
+# id: provider.dataset         # required
+# parser:
+#   entrypoint: my_pkg.sources.provider.dataset:parse # required
+#   args: {}                   # optional
+# loader:
+#   entrypoint: {{DEFAULT_IO_LOADER_EP}}
+#   args:
+#     transport: fs
+#     format: csv            # required; csv | json | json-lines | pickle
+#     path: ./data/file.csv  # optional (use path or glob)
+#     glob: ./data/*.csv     # optional (use path or glob)
+#     encoding: utf-8        # optional
+#     delimiter: ","         # optional; csv only

datapipeline/templates/plugin_skeleton/reference/reference/sources/http.reference.yaml ADDED Viewed

@@ -0,0 +1,17 @@
+# HTTP loader reference (generic I/O loader).
+# This file is documentation only; uncomment the keys you want to use.
+#
+# id: provider.dataset         # required
+# parser:
+#   entrypoint: my_pkg.sources.provider.dataset:parse # required
+#   args: {}                   # optional
+# loader:
+#   entrypoint: {{DEFAULT_IO_LOADER_EP}}
+#   args:
+#     transport: http
+#     format: json-lines     # required; json | json-lines | csv
+#     url: https://example/api # required
+#     params: { key: value } # optional
+#     headers: { Authorization: "Bearer ..." } # optional
+#     encoding: utf-8        # optional
+#     count_by_fetch: false  # optional

datapipeline/templates/plugin_skeleton/reference/reference/sources/overview.reference.yaml ADDED Viewed

@@ -0,0 +1,18 @@
+# Source config reference (overview).
+# This file is documentation only; uncomment the keys you want to use.
+#
+# Required shape:
+# id: provider.dataset        # required
+# parser:
+#   entrypoint: module.path:callable # required
+#   args: {}                  # optional
+# loader:
+#   entrypoint: module.path:callable # required
+#   args: {}                  # optional
+#
+# See also:
+# - fs.reference.yaml
+# - http.reference.yaml
+# - synthetic.reference.yaml
+# - foreach.reference.yaml
+# - foreach.http.reference.yaml

datapipeline/templates/plugin_skeleton/reference/reference/sources/synthetic.reference.yaml ADDED Viewed

@@ -0,0 +1,15 @@
+# Synthetic source reference.
+# This file is documentation only; uncomment the keys you want to use.
+#
+# id: synthetic.ticks          # required
+#
+# parser:
+#   entrypoint: core.synthetic.ticks # required
+#   args: {}                   # optional
+#
+# loader:
+#   entrypoint: core.synthetic.ticks # required
+#   args:
+#     start: "${start_time}"   # optional
+#     end: "${end_time}"       # optional
+#     frequency: "${group_by}" # optional

datapipeline/templates/plugin_skeleton/reference/reference/tasks/metadata.reference.yaml ADDED Viewed

@@ -0,0 +1,11 @@
+# Metadata task reference (all options).
+# This file is documentation only; uncomment the keys you want to use.
+#
+# version: 1                # optional
+# kind: metadata
+# name: metadata            # optional (defaults to filename stem)
+# enabled: true             # optional
+#
+# output: metadata.json      # optional; relative to project.paths.artifacts
+# cadence_strategy: max      # optional; currently only "max"
+# window_mode: intersection  # optional; union | intersection | strict | relaxed

datapipeline/templates/plugin_skeleton/reference/reference/tasks/scaler.reference.yaml ADDED Viewed

@@ -0,0 +1,10 @@
+# Scaler task reference (all options).
+# This file is documentation only; uncomment the keys you want to use.
+#
+# version: 1                # optional
+# kind: scaler
+# name: scaler              # optional (defaults to filename stem)
+# enabled: true             # optional
+#
+# output: scaler.pkl         # optional; relative to project.paths.artifacts
+# split_label: train         # optional; split label from globals.split

jerry-thomas 1.0.3__py3-none-any.whl → 2.0.0__py3-none-any.whl

jerry-thomas 1.0.3py3-none-any.whl → 2.0.0py3-none-any.whl