PyPI - json2vec - Versions diffs - 0.4.4__tar.gz → 0.4.6__tar.gz - Mend

json2vec 0.4.4tar.gz → 0.4.6tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (69) hide show

{json2vec-0.4.4/src/json2vec.egg-info → json2vec-0.4.6}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: json2vec
-Version: 0.4.4
+Version: 0.4.6
 Summary: {...} -> [*]
 License-Expression: Apache-2.0
 Requires-Python: >=3.12
@@ -27,16 +27,18 @@ Requires-Dist: pydantic-settings>=2.10.1; extra == "serving"
 Provides-Extra: text
 Requires-Dist: transformers>=4.55.0; extra == "text"
 Provides-Extra: docs
+Requires-Dist: litserve>=0.2.13; extra == "docs"
 Requires-Dist: mkdocs-material>=9.6; extra == "docs"
 Requires-Dist: mkdocs-jupyter>=0.26.3; extra == "docs"
 Requires-Dist: mkdocstrings[python]>=0.27; extra == "docs"
+Requires-Dist: pydantic-settings>=2.10.1; extra == "docs"
 Dynamic: license-file
 <p align="center">
-  <img src="https://json2vec.github.io/json2vec/diagrams/json2vec.png" alt="JSON2Vec logo" width="180">
+  <img src="https://json2vec.github.io/json2vec/diagrams/json2vec.png" alt="json2vec logo" width="180">
 </p>
-<h1 align="center">JSON2Vec</h1>
+<h1 align="center"><code>json2vec</code></h1>
 <p align="center">
   <img alt="Python 3.12+" src="https://img.shields.io/badge/python-3.12%2B-3776AB?logo=python&amp;logoColor=white" />
@@ -69,14 +71,14 @@ schemas, and checkpoints private.
 - **Extensible data types for predictive modeling.** Masked values,
   targeted fields, and explicit supervised targets all flow through the same
   datatype-specific heads. A new
-  [tensorfield type](https://json2vec.github.io/json2vec/guides/tensorfields/) brings its own embedding,
+  [tensorfield type](https://json2vec.github.io/json2vec/data-types/tensorfields/) brings its own embedding,
   decoding, loss, and writing logic, so the framework stays reusable as schemas
   grow.
 - **Schema evolution is a first-class workflow.** Between training loops
   (pretraining, finetuning, refitting, and task adaptation), the model can be
   mutated. Fields can be added (`model.extend`), removed (`model.delete`),
   updated (`model.update` / `with model.override`), and reset (`model.reset`).
-  See the [model update guide](https://json2vec.github.io/json2vec/guides/model-update/).
+  See the [mutations guide](https://json2vec.github.io/json2vec/core-concepts/mutations/).
 - **Production semantics for missingness.** `null`, `padded`, `masked`, and
   `valued` are distinct states in the tensorfield type system.
   They are not collapsed into one generic missing-value bucket.
@@ -104,7 +106,7 @@ Use `json2vec` when the hierarchy is part of the signal:
   multi-target prediction over nested records
 For more context on the modeling problem, read
-[Why JSON2Vec](https://json2vec.github.io/json2vec/motivation/).
+[Why `json2vec`](https://json2vec.github.io/json2vec/motivation/).
 ## What It Does Not Do
@@ -171,8 +173,8 @@ model = j2v.Model.from_schema(
     optimizer=lambda module: torch.optim.AdamW(module.parameters(), lr=1e-2),
 )
-datamodule = j2v.PolarsDataModule.from_model(
-    model,
+datamodule = j2v.PolarsDataModule(
+    model=model,
     train=records,
     validate=records,
     num_workers=0,
@@ -195,14 +197,11 @@ trainer = lit.Trainer(
 trainer.fit(model=model, datamodule=datamodule)
-batch = [[record] for record in records.to_dicts()[:3]]
-pprint(model.predict(batch))
-pprint(model.embed(batch))
+pprint(model.predict(records.to_dicts()[:3]))
 ```
-The prediction call returns a typed result for `record/species`. The embedding
-call returns the configured `record` embedding for each input observation.
+The prediction call returns a typed result for `record/species` and the
+configured `record` embedding for each input observation.
 ## Documentation
@@ -216,16 +215,20 @@ uv run --extra docs mkdocs build --strict
 Useful entry points:
 - [Getting Started](https://json2vec.github.io/json2vec/getting-started/)
-- [Why JSON2Vec](https://json2vec.github.io/json2vec/motivation/)
-- [Schemas & Queries](https://json2vec.github.io/json2vec/guides/model-schemas/)
-- [Model Updates](https://json2vec.github.io/json2vec/guides/model-update/)
+- [AI Quickstart](https://json2vec.github.io/json2vec/ai-quickstart/)
+- [Why `json2vec`](https://json2vec.github.io/json2vec/motivation/)
+- [Query Paths](https://json2vec.github.io/json2vec/core-concepts/querypaths/)
+- [Built-In Data Types](https://json2vec.github.io/json2vec/core-concepts/data-types/)
+- [Learning Modes & Embeddings](https://json2vec.github.io/json2vec/core-concepts/embeddings/)
+- [Model Tree](https://json2vec.github.io/json2vec/core-concepts/model-tree/)
+- [Mutations](https://json2vec.github.io/json2vec/core-concepts/mutations/)
 - [Hello World](https://json2vec.github.io/json2vec/tutorials/hello-world/)
-- [Masked Pretraining](https://json2vec.github.io/json2vec/tutorials/pretraining/)
 - [Nested Supervised Training](https://json2vec.github.io/json2vec/tutorials/nested-supervised-training/)
+- [Masked Pretraining](https://json2vec.github.io/json2vec/tutorials/pretraining/)
 - [Supervised Tabular Training](https://json2vec.github.io/json2vec/tutorials/supervised-tabular-training/)
-- [Field Ablation](https://json2vec.github.io/json2vec/guides/field-ablation/)
+- [Field Importance](https://json2vec.github.io/json2vec/guides/field-importance/)
 - [Preprocessors](https://json2vec.github.io/json2vec/guides/preprocessors/)
-- [Tensorfield Extensions](https://json2vec.github.io/json2vec/guides/tensorfields/)
+- [Custom Data Types](https://json2vec.github.io/json2vec/data-types/tensorfields/)
 - [Serving](https://json2vec.github.io/json2vec/tutorials/serving/)
 - [API Reference](https://json2vec.github.io/json2vec/reference/api/)
 - [Whitepaper](https://json2vec.github.io/json2vec/whitepaper.pdf)
@@ -292,7 +295,7 @@ Configured `dataset.kwargs` are passed into the preprocessor, with unsupported k
 Each tensorfield plugin provides a request schema plus the model components
 needed to encode values, decode predictions, compute losses, and optionally
-serialize outputs. See [Tensorfield Extensions](https://json2vec.github.io/json2vec/guides/tensorfields/)
+serialize outputs. See [Custom Data Types](https://json2vec.github.io/json2vec/data-types/tensorfields/)
 for a custom plugin walkthrough. Built-in tensorfields share the base leaf
 options `name`, `query`, `pooling`, `weight`, `n_heads`, `n_linear`, `dropout`,
 `p_mask`, and `p_prune`.

{json2vec-0.4.4 → json2vec-0.4.6}/README.md RENAMED Viewed

@@ -1,8 +1,8 @@
 <p align="center">
-  <img src="https://json2vec.github.io/json2vec/diagrams/json2vec.png" alt="JSON2Vec logo" width="180">
+  <img src="https://json2vec.github.io/json2vec/diagrams/json2vec.png" alt="json2vec logo" width="180">
 </p>
-<h1 align="center">JSON2Vec</h1>
+<h1 align="center"><code>json2vec</code></h1>
 <p align="center">
   <img alt="Python 3.12+" src="https://img.shields.io/badge/python-3.12%2B-3776AB?logo=python&amp;logoColor=white" />
@@ -35,14 +35,14 @@ schemas, and checkpoints private.
 - **Extensible data types for predictive modeling.** Masked values,
   targeted fields, and explicit supervised targets all flow through the same
   datatype-specific heads. A new
-  [tensorfield type](https://json2vec.github.io/json2vec/guides/tensorfields/) brings its own embedding,
+  [tensorfield type](https://json2vec.github.io/json2vec/data-types/tensorfields/) brings its own embedding,
   decoding, loss, and writing logic, so the framework stays reusable as schemas
   grow.
 - **Schema evolution is a first-class workflow.** Between training loops
   (pretraining, finetuning, refitting, and task adaptation), the model can be
   mutated. Fields can be added (`model.extend`), removed (`model.delete`),
   updated (`model.update` / `with model.override`), and reset (`model.reset`).
-  See the [model update guide](https://json2vec.github.io/json2vec/guides/model-update/).
+  See the [mutations guide](https://json2vec.github.io/json2vec/core-concepts/mutations/).
 - **Production semantics for missingness.** `null`, `padded`, `masked`, and
   `valued` are distinct states in the tensorfield type system.
   They are not collapsed into one generic missing-value bucket.
@@ -70,7 +70,7 @@ Use `json2vec` when the hierarchy is part of the signal:
   multi-target prediction over nested records
 For more context on the modeling problem, read
-[Why JSON2Vec](https://json2vec.github.io/json2vec/motivation/).
+[Why `json2vec`](https://json2vec.github.io/json2vec/motivation/).
 ## What It Does Not Do
@@ -137,8 +137,8 @@ model = j2v.Model.from_schema(
     optimizer=lambda module: torch.optim.AdamW(module.parameters(), lr=1e-2),
 )
-datamodule = j2v.PolarsDataModule.from_model(
-    model,
+datamodule = j2v.PolarsDataModule(
+    model=model,
     train=records,
     validate=records,
     num_workers=0,
@@ -161,14 +161,11 @@ trainer = lit.Trainer(
 trainer.fit(model=model, datamodule=datamodule)
-batch = [[record] for record in records.to_dicts()[:3]]
-pprint(model.predict(batch))
-pprint(model.embed(batch))
+pprint(model.predict(records.to_dicts()[:3]))
 ```
-The prediction call returns a typed result for `record/species`. The embedding
-call returns the configured `record` embedding for each input observation.
+The prediction call returns a typed result for `record/species` and the
+configured `record` embedding for each input observation.
 ## Documentation
@@ -182,16 +179,20 @@ uv run --extra docs mkdocs build --strict
 Useful entry points:
 - [Getting Started](https://json2vec.github.io/json2vec/getting-started/)
-- [Why JSON2Vec](https://json2vec.github.io/json2vec/motivation/)
-- [Schemas & Queries](https://json2vec.github.io/json2vec/guides/model-schemas/)
-- [Model Updates](https://json2vec.github.io/json2vec/guides/model-update/)
+- [AI Quickstart](https://json2vec.github.io/json2vec/ai-quickstart/)
+- [Why `json2vec`](https://json2vec.github.io/json2vec/motivation/)
+- [Query Paths](https://json2vec.github.io/json2vec/core-concepts/querypaths/)
+- [Built-In Data Types](https://json2vec.github.io/json2vec/core-concepts/data-types/)
+- [Learning Modes & Embeddings](https://json2vec.github.io/json2vec/core-concepts/embeddings/)
+- [Model Tree](https://json2vec.github.io/json2vec/core-concepts/model-tree/)
+- [Mutations](https://json2vec.github.io/json2vec/core-concepts/mutations/)
 - [Hello World](https://json2vec.github.io/json2vec/tutorials/hello-world/)
-- [Masked Pretraining](https://json2vec.github.io/json2vec/tutorials/pretraining/)
 - [Nested Supervised Training](https://json2vec.github.io/json2vec/tutorials/nested-supervised-training/)
+- [Masked Pretraining](https://json2vec.github.io/json2vec/tutorials/pretraining/)
 - [Supervised Tabular Training](https://json2vec.github.io/json2vec/tutorials/supervised-tabular-training/)
-- [Field Ablation](https://json2vec.github.io/json2vec/guides/field-ablation/)
+- [Field Importance](https://json2vec.github.io/json2vec/guides/field-importance/)
 - [Preprocessors](https://json2vec.github.io/json2vec/guides/preprocessors/)
-- [Tensorfield Extensions](https://json2vec.github.io/json2vec/guides/tensorfields/)
+- [Custom Data Types](https://json2vec.github.io/json2vec/data-types/tensorfields/)
 - [Serving](https://json2vec.github.io/json2vec/tutorials/serving/)
 - [API Reference](https://json2vec.github.io/json2vec/reference/api/)
 - [Whitepaper](https://json2vec.github.io/json2vec/whitepaper.pdf)
@@ -258,7 +259,7 @@ Configured `dataset.kwargs` are passed into the preprocessor, with unsupported k
 Each tensorfield plugin provides a request schema plus the model components
 needed to encode values, decode predictions, compute losses, and optionally
-serialize outputs. See [Tensorfield Extensions](https://json2vec.github.io/json2vec/guides/tensorfields/)
+serialize outputs. See [Custom Data Types](https://json2vec.github.io/json2vec/data-types/tensorfields/)
 for a custom plugin walkthrough. Built-in tensorfields share the base leaf
 options `name`, `query`, `pooling`, `weight`, `n_heads`, `n_linear`, `dropout`,
 `p_mask`, and `p_prune`.

{json2vec-0.4.4 → json2vec-0.4.6}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "json2vec"
-version = "0.4.4"
+version = "0.4.6"
 description = "{...} -> [*]"
 readme = "README.md"
 license = "Apache-2.0"
@@ -31,9 +31,11 @@ text = [
     "transformers>=4.55.0",
 ]
 docs = [
+    "litserve>=0.2.13",
     "mkdocs-material>=9.6",
     "mkdocs-jupyter>=0.26.3",
     "mkdocstrings[python]>=0.27",
+    "pydantic-settings>=2.10.1",
 ]
 [dependency-groups]

{json2vec-0.4.4 → json2vec-0.4.6}/src/json2vec/__init__.py RENAMED Viewed

@@ -1,4 +1,4 @@
-"""Public JSON2Vec SDK surface.
+"""Public `json2vec` SDK surface.
 The top-level package exports the constructors and helpers used by most
 applications: `Model.from_schema(...)` for model construction, tensorfield
@@ -6,6 +6,8 @@ request constructors such as `Category` and `Number`, data modules, schema
 mutation predicates, and the `@preprocess` decorator.
 """
+from typing import TYPE_CHECKING, Any
 from json2vec.architecture.root import (
     Model,
     MutationLockCallback,
@@ -15,6 +17,7 @@ from json2vec.architecture.root import (
     SchedulerConfig,
 )
 from json2vec.data.datasets import PolarsDataModule, StreamingDataModule
+from json2vec.inference.callback import Postprocessor, Writer
 from json2vec.preprocessors import PREPROCESSORS, Preprocessor, PreprocessorMode, preprocess
 from json2vec.structs.enums import AttentionMode, Component, Metric, ShardingStrategy, Strata, Suffix, TensorKey, Tokens
 from json2vec.structs.experiment import (
@@ -37,20 +40,73 @@ from json2vec.tensorfields.extensions.text import Request as Text
 from json2vec.tensorfields.extensions.vector import Request as Vector
 from json2vec.tensorfields.shared.vocabulary import VocabularySyncCallback
+if TYPE_CHECKING:
+    from json2vec.inference.deployment import (
+        API,
+        Accelerator,
+        BatchItem,
+        Deployment,
+        ErrorItem,
+        Input,
+        ModelSource,
+        UpdateOperation,
+    )
+_SERVING_EXPORTS = {
+    "API",
+    "Accelerator",
+    "BatchItem",
+    "Deployment",
+    "ErrorItem",
+    "Input",
+    "ModelSource",
+    "UpdateOperation",
+}
+def __getattr__(name: str) -> Any:
+    if name not in _SERVING_EXPORTS:
+        raise AttributeError(f"module 'json2vec' has no attribute {name!r}")
+    try:
+        from json2vec.inference import deployment
+    except ModuleNotFoundError as error:
+        if error.name in {"litserve", "pydantic_settings"}:
+            raise ModuleNotFoundError(
+                f"json2vec.{name} requires the serving extra; install with `pip install json2vec[serving]`."
+            ) from error
+        raise
+    value = getattr(deployment, name)
+    globals()[name] = value
+    return value
+def __dir__() -> list[str]:
+    return sorted([*globals(), *_SERVING_EXPORTS])
 __all__ = [
     "Address",
+    "API",
+    "Accelerator",
     "Array",
     "AttentionMode",
+    "BatchItem",
     "Category",
     "Component",
     "DateParts",
     "DecoderBase",
+    "Deployment",
     "EmbedderBase",
     "Entity",
+    "ErrorItem",
     "Hyperparameters",
+    "Input",
     "Leaf",
     "Metric",
     "Model",
+    "ModelSource",
     "MutationLockCallback",
     "NodeAttribute",
     "NodePredicate",
@@ -59,6 +115,7 @@ __all__ = [
     "PREPROCESSORS",
     "Plugin",
     "PolarsDataModule",
+    "Postprocessor",
     "Preprocessor",
     "PreprocessorMode",
     "RequestBase",
@@ -76,8 +133,10 @@ __all__ = [
     "TensorKey",
     "Text",
     "Tokens",
+    "UpdateOperation",
     "Vector",
     "VocabularySyncCallback",
+    "Writer",
     "predicate",
     "preprocess",
     "where",

{json2vec-0.4.4 → json2vec-0.4.6}/src/json2vec/architecture/checkpoint.py RENAMED Viewed

@@ -1,4 +1,4 @@
-"""Checkpoint serialization helpers for JSON2Vec models."""
+"""Checkpoint serialization helpers for `json2vec` models."""
 from __future__ import annotations

{json2vec-0.4.4 → json2vec-0.4.6}/src/json2vec/architecture/contracts.py RENAMED Viewed

@@ -93,7 +93,7 @@ def sanitize(
         require_core_tensors(module, address, tensorfield)
         require_tensor_devices(module, address, tensorfield)
         require_target_contract(module, address, tensorfield, strata=normalized)
-        require_mask_contract(module, address, tensorfield)
+        require_mask_contract(module, address, tensorfield, strata=normalized)
 def is_backoff_index(index: int, *, periodic_interval: int) -> bool:
@@ -292,7 +292,7 @@ def require_tensor_devices(module: "Model", address: Address, tensorfield: Tenso
         )
-def require_mask_contract(module: "Model", address: Address, tensorfield: TensorFieldBase) -> None:
+def require_mask_contract(module: "Model", address: Address, tensorfield: TensorFieldBase, *, strata: Strata) -> None:
     state = tensorfield.state
     trainable = tensorfield.trainable
     is_masked = state.eq(Tokens.masked.value)
@@ -301,7 +301,7 @@ def require_mask_contract(module: "Model", address: Address, tensorfield: Tensor
     if trainable.any() and not state.masked_select(trainable).eq(Tokens.masked.value).all():
         raise ForwardContractError(f"forward input '{address}' trainable positions must have masked state")
-    if not is_target and (is_masked & ~trainable).any():
+    if strata != Strata.predict and not is_target and (is_masked & ~trainable).any():
         raise ForwardContractError(f"forward input '{address}' has masked state where trainable is false")
     if not trainable.any():

{json2vec-0.4.4 → json2vec-0.4.6}/src/json2vec/architecture/encoder.py RENAMED Viewed

@@ -76,7 +76,7 @@ class ArrayEncoder(torch.nn.Module):
         self.encoder = torch.nn.ModuleList(layers)
         self.pool = LearnedQueryCrossAttention(
-            n_context=array.n_outputs,
+            n_context=1,
             d_model=hyperparameters.d_model,
             nhead=array.n_heads,
             dropout=dropout,

{json2vec-0.4.4 → json2vec-0.4.6}/src/json2vec/architecture/graph.py RENAMED Viewed

@@ -8,6 +8,8 @@ from typing import TYPE_CHECKING
 import torch
 from json2vec.architecture.node import NodeModule
+from json2vec.data.datasets.base import EncodedInput
+from json2vec.structs.enums import Strata
 from json2vec.structs.experiment import Hyperparameters
 from json2vec.structs.tree import Address, Node
@@ -19,9 +21,19 @@ class ModelGraph:
     """Build and rebuild runtime modules from schema hyperparameters."""
     @staticmethod
-    def build(hyperparameters: Hyperparameters, batch_size: int) -> tuple[torch.nn.ModuleDict, torch.Tensor]:
+    def example_forward_kwargs(hyperparameters: Hyperparameters, batch_size: int) -> dict[str, EncodedInput | Strata]:
         from json2vec.data.iterables import mock
+        return {
+            "inputs": mock(hyperparameters=hyperparameters, batch_size=batch_size),
+            "strata": Strata.predict,
+        }
+    @staticmethod
+    def build(
+        hyperparameters: Hyperparameters,
+        batch_size: int,
+    ) -> tuple[torch.nn.ModuleDict, dict[str, EncodedInput | Strata]]:
         nodes: torch.nn.ModuleDict[str, NodeModule] = torch.nn.ModuleDict()
         for address in hyperparameters.requests | hyperparameters.arrays:
@@ -31,7 +43,7 @@ class ModelGraph:
                 batch_size=batch_size,
             )
-        return nodes, mock(hyperparameters=hyperparameters, batch_size=batch_size)
+        return nodes, ModelGraph.example_forward_kwargs(hyperparameters=hyperparameters, batch_size=batch_size)
     @staticmethod
     def install(module: "Model") -> None:
@@ -72,8 +84,6 @@ class ModelGraph:
     @staticmethod
     def reset_selected(module: "Model", selected: list[Node], *, descendants: bool = False) -> None:
-        from json2vec.data.iterables import mock
         selected_by_address: dict[Address, Node] = {}
         for node in selected:
             if node.address in module.nodes:
@@ -94,7 +104,10 @@ class ModelGraph:
                 batch_size=module.batch_size,
             )
-        module.example_input_array = mock(hyperparameters=module.hyperparameters, batch_size=module.batch_size)
+        module.example_input_array = ModelGraph.example_forward_kwargs(
+            hyperparameters=module.hyperparameters,
+            batch_size=module.batch_size,
+        )
         device = module.device
         if isinstance(device, torch.device):
             module.to(device=device)

json2vec 0.4.4__tar.gz → 0.4.6__tar.gz

json2vec 0.4.4tar.gz → 0.4.6tar.gz