PyPI - data-annotations - Versions diffs - 2.6.0__tar.gz → 2.7.0__tar.gz - Mend

data-annotations 2.6.0tar.gz → 2.7.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (37) hide show

{data_annotations-2.6.0 → data_annotations-2.7.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: data-annotations
-Version: 2.6.0
+Version: 2.7.0
 Summary: Annotate data artifacts with provenance and descriptions
 Keywords: annotations,data,metadata,provenance,reproducibility
 Author: Rodrigo C.  G.  Pena
@@ -18,7 +18,7 @@ Classifier: Programming Language :: Python :: 3.14
 Classifier: Topic :: Scientific/Engineering
 Classifier: Topic :: Software Development :: Libraries :: Python Modules
 Requires-Dist: pydantic>=2.13.1
-Requires-Dist: pyyaml>=6.0.2 ; extra == 'cli'
+Requires-Dist: pyyaml>=6.0.2
 Requires-Dist: questionary>=2.1.1 ; extra == 'cli'
 Requires-Dist: typer>=0.16.0 ; extra == 'cli'
 Requires-Python: >=3.12
@@ -71,8 +71,9 @@ Or add it to a project with [uv](https://astral.sh/uv/):
 uv add data-annotations
 ```
-The command-line interface uses optional dependencies. Install the package with
-CLI support when you want to run `data-annotations` commands:
+The command-line interface uses optional dependencies for prompting and command
+parsing. Install the package with CLI support when you want to run
+`data-annotations` commands:
 ```bash
 pip install "data-annotations[cli]"
@@ -378,6 +379,39 @@ metadata to vary per call instead of staying fixed at decoration time, use
 `write_directory_annotation(...)` directly instead. See the example gallery in
 `examples/` for runnable examples of all approaches.
+The Python API can also load the same YAML answers payloads used by the
+CLI:
+```python
+from data_annotations.annotations import (
+    annotate_directory,
+    annotate_file,
+    record_file_annotation,
+)
+annotate_file(answers="participants.yaml")
+annotate_directory(answers="run-001.yaml")
+annotate_file(
+    "outputs/summary.txt",
+    answers={"title": "Run Summary", "summary": "Validation run summary."},
+)
+@record_file_annotation(answers="participants.yaml")
+def write_participants(artifact_path, input_path):
+    ...
+```
+If an answers payload includes `target`, the positional artifact path or directory
+may be omitted. When both are provided, they must resolve to the same path.
+Explicit Python keyword arguments override values from `answers`. Environment
+variables such as `$DATA_ROOT` and `${DATA_ROOT}` are expanded inside string
+values in both YAML files and mapping payloads.
+For directory decorators, the wrapped function still provides the produced output
+inventory. Matching `answers.artifacts` entries can supply titles, summaries,
+kinds, fields, primary keys, and missing-value codes for those returned paths.
 ### When To Use Decorators Vs Direct Functions
 If a function is only a final serializer for already-prepared data, prefer the
@@ -772,6 +806,32 @@ Answers files may also use schema-style aliases such as `subject.path`,
 `description.artifacts`, `description.artifact_groups`, `provenance.inputs`,
 and `provenance.params`.
+Answers can request selected provenance fields from the current runtime instead
+of taking them from the payload:
+```yaml
+target: path/to/run-001
+title: Processing outputs
+summary: Files produced by the shell processing workflow.
+provenance:
+    command: bash generate_some_data_artifact.sh
+    script: generate_some_data_artifact.sh
+    infer_from_runtime:
+        - runtime
+        - git
+        - source_code
+```
+`runtime` covers `created_at`, `hostname`, `username`, and `slurm_job_id`. `git`
+covers Git commit, branch, dirty state, remote, tags, and `git describe`.
+`source_code` leaves the source-code reference derived from runtime Git metadata.
+This is especially useful for timestamps, host/user and SLURM context, Git state,
+and derived `source_code`. Provide generation `command` and `script` explicitly
+in CLI answers files, because the runtime command and script would describe the
+`data-annotations annotate ...` invocation rather than the script that generated
+the artifact.
 For source-code recovery, `provenance.source_code.kind` may be `git`, `archive`,
 `file`, or `uri`. Git sources use `uri` plus `revision`; archive and file
 sources use `uri` or `download_uri` plus an optional `sha256`; `path` points to

{data_annotations-2.6.0 → data_annotations-2.7.0}/README.md RENAMED Viewed

@@ -41,8 +41,9 @@ Or add it to a project with [uv](https://astral.sh/uv/):
 uv add data-annotations
 ```
-The command-line interface uses optional dependencies. Install the package with
-CLI support when you want to run `data-annotations` commands:
+The command-line interface uses optional dependencies for prompting and command
+parsing. Install the package with CLI support when you want to run
+`data-annotations` commands:
 ```bash
 pip install "data-annotations[cli]"
@@ -348,6 +349,39 @@ metadata to vary per call instead of staying fixed at decoration time, use
 `write_directory_annotation(...)` directly instead. See the example gallery in
 `examples/` for runnable examples of all approaches.
+The Python API can also load the same YAML answers payloads used by the
+CLI:
+```python
+from data_annotations.annotations import (
+    annotate_directory,
+    annotate_file,
+    record_file_annotation,
+)
+annotate_file(answers="participants.yaml")
+annotate_directory(answers="run-001.yaml")
+annotate_file(
+    "outputs/summary.txt",
+    answers={"title": "Run Summary", "summary": "Validation run summary."},
+)
+@record_file_annotation(answers="participants.yaml")
+def write_participants(artifact_path, input_path):
+    ...
+```
+If an answers payload includes `target`, the positional artifact path or directory
+may be omitted. When both are provided, they must resolve to the same path.
+Explicit Python keyword arguments override values from `answers`. Environment
+variables such as `$DATA_ROOT` and `${DATA_ROOT}` are expanded inside string
+values in both YAML files and mapping payloads.
+For directory decorators, the wrapped function still provides the produced output
+inventory. Matching `answers.artifacts` entries can supply titles, summaries,
+kinds, fields, primary keys, and missing-value codes for those returned paths.
 ### When To Use Decorators Vs Direct Functions
 If a function is only a final serializer for already-prepared data, prefer the
@@ -742,6 +776,32 @@ Answers files may also use schema-style aliases such as `subject.path`,
 `description.artifacts`, `description.artifact_groups`, `provenance.inputs`,
 and `provenance.params`.
+Answers can request selected provenance fields from the current runtime instead
+of taking them from the payload:
+```yaml
+target: path/to/run-001
+title: Processing outputs
+summary: Files produced by the shell processing workflow.
+provenance:
+    command: bash generate_some_data_artifact.sh
+    script: generate_some_data_artifact.sh
+    infer_from_runtime:
+        - runtime
+        - git
+        - source_code
+```
+`runtime` covers `created_at`, `hostname`, `username`, and `slurm_job_id`. `git`
+covers Git commit, branch, dirty state, remote, tags, and `git describe`.
+`source_code` leaves the source-code reference derived from runtime Git metadata.
+This is especially useful for timestamps, host/user and SLURM context, Git state,
+and derived `source_code`. Provide generation `command` and `script` explicitly
+in CLI answers files, because the runtime command and script would describe the
+`data-annotations annotate ...` invocation rather than the script that generated
+the artifact.
 For source-code recovery, `provenance.source_code.kind` may be `git`, `archive`,
 `file`, or `uri`. Git sources use `uri` plus `revision`; archive and file
 sources use `uri` or `download_uri` plus an optional `sha256`; `path` points to

{data_annotations-2.6.0 → data_annotations-2.7.0}/pyproject.toml RENAMED Viewed

@@ -1,6 +1,6 @@
 [project]
 name = "data-annotations"
-version = "2.6.0"
+version = "2.7.0"
 description = "Annotate data artifacts with provenance and descriptions"
 readme = "README.md"
 authors = [
@@ -9,7 +9,7 @@ authors = [
 license = "BSD-3-Clause"
 license-files = ["LICENSE"]
 requires-python = ">=3.12"
-dependencies = ["pydantic>=2.13.1"]
+dependencies = ["pydantic>=2.13.1", "PyYAML>=6.0.2"]
 keywords = ["annotations", "data", "metadata", "provenance", "reproducibility"]
 classifiers = [
   "Development Status :: 4 - Beta",
@@ -30,7 +30,7 @@ Changelog = "https://gitlab.com/ceda-unibas/tools/data-annotations/-/blob/main/C
 Issues = "https://gitlab.com/ceda-unibas/tools/data-annotations/-/issues"
 [project.optional-dependencies]
-cli = ["PyYAML>=6.0.2", "questionary>=2.1.1", "typer>=0.16.0"]
+cli = ["questionary>=2.1.1", "typer>=0.16.0"]
 [project.scripts]
 data-annotations = "data_annotations.cli:main"

{data_annotations-2.6.0 → data_annotations-2.7.0}/src/data_annotations/annotations/__init__.py RENAMED Viewed

@@ -1,3 +1,10 @@
+from .answers import (
+    AnswersError,
+    DirectoryAnswers,
+    FileAnswers,
+    load_directory_answers,
+    load_file_answers,
+)
 from .models import (
     DirectoryAnnotationDocument,
     DirectoryAnnotationResult,
@@ -17,13 +24,18 @@ from .writers import (
 __all__ = [
     "annotate_directory",
     "annotate_file",
+    "load_directory_answers",
+    "load_file_answers",
     "record_directory_annotation",
     "record_file_annotation",
     "write_directory_annotation",
     "write_file_annotation",
+    "AnswersError",
+    "DirectoryAnswers",
     "DirectoryAnnotationDocument",
     "DirectoryAnnotationResult",
     "DirectoryArtifactSubject",
+    "FileAnswers",
     "FileAnnotationDocument",
     "FileAnnotationResult",
     "FileArtifactSubject",

{data_annotations-2.6.0/src/data_annotations/cli_app → data_annotations-2.7.0/src/data_annotations/annotations}/answers.py RENAMED Viewed

@@ -1,18 +1,20 @@
 import os
 import re
 import shlex
+from collections.abc import Mapping
 from pathlib import Path
-from typing import Any, Literal
+from typing import Any, Literal, TypeAlias
 import yaml
 from pydantic import BaseModel, ConfigDict, Field, ValidationError, field_validator
+from pydantic import model_validator
 from data_annotations.description import FieldDefinition
 from data_annotations.provenance.models import ArtifactKind, SourceCodeReference
 class AnswersError(ValueError):
-    """Raised when a CLI answers file cannot be used."""
+    """Raised when an annotation answers payload cannot be used."""
 _ENV_VAR_PATTERN = re.compile(
@@ -79,6 +81,46 @@ _PROVENANCE_KEYS = {
     "git_tags",
     "git_describe",
     "source_code",
+    "infer_from_runtime",
+}
+_RUNTIME_CONTEXT_FIELDS = {
+    "created_at",
+    "hostname",
+    "username",
+    "slurm_job_id",
+}
+_GIT_RUNTIME_FIELDS = {
+    "git_sha",
+    "git_branch",
+    "git_dirty",
+    "git_remote_name",
+    "git_remote_url",
+    "git_tags",
+    "git_describe",
+}
+_RUNTIME_INFERENCE_GROUPS = {
+    "runtime": _RUNTIME_CONTEXT_FIELDS,
+    "git": _GIT_RUNTIME_FIELDS,
+}
+_RUNTIME_INFERENCE_FIELDS = (
+    _RUNTIME_CONTEXT_FIELDS
+    | _GIT_RUNTIME_FIELDS
+    | set(_RUNTIME_INFERENCE_GROUPS)
+    | {"source_code"}
+)
+_UNSUPPORTED_RUNTIME_INFERENCE_FIELDS = {
+    "command",
+    "script",
+    "script_repo_path",
+}
+_EXPLICIT_PROVENANCE_OVERRIDE_FIELDS = {
+    "command",
+    "script",
+    "script_repo_path",
+    "function",
+    *_GIT_RUNTIME_FIELDS,
+    "source_code",
 }
@@ -97,6 +139,50 @@ class ProvenanceAnswers(BaseModel):
     git_tags: list[str] | None = None
     git_describe: str | None = None
     source_code: SourceCodeReference | None = None
+    infer_from_runtime: list[str] = Field(default_factory=list)
+    @field_validator("infer_from_runtime", mode="before")
+    @classmethod
+    def _coerce_runtime_inference_fields(cls, value: Any) -> Any:
+        if value is None:
+            return []
+        if isinstance(value, str):
+            return [value]
+        return value
+    @field_validator("infer_from_runtime")
+    @classmethod
+    def _validate_runtime_inference_fields(cls, values: list[str]) -> list[str]:
+        normalized: list[str] = []
+        for value in values:
+            if value in _UNSUPPORTED_RUNTIME_INFERENCE_FIELDS:
+                raise ValueError(
+                    "runtime inference is not supported for "
+                    f"provenance.{value}; provide it explicitly"
+                )
+            if value not in _RUNTIME_INFERENCE_FIELDS:
+                allowed = sorted(_RUNTIME_INFERENCE_FIELDS)
+                raise ValueError(
+                    f"unknown runtime inference field {value!r}; "
+                    "expected one of: " + ", ".join(allowed)
+                )
+            if value not in normalized:
+                normalized.append(value)
+        return normalized
+    @model_validator(mode="after")
+    def _validate_runtime_inference_conflicts(self) -> "ProvenanceAnswers":
+        inferred = self.runtime_inference_fields()
+        conflicts = sorted(
+            field
+            for field in inferred & _EXPLICIT_PROVENANCE_OVERRIDE_FIELDS
+            if field in self.model_fields_set
+        )
+        if conflicts:
+            raise ValueError(
+                "cannot both set and infer provenance field(s): " + ", ".join(conflicts)
+            )
+        return self
     def command_tokens(self) -> list[str] | None:
         if self.command is None:
@@ -108,6 +194,12 @@ class ProvenanceAnswers(BaseModel):
         except ValueError as exc:
             raise AnswersError(f"invalid provenance.command: {exc}") from exc
+    def runtime_inference_fields(self) -> set[str]:
+        fields: set[str] = set()
+        for field in self.infer_from_runtime:
+            fields.update(_RUNTIME_INFERENCE_GROUPS.get(field, {field}))
+        return fields
 class BaseAnswers(BaseModel):
     model_config = ConfigDict(extra="forbid")
@@ -177,12 +269,16 @@ class DirectoryAnswers(BaseAnswers):
     checksums: dict[str, str] = Field(default_factory=dict)
-def load_file_answers(path: str | Path) -> FileAnswers:
-    return _validate_answers(path, mode="file")
+FileAnswersInput: TypeAlias = str | Path | Mapping[str, Any] | FileAnswers
+DirectoryAnswersInput: TypeAlias = str | Path | Mapping[str, Any] | DirectoryAnswers
+def load_file_answers(source: FileAnswersInput) -> FileAnswers:
+    return _validate_answers(source, mode="file")
-def load_directory_answers(path: str | Path) -> DirectoryAnswers:
-    return _validate_answers(path, mode="directory")
+def load_directory_answers(source: DirectoryAnswersInput) -> DirectoryAnswers:
+    return _validate_answers(source, mode="directory")
 def check_answers(path: str | Path) -> tuple[Literal["file", "directory"], Path]:
@@ -230,17 +326,29 @@ def require_complete_directory_answers(
 def _validate_answers(
-    path: str | Path,
+    source: FileAnswersInput | DirectoryAnswersInput,
     *,
     mode: Literal["file", "directory"],
 ) -> Any:
-    normalized = _normalize_answers(_load_raw_answers(path))
+    if mode == "file" and isinstance(source, FileAnswers):
+        return source
+    if mode == "directory" and isinstance(source, DirectoryAnswers):
+        return source
+    normalized = _normalize_answers(_load_raw_answers(source))
     model = FileAnswers if mode == "file" else DirectoryAnswers
     return _model_validate(model, normalized)
-def _load_raw_answers(path: str | Path) -> dict[str, Any]:
-    answers_path = Path(path).expanduser()
+def _load_raw_answers(
+    source: str | Path | Mapping[str, Any] | BaseAnswers,
+) -> dict[str, Any]:
+    if isinstance(source, BaseAnswers):
+        return source.model_dump()
+    if isinstance(source, Mapping):
+        return _expand_env_vars(dict(source), path="$")
+    answers_path = Path(source).expanduser()
     if not answers_path.is_file():
         raise AnswersError(f"answers file not found: {answers_path}")
     try:
@@ -416,3 +524,22 @@ def _missing_required_common_fields(
 def _has_text(value: Any) -> bool:
     return isinstance(value, str) and bool(value.strip())
+__all__ = [
+    "AnswersError",
+    "BaseAnswers",
+    "ChildBundleAnswers",
+    "DirectoryAnswers",
+    "DirectoryAnswersInput",
+    "DirectoryArtifactAnswers",
+    "DirectoryArtifactGroupAnswers",
+    "FileAnswers",
+    "FileAnswersInput",
+    "ProvenanceAnswers",
+    "check_answers",
+    "load_directory_answers",
+    "load_file_answers",
+    "require_complete_directory_answers",
+    "require_complete_file_answers",
+]

{data_annotations-2.6.0 → data_annotations-2.7.0}/src/data_annotations/annotations/decorators.py RENAMED Viewed

@@ -17,11 +17,81 @@ from data_annotations.description.models import DocumentedArtifact, FieldDefinit
 from data_annotations.provenance import writers as provenance_writers
 from data_annotations.provenance.models import ArtifactKind, ChecksumPolicy
+from . import answers as answer_payloads
 from .writers import annotate_directory, annotate_file
+def _documented_artifact_from_answer(
+    artifact: answer_payloads.DirectoryArtifactAnswers,
+) -> DocumentedArtifact:
+    return DocumentedArtifact(
+        path=artifact.path,
+        kind=artifact.kind,
+        title=artifact.title,
+        summary=artifact.summary,
+        fields=list(artifact.fields),
+        primary_key=list(artifact.primary_key),
+        missing_value_codes=dict(artifact.missing_value_codes),
+    )
+def _directory_relative_label(path: str | Path, output_dir: Path) -> str:
+    resolved_path = provenance_writers._resolve_directory_member_path(path, output_dir)
+    return provenance_writers._directory_relative_label(resolved_path, output_dir)
+def _merge_artifact_answer(
+    artifact: DocumentedArtifact,
+    answer: answer_payloads.DirectoryArtifactAnswers,
+) -> DocumentedArtifact:
+    updates: dict[str, Any] = {}
+    if "kind" in answer.model_fields_set:
+        updates["kind"] = answer.kind
+    if "title" in answer.model_fields_set:
+        updates["title"] = answer.title
+    if "summary" in answer.model_fields_set:
+        updates["summary"] = answer.summary
+    if "fields" in answer.model_fields_set:
+        updates["fields"] = list(answer.fields)
+    if "primary_key" in answer.model_fields_set:
+        updates["primary_key"] = list(answer.primary_key)
+    if "missing_value_codes" in answer.model_fields_set:
+        updates["missing_value_codes"] = dict(answer.missing_value_codes)
+    return artifact.model_copy(update=updates)
+def _merge_directory_artifacts_from_answers(
+    artifacts: list[DocumentedArtifact],
+    answers: answer_payloads.DirectoryAnswers | None,
+    *,
+    output_dir: Path,
+) -> list[DocumentedArtifact]:
+    if answers is None or not answers.artifacts:
+        return artifacts
+    answer_by_path = {
+        _directory_relative_label(answer.path, output_dir): answer
+        for answer in answers.artifacts
+    }
+    if not artifacts:
+        return [
+            _documented_artifact_from_answer(answer) for answer in answers.artifacts
+        ]
+    merged: list[DocumentedArtifact] = []
+    for artifact in artifacts:
+        answer = answer_by_path.get(
+            _directory_relative_label(artifact.path, output_dir)
+        )
+        merged.append(
+            _merge_artifact_answer(artifact, answer) if answer is not None else artifact
+        )
+    return merged
 def record_file_annotation(
     *,
+    answers: answer_payloads.FileAnswersInput | None = None,
     artifact_path_arg: str = "artifact_path",
     input_args: tuple[str, ...] = DEFAULT_INPUT_ARGS,
     title: str | None = None,
@@ -31,10 +101,9 @@ def record_file_annotation(
     missing_value_codes: dict[str, str] | None = None,
     acquisition_context: dict[str, Any] | None = None,
     generation_context: dict[str, Any] | None = None,
-    artifact_kind: ArtifactKind = "other",
+    artifact_kind: ArtifactKind | None = None,
     artifact_sha256: str | None = None,
     write_readme: bool = True,
-    write_schema: bool | None = None,
     annotation_suffix: str = ".annotation.json",
     readme_suffix: str = ".README.md",
     checksum_policy: ChecksumPolicy = "auto",
@@ -67,6 +136,7 @@ def record_file_annotation(
             inputs = extract_inputs(bound, input_args=input_args)
             annotate_file(
                 artifact_path,
+                answers=answers,
                 title=title,
                 summary=summary,
                 fields=fields,
@@ -80,7 +150,6 @@ def record_file_annotation(
                 inputs=inputs,
                 function=fn,
                 write_readme=write_readme,
-                write_schema=write_schema,
                 annotation_suffix=annotation_suffix,
                 readme_suffix=readme_suffix,
                 checksum_policy=checksum_policy,
@@ -96,6 +165,7 @@ def record_file_annotation(
 def record_directory_annotation(
     *,
+    answers: answer_payloads.DirectoryAnswersInput | None = None,
     output_dir_arg: str = "output_dir",
     input_args: tuple[str, ...] = DEFAULT_INPUT_ARGS,
     title: str | None = None,
@@ -103,7 +173,6 @@ def record_directory_annotation(
     acquisition_context: dict[str, Any] | None = None,
     generation_context: dict[str, Any] | None = None,
     write_readme: bool = True,
-    write_schema: bool | None = None,
     annotation_filename: str = "data-annotations.json",
     readme_filename: str = "README.md",
     checksum_policy: ChecksumPolicy = "auto",
@@ -138,10 +207,20 @@ def record_directory_annotation(
             non_child_items, child_bundles = split_child_bundles(items)
             artifact_items, artifact_groups = split_artifact_groups(non_child_items)
             output_dir = argument_path(bound, argument_name=output_dir_arg)
+            directory_answers = (
+                answer_payloads.load_directory_answers(answers)
+                if answers is not None
+                else None
+            )
             artifacts: list[DocumentedArtifact] = coerce_documented_artifacts(
                 artifact_items,
                 normalize_paths=False,
             )
+            artifacts = _merge_directory_artifacts_from_answers(
+                artifacts,
+                directory_answers,
+                output_dir=output_dir,
+            )
             params = extract_params(
                 bound,
                 target_args=(output_dir_arg,),
@@ -150,9 +229,10 @@ def record_directory_annotation(
             inputs = extract_inputs(bound, input_args=input_args)
             annotate_directory(
                 output_dir,
+                answers=directory_answers,
                 artifacts=artifacts,
-                artifact_groups=artifact_groups,
-                child_bundles=child_bundles,
+                artifact_groups=artifact_groups or None,
+                child_bundles=child_bundles or None,
                 title=title,
                 summary=summary,
                 acquisition_context=acquisition_context,
@@ -161,7 +241,6 @@ def record_directory_annotation(
                 inputs=inputs,
                 function=fn,
                 write_readme=write_readme,
-                write_schema=write_schema,
                 annotation_filename=annotation_filename,
                 readme_filename=readme_filename,
                 checksum_policy=checksum_policy,

{data_annotations-2.6.0 → data_annotations-2.7.0}/src/data_annotations/annotations/writers.py RENAMED Viewed

@@ -3,8 +3,8 @@ from pathlib import Path
 from typing import Any, Callable
 from data_annotations.description import (
-    ArtifactGroupDescription,
     ArtifactDescription,
+    ArtifactGroupDescription,
     DirectoryDescription,
     DocumentedArtifact,
     DocumentedArtifactGroup,
@@ -16,12 +16,13 @@ from data_annotations.description import (
 from data_annotations.provenance import (
     ArtifactKind,
     BaseProvenance,
-    ChildBundle,
     ChecksumPolicy,
+    ChildBundle,
     ProducedFile,
 )
 from data_annotations.provenance import writers as provenance_writers
+from . import answers as answer_payloads
 from .models import (
     DirectoryAnnotationDocument,
     DirectoryAnnotationResult,
@@ -31,6 +32,21 @@ from .models import (
     FileArtifactSubject,
 )
+_PROVENANCE_ANSWER_OVERRIDE_FIELDS = (
+    "command",
+    "script",
+    "script_repo_path",
+    "function",
+    "git_sha",
+    "git_branch",
+    "git_dirty",
+    "git_remote_name",
+    "git_remote_url",
+    "git_tags",
+    "git_describe",
+    "source_code",
+)
 def _validated_file_readme_fields(
     *, title: str | None, summary: str | None
@@ -144,6 +160,121 @@ def _coerce_fields(
     return [FieldDefinition.model_validate(field) for field in (fields or [])]
+def _target_from_answers(value: str | None) -> Path | None:
+    if value is None or not value.strip():
+        return None
+    return Path(value).expanduser().resolve()
+def _resolve_answer_target(
+    explicit_target: str | Path | None,
+    answers_target: str | None,
+    *,
+    label: str,
+) -> str | Path:
+    explicit_path = (
+        Path(explicit_target).expanduser().resolve()
+        if explicit_target is not None
+        else None
+    )
+    answers_path = _target_from_answers(answers_target)
+    if explicit_path is None and answers_path is None:
+        raise ValueError(f"{label} is required unless answers supplies target")
+    if (
+        explicit_path is not None
+        and answers_path is not None
+        and explicit_path != answers_path
+    ):
+        raise ValueError(
+            f"{label} does not match answers target: {explicit_path} != {answers_path}"
+        )
+    if explicit_target is not None:
+        return explicit_target
+    if answers_path is None:
+        raise ValueError(f"{label} is required unless answers supplies target")
+    return answers_path
+def _normalize_answer_git_tags(value: list[str] | None) -> list[str]:
+    if value is None:
+        return []
+    return sorted({tag.strip() for tag in value if tag.strip()})
+def _provenance_overrides_from_answers(
+    answers: answer_payloads.BaseAnswers | None,
+    *,
+    function: Callable[..., Any] | None,
+) -> dict[str, Any] | None:
+    if answers is None:
+        return None
+    provenance = answers.provenance
+    explicit_fields = provenance.model_fields_set
+    inferred_fields = provenance.runtime_inference_fields()
+    overrides: dict[str, Any] = {}
+    for field in _PROVENANCE_ANSWER_OVERRIDE_FIELDS:
+        if field in inferred_fields or field not in explicit_fields:
+            continue
+        if field == "function" and function is not None:
+            continue
+        if field == "command":
+            overrides[field] = provenance.command_tokens()
+        elif field == "git_tags":
+            overrides[field] = _normalize_answer_git_tags(provenance.git_tags)
+        else:
+            overrides[field] = getattr(provenance, field)
+    return overrides or None
+def _documented_artifacts_from_answers(
+    artifacts: list[answer_payloads.DirectoryArtifactAnswers],
+) -> list[DocumentedArtifact]:
+    return [
+        DocumentedArtifact(
+            path=artifact.path,
+            kind=artifact.kind,
+            title=artifact.title,
+            summary=artifact.summary,
+            fields=list(artifact.fields),
+            primary_key=list(artifact.primary_key),
+            missing_value_codes=dict(artifact.missing_value_codes),
+        )
+        for artifact in artifacts
+    ]
+def _documented_artifact_groups_from_answers(
+    groups: list[answer_payloads.DirectoryArtifactGroupAnswers],
+) -> list[DocumentedArtifactGroup]:
+    return [
+        DocumentedArtifactGroup(
+            title=group.title,
+            summary=group.summary,
+            kind=group.kind,
+            paths=list(group.paths),
+            selector=group.selector,
+            fields=list(group.fields),
+            primary_key=list(group.primary_key),
+            missing_value_codes=dict(group.missing_value_codes),
+        )
+        for group in groups
+    ]
+def _child_bundles_from_answers(
+    child_bundles: list[answer_payloads.ChildBundleAnswers],
+) -> list[ChildBundle]:
+    return [
+        ChildBundle(
+            path=child_bundle.path,
+            annotation_path=child_bundle.annotation_path,
+            content_digest=child_bundle.content_digest,
+        )
+        for child_bundle in child_bundles
+    ]
 def _build_file_annotation_document(
     artifact_path: str | Path,
     *,
@@ -382,8 +513,9 @@ def write_directory_annotation(
 def annotate_file(
-    artifact_path: str | Path,
+    artifact_path: str | Path | None = None,
     *,
+    answers: answer_payloads.FileAnswersInput | None = None,
     title: str | None = None,
     summary: str | None = None,
     fields: list[FieldDefinition] | None = None,
@@ -391,33 +523,87 @@ def annotate_file(
     missing_value_codes: dict[str, str] | None = None,
     acquisition_context: dict[str, Any] | None = None,
     generation_context: dict[str, Any] | None = None,
-    artifact_kind: ArtifactKind = "other",
+    artifact_kind: ArtifactKind | None = None,
     artifact_sha256: str | None = None,
     params: dict[str, Any] | None = None,
     inputs: Sequence[str | Path] | None = None,
     function: Callable[..., Any] | None = None,
     write_readme: bool = True,
-    write_schema: bool | None = None,
     annotation_suffix: str = ".annotation.json",
     readme_suffix: str = ".README.md",
     checksum_policy: ChecksumPolicy = "auto",
     max_checksum_bytes: int = provenance_writers.DEFAULT_MAX_CHECKSUM_BYTES,
     checksum_overrides: Mapping[str | Path, str] | None = None,
 ) -> FileAnnotationResult:
-    document = _build_file_annotation_document(
+    file_answers = (
+        answer_payloads.load_file_answers(answers) if answers is not None else None
+    )
+    selected_artifact_path = _resolve_answer_target(
         artifact_path,
-        title=title,
-        summary=summary,
-        fields=fields,
-        primary_key=primary_key,
-        missing_value_codes=missing_value_codes,
+        file_answers.target if file_answers is not None else None,
+        label="artifact_path",
+    )
+    title_value = (
+        title if title is not None else (file_answers.title if file_answers else None)
+    )
+    summary_value = (
+        summary
+        if summary is not None
+        else (file_answers.summary if file_answers else None)
+    )
+    fields_value = (
+        fields
+        if fields is not None
+        else (list(file_answers.fields) if file_answers else None)
+    )
+    primary_key_value = (
+        primary_key
+        if primary_key is not None
+        else (list(file_answers.primary_key) if file_answers else None)
+    )
+    missing_value_codes_value = (
+        missing_value_codes
+        if missing_value_codes is not None
+        else (dict(file_answers.missing_value_codes) if file_answers else None)
+    )
+    artifact_kind_value = (
+        artifact_kind
+        if artifact_kind is not None
+        else (file_answers.kind if file_answers is not None else "other")
+    )
+    artifact_sha256_value = (
+        artifact_sha256
+        if artifact_sha256 is not None
+        else (file_answers.sha256 if file_answers else None)
+    )
+    params_value = (
+        params
+        if params is not None
+        else (dict(file_answers.params) if file_answers else None)
+    )
+    inputs_value = (
+        inputs
+        if inputs is not None
+        else (list(file_answers.inputs) if file_answers else None)
+    )
+    document = _build_file_annotation_document(
+        selected_artifact_path,
+        title=title_value,
+        summary=summary_value,
+        fields=fields_value,
+        primary_key=primary_key_value,
+        missing_value_codes=missing_value_codes_value,
         acquisition_context=acquisition_context,
         generation_context=generation_context,
-        artifact_kind=artifact_kind,
-        artifact_sha256=artifact_sha256,
-        params=params,
-        inputs=inputs,
+        artifact_kind=artifact_kind_value,
+        artifact_sha256=artifact_sha256_value,
+        params=params_value,
+        inputs=inputs_value,
         function=function,
+        provenance_overrides=_provenance_overrides_from_answers(
+            file_answers,
+            function=function,
+        ),
         checksum_policy=checksum_policy,
         max_checksum_bytes=max_checksum_bytes,
         checksum_overrides=checksum_overrides,
@@ -430,7 +616,7 @@ def annotate_file(
     readme_path: Path | None = None
     if write_readme:
-        _validated_file_readme_fields(title=title, summary=summary)
+        _validated_file_readme_fields(title=title_value, summary=summary_value)
         readme_path = write_file_readme(
             Path(str(artifact_path) + readme_suffix),
             artifact_path=document.subject.path,
@@ -446,9 +632,10 @@ def annotate_file(
 def annotate_directory(
-    output_dir: str | Path,
+    output_dir: str | Path | None = None,
     *,
-    artifacts: list[DocumentedArtifact],
+    answers: answer_payloads.DirectoryAnswersInput | None = None,
+    artifacts: list[DocumentedArtifact] | None = None,
     artifact_groups: list[DocumentedArtifactGroup] | None = None,
     child_bundles: list[ChildBundle] | None = None,
     title: str | None = None,
@@ -459,28 +646,97 @@ def annotate_directory(
     inputs: Sequence[str | Path] | None = None,
     function: Callable[..., Any] | None = None,
     write_readme: bool = True,
-    write_schema: bool | None = None,
     annotation_filename: str = "data-annotations.json",
     readme_filename: str = "README.md",
     checksum_policy: ChecksumPolicy = "auto",
     max_checksum_bytes: int = provenance_writers.DEFAULT_MAX_CHECKSUM_BYTES,
     checksum_overrides: Mapping[str | Path, str] | None = None,
 ) -> DirectoryAnnotationResult:
-    document = _build_directory_annotation_document(
+    directory_answers = (
+        answer_payloads.load_directory_answers(answers) if answers is not None else None
+    )
+    selected_output_dir = _resolve_answer_target(
         output_dir,
-        artifacts=artifacts,
-        artifact_groups=artifact_groups,
-        child_bundles=child_bundles,
-        title=title,
-        summary=summary,
+        directory_answers.target if directory_answers is not None else None,
+        label="output_dir",
+    )
+    artifacts_value = (
+        artifacts
+        if artifacts is not None
+        else (
+            _documented_artifacts_from_answers(directory_answers.artifacts)
+            if directory_answers
+            else None
+        )
+    )
+    if artifacts_value is None:
+        raise ValueError("artifacts is required unless answers supplies artifacts")
+    artifact_groups_value = (
+        artifact_groups
+        if artifact_groups is not None
+        else (
+            _documented_artifact_groups_from_answers(directory_answers.artifact_groups)
+            if directory_answers
+            else None
+        )
+    )
+    child_bundles_value = (
+        child_bundles
+        if child_bundles is not None
+        else (
+            _child_bundles_from_answers(directory_answers.child_bundles)
+            if directory_answers
+            else None
+        )
+    )
+    title_value = (
+        title
+        if title is not None
+        else (directory_answers.title if directory_answers else None)
+    )
+    summary_value = (
+        summary
+        if summary is not None
+        else (directory_answers.summary if directory_answers else None)
+    )
+    params_value = (
+        params
+        if params is not None
+        else (dict(directory_answers.params) if directory_answers else None)
+    )
+    inputs_value = (
+        inputs
+        if inputs is not None
+        else (list(directory_answers.inputs) if directory_answers else None)
+    )
+    checksum_overrides_value: dict[str | Path, str] | None
+    if directory_answers is not None and directory_answers.checksums:
+        checksum_overrides_value = dict(directory_answers.checksums)
+        if checksum_overrides is not None:
+            checksum_overrides_value.update(checksum_overrides)
+    else:
+        checksum_overrides_value = (
+            dict(checksum_overrides) if checksum_overrides is not None else None
+        )
+    document = _build_directory_annotation_document(
+        selected_output_dir,
+        artifacts=artifacts_value,
+        artifact_groups=artifact_groups_value,
+        child_bundles=child_bundles_value,
+        title=title_value,
+        summary=summary_value,
         acquisition_context=acquisition_context,
         generation_context=generation_context,
-        params=params,
-        inputs=inputs,
+        params=params_value,
+        inputs=inputs_value,
         function=function,
+        provenance_overrides=_provenance_overrides_from_answers(
+            directory_answers,
+            function=function,
+        ),
         checksum_policy=checksum_policy,
         max_checksum_bytes=max_checksum_bytes,
-        checksum_overrides=checksum_overrides,
+        checksum_overrides=checksum_overrides_value,
     )
     output_dir = Path(document.subject.path)
     annotation_path = _write_annotation_document(
@@ -490,7 +746,7 @@ def annotate_directory(
     readme_path: Path | None = None
     if write_readme:
-        _validated_directory_readme_fields(title=title, summary=summary)
+        _validated_directory_readme_fields(title=title_value, summary=summary_value)
         readme_path = write_directory_readme(
             output_dir / readme_filename,
             output_dir=document.subject.path,

{data_annotations-2.6.0 → data_annotations-2.7.0}/src/data_annotations/cli_app/annotate/helpers.py RENAMED Viewed

@@ -278,6 +278,11 @@ def _selected_source_code(
         source_sha256,
     ]
     if not any(value is not None for value in source_values):
+        if (
+            answers is not None
+            and "source_code" in answers.provenance.runtime_inference_fields()
+        ):
+            return None
         return answers.provenance.source_code if answers is not None else None
     if source_kind is None:
@@ -297,6 +302,34 @@ def _selected_source_code(
     )
+def _runtime_inferred_fields(answers: answer_files.BaseAnswers | None) -> set[str]:
+    if answers is None:
+        return set()
+    return answers.provenance.runtime_inference_fields()
+def _filter_runtime_inferred_overrides(
+    overrides: dict[str, Any],
+    *,
+    answers: answer_files.BaseAnswers | None,
+    explicit_fields: set[str],
+) -> dict[str, Any]:
+    inferred_fields = _runtime_inferred_fields(answers)
+    if not inferred_fields:
+        return overrides
+    if "source_code" in inferred_fields:
+        inferred_fields = inferred_fields | {
+            "git_remote_url",
+            "git_sha",
+            "script_repo_path",
+        }
+    return {
+        key: value
+        for key, value in overrides.items()
+        if key not in inferred_fields or key in explicit_fields
+    }
 def _validate_source_code_git_conflicts(
     source_code: SourceCodeReference | None,
     *,
@@ -374,6 +407,32 @@ def _collect_post_hoc_provenance_from_sources(
     selected_inputs = _selected_inputs(input_values, answers)
     selected_params = _selected_params(param_values, answers)
     answer_command_tokens = _command_tokens_from_answers(answers)
+    source_code_cli_values = {
+        source_kind,
+        source_uri,
+        source_download_uri,
+        source_path,
+        source_revision,
+        source_sha256,
+    }
+    explicit_override_fields = {
+        field_name
+        for field_name, is_explicit in {
+            "script": script is not None,
+            "script_repo_path": script_repo_path is not None,
+            "command": command is not None,
+            "function": function is not None,
+            "git_sha": git_sha is not None,
+            "git_branch": git_branch is not None,
+            "git_remote_name": git_remote_name is not None,
+            "git_remote_url": git_remote_url is not None,
+            "git_tags": git_tags is not None,
+            "git_describe": git_describe is not None,
+            "git_dirty": git_dirty is not None,
+            "source_code": any(value is not None for value in source_code_cli_values),
+        }.items()
+        if is_explicit
+    }
     selected_command = (
         command
         if command is not None
@@ -435,7 +494,15 @@ def _collect_post_hoc_provenance_from_sources(
         )
         if selected_source_code is not None:
             overrides["source_code"] = selected_source_code
-        return inputs, params, overrides
+        return (
+            inputs,
+            params,
+            _filter_runtime_inferred_overrides(
+                overrides,
+                answers=answers,
+                explicit_fields=explicit_override_fields,
+            ),
+        )
     command_tokens = (
         _parse_command_string(command)
@@ -475,7 +542,15 @@ def _collect_post_hoc_provenance_from_sources(
         "git_dirty": _provenance_value(git_dirty, answers, "git_dirty"),
         "source_code": selected_source_code,
     }
-    return selected_inputs or [], selected_params or {}, overrides
+    return (
+        selected_inputs or [],
+        selected_params or {},
+        _filter_runtime_inferred_overrides(
+            overrides,
+            answers=answers,
+            explicit_fields=explicit_override_fields,
+        ),
+    )
 def _documented_artifacts_from_answers(

data_annotations-2.7.0/src/data_annotations/cli_app/answers.py ADDED Viewed

@@ -0,0 +1,35 @@
+from data_annotations.annotations.answers import (
+    AnswersError,
+    BaseAnswers,
+    ChildBundleAnswers,
+    DirectoryAnswers,
+    DirectoryAnswersInput,
+    DirectoryArtifactAnswers,
+    DirectoryArtifactGroupAnswers,
+    FileAnswers,
+    FileAnswersInput,
+    ProvenanceAnswers,
+    check_answers,
+    load_directory_answers,
+    load_file_answers,
+    require_complete_directory_answers,
+    require_complete_file_answers,
+)
+__all__ = [
+    "AnswersError",
+    "BaseAnswers",
+    "ChildBundleAnswers",
+    "DirectoryAnswers",
+    "DirectoryAnswersInput",
+    "DirectoryArtifactAnswers",
+    "DirectoryArtifactGroupAnswers",
+    "FileAnswers",
+    "FileAnswersInput",
+    "ProvenanceAnswers",
+    "check_answers",
+    "load_directory_answers",
+    "load_file_answers",
+    "require_complete_directory_answers",
+    "require_complete_file_answers",
+]