agentversion 0.1.0__tar.gz → 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (129) hide show
  1. agentversion-0.2.0/.github/dependabot.yml +19 -0
  2. {agentversion-0.1.0 → agentversion-0.2.0}/.github/workflows/ci.yml +4 -4
  3. {agentversion-0.1.0 → agentversion-0.2.0}/.github/workflows/publish.yml +8 -5
  4. {agentversion-0.1.0 → agentversion-0.2.0}/CHANGELOG.md +93 -0
  5. {agentversion-0.1.0 → agentversion-0.2.0}/CONTRIBUTING.md +2 -4
  6. {agentversion-0.1.0 → agentversion-0.2.0}/PKG-INFO +141 -42
  7. agentversion-0.2.0/README.md +316 -0
  8. agentversion-0.2.0/RELEASING.md +63 -0
  9. {agentversion-0.1.0 → agentversion-0.2.0}/agentversion/__init__.py +12 -0
  10. agentversion-0.2.0/agentversion/a2a.py +123 -0
  11. {agentversion-0.1.0 → agentversion-0.2.0}/agentversion/cli.py +2 -1
  12. {agentversion-0.1.0 → agentversion-0.2.0}/agentversion/compatibility.py +3 -1
  13. agentversion-0.2.0/agentversion/contract.py +255 -0
  14. {agentversion-0.1.0 → agentversion-0.2.0}/agentversion/decision.py +18 -0
  15. {agentversion-0.1.0 → agentversion-0.2.0}/agentversion/diff.py +212 -21
  16. {agentversion-0.1.0 → agentversion-0.2.0}/agentversion/hasher.py +39 -1
  17. {agentversion-0.1.0 → agentversion-0.2.0}/agentversion/ids.py +13 -0
  18. {agentversion-0.1.0 → agentversion-0.2.0}/agentversion/manifest.py +39 -2
  19. {agentversion-0.1.0 → agentversion-0.2.0}/agentversion/validator.py +111 -4
  20. {agentversion-0.1.0 → agentversion-0.2.0}/compatibility-tests/output-schema-change/expected-diff.json +2 -2
  21. agentversion-0.2.0/examples/integrations/decimalai_bridge.py +100 -0
  22. agentversion-0.2.0/examples/integrations/langgraph_example.py +174 -0
  23. {agentversion-0.1.0 → agentversion-0.2.0}/examples/integrations/otel_mapping.md +10 -5
  24. agentversion-0.2.0/examples/scenarios/tool-rename-drift.md +71 -0
  25. agentversion-0.2.0/examples/scenarios/walkthrough.py +69 -0
  26. {agentversion-0.1.0 → agentversion-0.2.0}/pyproject.toml +24 -5
  27. {agentversion-0.1.0 → agentversion-0.2.0}/schemas/agent-manifest.schema.json +6 -7
  28. {agentversion-0.1.0 → agentversion-0.2.0}/schemas/compatibility-decision.schema.json +2 -0
  29. {agentversion-0.1.0 → agentversion-0.2.0}/schemas/manifest-diff.schema.json +2 -1
  30. agentversion-0.2.0/scripts/release.sh +70 -0
  31. agentversion-0.2.0/spec/a2a-mapping.md +50 -0
  32. {agentversion-0.1.0 → agentversion-0.2.0}/spec/attestation.md +1 -1
  33. agentversion-0.2.0/spec/behavioral-policy.md +51 -0
  34. {agentversion-0.1.0 → agentversion-0.2.0}/spec/compatibility-batch.md +1 -1
  35. {agentversion-0.1.0 → agentversion-0.2.0}/spec/compatibility-decision.md +2 -0
  36. {agentversion-0.1.0 → agentversion-0.2.0}/spec/compatibility-policy.md +29 -14
  37. {agentversion-0.1.0 → agentversion-0.2.0}/spec/diff.md +13 -5
  38. agentversion-0.2.0/spec/hashing.md +106 -0
  39. {agentversion-0.1.0 → agentversion-0.2.0}/spec/manifest.md +21 -11
  40. {agentversion-0.1.0 → agentversion-0.2.0}/spec/reference.md +36 -20
  41. {agentversion-0.1.0 → agentversion-0.2.0}/spec/refs.md +6 -17
  42. {agentversion-0.1.0 → agentversion-0.2.0}/spec/versioning-policy.md +3 -1
  43. agentversion-0.2.0/tests/fixtures/frozen_hash_vectors.json +126 -0
  44. agentversion-0.2.0/tests/test_a2a_mapping.py +86 -0
  45. {agentversion-0.1.0 → agentversion-0.2.0}/tests/test_audit_v020.py +96 -0
  46. agentversion-0.2.0/tests/test_audit_v030.py +245 -0
  47. agentversion-0.2.0/tests/test_behavioral_policy.py +70 -0
  48. {agentversion-0.1.0 → agentversion-0.2.0}/tests/test_cli.py +10 -3
  49. agentversion-0.2.0/tests/test_component_surface_map.py +37 -0
  50. agentversion-0.2.0/tests/test_contract.py +115 -0
  51. {agentversion-0.1.0 → agentversion-0.2.0}/tests/test_diff.py +53 -0
  52. agentversion-0.2.0/tests/test_examples_runnable.py +38 -0
  53. agentversion-0.2.0/tests/test_extension_hatch.py +42 -0
  54. agentversion-0.2.0/tests/test_frozen_hash_vectors.py +83 -0
  55. {agentversion-0.1.0 → agentversion-0.2.0}/tests/test_hasher.py +43 -0
  56. {agentversion-0.1.0 → agentversion-0.2.0}/tests/test_ids.py +27 -0
  57. {agentversion-0.1.0 → agentversion-0.2.0}/tests/test_validator.py +29 -0
  58. agentversion-0.1.0/.github/dependabot.yml +0 -13
  59. agentversion-0.1.0/README.md +0 -217
  60. agentversion-0.1.0/examples/integrations/langgraph_example.py +0 -187
  61. agentversion-0.1.0/examples/scenarios/tool-rename-drift.md +0 -90
  62. agentversion-0.1.0/spec/hashing.md +0 -64
  63. {agentversion-0.1.0 → agentversion-0.2.0}/.gitignore +0 -0
  64. {agentversion-0.1.0 → agentversion-0.2.0}/CONFORMANCE.md +0 -0
  65. {agentversion-0.1.0 → agentversion-0.2.0}/LICENSE +0 -0
  66. {agentversion-0.1.0 → agentversion-0.2.0}/adrs/0000-template.md +0 -0
  67. {agentversion-0.1.0 → agentversion-0.2.0}/adrs/0001-version-spec-core.md +0 -0
  68. {agentversion-0.1.0 → agentversion-0.2.0}/agentversion/_shared.py +0 -0
  69. {agentversion-0.1.0 → agentversion-0.2.0}/agentversion/constants.py +0 -0
  70. {agentversion-0.1.0 → agentversion-0.2.0}/agentversion/dataset.py +0 -0
  71. {agentversion-0.1.0 → agentversion-0.2.0}/agentversion/py.typed +0 -0
  72. {agentversion-0.1.0 → agentversion-0.2.0}/agentversion/refs.py +0 -0
  73. {agentversion-0.1.0 → agentversion-0.2.0}/agentversion/replay.py +0 -0
  74. {agentversion-0.1.0 → agentversion-0.2.0}/compatibility-tests/environment-region-change/after.json +0 -0
  75. {agentversion-0.1.0 → agentversion-0.2.0}/compatibility-tests/environment-region-change/before.json +0 -0
  76. {agentversion-0.1.0 → agentversion-0.2.0}/compatibility-tests/environment-region-change/expected-diff.json +0 -0
  77. {agentversion-0.1.0 → agentversion-0.2.0}/compatibility-tests/model-runtime-provider-change/after.json +0 -0
  78. {agentversion-0.1.0 → agentversion-0.2.0}/compatibility-tests/model-runtime-provider-change/before.json +0 -0
  79. {agentversion-0.1.0 → agentversion-0.2.0}/compatibility-tests/model-runtime-provider-change/expected-diff.json +0 -0
  80. {agentversion-0.1.0 → agentversion-0.2.0}/compatibility-tests/output-schema-change/after.json +0 -0
  81. {agentversion-0.1.0 → agentversion-0.2.0}/compatibility-tests/output-schema-change/before.json +0 -0
  82. {agentversion-0.1.0 → agentversion-0.2.0}/compatibility-tests/prompt-stack-edit/after.json +0 -0
  83. {agentversion-0.1.0 → agentversion-0.2.0}/compatibility-tests/prompt-stack-edit/before.json +0 -0
  84. {agentversion-0.1.0 → agentversion-0.2.0}/compatibility-tests/prompt-stack-edit/expected-diff.json +0 -0
  85. {agentversion-0.1.0 → agentversion-0.2.0}/compatibility-tests/skill-registry-skill-removed/after.json +0 -0
  86. {agentversion-0.1.0 → agentversion-0.2.0}/compatibility-tests/skill-registry-skill-removed/before.json +0 -0
  87. {agentversion-0.1.0 → agentversion-0.2.0}/compatibility-tests/skill-registry-skill-removed/expected-diff.json +0 -0
  88. {agentversion-0.1.0 → agentversion-0.2.0}/compatibility-tests/subagent-handoff-change/after.json +0 -0
  89. {agentversion-0.1.0 → agentversion-0.2.0}/compatibility-tests/subagent-handoff-change/before.json +0 -0
  90. {agentversion-0.1.0 → agentversion-0.2.0}/compatibility-tests/subagent-handoff-change/expected-diff.json +0 -0
  91. {agentversion-0.1.0 → agentversion-0.2.0}/compatibility-tests/tool-rename/after.json +0 -0
  92. {agentversion-0.1.0 → agentversion-0.2.0}/compatibility-tests/tool-rename/before.json +0 -0
  93. {agentversion-0.1.0 → agentversion-0.2.0}/compatibility-tests/tool-rename/expected-diff.json +0 -0
  94. {agentversion-0.1.0 → agentversion-0.2.0}/compatibility-tests/workflow-graph-change/after.json +0 -0
  95. {agentversion-0.1.0 → agentversion-0.2.0}/compatibility-tests/workflow-graph-change/before.json +0 -0
  96. {agentversion-0.1.0 → agentversion-0.2.0}/compatibility-tests/workflow-graph-change/expected-diff.json +0 -0
  97. {agentversion-0.1.0 → agentversion-0.2.0}/examples/.gitkeep +0 -0
  98. {agentversion-0.1.0 → agentversion-0.2.0}/examples/manifest/finance-agent-v1.json +0 -0
  99. {agentversion-0.1.0 → agentversion-0.2.0}/examples/manifest/finance-agent-v2.json +0 -0
  100. {agentversion-0.1.0 → agentversion-0.2.0}/pyrightconfig.json +0 -0
  101. {agentversion-0.1.0 → agentversion-0.2.0}/schemas/.gitkeep +0 -0
  102. {agentversion-0.1.0 → agentversion-0.2.0}/schemas/compatibility-batch.schema.json +0 -0
  103. {agentversion-0.1.0 → agentversion-0.2.0}/schemas/compatibility-policy.schema.json +0 -0
  104. {agentversion-0.1.0 → agentversion-0.2.0}/schemas/compatibility-report.schema.json +0 -0
  105. {agentversion-0.1.0 → agentversion-0.2.0}/schemas/dataset-snapshot.schema.json +0 -0
  106. {agentversion-0.1.0 → agentversion-0.2.0}/schemas/episode.schema.json +0 -0
  107. {agentversion-0.1.0 → agentversion-0.2.0}/schemas/replay-job.schema.json +0 -0
  108. {agentversion-0.1.0 → agentversion-0.2.0}/schemas/replay-result.schema.json +0 -0
  109. {agentversion-0.1.0 → agentversion-0.2.0}/schemas/step.schema.json +0 -0
  110. {agentversion-0.1.0 → agentversion-0.2.0}/schemas/task.schema.json +0 -0
  111. {agentversion-0.1.0 → agentversion-0.2.0}/spec/data-classification.md +0 -0
  112. {agentversion-0.1.0 → agentversion-0.2.0}/spec/dataset.md +0 -0
  113. {agentversion-0.1.0 → agentversion-0.2.0}/spec/environment.md +0 -0
  114. {agentversion-0.1.0 → agentversion-0.2.0}/spec/evaluation.md +0 -0
  115. {agentversion-0.1.0 → agentversion-0.2.0}/spec/ids.md +0 -0
  116. {agentversion-0.1.0 → agentversion-0.2.0}/spec/lifecycle.md +0 -0
  117. {agentversion-0.1.0 → agentversion-0.2.0}/spec/otel-mapping.md +0 -0
  118. {agentversion-0.1.0 → agentversion-0.2.0}/spec/replay-determinism.md +0 -0
  119. {agentversion-0.1.0 → agentversion-0.2.0}/spec/replay.md +0 -0
  120. {agentversion-0.1.0 → agentversion-0.2.0}/tests/test_conformance.py +0 -0
  121. {agentversion-0.1.0 → agentversion-0.2.0}/tests/test_dataset.py +0 -0
  122. {agentversion-0.1.0 → agentversion-0.2.0}/tests/test_decision_replay.py +0 -0
  123. {agentversion-0.1.0 → agentversion-0.2.0}/tests/test_environment.py +0 -0
  124. {agentversion-0.1.0 → agentversion-0.2.0}/tests/test_evaluation.py +0 -0
  125. {agentversion-0.1.0 → agentversion-0.2.0}/tests/test_lifecycle.py +0 -0
  126. {agentversion-0.1.0 → agentversion-0.2.0}/tests/test_manifest.py +0 -0
  127. {agentversion-0.1.0 → agentversion-0.2.0}/tests/test_refs.py +0 -0
  128. {agentversion-0.1.0 → agentversion-0.2.0}/tests/test_reproducible_replay.py +0 -0
  129. {agentversion-0.1.0 → agentversion-0.2.0}/tests/test_trust_observability.py +0 -0
@@ -0,0 +1,19 @@
1
+ version: 2
2
+ updates:
3
+ - package-ecosystem: "pip"
4
+ directory: "/"
5
+ schedule:
6
+ interval: "weekly"
7
+ open-pull-requests-limit: 3
8
+
9
+ - package-ecosystem: "github-actions"
10
+ directory: "/"
11
+ schedule:
12
+ interval: "weekly"
13
+ open-pull-requests-limit: 3
14
+ ignore:
15
+ # CI audit 2026-06-21: skip MAJOR github-actions bumps — they churn + break
16
+ # (actions/checkout@v7 broke docs CI). Take majors deliberately + tested;
17
+ # minor/patch (incl. security) still auto-PR.
18
+ - dependency-name: "*"
19
+ update-types: ["version-update:semver-major"]
@@ -18,9 +18,9 @@ jobs:
18
18
  python-version: ["3.10", "3.11", "3.12"]
19
19
 
20
20
  steps:
21
- - uses: actions/checkout@v4
21
+ - uses: actions/checkout@v6
22
22
 
23
- - uses: actions/setup-python@v5
23
+ - uses: actions/setup-python@v6
24
24
  with:
25
25
  python-version: ${{ matrix.python-version }}
26
26
  cache: pip
@@ -41,9 +41,9 @@ jobs:
41
41
  name: Lint
42
42
  runs-on: ubuntu-latest
43
43
  steps:
44
- - uses: actions/checkout@v4
44
+ - uses: actions/checkout@v6
45
45
 
46
- - uses: actions/setup-python@v5
46
+ - uses: actions/setup-python@v6
47
47
  with:
48
48
  python-version: "3.12"
49
49
  cache: pip
@@ -27,9 +27,9 @@ jobs:
27
27
  matrix:
28
28
  python-version: ["3.10", "3.11", "3.12"]
29
29
  steps:
30
- - uses: actions/checkout@v4
30
+ - uses: actions/checkout@v6
31
31
 
32
- - uses: actions/setup-python@v5
32
+ - uses: actions/setup-python@v6
33
33
  with:
34
34
  python-version: ${{ matrix.python-version }}
35
35
 
@@ -46,9 +46,9 @@ jobs:
46
46
  environment: pypi
47
47
 
48
48
  steps:
49
- - uses: actions/checkout@v4
49
+ - uses: actions/checkout@v6
50
50
 
51
- - uses: actions/setup-python@v5
51
+ - uses: actions/setup-python@v6
52
52
  with:
53
53
  python-version: "3.12"
54
54
 
@@ -69,4 +69,7 @@ jobs:
69
69
  echo "Version match: $PKG_VERSION"
70
70
 
71
71
  - name: Publish to PyPI
72
- uses: pypa/gh-action-pypi-publish@release/v1
72
+ # REL-3 (2026-06-20 audit): SHA-pinned (was the mutable @release/v1 branch ref).
73
+ # This job holds OIDC id-token:write → PyPI publish authority; a mutable ref would
74
+ # run whatever HEAD of that branch is at publish time. Dependabot maintains the pin.
75
+ uses: pypa/gh-action-pypi-publish@cef221092ed1bacb1cc03d23a2d87d1d172e277b # release/v1
@@ -7,6 +7,99 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
 
8
8
  > **Package version ≠ spec version.** This file tracks the **package** version. The on-the-wire `spec_version` is independent and frozen at `1.0.0`; a pre-1.0 package can implement a stable 1.0 spec, which is exactly the situation today.
9
9
 
10
+ ## [0.2.0] - 2026-06-24
11
+
12
+ ### Added
13
+ - **`behavioral_policy` contract surface** — a first-class surface for a multi-turn agent's behavioral
14
+ policy (the rules it holds across turns: a refund/escalation policy, etc.), bound to skillevaluation's
15
+ conversation-mode `policy_check`. A change to the RULES diffs as **breaking** → `replay`/`drop`, where
16
+ previously a policy flip lived only in the prompt-stack hash and read `non_breaking` → `keep`, silently
17
+ retaining a now-invalid conversation eval set. Optional + omitted by default, so existing manifests'
18
+ `overall_hash` is **unchanged**. Reason code `behavioral_policy_changed`; spec `spec/behavioral-policy.md`.
19
+ - **A2A Agent Card mapping** (`a2a.manifest_to_agent_card`, exported): project a manifest onto an
20
+ [A2A (Agent2Agent)](https://a2a-protocol.org/) Agent Card, stamping the manifest's identity
21
+ (`manifest_id` + `overall_hash`) under an `x-agentversion` provenance block so a card consumer can pin
22
+ the exact versioned contract it describes. Positions AgentVersion as the version/diff/provenance layer
23
+ **on top of** the A2A interop standard rather than a competing descriptor. Spec: `spec/a2a-mapping.md`.
24
+ - **Extension hatch** (`AgentContract` is now `extra="allow"`): a custom / emerging contract surface
25
+ (RAG corpus, MCP server registry, memory policy, vendor extension) is hashed by the hasher and diffed
26
+ by the engine, and now also **survives `model_validate`** — previously it was silently dropped, so a
27
+ validate→re-serialize→re-hash round-trip changed `overall_hash` (a moat-breaking non-determinism). The
28
+ surface set can grow without forking the model. ASCII/known-surface hashes are unchanged.
29
+ - **`COMPONENT_TYPE_TO_SURFACE` + `surface_key_for_component()`** — the canonical routing from a
30
+ producer's flat `component_type` to the contract surface key it lands in (incl. the singular→plural
31
+ `guardrail`→`guardrails` rename). Exported so a producer (the SDK exporter) and a consumer (a diff
32
+ translator) share one source of truth instead of hand-copying the map — closing a cross-stack drift
33
+ class where the rename had to be applied independently in every translator copy.
34
+ - **`contract.contract_from_components()`** — the single source of truth for assembling a contract block
35
+ (every surface, in canonical shape) from a producer's flat component list. Both the DecimalAI SDK
36
+ exporter and the platform's hash path route through it, so they compute the *same* `jcs-sha256`
37
+ identity hash for the same agent — the platform can now make the canonical hash authoritative and a
38
+ customer reproduces the stored hash with the OSS CLI. Closes audit X-2 (two hand-written translators
39
+ with no shared code/test).
40
+
41
+ ### Fixed (hash-determinism + trust)
42
+ - **Canonical-hash domain** (`hasher.py`): NFC-normalize every string (keys + values) and reject
43
+ non-finite floats (`NaN`/`±Infinity`) **before** JCS canonicalization. JCS canonicalizes bytes but
44
+ does not Unicode-normalize, so a composed `"café"` and a decomposed `"café"` previously produced a
45
+ different `overall_hash` for the same agent (a cross-language reproduction break in the moat); and a
46
+ `NaN` raised an opaque `jcs` error instead of a clear domain error. NFC of ASCII is a no-op, so
47
+ **existing manifest hashes are unchanged** (frozen vectors still pass). Documented in `spec/hashing.md`.
48
+ - **Attestation integrity** (`validator.py`): an attestation's `signed_payload_hash` must equal the
49
+ manifest's declared `overall_hash` (error on mismatch) — the no-crypto linkage that proves the
50
+ attestation covers *this* manifest. Previously the envelope was parsed but never checked, so a
51
+ copy-pasted/tampered attestation rode along inertly. Cryptographic signature verification remains
52
+ explicitly delegated to a verifier (out of scope for the validator).
53
+
54
+ ### Fixed (correctness audit)
55
+ - **Schema ↔ code drift on `behavioral_policy`** — the bundled `manifest-diff.schema.json` surface enum
56
+ was missing `behavioral_policy`, and neither `decision.REASON_CODES` nor `compatibility-decision.schema.json`
57
+ listed `behavioral_policy_changed`, even though the diff/compat code **emits** both. A diff or decision
58
+ touching the policy surface therefore failed its own bundled schema. Both schemas + `REASON_CODES` now
59
+ include them. (`spec_version` stays `1.0.0` — these are compatible additions to an already-declared surface.)
60
+ - **`model_runtime` reason code** — a model swap mapped to `prompt_policy_changed` (there was no model
61
+ reason code at all). Added `model_runtime_changed` to `REASON_CODES` + the decision schema, and remapped
62
+ the `model_runtime` surface to it.
63
+ - **`output_contract` severity tiers corrected to match the normative spec** (`spec/compatibility-policy.md`):
64
+ was format→moderate, schema→major, strict→(no bump); now **format→minor, schema→moderate, strict→major**
65
+ (a strict-mode flip newly *rejects* previously-valid outputs, so it is the most consumer-breaking). A
66
+ `strict` change is now also `breaking`. The `output-schema-change` conformance fixture's expected severity
67
+ was corrected `major`→`moderate` accordingly: the fixture encoded the buggy output, so this is a defect
68
+ correction, not a change to a stable conformance scenario.
69
+ - **Asymmetric add/remove diffs** — a surface that *appeared* or *vanished* bypassed its dedicated
70
+ classifier and got a flat generic verdict (add→moderate/non_breaking, remove→breaking/major), so
71
+ `diff(A,B)` was not the inverse of `diff(B,A)` (e.g. an added `output_contract{strict:true}` read as a
72
+ bland "moderate" instead of major). Add/remove now route through the dedicated classifier against an
73
+ empty sentinel, so a surface appearing/vanishing is severitied by the same logic as an in-place change.
74
+ Introducing a `behavioral_policy` stays additive (non_breaking); removing one stays breaking.
75
+ - **Model-family extraction missed compact date stamps** — `_extract_model_family` stripped dashed dates
76
+ (`gpt-4o-2024-08-06`) but not compact 8-digit Anthropic dates (`claude-3-5-sonnet-20241022`), so a routine
77
+ model date-rev read as a *different family* → spurious `major`/`replay`. Now strips `\d{8}` (and applies
78
+ iteratively); genuine family changes (`gpt-4` vs `gpt-4o`) are preserved.
79
+ - **Validator hardening** — `CompatibilityDecision.reason_codes` are now validated against `REASON_CODES`
80
+ (the schema enforced this on the wire; the model didn't); `validate_manifest(..., check_schema=True)` is a
81
+ new opt-in pass that validates against `agent-manifest.schema.json` (catches unknown top-level keys the
82
+ Pydantic models silently drop); a manifest whose contract can't be canonically hashed (e.g. a non-finite
83
+ float) is now an **error** (`hash_uncomputable`), not a soft warning that still validated.
84
+ - **Lower-severity:** unnamed subagents are keyed by content hash rather than list position (removing one of
85
+ several no longer reads as a positional rename); `prompt_severity` boundary at exactly 5.0% is now `minor`
86
+ to match the spec's `≤5%` (was `moderate`).
87
+
88
+ ### Documentation & packaging
89
+ - **The wheel now bundles `spec/`, `examples/`, and `compatibility-tests/`** (previously only `schemas/`), so a
90
+ `pip install`ed user actually has the files the README points at. Dropped the false `pytest --pyargs agentversion`
91
+ claim (tests aren't bundled) in favor of a clone-and-test snippet.
92
+ - **New `examples/integrations/decimalai_bridge.py`** and a README **"From the DecimalAI SDK"** section showing
93
+ the `decimalai.export_manifest(snap)` → `agentversion diff` round-trip — the seam that makes agentversion the
94
+ open core of the paid platform was previously undocumented. The `decimalai-python` README now references it too.
95
+ - **Fixed the README** `evaluation.gates[]` example (was missing the required `ran_at` → failed validation on
96
+ copy-paste) and regenerated the "See it in action" diff table to match current output (the `environment` row now
97
+ shows real field-level changes instead of a bland "environment added").
98
+ - **`langgraph_example.py`** rewritten to compute real content hashes (was emitting placeholder `sha256:extract_from_*`
99
+ strings) and to actually validate; **`examples/scenarios/walkthrough.py`** is a new runnable, test-covered version
100
+ of the tool-rename drift scenario. A new smoke test runs every example so they can't bit-rot.
101
+ - Stale-terminology fixes: `compatibility-batch.md` example id `rcb_` → `cbt_`.
102
+
10
103
  ## [0.1.0] - 2026-05-29
11
104
 
12
105
  **First published release** — the first `agentversion` release on PyPI.
@@ -57,11 +57,9 @@ The conformance scenarios (`tests/test_conformance.py`) are non-negotiable. If y
57
57
 
58
58
  ## Releases
59
59
 
60
- The maintainer cuts releases. The flow is:
60
+ The maintainer cuts releases — see [`RELEASING.md`](./RELEASING.md) for the full runbook.
61
61
 
62
- 1. PR with the version bump in `pyproject.toml` + a `CHANGELOG.md` entry.
63
- 2. Tag `vX.Y.Z` on `main`.
64
- 3. CI publishes to PyPI via trusted publishing.
62
+ Today releases are published manually with `./scripts/release.sh` (local build + `twine` upload to PyPI). Once the repository is public, the target is tag-triggered CI via [trusted publishing](https://docs.pypi.org/trusted-publishers/), so no token is stored.
65
63
 
66
64
  ## License
67
65
 
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: agentversion
3
- Version: 0.1.0
3
+ Version: 0.2.0
4
4
  Summary: An open specification for versioning agent runtimes and keeping datasets valid.
5
5
  Project-URL: Homepage, https://github.com/decimal-labs/agentversion
6
6
  Project-URL: Documentation, https://github.com/decimal-labs/agentversion/tree/main/spec
@@ -22,30 +22,38 @@ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
22
22
  Classifier: Topic :: Software Development :: Libraries :: Python Modules
23
23
  Requires-Python: >=3.10
24
24
  Requires-Dist: click>=8.0
25
- Requires-Dist: jcs>=0.2
25
+ Requires-Dist: jcs<1,>=0.2
26
26
  Requires-Dist: pydantic>=2.0
27
27
  Requires-Dist: rich>=13.0
28
28
  Provides-Extra: dev
29
29
  Requires-Dist: jsonschema>=4.0; extra == 'dev'
30
- Requires-Dist: mypy>=1.0; extra == 'dev'
30
+ Requires-Dist: mypy<2.2,>=1.0; extra == 'dev'
31
31
  Requires-Dist: pytest-cov>=4.0; extra == 'dev'
32
32
  Requires-Dist: pytest>=7.0; extra == 'dev'
33
- Requires-Dist: ruff>=0.4; extra == 'dev'
33
+ Requires-Dist: ruff<0.16,>=0.15; extra == 'dev'
34
34
  Description-Content-Type: text/markdown
35
35
 
36
36
  # AgentVersion
37
37
 
38
38
  **Your agent changed. Is your saved data still valid?**
39
39
 
40
- `agentversion` turns an agent version into a diffable, hashable contract — so when prompts, tools, models, or graphs change, you know exactly what broke and which traces, eval sets, and training data survived.
41
-
42
- [![CI](https://github.com/decimal-labs/agentversion/actions/workflows/ci.yml/badge.svg)](https://github.com/decimal-labs/agentversion/actions/workflows/ci.yml)
43
40
  [![PyPI](https://img.shields.io/pypi/v/agentversion)](https://pypi.org/project/agentversion/)
44
41
  [![Python](https://img.shields.io/badge/python-3.10%2B-blue)](https://pypi.org/project/agentversion/)
45
- [![Spec](https://img.shields.io/badge/spec-v1.0-success)](https://github.com/decimal-labs/agentversion/tree/main/spec)
46
- [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://github.com/decimal-labs/agentversion/blob/main/LICENSE)
42
+ [![Spec](https://img.shields.io/badge/spec-v1.0-success)](https://pypi.org/project/agentversion/)
43
+ [![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://pypi.org/project/agentversion/)
44
+
45
+ When you ship a new version of an agent, everything you collected against the old one — production traces, eval datasets, SFT (supervised fine-tuning) examples — quietly drifts out of date. There's no `package.json` to pin an agent's contract, and no `git diff` to tell you what changed.
47
46
 
48
- When you ship a new version of an agent, everything you collected against the old one — production traces, eval datasets, SFT examples — quietly drifts out of date. There's no `package.json` to pin an agent's contract, and no `git diff` to tell you what changed. `agentversion` is that missing format: a JSON **manifest** describing an agent version, a **diff** that classifies every change as breaking or non-breaking, and a **compatibility decision** that tells you whether to keep, repair, replay, or drop your old data.
47
+ `agentversion` is that missing format. Three steps, one per noun:
48
+
49
+ ```
50
+ manifest → diff → compatibility decision
51
+ (what an (what (what to do with the data
52
+ agent changed, you already collected:
53
+ version is) per surface) keep / repair / replay / drop)
54
+ ```
55
+
56
+ A **surface** is one independently-versioned part of the agent — its prompts, its tools, its model, its graph, its output format — each hashed on its own, so any change can be pinned to exactly one of them. A **diff** classifies each changed surface as breaking or non-breaking; a **compatibility decision** turns that into a per-data verdict.
49
57
 
50
58
  It's a dependency-light Python package with a CLI — and an open spec any tool can implement.
51
59
 
@@ -64,14 +72,20 @@ $ agentversion diff finance-agent-v1.json finance-agent-v2.json --compat
64
72
  ┏━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
65
73
  ┃ Surface ┃ Change Type ┃ Details ┃
66
74
  ┡━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
67
- │ environment │ non_breaking │ environment added
75
+ │ environment │ non_breaking │ deployment_id: None → 'prod-east-1'
76
+ │ │ │ region: None → 'us-east-1' │
77
+ │ │ │ infra_image_hash: None → │
78
+ │ │ │ 'sha256:img2img2img2img2img2img2img2img2img2img2im… │
79
+ │ │ │ runtime_versions added=app-runtime,python │
80
+ │ │ │ external_service_pins changed │
81
+ │ │ │ resource_limits changed │
68
82
  │ model_runtime │ breaking │ provider: 'google' → 'openai' │
69
83
  │ │ │ runtime_version: 'app-runtime@1.5.0' → │
70
84
  │ │ │ 'app-runtime@1.8.2' │
71
85
  │ │ │ envelope changed │
72
86
  │ output_contract │ breaking │ format: 'text' → 'json' │
73
- │ │ │ strict: False → True │
74
87
  │ │ │ output schema changed │
88
+ │ │ │ strict: False → True │
75
89
  │ prompt_stack │ non_breaking │ system_prompt hash changed │
76
90
  │ │ │ developer_prompt hash changed │
77
91
  │ subagents │ breaking │ subagents added: ['finance_subagent', │
@@ -96,7 +110,45 @@ $ agentversion diff finance-agent-v1.json finance-agent-v2.json --compat
96
110
 
97
111
  Between v1 and v2 the team swapped the model (Google → OpenAI), renamed a tool, added two subagents, and switched to strict JSON output. `agentversion` caught all five breaking surfaces and told you the old traces need a replay — not a guess, a classification you can gate CI on.
98
112
 
99
- > Try it yourselfboth manifests live in [`examples/manifest/`](https://github.com/decimal-labs/agentversion/tree/main/examples/manifest).
113
+ The recommendation is one of **four verdicts** what to do with each piece of data you collected against the old version:
114
+
115
+ | Verdict | What it means | Typical trigger |
116
+ |---|---|---|
117
+ | `keep` | Still valid as-is. | Only non-breaking surfaces changed. |
118
+ | `repair` | Salvageable with a transform — patch it, don't re-run the agent. | A recoverable output-contract change (the bundled default rules emit `repair` only for output-contract-only breaks). |
119
+ | `replay` | Re-run it through the new version for fresh outputs. | A breaking surface (tool, model, workflow) makes old *outputs* untrustworthy but the *inputs* still apply. |
120
+ | `drop` | No longer usable — discard it. | The inputs themselves no longer apply. (`drop` comes from a custom policy, not the default `diff --compat` rules.) |
121
+
122
+ In the demo above, five breaking surfaces (model swap, tool rename, new subagents, strict-JSON output, new graph) make the old *outputs* stale — but the old *inputs* still apply — so the verdict is `replay`.
123
+
124
+ ### What a manifest looks like
125
+
126
+ A manifest is plain JSON. The top says *which* version this is; `contract` holds one entry per **surface** — exactly the rows you saw in the diff above:
127
+
128
+ ```jsonc
129
+ {
130
+ "agent_name": "finance-agent",
131
+ "version_label": "2026-03-01.prod.1",
132
+ "identity": {
133
+ "overall_hash": "sha256:47301b25...", // stable id for this whole version
134
+ "hash_algorithm": "jcs-sha256"
135
+ },
136
+ "contract": {
137
+ "prompt_stack": { "system_prompt": { "version": "8", "hash": "sha256:aaa1..." }, "...": "..." },
138
+ "model_runtime": { "provider": "google", "model": "gemini-2.0-flash", "...": "..." },
139
+ "tool_registry": { "registry_version": "5", "tools": [ /* get_market_cap, search_population */ ] },
140
+ "workflow": { "graph_name": "finance-simple-graph", "graph_version": "3", "...": "..." },
141
+ "subagents": [],
142
+ "output_contract": { "format": "text", "strict": false, "...": "..." },
143
+ "guardrails": { "bundle_version": "3", "...": "..." },
144
+ "context_config": { "retrieval_config_version": "5", "...": "..." }
145
+ }
146
+ }
147
+ ```
148
+
149
+ Each surface is hashed on its own, so the diff can say *"`tool_registry` changed, `prompt_stack` didn't"* instead of just *"the manifest changed."*
150
+
151
+ > Try it yourself — both [`examples/manifest/`](https://pypi.org/project/agentversion/) manifests ship inside the `agentversion` wheel.
100
152
 
101
153
  ---
102
154
 
@@ -110,7 +162,7 @@ You probably already have observability and a trace store. None of them answer *
110
162
  | A2A / ACP agent cards | runtime discovery + I/O types | version identity or data-compatibility |
111
163
  | OpenAI JSONL / SFT files | a training format | provenance — *which agent version* produced each row |
112
164
 
113
- **Isn't this A2A?** No — and they compose. A2A and ACP answer *"how does Agent A discover and talk to Agent B?"*. `agentversion` answers *"what changed in this agent, and what does that mean for my data?"*. An A2A Agent Card can carry an `agentversion` manifest hash so you know both at once.
165
+ **Isn't this A2A?** No — and they compose. A2A and ACP (the Agent-to-Agent and Agent Communication protocols) answer *"how does Agent A discover and talk to Agent B?"*. `agentversion` answers *"what changed in this agent, and what does that mean for my data?"*. An A2A Agent Card can carry an `agentversion` manifest hash so you know both at once.
114
166
 
115
167
  ---
116
168
 
@@ -120,30 +172,48 @@ You probably already have observability and a trace store. None of them answer *
120
172
  pip install agentversion
121
173
  ```
122
174
 
123
- Apache-2.0, no config — just needs Python 3.10+. It implements the frozen **v1.0 spec**, but the Python package itself is early: `0.1.0`, pre-1.0, with the API still settling.
175
+ Apache-2.0, no config — just needs Python 3.10+.
176
+
177
+ There are **two version numbers**, deliberately different:
178
+
179
+ - the **wire spec** is frozen at **v1.0** (stable format + conformance suite — safe to build against);
180
+ - this **Python package** is **0.1.0** — pre-1.0, so its API may still shift.
181
+
182
+ (The `spec-v1.0` and PyPI badges above show each one.)
124
183
 
125
184
  ## Quickstart
126
185
 
127
- **Diff two versions** (table by default; add `--json` for machine output, `--compat` for a keep/repair/replay/drop recommendation):
186
+ First five minutes: **init hash validate diff gate in CI**.
187
+
188
+ **1. Scaffold a manifest** for your agent (interactive):
128
189
 
129
190
  ```bash
130
- agentversion diff old-manifest.json new-manifest.json --compat
191
+ agentversion init
131
192
  ```
132
193
 
133
- **Gate breaking changes in CI** `--fail-on-breaking` exits non-zero when any surface is breaking:
194
+ **2. Get its stable id and check it's valid:**
134
195
 
135
- ```yaml
136
- # .github/workflows/agent.yml
137
- - name: Block breaking agent changes
138
- run: agentversion diff baseline-manifest.json current-manifest.json --fail-on-breaking
196
+ ```bash
197
+ agentversion hash manifest.json # a content hash that ignores key order and
198
+ # whitespace, so the same agent always hashes the
199
+ # same id (JCS-SHA256 = JSON Canonicalization
200
+ # Scheme + SHA-256)
201
+ agentversion validate manifest.json # check it against the spec
139
202
  ```
140
203
 
141
- **Scaffold, hash, and validate** a manifest:
204
+ **3. Diff two versions** runnable right now against the bundled examples (`--compat` adds the keep/repair/replay/drop recommendation; `--json` for machine output):
142
205
 
143
206
  ```bash
144
- agentversion init # interactively create a manifest
145
- agentversion hash manifest.json # canonical JCS-SHA256 identity hash
146
- agentversion validate manifest.json # check it against the spec
207
+ agentversion diff examples/manifest/finance-agent-v1.json \
208
+ examples/manifest/finance-agent-v2.json --compat
209
+ ```
210
+
211
+ **4. Gate breaking changes in CI** — `--fail-on-breaking` exits non-zero when any surface is breaking:
212
+
213
+ ```yaml
214
+ # .github/workflows/agent.yml
215
+ - name: Block breaking agent changes
216
+ run: agentversion diff baseline-manifest.json current-manifest.json --fail-on-breaking
147
217
  ```
148
218
 
149
219
  **Use it from Python** — every line below is exercised by the test suite:
@@ -197,7 +267,7 @@ The protocol is fully useful standalone:
197
267
 
198
268
  1. **Track versions locally** — `init` to scaffold, `hash` for a stable id, `diff` between any two. No account, fully offline.
199
269
  2. **Gate CI/CD** — `diff --fail-on-breaking` stops a breaking agent change from reaching production.
200
- 3. **Annotate traces** — stamp `identity.overall_hash` onto your OpenTelemetry spans as `agentversion.manifest_hash` for version-scoped filtering. See [`examples/integrations/otel_mapping.md`](https://github.com/decimal-labs/agentversion/blob/main/examples/integrations/otel_mapping.md).
270
+ 3. **Annotate traces** — stamp `identity.overall_hash` onto your OpenTelemetry spans as `agentversion.manifest_hash` for version-scoped filtering. See [`examples/integrations/otel_mapping.md`](https://pypi.org/project/agentversion/), bundled in the package.
201
271
  4. **Classify data compatibility** — `diff --compat` (or `decision generate`) gives a per-episode keep / repair / replay / drop verdict you can act on.
202
272
 
203
273
  It interoperates with LangSmith, Langfuse, Phoenix, and W&B — annotate their traces/datasets with a manifest hash, or read/write compatibility decisions alongside your eval pipeline.
@@ -208,13 +278,13 @@ It interoperates with LangSmith, Langfuse, Phoenix, and W&B — annotate their t
208
278
 
209
279
  `agentversion` is an open spec so any tool, in any language, can produce interoperable manifests and diffs:
210
280
 
211
- - [`spec/manifest.md`](https://github.com/decimal-labs/agentversion/blob/main/spec/manifest.md) — the agent manifest
212
- - [`spec/diff.md`](https://github.com/decimal-labs/agentversion/blob/main/spec/diff.md) — surface diffs, breaking vs non-breaking
213
- - [`spec/compatibility-decision.md`](https://github.com/decimal-labs/agentversion/blob/main/spec/compatibility-decision.md) — keep / repair / replay / drop
214
- - [`spec/replay.md`](https://github.com/decimal-labs/agentversion/blob/main/spec/replay.md) · [`spec/dataset.md`](https://github.com/decimal-labs/agentversion/blob/main/spec/dataset.md) — replay jobs and dataset objects with provenance
215
- - [`spec/reference.md`](https://github.com/decimal-labs/agentversion/blob/main/spec/reference.md) — full schemas and validation rules · [`schemas/`](https://github.com/decimal-labs/agentversion/tree/main/schemas) — JSON Schemas
281
+ - [`spec/manifest.md`](https://pypi.org/project/agentversion/) — the agent manifest
282
+ - [`spec/diff.md`](https://pypi.org/project/agentversion/) — surface diffs, breaking vs non-breaking
283
+ - [`spec/compatibility-decision.md`](https://pypi.org/project/agentversion/) — keep / repair / replay / drop
284
+ - [`spec/replay.md`](https://pypi.org/project/agentversion/) · [`spec/dataset.md`](https://pypi.org/project/agentversion/) — replay jobs and dataset objects with provenance
285
+ - [`spec/reference.md`](https://pypi.org/project/agentversion/) — full schemas and validation rules · [`schemas/`](https://pypi.org/project/agentversion/) — JSON Schemas
216
286
 
217
- [`CONFORMANCE.md`](https://github.com/decimal-labs/agentversion/blob/main/CONFORMANCE.md) + [`compatibility-tests/`](https://github.com/decimal-labs/agentversion/tree/main/compatibility-tests) are golden in/out pairs that any implementation must reproduce to claim conformance.
287
+ The full spec and JSON Schemas ship inside the `agentversion` wheel. [`CONFORMANCE.md`](https://pypi.org/project/agentversion/) + [`compatibility-tests/`](https://pypi.org/project/agentversion/) are golden in/out pairs that any implementation must reproduce to claim conformance.
218
288
 
219
289
  ---
220
290
 
@@ -226,27 +296,56 @@ A manifest can carry the eval results that gated its release in `evaluation.gate
226
296
  {
227
297
  "evaluation": {
228
298
  "gates": [
229
- { "name": "regression-suite", "threshold": 0.95, "actual_score": 0.972, "passed": true }
299
+ { "name": "regression-suite", "threshold": 0.95, "actual_score": 0.972,
300
+ "passed": true, "ran_at": "2026-03-05T14:00:00Z" }
230
301
  ]
231
302
  }
232
303
  }
233
304
  ```
234
305
 
235
- Those scores come from [`skillevaluation`](https://github.com/decimal-labs/skillevaluation), the sibling open spec for A/B benchmarking skills. `agentversion` records *what an agent version is*; `skillevaluation` measures *whether it's better*.
306
+ Those scores come from [`skillevaluation`](https://pypi.org/project/skillevaluation/), the sibling open spec for A/B benchmarking skills. `agentversion` records *what an agent version is*; `skillevaluation` measures *whether it's better*.
236
307
 
237
- The [`decimalai`](https://github.com/decimal-labs/decimalai-python) Python SDK builds on `agentversion` to add framework adapters (capture a manifest straight from your LangGraph/CrewAI app), trace capture, and managed replay — but you never need it to use the spec.
308
+ The [`decimalai`](https://pypi.org/project/decimalai/) Python SDK builds on `agentversion` to add framework adapters (capture a manifest straight from your LangGraph/CrewAI app), trace capture, and managed replay — but you never need it to use the spec.
309
+
310
+ ---
311
+
312
+ ## From the DecimalAI SDK
313
+
314
+ If you use the [`decimalai`](https://pypi.org/project/decimalai/) SDK you don't hand-write manifests — it captures one straight from your running agent, and `export_manifest` hands it to the OSS tooling here:
315
+
316
+ ```python
317
+ import decimalai
318
+ from decimalai.schema.manifest import extract_from_config
319
+ from agentversion.diff import diff_manifests
320
+ from agentversion.compatibility import classify_compatibility
321
+
322
+ # Capture a manifest from your agent's config (or a framework adapter)…
323
+ snap = extract_from_config(
324
+ agent_name="support-agent",
325
+ prompts={"system": "You are a helpful support assistant."},
326
+ models={"default": {"provider": "openai", "model": "gpt-4o"}},
327
+ )
328
+ manifest = decimalai.export_manifest(snap) # → an agentversion manifest dict
329
+
330
+ # …then this package takes over: diff vs your last prod manifest, gate in CI.
331
+ diff = diff_manifests(last_prod_manifest, manifest)
332
+ print(classify_compatibility(diff).recommended_decision)
333
+ ```
334
+
335
+ This is the seam that makes `agentversion` the **open core** of the paid platform: the manifest the SDK captures *is* the format `agentversion diff` consumes, so you can reproduce the platform's diffs and verdicts entirely outside DecimalAI. A runnable version is in [`examples/integrations/decimalai_bridge.py`](https://pypi.org/project/agentversion/).
238
336
 
239
337
  ---
240
338
 
241
339
  ## Project
242
340
 
243
- The **spec** is stable at v1.0 frozen wire format and conformance suite. The **package** is `0.1.0`: pre-1.0 under semantic versioning, so the Python API may still shift before it catches up. Design decisions are logged in [`adrs/`](https://github.com/decimal-labs/agentversion/tree/main/adrs), releases in [`CHANGELOG.md`](https://github.com/decimal-labs/agentversion/blob/main/CHANGELOG.md). Contributions — especially new conformance cases — are genuinely welcome; see [`CONTRIBUTING.md`](https://github.com/decimal-labs/agentversion/blob/main/CONTRIBUTING.md):
341
+ The spec is frozen at v1.0; the package is pre-1.0 (see [Install](#install)). Design decisions are logged in [`adrs/`](https://pypi.org/project/agentversion/), releases in [`CHANGELOG.md`](https://pypi.org/project/agentversion/). Contributions — especially new conformance cases — are genuinely welcome; see [`CONTRIBUTING.md`](https://pypi.org/project/agentversion/):
244
342
 
245
343
  ```bash
246
- git clone https://github.com/decimal-labs/agentversion
247
- cd agentversion
248
- pip install -e ".[dev]"
249
- pytest
344
+ pip install agentversion
345
+ agentversion --help
346
+ # run the conformance + unit suite from a clone:
347
+ git clone https://github.com/decimal-labs/agentversion && cd agentversion
348
+ pip install -e ".[dev]" && pytest
250
349
  ```
251
350
 
252
- Licensed under [Apache 2.0](https://github.com/decimal-labs/agentversion/blob/main/LICENSE).
351
+ Licensed under [Apache 2.0](https://pypi.org/project/agentversion/).