code-lens-cli 0.7.1__tar.gz → 0.9.1__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (147) hide show
  1. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.claude/skills/cicd/SKILL.md +1 -1
  2. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.claude/skills/cicd/scripts/portability-lint.sh +2 -2
  3. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.claude/skills/code-lookup/SKILL.md +1 -1
  4. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.claude/skills/code-lookup/scripts/classify.sh +1 -1
  5. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.claude/skills/code-lookup/scripts/grep.sh +1 -1
  6. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.claude/skills/code-lookup/scripts/recent.sh +1 -1
  7. code_lens_cli-0.9.1/.claude/skills/eval/SKILL.md +399 -0
  8. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.claude/skills/repo-map/SKILL.md +2 -2
  9. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.claude/skills/repo-map/scripts/connections.sh +1 -1
  10. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.claude/skills/repo-map/scripts/graph.sh +1 -1
  11. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.claude/skills/repo-map/scripts/profile.sh +1 -1
  12. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.github/workflows/publish.yml +5 -5
  13. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.github/workflows/security-checks.yml +2 -2
  14. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.github/workflows/tests.yml +5 -5
  15. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.markdownlint-cli2.yaml +1 -1
  16. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/CHANGELOG.md +27 -0
  17. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/CLAUDE.md +18 -18
  18. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/PKG-INFO +6 -6
  19. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/README.md +2 -2
  20. {code_lens_cli-0.7.1/seer → code_lens_cli-0.9.1/antoine}/__init__.py +8 -7
  21. code_lens_cli-0.9.1/antoine/__main__.py +8 -0
  22. {code_lens_cli-0.7.1/seer → code_lens_cli-0.9.1/antoine}/cli/__init__.py +27 -27
  23. code_lens_cli-0.9.1/antoine/cli/_commands/__init__.py +1 -0
  24. {code_lens_cli-0.7.1/seer → code_lens_cli-0.9.1/antoine}/cli/_commands/classify.py +4 -4
  25. {code_lens_cli-0.7.1/seer → code_lens_cli-0.9.1/antoine}/cli/_commands/explain.py +7 -7
  26. {code_lens_cli-0.7.1/seer → code_lens_cli-0.9.1/antoine}/cli/_commands/grep.py +3 -3
  27. {code_lens_cli-0.7.1/seer → code_lens_cli-0.9.1/antoine}/cli/_commands/learn.py +8 -8
  28. {code_lens_cli-0.7.1/seer → code_lens_cli-0.9.1/antoine}/cli/_commands/recent.py +3 -3
  29. {code_lens_cli-0.7.1/seer → code_lens_cli-0.9.1/antoine}/cli/_commands/whoami.py +7 -7
  30. {code_lens_cli-0.7.1/seer → code_lens_cli-0.9.1/antoine}/cli/_errors.py +7 -7
  31. {code_lens_cli-0.7.1/seer → code_lens_cli-0.9.1/antoine}/cli/_output.py +4 -4
  32. code_lens_cli-0.9.1/antoine/lookup/__init__.py +25 -0
  33. {code_lens_cli-0.7.1/seer → code_lens_cli-0.9.1/antoine}/lookup/ast_scope.py +1 -1
  34. {code_lens_cli-0.7.1/seer → code_lens_cli-0.9.1/antoine}/lookup/classify.py +9 -9
  35. {code_lens_cli-0.7.1/seer → code_lens_cli-0.9.1/antoine}/lookup/grep_context.py +11 -11
  36. {code_lens_cli-0.7.1/seer → code_lens_cli-0.9.1/antoine}/lookup/recent_outline.py +16 -16
  37. {code_lens_cli-0.7.1/seer → code_lens_cli-0.9.1/antoine}/lookup/render.py +1 -1
  38. code_lens_cli-0.9.1/antoine/repo/__init__.py +9 -0
  39. {code_lens_cli-0.7.1/seer → code_lens_cli-0.9.1/antoine}/repo/__main__.py +22 -22
  40. {code_lens_cli-0.7.1/seer → code_lens_cli-0.9.1/antoine}/repo/connections.py +9 -9
  41. {code_lens_cli-0.7.1/seer → code_lens_cli-0.9.1/antoine}/repo/errors.py +17 -17
  42. {code_lens_cli-0.7.1/seer → code_lens_cli-0.9.1/antoine}/repo/graph.py +8 -8
  43. {code_lens_cli-0.7.1/seer → code_lens_cli-0.9.1/antoine}/repo/manifest.py +2 -2
  44. {code_lens_cli-0.7.1/seer → code_lens_cli-0.9.1/antoine}/repo/profile.py +2 -2
  45. {code_lens_cli-0.7.1/seer → code_lens_cli-0.9.1/antoine}/repo/render.py +7 -7
  46. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/culture.yaml +1 -1
  47. code_lens_cli-0.9.1/docs/eval-rounds/2026-05-16-round-02.md +74 -0
  48. code_lens_cli-0.9.1/docs/skill-sources.md +29 -0
  49. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/experiments/scripts_eval/README.md +2 -2
  50. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/experiments/scripts_eval/RUNBOOK.md +13 -13
  51. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/experiments/scripts_eval/_io.py +10 -7
  52. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/experiments/scripts_eval/backfill.py +1 -1
  53. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/experiments/scripts_eval/corpus.yaml +1 -1
  54. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/experiments/scripts_eval/hooks/pre_tool.py +3 -3
  55. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/experiments/scripts_eval/judge.py +230 -59
  56. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/experiments/scripts_eval/report.py +43 -30
  57. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/experiments/scripts_eval/summarize.py +153 -65
  58. code_lens_cli-0.9.1/experiments/scripts_eval/switch-arm.sh +58 -0
  59. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/experiments/scripts_eval/trial.py +1 -1
  60. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/experiments/scripts_eval/validate.py +2 -2
  61. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/pyproject.toml +10 -10
  62. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/sonar-project.properties +2 -2
  63. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/scripts_eval/test_hooks_post_tool.py +8 -8
  64. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/scripts_eval/test_hooks_pre_tool.py +11 -11
  65. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/scripts_eval/test_io.py +10 -5
  66. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/scripts_eval/test_judge.py +130 -0
  67. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/scripts_eval/test_report.py +3 -2
  68. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/test_ast_scope.py +2 -2
  69. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/test_classify.py +9 -9
  70. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/test_classify_render.py +2 -2
  71. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/test_cli_chassis.py +5 -5
  72. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/test_cli_errors.py +10 -10
  73. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/test_cli_output.py +8 -8
  74. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/test_cli_stubs.py +8 -8
  75. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/test_grep_cmd.py +2 -2
  76. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/test_grep_context.py +7 -7
  77. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/test_package.py +12 -12
  78. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/test_recent_cmd.py +2 -2
  79. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/test_recent_outline.py +12 -12
  80. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/test_repo_cli.py +3 -3
  81. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/test_repo_config.py +2 -2
  82. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/test_repo_connections.py +5 -5
  83. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/test_repo_detect.py +2 -2
  84. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/test_repo_errors.py +4 -4
  85. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/test_repo_graph.py +5 -5
  86. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/test_repo_manifest.py +5 -5
  87. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/test_repo_profile.py +4 -4
  88. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/test_repo_render.py +6 -6
  89. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/uv.lock +51 -51
  90. code_lens_cli-0.7.1/.claude/skills/eval/SKILL.md +0 -263
  91. code_lens_cli-0.7.1/docs/skill-sources.md +0 -29
  92. code_lens_cli-0.7.1/experiments/scripts_eval/switch-arm.sh +0 -91
  93. code_lens_cli-0.7.1/seer/__main__.py +0 -8
  94. code_lens_cli-0.7.1/seer/cli/_commands/__init__.py +0 -1
  95. code_lens_cli-0.7.1/seer/lookup/__init__.py +0 -25
  96. code_lens_cli-0.7.1/seer/repo/__init__.py +0 -9
  97. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.claude/settings.json +0 -0
  98. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.claude/skills/cicd/scripts/_resolve-nick.sh +0 -0
  99. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.claude/skills/cicd/scripts/pr-reply.sh +0 -0
  100. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.claude/skills/cicd/scripts/pr-status.sh +0 -0
  101. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.claude/skills/cicd/scripts/workflow.sh +0 -0
  102. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.claude/skills/communicate/SKILL.md +0 -0
  103. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.claude/skills/communicate/scripts/fetch-issues.sh +0 -0
  104. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.claude/skills/communicate/scripts/mesh-message.sh +0 -0
  105. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.claude/skills/communicate/scripts/post-comment.sh +0 -0
  106. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.claude/skills/communicate/scripts/post-issue.sh +0 -0
  107. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.claude/skills/communicate/scripts/templates/skill-update-brief.md +0 -0
  108. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.claude/skills/run-tests/SKILL.md +0 -0
  109. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.claude/skills/run-tests/scripts/test.sh +0 -0
  110. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.claude/skills/sonarclaude/SKILL.md +0 -0
  111. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.claude/skills/sonarclaude/scripts/sonar.sh +0 -0
  112. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.claude/skills/version-bump/SKILL.md +0 -0
  113. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.claude/skills/version-bump/scripts/bump.py +0 -0
  114. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.claude/skills.local.yaml.example +0 -0
  115. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.flake8 +0 -0
  116. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.gitignore +0 -0
  117. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/.pre-commit-config.yaml +0 -0
  118. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/LICENSE +0 -0
  119. {code_lens_cli-0.7.1/seer → code_lens_cli-0.9.1/antoine}/repo/config.py +0 -0
  120. {code_lens_cli-0.7.1/seer → code_lens_cli-0.9.1/antoine}/repo/detect.py +0 -0
  121. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/docs/eval-rounds/2026-05-15-round-01.md +0 -0
  122. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/docs/eval-rounds/2026-05-15-smoke-02-examples.md +0 -0
  123. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/docs/superpowers/plans/2026-05-15-repo-map.md +0 -0
  124. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/docs/superpowers/plans/2026-05-15-scripts-eval-harness.md +0 -0
  125. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/docs/superpowers/plans/2026-05-16-seer-classify.md +0 -0
  126. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/docs/superpowers/specs/2026-05-15-repo-map-design.md +0 -0
  127. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/docs/superpowers/specs/2026-05-15-scripts-eval-harness-design.md +0 -0
  128. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/docs/superpowers/specs/2026-05-16-seer-classify-design.md +0 -0
  129. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/experiments/__init__.py +0 -0
  130. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/experiments/scripts_eval/__init__.py +0 -0
  131. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/experiments/scripts_eval/corpus.py +0 -0
  132. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/experiments/scripts_eval/hooks/__init__.py +0 -0
  133. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/experiments/scripts_eval/hooks/post_tool.py +0 -0
  134. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/experiments/scripts_eval/judge_rubric.md +0 -0
  135. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/experiments/scripts_eval/manifest.py +0 -0
  136. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/experiments/scripts_eval/results/.gitkeep +0 -0
  137. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/__init__.py +0 -0
  138. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/scripts_eval/__init__.py +0 -0
  139. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/scripts_eval/fixtures/.gitkeep +0 -0
  140. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/scripts_eval/fixtures/corpus_minimal.yaml +0 -0
  141. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/scripts_eval/fixtures/sidechain_min.jsonl +0 -0
  142. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/scripts_eval/test_backfill.py +0 -0
  143. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/scripts_eval/test_corpus.py +0 -0
  144. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/scripts_eval/test_manifest.py +0 -0
  145. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/scripts_eval/test_summarize.py +0 -0
  146. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/scripts_eval/test_trial.py +0 -0
  147. {code_lens_cli-0.7.1 → code_lens_cli-0.9.1}/tests/scripts_eval/test_validate.py +0 -0
@@ -9,7 +9,7 @@ description: >
9
9
  review feedback, polling CI status, or the user says "create PR",
10
10
  "review comments", "address feedback", "resolve threads". Renamed
11
11
  from `pr-review` in steward 0.7.0; rebased on agex in 0.12.0.
12
- seer-cli divergence: `scripts/portability-lint.sh` drops the GNU-only
12
+ antoine divergence: `scripts/portability-lint.sh` drops the GNU-only
13
13
  `xargs -r` flag for BSD/macOS portability — see `docs/skill-sources.md`.
14
14
  ---
15
15
 
@@ -21,7 +21,7 @@ esac
21
21
  [ -z "$files" ] && { echo "(no files to check)"; exit 0; }
22
22
 
23
23
  # ----- Check 1: hard-coded /home/<user>/... paths -----
24
- # seer-cli divergence: `xargs -r` is GNU-only and fails on BSD/macOS xargs.
24
+ # antoine divergence: `xargs -r` is GNU-only and fails on BSD/macOS xargs.
25
25
  # `$files` is already guarded non-empty above, so `-r` is redundant — dropped.
26
26
  hits1=$(echo "$files" | xargs grep -nE '/home/[a-z][a-z0-9_-]+/' 2>/dev/null || true)
27
27
 
@@ -31,7 +31,7 @@ hits1=$(echo "$files" | xargs grep -nE '/home/[a-z][a-z0-9_-]+/' 2>/dev/null ||
31
31
  # - ~/.culture/ Culture mesh data this skill is supposed to read
32
32
  md_yaml=$(echo "$files" | grep -E '\.(md|ya?ml|toml|json|jsonc)$' || true)
33
33
  if [ -n "$md_yaml" ]; then
34
- # seer-cli divergence: `xargs -r` is GNU-only; `$md_yaml` is guarded
34
+ # antoine divergence: `xargs -r` is GNU-only; `$md_yaml` is guarded
35
35
  # non-empty by the enclosing `if`, so `-r` is redundant — dropped.
36
36
  hits2=$(echo "$md_yaml" | xargs grep -nE '~/\.[A-Za-z]' 2>/dev/null \
37
37
  | grep -vE '~/\.claude/skills/[^[:space:]"]+/scripts/' \
@@ -96,5 +96,5 @@ One call each, no re-grepping.
96
96
 
97
97
  ## Engine
98
98
 
99
- `seer/lookup/` — `python -m seer <verb> …`. Each shell wrapper is a
99
+ `antoine/lookup/` — `python -m antoine <verb> …`. Each shell wrapper is a
100
100
  one-liner; the agent-facing contract is the verb and its flags.
@@ -1,4 +1,4 @@
1
1
  #!/usr/bin/env bash
2
2
  set -euo pipefail
3
3
  PROJECT_ROOT="$(cd "$(dirname "$0")/../../../.." && pwd)"
4
- exec uv run --directory "$PROJECT_ROOT" python -m seer classify "$@"
4
+ exec uv run --directory "$PROJECT_ROOT" python -m antoine classify "$@"
@@ -1,4 +1,4 @@
1
1
  #!/usr/bin/env bash
2
2
  set -euo pipefail
3
3
  PROJECT_ROOT="$(cd "$(dirname "$0")/../../../.." && pwd)"
4
- exec uv run --directory "$PROJECT_ROOT" python -m seer grep "$@"
4
+ exec uv run --directory "$PROJECT_ROOT" python -m antoine grep "$@"
@@ -1,4 +1,4 @@
1
1
  #!/usr/bin/env bash
2
2
  set -euo pipefail
3
3
  PROJECT_ROOT="$(cd "$(dirname "$0")/../../../.." && pwd)"
4
- exec uv run --directory "$PROJECT_ROOT" python -m seer recent "$@"
4
+ exec uv run --directory "$PROJECT_ROOT" python -m antoine recent "$@"
@@ -0,0 +1,399 @@
1
+ ---
2
+ name: eval
3
+ description: >
4
+ Run one scripts-eval set — one `(target, question)` row from
5
+ `experiments/scripts_eval/corpus.yaml` × 3 trials × one arm — including
6
+ tester subagent dispatches + captures, plus (for arm C) judge subagent
7
+ dispatches + records, then `summarize` + commit to the round's
8
+ accumulator file. Use when the user says "run eval set", "eval",
9
+ "scripts-eval", "round-NN set", or asks to execute a row of the corpus.
10
+ Three arms: A (banned — rider forbids the antoine skills), B (directed
11
+ — rider instructs use of antoine skills), C (organic — rider permits
12
+ but doesn't direct). Two judge pairs: A-vs-B ("do the skills help
13
+ when used") and A-vs-C ("do the skills get adopted organically").
14
+ `judge prepare --pair AB|AC` selects the pair.
15
+ ---
16
+
17
+ # scripts-eval — running a set
18
+
19
+ This skill drives one **set** of the scripts-eval harness:
20
+ one `(target, question)` row × 3 trials × one arm.
21
+
22
+ The harness pipeline (`trial` / `validate` / `judge` / `summarize`) and
23
+ the corpus (`corpus.yaml`) are repo state — this skill is just the
24
+ operator procedure that sequences them per session.
25
+
26
+ ## When to push back
27
+
28
+ Before doing anything, verify the user's intent matches the session
29
+ state. Stop and ask if any of these hold:
30
+
31
+ - `env | grep ANTOINE_EVAL_RUN_ID` is empty → the harness hooks no-op, no
32
+ metrics get captured. Operator needs to re-launch with the env vars
33
+ exported.
34
+ - `ANTOINE_EVAL_ARM` is set to anything other than `A`, `B`, or `C` → bad config.
35
+ - User says "do arm C" but the matching arm-A cells don't exist on
36
+ disk under `experiments/scripts_eval/results/$ANTOINE_EVAL_RUN_ID/arm-A/`
37
+ → arm A must complete first; there's nothing to pair against.
38
+
39
+ All three arms run with `repo-map` and `code-lookup` enabled on disk.
40
+ Arm-A's constraint is **verbal** — the rider in the dispatched prompt
41
+ is the sole guard against the subagent using the antoine skills. Do not
42
+ edit the rider; copy it verbatim. (Earlier versions of this skill
43
+ physically moved `.claude/skills/repo-map/` aside for arm A as
44
+ defense-in-depth; that step was dropped because the rider proved
45
+ sufficient and the move-aside dance made operator setup brittle.)
46
+
47
+ Three arms, three questions they answer:
48
+
49
+ - **A (banned)** — verbal rider forbids both antoine skills. Establishes
50
+ the "without the new skills" baseline.
51
+ - **B (directed)** — verbal rider instructs the subagent to use the
52
+ antoine skills where applicable. Establishes the "with the new skills,
53
+ when actually used" upper bound.
54
+ - **C (organic)** — verbal rider permits but does not direct use of
55
+ the antoine skills. Measures organic adoption rate.
56
+
57
+ A-vs-B is the primary "do the skills help?" comparison; A-vs-C is the
58
+ adoption canary. The judge supports both pairs via the `--pair` flag.
59
+
60
+ ## Preflight (every session)
61
+
62
+ ```bash
63
+ env | grep -E "^ANTOINE_EVAL_(RUN_ID|ARM)="
64
+ # expect both set to the intended round / arm
65
+ ```
66
+
67
+ If unset, export them in your shell before launching `claude`:
68
+
69
+ ```bash
70
+ # arm-A session (banned):
71
+ export ANTOINE_EVAL_RUN_ID=2026-05-NN-round-XX ANTOINE_EVAL_ARM=A
72
+ # arm-B session (directed):
73
+ export ANTOINE_EVAL_RUN_ID=2026-05-NN-round-XX ANTOINE_EVAL_ARM=B
74
+ # arm-C session (organic):
75
+ export ANTOINE_EVAL_RUN_ID=2026-05-NN-round-XX ANTOINE_EVAL_ARM=C
76
+ ```
77
+
78
+ `experiments/scripts_eval/switch-arm.sh A|B|C <run_id>` does the same
79
+ thing.
80
+
81
+ If this is the first set of the run (idempotent, safe to re-run):
82
+
83
+ ```bash
84
+ uv run --group experiments python -m experiments.scripts_eval.manifest \
85
+ init --run $ANTOINE_EVAL_RUN_ID
86
+ ```
87
+
88
+ ## Arm-A procedure
89
+
90
+ **For each trial in {1, 2, 3}:**
91
+
92
+ 1. Read the question template for the target's `question_id` from
93
+ `experiments/scripts_eval/corpus.yaml`. Look up the target's path
94
+ from the same file's `targets:` list.
95
+
96
+ 2. Substitute `{repo_path}` (or `{workspace_root}` for the workspace
97
+ question) in the template, then append **verbatim**:
98
+
99
+ ```text
100
+
101
+ Constraints (verbatim):
102
+ - You may NOT use the `repo-map` skill, `python -m antoine.repo`,
103
+ the `antoine.repo` Python module, or any `scripts/*.sh` paths under
104
+ `.claude/skills/repo-map/`.
105
+ - You may NOT use the `code-lookup` skill, the `antoine.lookup`
106
+ Python module, the `antoine grep` / `antoine recent` / `antoine classify`
107
+ CLI verbs, or any `scripts/*.sh` paths under
108
+ `.claude/skills/code-lookup/`.
109
+ If you cannot answer without them, say so explicitly and stop.
110
+ - Use only Read, Grep, Glob, and Bash.
111
+ - After answering, append two sections and stop:
112
+ ### tools_used
113
+ - <ToolName>: <count> (one line per distinct tool)
114
+ ### evidence
115
+ - <one path per line>
116
+ ```
117
+
118
+ 3. **Before dispatch** — start the trial. The script reads
119
+ `CLAUDE_CODE_SESSION_ID` from env, stamps an in-flight record, and
120
+ prints the `trial_id` to stdout:
121
+
122
+ ```bash
123
+ TRIAL_ID=$(uv run --group experiments python -m experiments.scripts_eval.trial \
124
+ start --run $ANTOINE_EVAL_RUN_ID --arm $ANTOINE_EVAL_ARM \
125
+ --target <target> --question <question_id> --trial <n>)
126
+ ```
127
+
128
+ (For the workspace-scope question, omit `--target`.)
129
+
130
+ 4. Dispatch **one** `Explore` subagent with the full prompt.
131
+
132
+ 5. After the subagent finishes, end the trial. The script reads the
133
+ subagent's sidechain transcript from
134
+ `$HOME/.claude/projects/<encoded_cwd>/<session>/subagents/agent-*.jsonl`
135
+ and writes the cell JSON:
136
+
137
+ ```bash
138
+ uv run --group experiments python -m experiments.scripts_eval.trial \
139
+ end --trial-id "$TRIAL_ID"
140
+ ```
141
+
142
+ 6. Confirm the cell JSON appeared under
143
+ `experiments/scripts_eval/results/$ANTOINE_EVAL_RUN_ID/arm-A/`.
144
+
145
+ **After all 3 trials**, summarize + commit:
146
+
147
+ ```bash
148
+ uv run --group experiments python -m experiments.scripts_eval.summarize \
149
+ --run $ANTOINE_EVAL_RUN_ID \
150
+ --out docs/eval-rounds/$ANTOINE_EVAL_RUN_ID.md
151
+
152
+ git add docs/eval-rounds/$ANTOINE_EVAL_RUN_ID.md
153
+ git commit -m "$ANTOINE_EVAL_RUN_ID: arm-A captured for <target>/<question_id> (3 trials)"
154
+ ```
155
+
156
+ Report back: cell count under arm-A/, what's pending for arm-B and
157
+ arm-C on this set, the next pending set per the run-state table in the
158
+ accumulator file.
159
+
160
+ ## Arm-B procedure
161
+
162
+ Arm-B captures the **directed** trials so the A-vs-B judge run can
163
+ assess "do the skills help when actually used?". Capture happens in
164
+ its own session (`ANTOINE_EVAL_ARM=B`); the A-vs-B judges then run in
165
+ the arm-C session's Judge phase, alongside the A-vs-C judges
166
+ (`judge prepare --pair AB`).
167
+
168
+ **For each trial in {1, 2, 3}:**
169
+
170
+ 1. Substitute the corpus question template (same target / question
171
+ resolution as arm A), then append **verbatim** the arm-B rider:
172
+
173
+ ```text
174
+
175
+ Constraints (verbatim):
176
+ - For this question, you MUST use the antoine skills where they
177
+ apply:
178
+ * `repo-map` (`scripts/profile.sh`, `scripts/connections.sh`,
179
+ `scripts/graph.sh` under `.claude/skills/repo-map/`) for
180
+ repo overview, dependencies, and workspace shape.
181
+ * `code-lookup` (`antoine grep`, `antoine recent`, `antoine classify`,
182
+ or the equivalent scripts under
183
+ `.claude/skills/code-lookup/`) for symbol references,
184
+ recent commit-symbol diffs, and project-kind classification.
185
+ Only fall back to Read / Grep / Glob / Bash for facts the
186
+ scripts do not cover.
187
+ - After answering, append two sections and stop:
188
+ ### tools_used
189
+ - <ToolName>: <count> (one line per distinct tool)
190
+ ### evidence
191
+ - <one path per line>
192
+ ```
193
+
194
+ 2. Bookend the dispatch with `trial start` and `trial end` exactly
195
+ as in arm A, just with `--arm B`:
196
+
197
+ ```bash
198
+ TRIAL_ID=$(uv run --group experiments python -m experiments.scripts_eval.trial \
199
+ start --run $ANTOINE_EVAL_RUN_ID --arm $ANTOINE_EVAL_ARM \
200
+ --target <target> --question <question_id> --trial <n>)
201
+ # dispatch one Explore subagent with the rendered prompt above
202
+ uv run --group experiments python -m experiments.scripts_eval.trial \
203
+ end --trial-id "$TRIAL_ID"
204
+ ```
205
+
206
+ 3. Confirm the cell JSON appeared under
207
+ `experiments/scripts_eval/results/$ANTOINE_EVAL_RUN_ID/arm-B/`.
208
+
209
+ **After all 3 trials**, summarize + commit:
210
+
211
+ ```bash
212
+ uv run --group experiments python -m experiments.scripts_eval.summarize \
213
+ --run $ANTOINE_EVAL_RUN_ID \
214
+ --out docs/eval-rounds/$ANTOINE_EVAL_RUN_ID.md
215
+
216
+ git add docs/eval-rounds/$ANTOINE_EVAL_RUN_ID.md
217
+ git commit -m "$ANTOINE_EVAL_RUN_ID: arm-B captured for <target>/<question_id> (3 trials)"
218
+ ```
219
+
220
+ Report back: cell count under arm-B/, whether the subagent actually
221
+ followed the directive (look at the `### tools_used` of each arm-B
222
+ cell — `B_did_not_use_scripts` is a finding, not a bug), the next
223
+ pending set per the run-state table.
224
+
225
+ ## Arm-C procedure
226
+
227
+ **Precondition check (mandatory):**
228
+
229
+ ```bash
230
+ ls experiments/scripts_eval/results/$ANTOINE_EVAL_RUN_ID/arm-A/<target>-<question_id>-t*.json
231
+ # expect: 3 files (t1, t2, t3)
232
+ ```
233
+
234
+ If fewer than 3, stop — arm A must complete first.
235
+
236
+ ### Tester phase
237
+
238
+ **For each trial in {1, 2, 3}:**
239
+
240
+ 1. Substitute the corpus question template (same as arm A) but with the
241
+ arm-C rider:
242
+
243
+ ```text
244
+
245
+ Constraints (verbatim):
246
+ - You may use the `repo-map` skill (and its scripts under
247
+ `.claude/skills/repo-map/`) and the `code-lookup` skill (and its
248
+ scripts under `.claude/skills/code-lookup/`) at your discretion.
249
+ This includes `antoine grep` / `antoine recent` / `antoine classify`.
250
+ - After answering, append two sections and stop:
251
+ ### tools_used
252
+ - <ToolName>: <count>
253
+ ### evidence
254
+ - <one path per line>
255
+ ```
256
+
257
+ 2. Bookend the dispatch with `trial start` and `trial end`:
258
+
259
+ ```bash
260
+ TRIAL_ID=$(uv run --group experiments python -m experiments.scripts_eval.trial \
261
+ start --run $ANTOINE_EVAL_RUN_ID --arm $ANTOINE_EVAL_ARM \
262
+ --target <target> --question <question_id> --trial <n>)
263
+ # dispatch one Explore subagent with the rendered prompt above
264
+ uv run --group experiments python -m experiments.scripts_eval.trial \
265
+ end --trial-id "$TRIAL_ID"
266
+ ```
267
+
268
+ (For the workspace-scope question, omit `--target`.)
269
+
270
+ ### Judge phase
271
+
272
+ Two pairs are judged independently:
273
+
274
+ - **A-vs-C** — the original "with vs without (organic)" comparison.
275
+ - **A-vs-B** — the new "with (directed) vs without" comparison; needs
276
+ arm-B cells captured first.
277
+
278
+ Both pairs use the same `prepare` / `record` flow; only `--pair`
279
+ (`AC` or `AB`) and the matching `--blind-label-for-<arm>` flags differ.
280
+
281
+ **For each trial in {1, 2, 3}**, run the A-vs-C judge first (if arm-C
282
+ cells exist) and then the A-vs-B judge (if arm-B cells exist):
283
+
284
+ 1. Prepare the blinded job. `--pair` defaults to `AC`; pass `--pair AB`
285
+ for the A-vs-B run.
286
+
287
+ ```bash
288
+ uv run --group experiments python -m experiments.scripts_eval.judge \
289
+ prepare --run $ANTOINE_EVAL_RUN_ID \
290
+ --pair AC \
291
+ --pair-key <target>/<question_id>/<n> \
292
+ --seed 0 > /tmp/judge-AC-<n>.json
293
+ ```
294
+
295
+ 2. Materialise the prompt to a text file for dispatch (`jq -j`
296
+ joins without adding a trailing newline, so the bytes match what
297
+ `prepare` emitted):
298
+
299
+ ```bash
300
+ jq -j '.prompt_text' /tmp/judge-AC-<n>.json > /tmp/judge-AC-<n>.txt
301
+ ```
302
+
303
+ 3. Dispatch the judge subagent. **The description prefix is
304
+ load-bearing** — the `pre_tool` hook recognises `scripts_eval judge:`
305
+ and skips logging, so the judge dispatch does not pollute the
306
+ harness's `raw/` directory:
307
+
308
+ - `subagent_type`: `general-purpose`
309
+ - `description`: `scripts_eval judge: AC <target>/<question_id>/<n>`
310
+ - `prompt`: the verbatim contents of `/tmp/judge-AC-<n>.txt`
311
+
312
+ 4. Capture the subagent's final-text response and record. The blind
313
+ labels for an AC pair come back as `blind_label_for_A` /
314
+ `blind_label_for_C`:
315
+
316
+ ```bash
317
+ A_LABEL=$(jq -r .blind_label_for_A /tmp/judge-AC-<n>.json)
318
+ C_LABEL=$(jq -r .blind_label_for_C /tmp/judge-AC-<n>.json)
319
+ uv run --group experiments python -m experiments.scripts_eval.judge \
320
+ record --run $ANTOINE_EVAL_RUN_ID \
321
+ --pair AC \
322
+ --pair-key <target>/<question_id>/<n> \
323
+ --blind-label-for-a "$A_LABEL" \
324
+ --blind-label-for-c "$C_LABEL" \
325
+ --verdict-file -
326
+ ```
327
+
328
+ 5. **Repeat the four steps with `--pair AB`** to judge the directed
329
+ arm. The job JSON for an AB pair carries `blind_label_for_A` and
330
+ `blind_label_for_B` (no `_C`); use `--blind-label-for-b` instead of
331
+ `--blind-label-for-c`:
332
+
333
+ ```bash
334
+ uv run --group experiments python -m experiments.scripts_eval.judge \
335
+ prepare --run $ANTOINE_EVAL_RUN_ID \
336
+ --pair AB \
337
+ --pair-key <target>/<question_id>/<n> \
338
+ --seed 0 > /tmp/judge-AB-<n>.json
339
+ jq -j '.prompt_text' /tmp/judge-AB-<n>.json > /tmp/judge-AB-<n>.txt
340
+ # …dispatch general-purpose subagent with description
341
+ # "scripts_eval judge: AB <target>/<question_id>/<n>" and the txt prompt…
342
+ A_LABEL=$(jq -r .blind_label_for_A /tmp/judge-AB-<n>.json)
343
+ B_LABEL=$(jq -r .blind_label_for_B /tmp/judge-AB-<n>.json)
344
+ uv run --group experiments python -m experiments.scripts_eval.judge \
345
+ record --run $ANTOINE_EVAL_RUN_ID \
346
+ --pair AB \
347
+ --pair-key <target>/<question_id>/<n> \
348
+ --blind-label-for-a "$A_LABEL" \
349
+ --blind-label-for-b "$B_LABEL" \
350
+ --verdict-file -
351
+ ```
352
+
353
+ If `record` exits non-zero with `non-JSON` / `winner` / `margin` /
354
+ `blind_label` in the error, re-dispatch the judge subagent for that
355
+ trial and re-record. `record` is idempotent on replay — the operator's
356
+ recovery path is "re-dispatch + re-record"; no manual cell editing.
357
+
358
+ Storage: AC verdicts land under `cell["judges"]["AC"]` (and are mirrored
359
+ to `cell["judge"]` for back-compat with pre-phase-2 readers); AB verdicts
360
+ land under `cell["judges"]["AB"]` only.
361
+
362
+ ### Wrap-up
363
+
364
+ ```bash
365
+ uv run --group experiments python -m experiments.scripts_eval.validate \
366
+ --run $ANTOINE_EVAL_RUN_ID
367
+
368
+ uv run --group experiments python -m experiments.scripts_eval.summarize \
369
+ --run $ANTOINE_EVAL_RUN_ID \
370
+ --out docs/eval-rounds/$ANTOINE_EVAL_RUN_ID.md
371
+
372
+ git add docs/eval-rounds/$ANTOINE_EVAL_RUN_ID.md
373
+ git commit -m "$ANTOINE_EVAL_RUN_ID: completed <target>/<question_id> (both arms + judge)"
374
+ ```
375
+
376
+ Report back:
377
+ - A-vs-B winners (A / B / tie) and A-vs-C winners (A / C / tie).
378
+ - Whether arm B and arm C actually used the antoine scripts (look at the
379
+ `### tools_used` of each cell — `B_did_not_use_scripts` and
380
+ `C_did_not_use_scripts` are findings, not bugs).
381
+ - The next pending set per the run-state table.
382
+
383
+ ## Reading the run state
384
+
385
+ The committed run-state table and per-set verdicts live in
386
+ `docs/eval-rounds/$ANTOINE_EVAL_RUN_ID.md`, between the
387
+ `<!-- runstate:start -->` / `<!-- runstate:end -->` and
388
+ `<!-- evidence:start -->` / `<!-- evidence:end -->` markers. `summarize.py`
389
+ rewrites those regions idempotently — do not hand-edit them.
390
+
391
+ The accumulator file is also the operator's source of truth for what's
392
+ pending: a row's `arm-A` or `arm-C` count below `3/3` means more trials
393
+ are needed; `judged` below the arm counts means judges still owe verdicts.
394
+
395
+ ## Cite-don't-import
396
+
397
+ This skill is original to antoine (the harness only exists here). When
398
+ promoted upstream, it would re-vendor into steward's skill suppliers —
399
+ update `docs/skill-sources.md` accordingly at that point.
@@ -97,8 +97,8 @@ Flags always override config.
97
97
 
98
98
  ## Engine
99
99
 
100
- The actual logic lives in `seer/repo/` and is invoked via
101
- `uv run python -m seer.repo <verb>`. The shell scripts are one-line wrappers; the
100
+ The actual logic lives in `antoine/repo/` and is invoked via
101
+ `uv run python -m antoine.repo <verb>`. The shell scripts are one-line wrappers; the
102
102
  agent-facing contract is the verbs and their flags, not the wrappers.
103
103
 
104
104
  > **Interpreter note:** the scripts use `uv run --directory <project-root>`
@@ -1,4 +1,4 @@
1
1
  #!/usr/bin/env bash
2
2
  set -euo pipefail
3
3
  PROJECT_ROOT="$(cd "$(dirname "$0")/../../../.." && pwd)"
4
- exec uv run --directory "$PROJECT_ROOT" python -m seer.repo connections "$@"
4
+ exec uv run --directory "$PROJECT_ROOT" python -m antoine.repo connections "$@"
@@ -1,4 +1,4 @@
1
1
  #!/usr/bin/env bash
2
2
  set -euo pipefail
3
3
  PROJECT_ROOT="$(cd "$(dirname "$0")/../../../.." && pwd)"
4
- exec uv run --directory "$PROJECT_ROOT" python -m seer.repo graph "$@"
4
+ exec uv run --directory "$PROJECT_ROOT" python -m antoine.repo graph "$@"
@@ -1,4 +1,4 @@
1
1
  #!/usr/bin/env bash
2
2
  set -euo pipefail
3
3
  PROJECT_ROOT="$(cd "$(dirname "$0")/../../../.." && pwd)"
4
- exec uv run --directory "$PROJECT_ROOT" python -m seer.repo profile "$@"
4
+ exec uv run --directory "$PROJECT_ROOT" python -m antoine.repo profile "$@"
@@ -5,12 +5,12 @@ on:
5
5
  branches: [main]
6
6
  paths:
7
7
  - "pyproject.toml"
8
- - "seer/**"
8
+ - "antoine/**"
9
9
  pull_request:
10
10
  branches: [main]
11
11
  paths:
12
12
  - "pyproject.toml"
13
- - "seer/**"
13
+ - "antoine/**"
14
14
 
15
15
  jobs:
16
16
  test:
@@ -57,7 +57,7 @@ jobs:
57
57
  - name: Build and publish each distribution to TestPyPI
58
58
  run: |
59
59
  set -euo pipefail
60
- for pkg in seer-cli kata-cli code-lens-cli; do
60
+ for pkg in antoine-cli kata-cli code-lens-cli; do
61
61
  echo "::group::TestPyPI publish $pkg"
62
62
  # Run the per-package steps in a subshell so set -e failures
63
63
  # don't skip the ::endgroup:: marker — keeps Actions logs
@@ -81,7 +81,7 @@ jobs:
81
81
  - name: Print install commands
82
82
  if: always()
83
83
  run: |
84
- for pkg in seer-cli kata-cli code-lens-cli; do
84
+ for pkg in antoine-cli kata-cli code-lens-cli; do
85
85
  echo "::notice::Test with: uv tool install --index-url https://test.pypi.org/simple/ --index-strategy unsafe-best-match $pkg==${DEV_VERSION}"
86
86
  done
87
87
 
@@ -105,7 +105,7 @@ jobs:
105
105
  - name: Build and publish each distribution
106
106
  run: |
107
107
  set -euo pipefail
108
- for pkg in seer-cli kata-cli code-lens-cli; do
108
+ for pkg in antoine-cli kata-cli code-lens-cli; do
109
109
  echo "::group::Publishing $pkg"
110
110
  # Run the per-package steps in a subshell so set -e failures
111
111
  # don't skip the ::endgroup:: marker — keeps Actions logs
@@ -25,11 +25,11 @@ jobs:
25
25
  - run: uv sync
26
26
 
27
27
  - name: Run Bandit
28
- run: uv run bandit -r seer/ -f json -o bandit-results.json -c pyproject.toml
28
+ run: uv run bandit -r antoine/ -f json -o bandit-results.json -c pyproject.toml
29
29
  continue-on-error: true
30
30
 
31
31
  - name: Run Pylint
32
- run: uv run pylint seer/ --output-format=json:pylint-results.json,text
32
+ run: uv run pylint antoine/ --output-format=json:pylint-results.json,text
33
33
  continue-on-error: true
34
34
 
35
35
  - name: Upload Security Results
@@ -30,7 +30,7 @@ jobs:
30
30
 
31
31
  - run: uv sync
32
32
 
33
- - run: uv run pytest -n auto --cov=seer --cov-report=xml:coverage.xml --cov-report=term -v
33
+ - run: uv run pytest -n auto --cov=antoine --cov-report=xml:coverage.xml --cov-report=term -v
34
34
 
35
35
  - name: SonarCloud Scan
36
36
  if: env.SONAR_TOKEN != ''
@@ -56,16 +56,16 @@ jobs:
56
56
  - run: uv sync
57
57
 
58
58
  - name: black --check
59
- run: uv run black --check seer tests
59
+ run: uv run black --check antoine tests
60
60
 
61
61
  - name: isort --check
62
- run: uv run isort --check-only seer tests
62
+ run: uv run isort --check-only antoine tests
63
63
 
64
64
  - name: flake8
65
- run: uv run flake8 --config=.flake8 seer/ tests/
65
+ run: uv run flake8 --config=.flake8 antoine/ tests/
66
66
 
67
67
  - name: bandit
68
- run: uv run bandit -c pyproject.toml -r seer
68
+ run: uv run bandit -c pyproject.toml -r antoine
69
69
 
70
70
  - name: markdownlint-cli2
71
71
  run: |
@@ -1,4 +1,4 @@
1
- # markdownlint-cli2 config for seer-cli.
1
+ # markdownlint-cli2 config for antoine.
2
2
  # markdownlint-cli2 stops walking at the git root, so a global
3
3
  # markdownlint config in the user's home directory isn't picked up from
4
4
  # inside the repo. Keep this file aligned with the global preset.
@@ -5,6 +5,33 @@ All notable changes to this project will be documented in this file.
5
5
  Format follows [Keep a Changelog](https://keepachangelog.com/). This project
6
6
  adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
 
8
+ ## [0.9.1] - 2026-05-17
9
+
10
+ ### Changed
11
+
12
+ - PyPI distribution renamed from `antoine` to `antoine-cli` to avoid name collision and stay consistent with the `kata-cli` / `code-lens-cli` alt-publish naming convention. The Python module (`antoine`) and console script (`antoine`) are unchanged; only the wheel-distribution name moves. `_resolve_version()` fallback list and `.github/workflows/publish.yml` publish loop updated to match.
13
+
14
+ ## [0.9.0] - 2026-05-17
15
+
16
+ ### Changed
17
+
18
+ - **Repository rename: `seer-cli` → `antoine`.** GitHub remote moved to `agentculture/antoine`; primary PyPI distribution renamed from `seer-cli` to `antoine`; `kata-cli` and `code-lens-cli` alt-publishes preserved. Python module renamed `seer/` → `antoine/`; primary console script renamed `seer` → `antoine` (the `kata` alias is retained). All imports, error classes (`SeerError` → `AntoineError`, `_SeerArgumentParser` → `_AntoineArgumentParser`), env vars (`SEER_EVAL_*` → `ANTOINE_EVAL_*`), Sonar project key (`agentculture_seer-cli` → `agentculture_antoine`), `culture.yaml` agent suffix, vendored skill bodies, and the scripts-eval harness's banned-pattern detection updated accordingly. Historical `CHANGELOG.md` entries, `docs/eval-rounds/`, and dated `docs/superpowers/{specs,plans}/` files are intentionally left referring to `seer` — those describe past state.
19
+
20
+ ## [0.8.0] - 2026-05-16
21
+
22
+ ### Added
23
+
24
+ - scripts-eval: arm-B (directed-use) — rider explicitly instructs the subagent to use repo-map + code-lookup scripts. Cells captured under `results/<run>/arm-B/`. `corpus.yaml` arms field becomes `[A, B, C]`.
25
+ - scripts-eval: `judge.py` is now pair-aware. `judge prepare` / `judge record` take `--pair AC|AB|BC`; new label flag `--blind-label-for-b`. Verdicts land under `cell["judges"][pair]`; the AC pair still mirrors to legacy `cell["judge"]` for back-compat. New `iter_jobs_pair` / `record_verdict_pair` public APIs; old `iter_jobs` / `record_verdict` stay as AC-pair wrappers.
26
+ - scripts-eval: summarize.py renders both A-vs-B and A-vs-C winner tallies in the run-state table, with per-pair verdict tables in the evidence section.
27
+ - eval skill: arm-A rider tightened to also forbid the code-lookup skill / seer.lookup / seer grep/recent/classify so "without" means without both new skills. Added arm-B procedure section and pair-aware judge procedure (--pair AB | AC).
28
+
29
+ ### Changed
30
+
31
+ - eval skill: switch-arm.sh no longer moves .claude/skills/repo-map/ on disk; arm-A relies on the verbal rider alone (rider proved sufficient; move-aside dance made operator setup brittle).
32
+ - scripts-eval: report.py violation patterns now also flag code-lookup script use; report aggregates median validation across all three arms; per-cell view shows every captured arm.
33
+ - scripts-eval: validate.py / backfill.py iterate over _io.ARMS instead of hardcoding (A, C).
34
+
8
35
  ## [0.7.1] - 2026-05-16
9
36
 
10
37
  ### Changed