codebyplan 1.13.63 → 1.13.65

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (31) hide show
  1. package/dist/cli.js +923 -443
  2. package/package.json +1 -1
  3. package/templates/github-workflows/ci.yml +5 -0
  4. package/templates/hooks/cbp-session-id-stamp.sh +67 -0
  5. package/templates/hooks/cbp-skill-context-guard.sh +33 -13
  6. package/templates/hooks/cbp-test-hooks.sh +271 -0
  7. package/templates/rules/architecture-map.md +6 -0
  8. package/templates/rules/cbp-operating-gotchas.md +23 -9
  9. package/templates/rules/e2e-mandatory.md +5 -0
  10. package/templates/rules/handoff-file-convention.md +65 -0
  11. package/templates/rules/model-invocation-convention.md +1 -1
  12. package/templates/rules/scope-vocabulary.md +5 -0
  13. package/templates/rules/task-routing-recommendation.md +1 -1
  14. package/templates/settings.project.base.json +26 -8
  15. package/templates/skills/cbp-build-cc-agent/SKILL.md +3 -0
  16. package/templates/skills/cbp-build-cc-agent/reference/frontmatter-fields.md +31 -0
  17. package/templates/skills/cbp-build-cc-agent/reference/permission-modes.md +6 -0
  18. package/templates/skills/cbp-build-cc-agent/templates/agent.md +1 -0
  19. package/templates/skills/cbp-build-cc-settings/reference/cbp-permission-policy.md +12 -5
  20. package/templates/skills/cbp-build-cc-skill/SKILL.md +1 -0
  21. package/templates/skills/cbp-build-cc-skill/reference/frontmatter-fields.md +14 -0
  22. package/templates/skills/cbp-build-cc-skill/templates/skill.md +1 -0
  23. package/templates/skills/cbp-checkpoint-complete/SKILL.md +32 -0
  24. package/templates/skills/cbp-clear-continue/SKILL.md +23 -1
  25. package/templates/skills/cbp-clear-prep/SKILL.md +23 -2
  26. package/templates/skills/cbp-finalize/SKILL.md +40 -4
  27. package/templates/skills/cbp-finalize/reference/checkpoint-done-branching.md +1 -1
  28. package/templates/skills/cbp-finalize/reference/next-step-heuristic.md +1 -1
  29. package/templates/skills/cbp-session-end/SKILL.md +23 -25
  30. package/templates/skills/cbp-session-start/SKILL.md +24 -2
  31. package/templates/skills/cbp-standalone-task-complete/SKILL.md +32 -0
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "codebyplan",
3
- "version": "1.13.63",
3
+ "version": "1.13.65",
4
4
  "description": "CLI for CodeByPlan — AI-powered development planning and tracking",
5
5
  "type": "module",
6
6
  "bin": {
@@ -40,6 +40,8 @@ jobs:
40
40
  ci:
41
41
  name: Lint + typecheck + test + build
42
42
  runs-on: ubuntu-latest
43
+ # Cap the job so a hung step can't burn the 6h default and waste runner minutes.
44
+ timeout-minutes: 20
43
45
  steps:
44
46
  - name: Checkout
45
47
  uses: actions/checkout@v4
@@ -85,6 +87,9 @@ jobs:
85
87
  ci-strict:
86
88
  name: Strict whole-repo green{{STRICT_NAME_SUFFIX}}
87
89
  runs-on: ubuntu-latest{{STRICT_CONTINUE_ON_ERROR_LINE}}
90
+ # Whole-repo green is heavier than the soft tier; allow more headroom but
91
+ # still cap it so a hang can't run to the 6h default.
92
+ timeout-minutes: 30
88
93
  steps:
89
94
  - name: Checkout
90
95
  uses: actions/checkout@v4
@@ -0,0 +1,67 @@
1
+ #!/bin/bash
2
+ # @scope: org-shared
3
+ # Hook: SessionStart
4
+ # Purpose: Stamp the Claude harness session UUID into
5
+ # .codebyplan/state/session/session-id.json so the CLI create-log path
6
+ # (and other session-aware tools) can correlate log entries with the
7
+ # running Claude session.
8
+ #
9
+ # Hook-safe: all errors are swallowed, always exits 0. Non-fatal / best-effort.
10
+ # No-op when not inside a codebyplan repo or when the session UUID cannot be
11
+ # derived from the hook payload.
12
+ #
13
+ # UUID derivation order:
14
+ # 1. stdin JSON payload `.session_id` field
15
+ # 2. basename (without extension) of `.transcript_path` field
16
+ # 3. no-op — exit 0 without writing
17
+
18
+ # C0 — require jq; if absent, exit 0 (fail-open).
19
+ if ! command -v jq > /dev/null 2>&1; then
20
+ exit 0
21
+ fi
22
+
23
+ # Resolve the project dir: Claude Code sets CLAUDE_PROJECT_DIR; fall back to pwd.
24
+ PROJECT_DIR="${CLAUDE_PROJECT_DIR:-$(pwd)}"
25
+
26
+ # No-op when not inside a codebyplan repo (sentinel file absent).
27
+ if [ ! -f "$PROJECT_DIR/.codebyplan/repo.json" ]; then
28
+ exit 0
29
+ fi
30
+
31
+ # Read stdin once into a variable.
32
+ INPUT=$(cat)
33
+
34
+ # Derive the session UUID — try session_id field first.
35
+ SESSION_UUID=$(printf '%s' "$INPUT" | jq -r '.session_id // empty' 2>/dev/null)
36
+
37
+ # Fall back to transcript_path basename (strip directory + extension).
38
+ if [ -z "$SESSION_UUID" ]; then
39
+ TRANSCRIPT=$(printf '%s' "$INPUT" | jq -r '.transcript_path // empty' 2>/dev/null)
40
+ if [ -n "$TRANSCRIPT" ]; then
41
+ BASENAME=$(basename "$TRANSCRIPT")
42
+ SESSION_UUID="${BASENAME%.*}"
43
+ fi
44
+ fi
45
+
46
+ # If still no UUID, nothing to write — exit gracefully.
47
+ if [ -z "$SESSION_UUID" ]; then
48
+ exit 0
49
+ fi
50
+
51
+ # UUID guard — accept only a canonical UUID (8-4-4-4-12 hex) to avoid
52
+ # stamping garbage values (e.g. a literal "null" or an arbitrary path stem).
53
+ UUID_PATTERN='^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}$'
54
+ if ! printf '%s' "$SESSION_UUID" | grep -qE "$UUID_PATTERN"; then
55
+ exit 0
56
+ fi
57
+
58
+ # Ensure the target directory exists.
59
+ SESSION_DIR="$PROJECT_DIR/.codebyplan/state/session"
60
+ mkdir -p "$SESSION_DIR" 2>/dev/null || true
61
+
62
+ # Stamp the session UUID with an ISO timestamp.
63
+ STAMPED_AT=$(date -u +"%Y-%m-%dT%H:%M:%SZ" 2>/dev/null)
64
+ printf '{"claude_session_id":"%s","stamped_at":"%s"}\n' "$SESSION_UUID" "$STAMPED_AT" \
65
+ > "$SESSION_DIR/session-id.json" 2>/dev/null || true
66
+
67
+ exit 0
@@ -1,11 +1,16 @@
1
1
  #!/bin/bash
2
2
  # @scope: org-shared
3
3
  # Hook: PreToolUse (Skill)
4
- # Purpose: Deny heavy close-out skills when context window > CBP_CONTEXT_WARN_TOKENS (default 200000).
5
- # Reads transcript_path from stdin, sums the latest assistant message.usage — same logic
6
- # as cbp-context-window-notify.sh. If total exceeds threshold AND the skill is in the
7
- # heavy close-out allowlist, emits hookSpecificOutput.permissionDecision=deny directing
8
- # Claude to run /cbp-clear-prep. Always exits 0 fail-open.
4
+ # Purpose: Deny heavy close-out skills when context window exceeds the model-aware threshold:
5
+ # 200K tokens for standard models (CBP_CONTEXT_WARN_TOKENS, default 200000);
6
+ # 800K tokens for 1M-context models whose id contains "[1m]"
7
+ # (CBP_CONTEXT_WARN_TOKENS_1M, default 800000).
8
+ # Reads transcript_path from stdin, extracts the latest assistant message.usage
9
+ # AND model id in one pass — same logic as cbp-context-window-notify.sh.
10
+ # If total exceeds the tier threshold AND the skill is in the heavy close-out
11
+ # allowlist, emits hookSpecificOutput.permissionDecision=deny directing Claude
12
+ # to run /cbp-clear-prep. Always exits 0 — fail-open.
13
+ # No model id found → standard tier (fail-conservative).
9
14
 
10
15
  set -euo pipefail
11
16
 
@@ -17,8 +22,6 @@ TRANSCRIPT=$(echo "$INPUT" | jq -r '.transcript_path // ""' 2>/dev/null) || TRAN
17
22
  [ -z "$TRANSCRIPT" ] && exit 0
18
23
  [ ! -f "$TRANSCRIPT" ] && exit 0
19
24
 
20
- THRESHOLD="${CBP_CONTEXT_WARN_TOKENS:-200000}"
21
-
22
25
  # Heavy close-out allowlist (cbp-clear-prep + cbp-clear-continue deliberately excluded so
23
26
  # they always run even when context > threshold).
24
27
  HEAVY_SKILLS="cbp-round-build cbp-verify cbp-standalone-task-testing cbp-checkpoint-check cbp-checkpoint-end"
@@ -33,20 +36,37 @@ for heavy in $HEAVY_SKILLS; do
33
36
  done
34
37
  [ "$IS_HEAVY" = "false" ] && exit 0
35
38
 
36
- # Token sum same logic as cbp-context-window-notify.sh
37
- TOTAL=$(tail -n 400 "$TRANSCRIPT" \
39
+ # Extract model id AND token sum from the latest assistant message with usage, in one pass.
40
+ # Outputs two tab-separated fields: MODEL_ID <tab> TOTAL.
41
+ # Use cut -f1/f2 (not read -r A B) to correctly handle an empty first field (no .message.model).
42
+ _TSV=$(tail -n 400 "$TRANSCRIPT" \
38
43
  | jq -rR 'fromjson? | select(.message.usage != null)
39
- | (.message.usage
40
- | ((.input_tokens // 0) + (.cache_creation_input_tokens // 0) + (.cache_read_input_tokens // 0)))' \
41
- 2>/dev/null | tail -1) || TOTAL=0
44
+ | [ (.message.model // ""),
45
+ ((.message.usage.input_tokens // 0)
46
+ + (.message.usage.cache_creation_input_tokens // 0)
47
+ + (.message.usage.cache_read_input_tokens // 0)) ]
48
+ | @tsv' 2>/dev/null | tail -1) || _TSV=""
49
+ MODEL_ID=$(printf '%s' "${_TSV:-}" | cut -f1)
50
+ TOTAL=$(printf '%s' "${_TSV:-}" | cut -f2)
51
+ MODEL_ID="${MODEL_ID:-}"
42
52
  TOTAL="${TOTAL:-0}"
43
53
 
54
+ # Tier selection: [1m] in model id → 1M tier; otherwise standard (fail-conservative)
55
+ if printf '%s' "$MODEL_ID" | grep -qF '[1m]'; then
56
+ THRESHOLD="${CBP_CONTEXT_WARN_TOKENS_1M:-800000}"
57
+ TIER="1M"
58
+ else
59
+ THRESHOLD="${CBP_CONTEXT_WARN_TOKENS:-200000}"
60
+ TIER="standard"
61
+ fi
62
+
44
63
  if [ "$TOTAL" -ge "$THRESHOLD" ] 2>/dev/null; then
45
64
  jq -n \
46
65
  --argjson tokens "$TOTAL" \
47
66
  --argjson threshold "$THRESHOLD" \
48
67
  --arg skill "$SKILL_NAME" \
49
- '{hookSpecificOutput:{permissionDecision:"deny",permissionDecisionReason:("Context window at \($tokens) tokens (threshold \($threshold)) is too large to safely run /\($skill). Run /cbp-clear-prep now to capture a handoff, then /clear, then /cbp-clear-continue to resume.")}}'
68
+ --arg tier "$TIER" \
69
+ '{hookSpecificOutput:{permissionDecision:"deny",permissionDecisionReason:("Context window at \($tokens) tokens (\($tier) tier, threshold \($threshold)) is too large to safely run /\($skill). Run /cbp-clear-prep now to capture a handoff, then /clear, then /cbp-clear-continue to resume.")}}'
50
70
  fi
51
71
 
52
72
  exit 0
@@ -374,6 +374,87 @@ fi
374
374
 
375
375
  echo ""
376
376
 
377
+ # ===== HOOK SMOKE TESTS — cbp-mcp-caller-worktree-inject =====
378
+ echo "## Hook Smoke Tests — cbp-mcp-caller-worktree-inject (CHK-198)"
379
+
380
+ INJECT_HOOK="$HOOKS_DIR/cbp-mcp-caller-worktree-inject.sh"
381
+ # Absolute path — the fail-open test runs the hook from a temp cwd (to isolate it
382
+ # from this repo's git context), where the relative "$HOOKS_DIR" no longer resolves.
383
+ INJECT_HOOK_ABS="$(cd "$HOOKS_DIR" 2>/dev/null && pwd)/cbp-mcp-caller-worktree-inject.sh"
384
+
385
+ if [ ! -f "$INJECT_HOOK" ]; then
386
+ test_result "cbp-mcp-caller-worktree-inject.sh present" "passed" "missing"
387
+ else
388
+ test_result "cbp-mcp-caller-worktree-inject.sh present" "passed" "passed"
389
+
390
+ FIRST_LINE=$(head -1 "$INJECT_HOOK")
391
+ if echo "$FIRST_LINE" | grep -q '^#!/'; then
392
+ test_result "cbp-mcp-caller-worktree-inject.sh has shebang" "passed" "passed"
393
+ else
394
+ test_result "cbp-mcp-caller-worktree-inject.sh has shebang" "passed" "missing"
395
+ fi
396
+
397
+ if grep -q '@scope: org-shared' "$INJECT_HOOK"; then
398
+ test_result "cbp-mcp-caller-worktree-inject.sh has @scope: org-shared" "passed" "passed"
399
+ else
400
+ test_result "cbp-mcp-caller-worktree-inject.sh has @scope: org-shared" "passed" "missing"
401
+ fi
402
+
403
+ # Fail-open: run from a non-repo temp dir with no worktree cache and no
404
+ # CLAUDE_PROJECT_DIR — neither the cache nor the CLI fallback can resolve a
405
+ # worktree, so the hook must exit 0 with empty stdout (no updatedInput).
406
+ ISO=$(mktemp -d)
407
+ OUTPUT=$( (cd "$ISO" && env -u CLAUDE_PROJECT_DIR bash "$INJECT_HOOK_ABS" <<< '{"tool_input":{"task_id":"x"}}') 2>/dev/null )
408
+ EXIT_CODE=$?
409
+ if [ "$EXIT_CODE" = "0" ] && [ -z "$OUTPUT" ]; then
410
+ test_result "cbp-mcp-caller-worktree-inject.sh fail-open (unresolvable) exits 0 + empty stdout" "passed" "passed"
411
+ else
412
+ test_result "cbp-mcp-caller-worktree-inject.sh fail-open (unresolvable) exits 0 + empty stdout" "passed" "failed (exit=$EXIT_CODE)"
413
+ fi
414
+ rm -rf "$ISO"
415
+
416
+ # C6 — input already carries a non-empty caller_worktree_id → never overwrite;
417
+ # early-return with exit 0 and empty stdout (no resolution attempted).
418
+ OUTPUT=$(echo '{"tool_input":{"caller_worktree_id":"11111111-1111-1111-1111-111111111111"}}' | bash "$INJECT_HOOK" 2>/dev/null)
419
+ EXIT_CODE=$?
420
+ if [ "$EXIT_CODE" = "0" ] && [ -z "$OUTPUT" ]; then
421
+ test_result "cbp-mcp-caller-worktree-inject.sh C6 keeps existing caller_worktree_id (exit 0 + empty stdout)" "passed" "passed"
422
+ else
423
+ test_result "cbp-mcp-caller-worktree-inject.sh C6 keeps existing caller_worktree_id (exit 0 + empty stdout)" "passed" "failed (exit=$EXIT_CODE)"
424
+ fi
425
+
426
+ # Injection — a worktree.local.json whose .branch matches the current git branch
427
+ # makes the cache fast-path resolve. Use a synthetic UUID so the assertion proves
428
+ # the cache value (not the live CLI) was injected. Skipped when no concrete git
429
+ # branch resolves (detached HEAD / non-git checkout) or jq is unavailable.
430
+ CUR_BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null)
431
+ if [ -n "$CUR_BRANCH" ] && [ "$CUR_BRANCH" != "HEAD" ] && command -v jq >/dev/null 2>&1; then
432
+ ISO=$(mktemp -d)
433
+ mkdir -p "$ISO/.codebyplan"
434
+ FAKE_WT="abcdef01-2345-6789-abcd-ef0123456789"
435
+ jq -n --arg b "$CUR_BRANCH" --arg w "$FAKE_WT" \
436
+ '{worktree_id:$w, branch:$b}' > "$ISO/.codebyplan/worktree.local.json"
437
+ OUTPUT=$(CLAUDE_PROJECT_DIR="$ISO" bash "$INJECT_HOOK" <<< '{"tool_input":{"task_id":"x"}}' 2>/dev/null)
438
+ EXIT_CODE=$?
439
+ INJECTED=$(echo "$OUTPUT" | jq -r '.hookSpecificOutput.updatedInput.caller_worktree_id // empty' 2>/dev/null)
440
+ # Sibling-key survival — CC's updatedInput REPLACES tool_input wholesale (it is
441
+ # not a partial merge), so the hook must echo back every original field merged
442
+ # with caller_worktree_id. Assert the non-target sibling key (task_id) survives;
443
+ # this is the assertion gap that let the replace-vs-merge bug ship in round 2.
444
+ PRESERVED=$(echo "$OUTPUT" | jq -r '.hookSpecificOutput.updatedInput.task_id // empty' 2>/dev/null)
445
+ if [ "$EXIT_CODE" = "0" ] && [ "$INJECTED" = "$FAKE_WT" ] && [ "$PRESERVED" = "x" ]; then
446
+ test_result "cbp-mcp-caller-worktree-inject.sh injects caller_worktree_id AND preserves sibling keys" "passed" "passed"
447
+ else
448
+ test_result "cbp-mcp-caller-worktree-inject.sh injects caller_worktree_id AND preserves sibling keys" "passed" "failed (exit=$EXIT_CODE injected=$INJECTED preserved=$PRESERVED)"
449
+ fi
450
+ rm -rf "$ISO"
451
+ else
452
+ test_result "cbp-mcp-caller-worktree-inject.sh injection test (no branch resolvable — skipped)" "passed" "passed"
453
+ fi
454
+ fi
455
+
456
+ echo ""
457
+
377
458
  # ===== HOOK SMOKE TESTS — cbp-session-start-hook =====
378
459
  echo "## Hook Smoke Tests — cbp-session-start-hook (CHK-178)"
379
460
 
@@ -533,6 +614,196 @@ else
533
614
 
534
615
  fi
535
616
 
617
+ # ===== HOOK SMOKE TESTS — cbp-session-id-stamp (CHK-231) =====
618
+ echo "## Hook Smoke Tests — cbp-session-id-stamp (CHK-231)"
619
+
620
+ STAMP_HOOK="$HOOKS_DIR/cbp-session-id-stamp.sh"
621
+
622
+ if [ ! -f "$STAMP_HOOK" ]; then
623
+ test_result "cbp-session-id-stamp.sh present" "passed" "missing"
624
+ else
625
+ test_result "cbp-session-id-stamp.sh present" "passed" "passed"
626
+
627
+ FIRST_LINE=$(head -1 "$STAMP_HOOK")
628
+ if echo "$FIRST_LINE" | grep -q '^#!/'; then
629
+ test_result "cbp-session-id-stamp.sh has shebang" "passed" "passed"
630
+ else
631
+ test_result "cbp-session-id-stamp.sh has shebang" "passed" "missing"
632
+ fi
633
+
634
+ if grep -q '@scope: org-shared' "$STAMP_HOOK"; then
635
+ test_result "cbp-session-id-stamp.sh has @scope: org-shared" "passed" "passed"
636
+ else
637
+ test_result "cbp-session-id-stamp.sh has @scope: org-shared" "passed" "missing"
638
+ fi
639
+
640
+ if bash -n "$STAMP_HOOK" 2>/dev/null; then
641
+ test_result "cbp-session-id-stamp.sh bash -n syntax ok" "passed" "passed"
642
+ else
643
+ test_result "cbp-session-id-stamp.sh bash -n syntax ok" "passed" "failed"
644
+ fi
645
+
646
+ # Case 1: session_id present in payload → session-id.json written with that UUID
647
+ ISO=$(mktemp -d)
648
+ mkdir -p "$ISO/.codebyplan"
649
+ printf '{}' > "$ISO/.codebyplan/repo.json"
650
+ FAKE_UUID="aaaabbbb-cccc-dddd-eeee-ffffffffffff"
651
+ PAYLOAD=$(jq -n --arg s "$FAKE_UUID" '{session_id:$s}')
652
+ EXIT_CODE=$(echo "$PAYLOAD" | CLAUDE_PROJECT_DIR="$ISO" bash "$STAMP_HOOK" >/dev/null 2>&1; echo $?)
653
+ STAMPED_ID=$(jq -r '.claude_session_id // empty' "$ISO/.codebyplan/state/session/session-id.json" 2>/dev/null)
654
+ if [ "$EXIT_CODE" = "0" ] && [ "$STAMPED_ID" = "$FAKE_UUID" ]; then
655
+ test_result "cbp-session-id-stamp.sh session_id present → session-id.json written with UUID" "passed" "passed"
656
+ else
657
+ test_result "cbp-session-id-stamp.sh session_id present → session-id.json written with UUID" "passed" "failed (exit=$EXIT_CODE stamped=$STAMPED_ID)"
658
+ fi
659
+ rm -rf "$ISO"
660
+
661
+ # Case 2: no session_id but transcript_path present (basename is a valid UUID) →
662
+ # UUID derived from transcript basename and written to session-id.json.
663
+ ISO=$(mktemp -d)
664
+ mkdir -p "$ISO/.codebyplan"
665
+ printf '{}' > "$ISO/.codebyplan/repo.json"
666
+ TRANSCRIPT_UUID="11112222-3333-4444-5555-666677778888"
667
+ PAYLOAD=$(jq -n --arg t "/tmp/sessions/${TRANSCRIPT_UUID}.jsonl" '{transcript_path:$t}')
668
+ EXIT_CODE=$(echo "$PAYLOAD" | CLAUDE_PROJECT_DIR="$ISO" bash "$STAMP_HOOK" >/dev/null 2>&1; echo $?)
669
+ STAMPED_ID=$(jq -r '.claude_session_id // empty' "$ISO/.codebyplan/state/session/session-id.json" 2>/dev/null)
670
+ if [ "$EXIT_CODE" = "0" ] && [ "$STAMPED_ID" = "$TRANSCRIPT_UUID" ]; then
671
+ test_result "cbp-session-id-stamp.sh transcript_path fallback → UUID derived from basename" "passed" "passed"
672
+ else
673
+ test_result "cbp-session-id-stamp.sh transcript_path fallback → UUID derived from basename" "passed" "failed (exit=$EXIT_CODE stamped=$STAMPED_ID)"
674
+ fi
675
+ rm -rf "$ISO"
676
+
677
+ # Case 3: neither session_id nor transcript_path → no-op (no file written), exit 0
678
+ ISO=$(mktemp -d)
679
+ mkdir -p "$ISO/.codebyplan"
680
+ printf '{}' > "$ISO/.codebyplan/repo.json"
681
+ PAYLOAD='{"tool_name":"some_tool"}'
682
+ EXIT_CODE=$(echo "$PAYLOAD" | CLAUDE_PROJECT_DIR="$ISO" bash "$STAMP_HOOK" >/dev/null 2>&1; echo $?)
683
+ STAMP_FILE="$ISO/.codebyplan/state/session/session-id.json"
684
+ if [ "$EXIT_CODE" = "0" ] && [ ! -f "$STAMP_FILE" ]; then
685
+ test_result "cbp-session-id-stamp.sh neither present → no-op, exit 0, no file written" "passed" "passed"
686
+ else
687
+ test_result "cbp-session-id-stamp.sh neither present → no-op, exit 0, no file written" "passed" "failed (exit=$EXIT_CODE file_exists=$([ -f "$STAMP_FILE" ] && echo yes || echo no))"
688
+ fi
689
+ rm -rf "$ISO"
690
+
691
+ # Case 4: outside a codebyplan repo (no .codebyplan/repo.json) → no-op, exit 0
692
+ ISO=$(mktemp -d)
693
+ FAKE_UUID="aaaabbbb-cccc-dddd-eeee-ffffffffffff"
694
+ PAYLOAD=$(jq -n --arg s "$FAKE_UUID" '{session_id:$s}')
695
+ EXIT_CODE=$(echo "$PAYLOAD" | CLAUDE_PROJECT_DIR="$ISO" bash "$STAMP_HOOK" >/dev/null 2>&1; echo $?)
696
+ STAMP_FILE="$ISO/.codebyplan/state/session/session-id.json"
697
+ if [ "$EXIT_CODE" = "0" ] && [ ! -f "$STAMP_FILE" ]; then
698
+ test_result "cbp-session-id-stamp.sh no repo.json → no-op, exit 0" "passed" "passed"
699
+ else
700
+ test_result "cbp-session-id-stamp.sh no repo.json → no-op, exit 0" "passed" "failed (exit=$EXIT_CODE file_exists=$([ -f "$STAMP_FILE" ] && echo yes || echo no))"
701
+ fi
702
+ rm -rf "$ISO"
703
+
704
+ # Case 5: session_id present but not a canonical UUID → guard rejects, no file written, exit 0
705
+ ISO=$(mktemp -d)
706
+ mkdir -p "$ISO/.codebyplan"
707
+ printf '{}' > "$ISO/.codebyplan/repo.json"
708
+ PAYLOAD=$(jq -n '{session_id:"not-a-valid-uuid"}')
709
+ EXIT_CODE=$(echo "$PAYLOAD" | CLAUDE_PROJECT_DIR="$ISO" bash "$STAMP_HOOK" >/dev/null 2>&1; echo $?)
710
+ STAMP_FILE="$ISO/.codebyplan/state/session/session-id.json"
711
+ if [ "$EXIT_CODE" = "0" ] && [ ! -f "$STAMP_FILE" ]; then
712
+ test_result "cbp-session-id-stamp.sh invalid UUID → guard rejects, no file written" "passed" "passed"
713
+ else
714
+ test_result "cbp-session-id-stamp.sh invalid UUID → guard rejects, no file written" "passed" "failed (exit=$EXIT_CODE file_exists=$([ -f "$STAMP_FILE" ] && echo yes || echo no))"
715
+ fi
716
+ rm -rf "$ISO"
717
+
718
+ fi
719
+
720
+ echo ""
721
+
722
+ echo ""
723
+
724
+ # ===== HOOK SMOKE TESTS — cbp-skill-context-guard model-aware tier (CHK-224) =====
725
+ echo "## Hook Smoke Tests — cbp-skill-context-guard model-aware tier (CHK-224)"
726
+
727
+ FIXTURES_GUARD_MODEL="$HOOKS_DIR/__test-fixtures__/cbp-skill-context-guard"
728
+
729
+ # Case M1: standard model @201K → deny (standard tier, 200K threshold)
730
+ if [ -f "$FIXTURES_GUARD_MODEL/over-threshold-standard.jsonl" ]; then
731
+ STDIN=$(jq -n \
732
+ --arg t "$FIXTURES_GUARD_MODEL/over-threshold-standard.jsonl" \
733
+ --arg s "cbp-round-build" \
734
+ '{transcript_path:$t,tool_input:{skill:$s}}')
735
+ OUTPUT=$(echo "$STDIN" | CBP_CONTEXT_WARN_TOKENS=200000 bash "$GUARD_HOOK" 2>/dev/null)
736
+ EXIT_CODE=$?
737
+ if [ "$EXIT_CODE" = "0" ] \
738
+ && echo "$OUTPUT" | jq -e '.hookSpecificOutput.permissionDecision == "deny"' >/dev/null 2>&1; then
739
+ test_result "cbp-skill-context-guard.sh standard model @201K → deny (standard tier)" "passed" "passed"
740
+ else
741
+ test_result "cbp-skill-context-guard.sh standard model @201K → deny (standard tier)" "passed" "failed (exit=$EXIT_CODE output=$(echo "$OUTPUT" | head -c 80))"
742
+ fi
743
+ fi
744
+
745
+ # Case M2: [1m] model @201K → empty stdout (allowed; 201K is below 800K 1M-tier threshold)
746
+ if [ -f "$FIXTURES_GUARD_MODEL/over-threshold-1m.jsonl" ]; then
747
+ STDIN=$(jq -n \
748
+ --arg t "$FIXTURES_GUARD_MODEL/over-threshold-1m.jsonl" \
749
+ --arg s "cbp-round-build" \
750
+ '{transcript_path:$t,tool_input:{skill:$s}}')
751
+ OUTPUT=$(echo "$STDIN" | CBP_CONTEXT_WARN_TOKENS_1M=800000 bash "$GUARD_HOOK" 2>/dev/null)
752
+ EXIT_CODE=$?
753
+ if [ "$EXIT_CODE" = "0" ] && [ -z "$OUTPUT" ]; then
754
+ test_result "cbp-skill-context-guard.sh [1m] model @201K → empty stdout (under 800K 1M tier)" "passed" "passed"
755
+ else
756
+ test_result "cbp-skill-context-guard.sh [1m] model @201K → empty stdout (under 800K 1M tier)" "passed" "failed (exit=$EXIT_CODE output=$(echo "$OUTPUT" | head -c 80))"
757
+ fi
758
+ fi
759
+
760
+ # Case M3: [1m] model @850K → deny (1M tier, 800K threshold)
761
+ if [ -f "$FIXTURES_GUARD_MODEL/over-threshold-1m-denied.jsonl" ]; then
762
+ STDIN=$(jq -n \
763
+ --arg t "$FIXTURES_GUARD_MODEL/over-threshold-1m-denied.jsonl" \
764
+ --arg s "cbp-round-build" \
765
+ '{transcript_path:$t,tool_input:{skill:$s}}')
766
+ OUTPUT=$(echo "$STDIN" | CBP_CONTEXT_WARN_TOKENS_1M=800000 bash "$GUARD_HOOK" 2>/dev/null)
767
+ EXIT_CODE=$?
768
+ if [ "$EXIT_CODE" = "0" ] \
769
+ && echo "$OUTPUT" | jq -e '.hookSpecificOutput.permissionDecision == "deny"' >/dev/null 2>&1; then
770
+ test_result "cbp-skill-context-guard.sh [1m] model @850K → deny (1M tier)" "passed" "passed"
771
+ else
772
+ test_result "cbp-skill-context-guard.sh [1m] model @850K → deny (1M tier)" "passed" "failed (exit=$EXIT_CODE output=$(echo "$OUTPUT" | head -c 80))"
773
+ fi
774
+ fi
775
+
776
+ # Case M4: env override CBP_CONTEXT_WARN_TOKENS=150000 + standard model @201K → deny
777
+ if [ -f "$FIXTURES_GUARD_MODEL/over-threshold-standard.jsonl" ]; then
778
+ STDIN=$(jq -n \
779
+ --arg t "$FIXTURES_GUARD_MODEL/over-threshold-standard.jsonl" \
780
+ --arg s "cbp-round-build" \
781
+ '{transcript_path:$t,tool_input:{skill:$s}}')
782
+ OUTPUT=$(echo "$STDIN" | CBP_CONTEXT_WARN_TOKENS=150000 bash "$GUARD_HOOK" 2>/dev/null)
783
+ EXIT_CODE=$?
784
+ if [ "$EXIT_CODE" = "0" ] \
785
+ && echo "$OUTPUT" | jq -e '.hookSpecificOutput.permissionDecision == "deny"' >/dev/null 2>&1; then
786
+ test_result "cbp-skill-context-guard.sh env CBP_CONTEXT_WARN_TOKENS=150000 + standard @201K → deny" "passed" "passed"
787
+ else
788
+ test_result "cbp-skill-context-guard.sh env CBP_CONTEXT_WARN_TOKENS=150000 + standard @201K → deny" "passed" "failed (exit=$EXIT_CODE output=$(echo "$OUTPUT" | head -c 80))"
789
+ fi
790
+ fi
791
+
792
+ # Case M5: env override CBP_CONTEXT_WARN_TOKENS_1M=900000 + [1m] model @850K → empty stdout (allowed)
793
+ if [ -f "$FIXTURES_GUARD_MODEL/over-threshold-1m-denied.jsonl" ]; then
794
+ STDIN=$(jq -n \
795
+ --arg t "$FIXTURES_GUARD_MODEL/over-threshold-1m-denied.jsonl" \
796
+ --arg s "cbp-round-build" \
797
+ '{transcript_path:$t,tool_input:{skill:$s}}')
798
+ OUTPUT=$(echo "$STDIN" | CBP_CONTEXT_WARN_TOKENS_1M=900000 bash "$GUARD_HOOK" 2>/dev/null)
799
+ EXIT_CODE=$?
800
+ if [ "$EXIT_CODE" = "0" ] && [ -z "$OUTPUT" ]; then
801
+ test_result "cbp-skill-context-guard.sh env CBP_CONTEXT_WARN_TOKENS_1M=900000 + [1m] @850K → empty stdout (override keeps allowed)" "passed" "passed"
802
+ else
803
+ test_result "cbp-skill-context-guard.sh env CBP_CONTEXT_WARN_TOKENS_1M=900000 + [1m] @850K → empty stdout (override keeps allowed)" "passed" "failed (exit=$EXIT_CODE output=$(echo "$OUTPUT" | head -c 80))"
804
+ fi
805
+ fi
806
+
536
807
  # ===== STRUCTURAL ASSERTIONS — cbp-clear-* skills (CHK-217) =====
537
808
  echo ""
538
809
  echo "## Structural Assertions — cbp-clear-* skills (CHK-217)"
@@ -1,3 +1,9 @@
1
+ ---
2
+ paths:
3
+ - "apps/**"
4
+ - "packages/**"
5
+ ---
6
+
1
7
  # Architecture Map
2
8
 
3
9
  When `.claude/architecture/` exists in this repo, per-module map files may be present.
@@ -25,13 +25,11 @@ SHARED tooling behavior only — repo-specific gotchas belong in that repo's own
25
25
  clobbers existing `decisions` / `discoveries` / `check_results`. Always read the current row,
26
26
  merge your change into the full object/array, then write the whole thing back.
27
27
 
28
- - **User-level locks are invisible until a mutation they block.** `get_checkpoints` /
29
- `get_tasks` succeed even when another user holds the assignment; the 403 fires only on
30
- `update_*` / `complete_*`. The lock keys on the JWT user (`ctx.userId`) vs the row's
31
- `assigned_user_id` (null = open). `caller_worktree_id` / `worktree_id` params are
32
- accepted-and-ignored — do not thread them. Verify `assigned_user_id` matches
33
- `npx codebyplan whoami` before mutating; recover a stale assignment with
34
- `release_assignment` (maintainer).
28
+ - **`resolve-worktree` empty output = a NULL `(device, path, branch)` tuple, not a broken
29
+ resolver.** When identity is unresolved the server can collapse the caller to the repo's main
30
+ worktree, so feat-locked writes get rejected. Pass `caller_worktree_id` on every MCP mutation,
31
+ and confirm ownership by matching the row's repo path + branch to the current directory before
32
+ mutating.
35
33
 
36
34
  - **Full-repo lint/type baselines are often pre-existing red.** A round must gate on the files
37
35
  it changed, not the whole-repo baseline — scope lint/tsc checks to the round's changed set so a
@@ -40,14 +38,30 @@ SHARED tooling behavior only — repo-specific gotchas belong in that repo's own
40
38
  - **`complete_task` checks file approval on the round's `files_changed`, not the task's.**
41
39
  Reconcile approvals via `update_round` (set each entry `user_approved: true`), not
42
40
  `update_task` alone — updating only the task leaves the round entries unapproved and
43
- `complete_task` rejects with "files are not approved".
41
+ `complete_task` rejects with "files are not approved". After a merge-main pulls in foreign
42
+ files or carried directory-slash round artifacts, `complete_task` can hard-fail "N files not
43
+ approved"; fix by re-writing each affected round's `files_changed` via `update_round`.
44
44
 
45
- - **CLI transport uses REST (reads) and OAuth+MCP (writes) — a 502 from `codebyplan round sync-approvals` is transient MCP churn, not an outage.** The CLI exits 0 with a warning and MCP tools still work. A missing `CODEBYPLAN_API_KEY` surfaces as an `ApiError`, not a 502. `sync-approvals` can also drag untracked per-device dirs into `files_changed` — run it from the repo root.
45
+ - **CLI transport uses REST (reads) and OAuth+MCP (writes) — a 502 from `codebyplan round sync-approvals` is transient MCP churn, not an outage.** The CLI exits 0 with a warning and MCP tools still work. A missing `CODEBYPLAN_API_KEY` surfaces as an `ApiError`, not a 502. `sync-approvals` can also drag untracked per-device dirs into `files_changed` — run it from the repo root or pass `--caller-worktree-id`.
46
46
 
47
47
  - **`codebyplan claude update` requires a TTY.** On non-TTY stdin (CI, piped) it half-applies then errors. Re-run with `--yes` to accept defaults non-interactively.
48
48
 
49
+ - **Checkpoint locks are invisible until a mutation they block.** `get_checkpoints` / `get_tasks` succeed even when another worktree holds the lock; the 403 fires only on `update_*` / `complete_*`. Verify the row's `worktree_id` matches the caller before mutating. A null-`worktree_id` checkpoint can still be actively shipped by whichever worktree physically holds its feat branch — check `git worktree list` first.
50
+
51
+ - **`update_task` accepts `caller_worktree_id` for lock-verify only — it does NOT assign ownership.** Ownership assignment goes through the web UI or the dedicated assignment path. Don't conflate `caller_worktree_id` with `assigned_worktree_id`.
52
+
49
53
  - **Re-run config-driven gates after merging main into a feat branch.** A merge can add or change `.codebyplan/shipment.json`, ports, branch config, `e2e.json`, and `eslint.json` — treat the post-merge state as a fresh baseline before continuing.
50
54
 
55
+ - **MCP write calls can return 403 via Cloudflare WAF when the JSONB payload contains DDL
56
+ keywords (table-drop statements, function-drop statements, or raw query clauses).** Paraphrase
57
+ or base64-encode any user-provided SQL content before embedding it in a JSONB field — never
58
+ pass raw DDL verbatim in MCP context payloads.
59
+
60
+ - **A "Merge origin/main" commit can integrate ZERO commits if it merged a stale local ref.**
61
+ Always `git fetch` and re-check the behind-count before treating a merge as complete — a
62
+ clean merge with no new commits is a sign the local ref was stale, not that the branch was
63
+ already up to date.
64
+
51
65
  ## Behavioral Preferences
52
66
 
53
67
  - **Never `git stash`** — for any reason. To inspect or compare other state use
@@ -1,3 +1,8 @@
1
+ ---
2
+ paths:
3
+ - "apps/**"
4
+ ---
5
+
1
6
  # E2E Mandatory Run Contract
2
7
 
3
8
  E2E is **opt-out, not opt-in**. Whenever a framework configured in `.codebyplan/e2e.json`
@@ -0,0 +1,65 @@
1
+ # Handoff File Convention
2
+
3
+ Per-level handoff notes for cross-session context. Files live under
4
+ `.codebyplan/handoff/` and are committed (negation entry `!.codebyplan/handoff/`
5
+ in the managed `.gitignore` block).
6
+
7
+ ## Layout
8
+
9
+ | Level | Path template | Number format |
10
+ |------------|--------------------------------------------------|-------------------|
11
+ | repo | `.codebyplan/handoff/repo.md` | n/a (per-section) |
12
+ | checkpoint | `.codebyplan/handoff/checkpoint/<NNN>.md` | 3-digit zero-pad |
13
+ | task | `.codebyplan/handoff/task/<NNN>-<T>.md` | CHK zero-pad + T |
14
+ | standalone | `.codebyplan/handoff/standalone/<N>.md` | bare number |
15
+
16
+ ## repo.md — per-worktree sections
17
+
18
+ `repo.md` uses `## <worktree-label>` sections so multiple worktrees can write
19
+ to the same file without conflict. Label resolution order:
20
+
21
+ 1. Read `.codebyplan/state/worktrees.json`; find entry whose `branch` matches
22
+ the current git branch; use its `name`.
23
+ 2. Fallback: git branch name with `/` replaced by `-`.
24
+
25
+ Each CLI verb (`write`, `append`, `clear`) operates on the current worktree's
26
+ section only. Merge conflicts are structurally impossible: two worktrees write
27
+ to different `##` sections of the same file.
28
+
29
+ ## Empty = absent / whitespace-only
30
+
31
+ A handoff is considered **empty** when the file is absent OR its content is
32
+ whitespace-only. The CLI deletes the file when it becomes empty:
33
+
34
+ - Non-repo levels: `write --content ""` and `clear` both delete the file.
35
+ - repo level: `write --content ""` and `clear` remove the current worktree's
36
+ `##` section; the file is deleted when no non-empty sections remain.
37
+
38
+ All delete operations swallow ENOENT (idempotent).
39
+
40
+ ## CLI verbs
41
+
42
+ ```
43
+ codebyplan handoff read --level <l> [--number N] [--task T]
44
+ codebyplan handoff write --level <l> [--number N] [--task T] --content "..."
45
+ codebyplan handoff append --level <l> [--number N] [--task T] --content "..."
46
+ codebyplan handoff clear --level <l> [--number N] [--task T]
47
+ codebyplan handoff status [--json]
48
+ ```
49
+
50
+ ## status --json shape (stable contract)
51
+
52
+ ```json
53
+ {
54
+ "nonEmpty": [{ "level": "checkpoint", "identifier": "005", "path": "/abs/path" }],
55
+ "empty": []
56
+ }
57
+ ```
58
+
59
+ `status` is consumed by TASK-3 session-start/end gates to decide whether a
60
+ handoff prompt is shown. TASK-4 web UI reads the same files via the API.
61
+
62
+ ## Content format
63
+
64
+ Freeform markdown. No structured schema is enforced; the files are human-written
65
+ notes surfaced to Claude at session boundaries.
@@ -32,7 +32,7 @@ not via `disable-model-invocation`:
32
32
  under what condition, so the intent is auditable and overridable.
33
33
 
34
34
  See `.claude/skills/cbp-build-cc-settings/reference/cbp-permission-policy.md` for the full
35
- `allow` vs `ask` split and the auto-trigger + 200K context-guard model.
35
+ `allow` vs `ask` split and the auto-trigger + model-aware context-guard (200K standard / 800K on `[1m]` sessions).
36
36
 
37
37
  ## Related
38
38
 
@@ -1,3 +1,8 @@
1
+ ---
2
+ paths:
3
+ - ".claude/**"
4
+ ---
5
+
1
6
  # Scope Vocabulary
2
7
 
3
8
  Canonical scope-marker enum for `.claude/` files. The marker classifies a structural file's distribution scope — but it is **required only on user-created files**, not on the assets the `codebyplan` package distributes.
@@ -45,7 +45,7 @@ After task completion, routes use single-directive form (never A/B/C menus):
45
45
 
46
46
  **Checkpoint-bound task complete:**
47
47
  - More tasks in checkpoint → auto-triggers next task (same context)
48
- - Last task in checkpoint → auto-triggers `cbp-checkpoint-check` (ask-tier permission prompt is the human gate; the 200K context guard handles oversized contexts)
48
+ - Last task in checkpoint → auto-triggers `cbp-checkpoint-check` (ask-tier permission prompt is the human gate; the model-aware context guard handles oversized contexts — 200K standard, 800K on `[1m]` sessions)
49
49
 
50
50
  **Standalone task complete:**
51
51
  - Always → `Next: /cbp-session-end` (or `/cbp-standalone-task-create` for new work)