claude-dev-env 1.48.0 → 1.49.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (39) hide show
  1. package/audit-rubrics/category_rubrics/category-a-api-contracts.md +72 -0
  2. package/audit-rubrics/category_rubrics/category-b-selector-engine-compat.md +36 -0
  3. package/audit-rubrics/category_rubrics/category-c-resource-cleanup.md +35 -0
  4. package/audit-rubrics/category_rubrics/category-d-scoping-and-ordering.md +35 -0
  5. package/audit-rubrics/category_rubrics/category-e-dead-code.md +38 -0
  6. package/audit-rubrics/category_rubrics/category-f-silent-failures.md +38 -0
  7. package/audit-rubrics/category_rubrics/category-g-bounds-and-overflow.md +38 -0
  8. package/audit-rubrics/category_rubrics/category-h-security-boundaries.md +40 -0
  9. package/audit-rubrics/category_rubrics/category-i-concurrency.md +38 -0
  10. package/audit-rubrics/category_rubrics/category-j-code-rules-compliance.md +46 -0
  11. package/audit-rubrics/category_rubrics/category-k-codebase-conflicts.md +59 -0
  12. package/audit-rubrics/category_rubrics/category-l-behavior-equivalence.md +45 -0
  13. package/audit-rubrics/category_rubrics/category-m-producer-consumer-cardinality.md +44 -0
  14. package/audit-rubrics/category_rubrics/category-n-test-name-scenario-verifier.md +45 -0
  15. package/audit-rubrics/prompts/category-a-api-contracts.md +384 -0
  16. package/audit-rubrics/prompts/category-b-selector-engine-compat.md +401 -0
  17. package/audit-rubrics/prompts/category-c-resource-cleanup.md +420 -0
  18. package/audit-rubrics/prompts/category-d-scoping-and-ordering.md +414 -0
  19. package/audit-rubrics/prompts/category-e-dead-code.md +420 -0
  20. package/audit-rubrics/prompts/category-f-silent-failures.md +420 -0
  21. package/audit-rubrics/prompts/category-g-bounds-and-overflow.md +383 -0
  22. package/audit-rubrics/prompts/category-h-security-boundaries.md +423 -0
  23. package/audit-rubrics/prompts/category-i-concurrency.md +429 -0
  24. package/audit-rubrics/prompts/category-j-code-rules-compliance.md +463 -0
  25. package/audit-rubrics/prompts/category-k-codebase-conflicts.md +328 -0
  26. package/audit-rubrics/prompts/category-l-behavior-equivalence.md +128 -0
  27. package/audit-rubrics/prompts/category-m-producer-consumer-cardinality.md +129 -0
  28. package/audit-rubrics/prompts/category-n-test-name-scenario-verifier.md +132 -0
  29. package/audit-rubrics/source-material-section-types.md +51 -0
  30. package/hooks/blocking/plain_language_blocker.py +184 -0
  31. package/hooks/blocking/pr_description_enforcer.py +21 -1
  32. package/hooks/blocking/test_plain_language_blocker.py +247 -0
  33. package/hooks/blocking/test_pr_description_enforcer.py +68 -0
  34. package/hooks/hooks.json +15 -0
  35. package/hooks/hooks_constants/plain_language_blocker_constants.py +295 -0
  36. package/hooks/hooks_constants/pr_description_enforcer_constants.py +4 -0
  37. package/package.json +2 -1
  38. package/rules/plain-language.md +2 -0
  39. package/skills/bugteam/reference/teardown-publish-permissions.md +7 -2
@@ -0,0 +1,132 @@
1
+ Audit [REPO/ARTIFACT] [TARGET_ID] for **Category N only** (test-name scenario verifier). Skip A–M. Sub-bucket forced-exhaustion mode: Category N is decomposed into 9 sub-buckets below. Each sub-bucket REQUIRES at least one Shape A finding OR exactly one Shape B proof-of-absence with **at least 3 adversarial probes** specific to that sub-bucket. A sub-bucket returning neither is a protocol gap.
2
+
3
+ [ARTIFACT METADATA — include every changed test alongside the production code path it claims to cover]
4
+
5
+ - Title / one-line summary: [TITLE]
6
+ - Head ref / SHA at audit time: [HEAD_SHA]
7
+ - Changed test functions (file + line range + test name + first-line assertion): [CHANGED_TESTS]
8
+ - Production functions the tests claim to cover (file + line range + symbol name + branch structure): [PRODUCTION_TARGETS]
9
+ - Scenario fixtures / monkeypatches in scope (`monkeypatch.setattr`, `pytest.mark.skipif`, `freezegun.freeze_time`, `mock.patch`): [SCENARIO_GATES]
10
+ - Stated intent of each scenario-named test (what condition the test name claims to exercise): [INTENT]
11
+
12
+ ID prefix: `find`.
13
+
14
+ [ONE-PARAGRAPH FRAME: enumerate every test whose name includes a scenario claim (`_when_*`, `_at_*`, `_under_*`, `_with_*`, `_on_*`, `_after_*`, `_during_*`). State the audit goal: for each scenario-named test, verify the body sets up the named condition via fixture / monkeypatch / environment gate so the production code's scenario-named branch actually runs during the act phase.]
15
+
16
+ ## Source material ([N] files/sections, all lines in scope)
17
+
18
+ [INLINE every changed test function alongside the production function it claims to cover. Include the production function's branch structure so the audit can identify the no-op / early-return / default branches that scenario-named tests must NOT silently pass against.]
19
+
20
+ ## Sub-buckets (each requires Shape A finding OR Shape B with ≥3 adversarial probes)
21
+
22
+ **N1. Scenario-named tests demonstrate the scenario** ⭐ canonical N case
23
+ - For every test whose name contains `_when_X` / `_at_X` / `_under_X` / `_with_X` / `_on_X` / `_after_X` / `_during_X`, verify the body sets up condition X via fixture, monkeypatch, or environment gate before calling the system under test.
24
+ - Adversarial probes: (a) construct an input that satisfies the test's assertion but does NOT trigger the scenario-named code path — does the test still pass; (b) trace the production function's code path under the test's input — which branch executes during the act phase; (c) inspect the test's setup-phase for monkeypatch / fixture calls that gate the scenario.
25
+
26
+ **N2. Path-decision parametric matrices**
27
+ - For tests of `is_*_path` / `_resolve_*_path` / `*_path_exemptions` modules, verify the test corpus ships a parametric matrix covering: empty string, single filename, tilde-prefix, UNC path, drive-letter path, symlinked path, `..`-containing path, trailing-slash path.
28
+ - Adversarial probes: (a) walk the production function's path-classification branches — which branch does each input class hit; (b) check the test corpus for input shapes that hit only the default / no-classification branch; (c) for each input class missing from the matrix, construct a probe input and trace which branch executes.
29
+
30
+ **N3. Tests that pass "for the wrong reason"**
31
+ - For every assertion of the shape `assert <substring> in result`, verify the substring shape is unique to the scenario-named branch's output.
32
+ - Adversarial probe: walk the production function's branches; for each branch, build the output and test the substring against it. If the substring matches more than one branch's output, the assertion cannot discriminate which branch ran.
33
+
34
+ **N4. No-op branch exercised by scenario name**
35
+ - For every scenario-named test, identify the production function's no-op / early-return / no-feature-installed branch. Verify the test's constructed input does NOT hit that branch.
36
+ - Adversarial probes: (a) any test whose input fails the production function's first guard returns the no-op default and the assertion checks the default; (b) any test whose input is empty / None / missing returns early; (c) any test whose fixture is not installed at the test runtime hits the "feature missing" branch.
37
+
38
+ **N5. Assertion shape mismatch**
39
+ - For every assertion, verify the assertion's shape can fail by construction. `assert <substring> not in result` where the substring is misspelled relative to the production output, or `assert result == ""` when the production function returns `None` on the negative case, or `len(result) > 0` when the production function returns an empty list on the no-feature path.
40
+ - Adversarial probes: (a) inspect each assertion's shape against the production function's actual return-value space; (b) check for assertions where the substring shape never appears in the production output by construction; (c) check for `assert x is True` where the production function returns truthy non-bool values.
41
+
42
+ **N6. Cross-platform scenario gating**
43
+ - For every test named `_on_windows` / `_on_linux` / `_on_macos`, verify the body gates on `sys.platform`, `monkeypatch.setattr(os, "name", ...)`, or `@pytest.mark.skipif`.
44
+ - Bare scenario names that run unchanged across platforms claim more than they prove.
45
+ - Adversarial probes: (a) does the production function's platform-specific branch get skipped on the CI runner's actual platform; (b) does the test pass against the platform fallback rather than the platform-specific code; (c) is the platform fixture installed and respected by the test runner.
46
+
47
+ **N7. Time / clock scenario gating**
48
+ - For every test named `_after_<duration>` / `_at_midnight` / `_during_business_hours`, verify the body injects a frozen clock (`freezegun.freeze_time`, `monkeypatch.setattr(time, "time", ...)`, `unittest.mock.patch("datetime.now")`).
49
+ - Wall-clock tests are non-deterministic and may pass against the wrong scenario.
50
+ - Adversarial probes: (a) does the test's act phase depend on the system clock being at a specific value; (b) does any timezone shift cause the test to flake; (c) does the production function read the clock during the act phase.
51
+
52
+ **N8. Concurrent / load scenario gating**
53
+ - For every test named `_under_load` / `_with_concurrent_writers` / `_under_contention`, verify the body spawns the concurrent workers and `wait()`s on them.
54
+ - Single-threaded tests cannot claim concurrent-scenario coverage.
55
+ - Adversarial probes: (a) does the test spawn `threading.Thread` / `multiprocessing.Process` / `asyncio.gather` / `concurrent.futures.ThreadPoolExecutor`; (b) does the test's act phase exercise the concurrency primitive the production function relies on; (c) does the test introduce a race window the production function's lock should serialize.
56
+
57
+ **N9. Neutral-named tests (out of scope)**
58
+ - Tests named `test_returns_empty_list_for_unknown_key` / `test_handles_y` / `test_raises_value_error` (no scenario claim in the name) are NOT subject to N1–N8.
59
+ - For neutral-named tests, only N5 (assertion shape mismatch) applies.
60
+
61
+ ## Cross-bucket questions to answer at the end
62
+
63
+ Q1: Across all 9 sub-buckets, is there a scenario-named test that does not exercise the named scenario? Cite the test's file:line and the production function's scenario-named branch that should have been exercised.
64
+
65
+ Q2: What's the worst false-coverage signal introduced by the diff? Evaluate by (a) whether the test's name is load-bearing in the suite's coverage report, (b) whether the named scenario has any other coverage; (c) whether removing the test would change the coverage percentage.
66
+
67
+ Q3: Which scenario-named test most likely will start passing for the wrong reason in a future refactor? Identify tests whose assertions match substrings that could appear in multiple branches — these are time bombs.
68
+
69
+ ## Output
70
+
71
+ Lead: `Total: N (P0=N, P1=N, P2=N)`. For each sub-bucket N1-N9, produce Shape A or Shape B (with ≥3 probes). Each Shape A finding must cite the test's file:line AND the production function's branch the test's name claims to cover. Cross-bucket Q1-Q3 answers after the per-sub-bucket walk. Adversarial second pass: "assume your first pass missed at least 3 scenario-named tests that exercise the no-op branch — find them." Open Questions section for ambiguities. Read-only. No edits, no commits.
72
+
73
+ ---
74
+
75
+ # Worked example: jl-cmd/claude-code-config PR #476
76
+
77
+ Audit jl-cmd/claude-code-config PR #476 for **Category N only** (test-name scenario verifier). Skip A–M. Sub-bucket forced-exhaustion mode: Category N is decomposed into 9 sub-buckets below.
78
+
79
+ PR: refactor(hooks): cross-platform path resolution for windows-rmtree-blocker
80
+ Head SHA: (the commit that landed the platform-conditional logic)
81
+ ID prefix: `find`.
82
+
83
+ The PR adds platform-conditional path-resolution logic to `windows_rmtree_blocker.py` and ships 5 new tests named `test_*_on_windows` and `test_*_on_linux` across `test_windows_rmtree_blocker.py`. The audit goal: verify each scenario-named test sets up the named platform via monkeypatch or skipif gate so the production function's platform-specific branch actually runs during the act phase.
84
+
85
+ ## Sub-buckets (each requires Shape A finding OR Shape B with ≥3 adversarial probes)
86
+
87
+ **N1. Scenario-named tests demonstrate the scenario** ⭐ canonical N case — Shape A findings F5, F21, F23, F26, F27
88
+ - `test_resolves_path_on_windows` calls `windows_rmtree_blocker.resolve_path("C:/Users/test")` and asserts the result equals `Path("C:/Users/test")`. The body does NOT call `monkeypatch.setattr(sys, "platform", "win32")` or `@pytest.mark.skipif(sys.platform != "win32")`. On a Linux CI runner, `sys.platform == "linux"` is in effect when the test runs; the production function's `if sys.platform == "win32":` branch is skipped, and the assertion succeeds against the Linux fallback branch's output (which happens to match `Path("C:/Users/test")` because `pathlib.PurePath` accepts Windows-style strings on Linux without normalization).
89
+ - The test's NAME claims Windows-branch coverage; the test's BODY exercises the Linux fallback. This is the canonical N1 finding shape.
90
+ - Adversarial probe (a): construct an input that the Windows branch would handle differently from the Linux branch — does the test catch the divergence? In F5's case, no: the assertion uses a string that both branches happen to produce, so the test cannot discriminate.
91
+ - Adversarial probe (b): the production function's `sys.platform == "win32"` branch performs UNC-prefix stripping; the Linux fallback does not. Inputs containing `\\?\` would yield different outputs on the two branches. The test does not use such inputs.
92
+ - Adversarial probe (c): the test runtime's `sys.platform` is `"linux"` on the CI runner. The act phase hits the fallback, full stop.
93
+ - **Severity P1** for each of F5, F21, F23, F26, F27: scenario-named tests claim platform-specific coverage they do not provide.
94
+ - **Fix**: wrap each `_on_windows`-named test in `@pytest.mark.skipif(sys.platform != "win32", reason="windows-specific path resolution")` AND duplicate as `_on_linux` for the Linux fallback branch; OR use `monkeypatch.setattr(sys, "platform", "win32")` to force the named platform during the act phase.
95
+
96
+ **N2. Path-decision parametric matrices**
97
+ - The production function `resolve_path` is a path-classifier — it qualifies for N2 coverage. The PR ships 5 inputs: drive-letter, UNC-prefix, tilde-prefix, `..`-containing, and trailing-slash. Missing: empty string, single filename, symlinked path. These three input classes have no test in the diff.
98
+ - Adversarial probes: (a) construct an empty-string input — does any branch handle it; (b) construct a single-filename input (no directory component) — does the function return as-is or attempt to resolve against cwd; (c) construct a symlinked path — does the function resolve through the symlink or preserve it.
99
+
100
+ **N3. Tests that pass "for the wrong reason"**
101
+ - See N1 findings F5, F21, F23, F26, F27 — each passes because the assertion's substring matches both the Windows-branch output and the Linux-fallback output. The assertion shape cannot discriminate which branch ran.
102
+
103
+ **N4. No-op branch exercised by scenario name**
104
+ - F5 finding above: the scenario-named test exercises the Linux-fallback no-op branch on the CI runner.
105
+
106
+ **N5. Assertion shape mismatch**
107
+ - All five tests use `assert result == Path(<expected>)`. The shape can fail by construction (Path equality is strict). N5 verified clean.
108
+
109
+ **N6. Cross-platform scenario gating** ⭐
110
+ - Five `_on_windows`-named tests have zero platform gating. Five `_on_linux`-named tests have zero platform gating. N6 is the structural lens on the N1 findings — every test's NAME claims platform coverage, every test's BODY ignores the platform gate.
111
+ - See N1 F5 / F21 / F23 / F26 / F27.
112
+
113
+ **N7. Time / clock scenario gating**
114
+ - No time-named tests in scope. N7 verified clean.
115
+
116
+ **N8. Concurrent / load scenario gating**
117
+ - No concurrency-named tests in scope. N8 verified clean.
118
+
119
+ **N9. Neutral-named tests (out of scope)**
120
+ - One test in the diff is neutrally named (`test_returns_path_unchanged_when_already_absolute`). N9 marks it out of scope for N1-N4 / N6-N8; only N5 applies. The assertion is `assert result == input_path` — shape clean. Verified clean.
121
+
122
+ ## Cross-bucket questions to answer at the end
123
+
124
+ Q1: Five scenario-named tests (F5, F21, F23, F26, F27) do not gate on `sys.platform` and pass against the Linux-fallback branch on the CI runner. The Windows-specific code path has zero actual coverage despite the test names claiming it. Cite `test_windows_rmtree_blocker.py:42` (F5 first test) and `windows_rmtree_blocker.py:67` (the `if sys.platform == "win32":` branch) as the misclaim pair.
125
+
126
+ Q2: Worst false-coverage signal: F5 — the test's name `test_resolves_path_on_windows` reads as Windows-branch coverage in the PR review, but the act phase exercises the Linux fallback. A reviewer reading the test name during PR review would assume Windows coverage exists; it does not.
127
+
128
+ Q3: Once the Windows branch and the Linux branch diverge in their output for the same input — for example, a future PR that adds normalization to the Windows branch only — these five tests will start failing on Windows CI, exposing the false coverage retroactively.
129
+
130
+ ## Output
131
+
132
+ Lead: `Total: 5 (P0=0, P1=5, P2=0)`. F5, F21, F23, F26, F27 are the N1+N6 scenario-gate-missing findings. N2 has one finding (parametric matrix incomplete) at P2. N3 / N4 are subsumed by N1. N5 / N7 / N8 / N9 verified clean. Adversarial second pass: scan for any non-`_on_<platform>`-named test that exercises the platform-conditional branch — verified none in this diff. Open Questions: whether the PR author intended any of the `_on_<platform>` tests to be platform-gated; resolve via reply on the audit thread. Read-only. No edits, no commits.
@@ -0,0 +1,51 @@
1
+ # What "section" means in the source-material block
2
+
3
+ Audit prompt templates ask you to inline the artifact under audit, broken into "sections." A section is **the natural chunk you'd quote and reference back to when reporting a finding.** The right chunk size depends on what you're auditing.
4
+
5
+ ## Lookup table
6
+
7
+ | If you're auditing… | A "section" is… | What you put in the code fence |
8
+ |---|---|---|
9
+ | A code PR | One file in the diff | Filename as header, full file content |
10
+ | A long Python module by itself | One function or class | Function name as header, just that function's body |
11
+ | A design doc / RFC | One named heading (e.g. "## Authentication") | The heading + all paragraphs under it |
12
+ | An essay or article | One section break or chapter | Section title + the paragraphs |
13
+ | A contract or terms-of-service | One clause | Clause number + clause text |
14
+ | A meeting transcript | One topic or speaker block | Topic name + the dialogue |
15
+ | An email thread | One message | Sender + timestamp + message body |
16
+ | A spreadsheet | One sheet or one logical table | Sheet name + the rows |
17
+ | A SQL schema | One table definition | Table name + the CREATE TABLE statement |
18
+ | A config file | One stanza | Stanza name + the keys/values |
19
+ | A test suite | One test file | Filename + all the test functions |
20
+
21
+ ## Picking the right size
22
+
23
+ The rule: **pick the chunk size that lets the agent cite a finding with `[section name]:[line/paragraph N]` and have the user know exactly where to look.**
24
+
25
+ - **Too small** (one sentence per section): the agent runs out of context per chunk and findings can't reference cross-chunk patterns.
26
+ - **Too big** (the whole document as one section): the agent can't anchor findings to a specific spot, and the `failure_mode` text becomes vague.
27
+ - **Sweet spot in the May 2026 audit experiment on PR #394**: 4 files, 11–102 lines each. Each finding cited `<filename>:<line>` and was easy to verify. Results were better than the same audit run with the diff fetched on demand instead of inlined.
28
+
29
+ ## Header format inside the source-material block
30
+
31
+ Use one `###` header per section so the agent can reference each one by name:
32
+
33
+ ````
34
+ ## Source material (4 files, all lines in scope)
35
+
36
+ ### packages/foo/bar.py
37
+ ```python
38
+ [content]
39
+ ```
40
+
41
+ ### packages/foo/baz.py
42
+ ```python
43
+ [content]
44
+ ```
45
+ ````
46
+
47
+ The header text becomes the anchor the agent quotes back when reporting findings — keep it stable, unambiguous, and copy-pasteable into a citation.
48
+
49
+ ## When the artifact has no natural section breaks
50
+
51
+ If you're auditing something monolithic (a single long function, a contract with no clauses, a stream of dialogue), impose your own breaks at logical hinge points and label them: `### lines 1–40 (parameter parsing)`, `### lines 41–120 (main loop)`, `### lines 121–200 (cleanup)`. Don't hand the agent a wall of text — without anchors, findings degrade to "somewhere in this file."
@@ -0,0 +1,184 @@
1
+ #!/usr/bin/env python3
2
+ """PreToolUse hook that blocks heavy words in AskUserQuestion prose and .md writes.
3
+
4
+ Reaches for the everyday word over the formal one: `use` over `utilize`,
5
+ `start` over `initiate`, `enough` over `sufficient`. Two surfaces are guarded --
6
+ AskUserQuestion (its question and option prose) and Write/Edit/MultiEdit targeting a .md
7
+ file. Code fences, inline code, blockquotes, URLs, and file paths are stripped
8
+ before matching so exact identifiers and paths are never flagged.
9
+
10
+ See the plain-language rule for the full guidance this hook enforces.
11
+ """
12
+
13
+ import json
14
+ import sys
15
+ from pathlib import Path
16
+ from typing import TextIO
17
+
18
+ _hooks_dir = str(Path(__file__).resolve().parent.parent)
19
+ if _hooks_dir not in sys.path:
20
+ sys.path.insert(0, _hooks_dir)
21
+
22
+ from hooks_constants.plain_language_blocker_constants import ( # noqa: E402
23
+ ALL_SOFTWARE_TERMS,
24
+ ALL_TERM_PATTERNS,
25
+ ALL_WRITE_EDIT_TOOL_NAMES,
26
+ ASK_USER_QUESTION_TOOL_NAME,
27
+ BLOCKQUOTE_LINE_PATTERN,
28
+ FENCED_CODE_BLOCK_PATTERN,
29
+ FILE_PATH_PATTERN,
30
+ INLINE_CODE_PATTERN,
31
+ MARKDOWN_EXTENSION,
32
+ URL_PATTERN,
33
+ USER_FACING_PLAIN_LANGUAGE_NOTICE,
34
+ )
35
+
36
+
37
+ def strip_non_prose_regions(text: str) -> str:
38
+ """Return text with code, quotes, URLs, and file paths removed.
39
+
40
+ These regions carry exact identifiers and references that plain language
41
+ leaves untouched, so they must not contribute matches.
42
+ """
43
+ without_fences = FENCED_CODE_BLOCK_PATTERN.sub("", text)
44
+ without_inline_code = INLINE_CODE_PATTERN.sub("", without_fences)
45
+ without_blockquotes = BLOCKQUOTE_LINE_PATTERN.sub("", without_inline_code)
46
+ without_urls = URL_PATTERN.sub("", without_blockquotes)
47
+ without_paths = FILE_PATH_PATTERN.sub("", without_urls)
48
+ return without_paths
49
+
50
+
51
+ def find_banned_terms(text: str) -> list[tuple[str, str]]:
52
+ """Return each (matched term, suggested replacement) found in the prose.
53
+
54
+ Each term appears at most once, in first-seen order. Matching is
55
+ case-insensitive and respects word boundaries; multi-word phrases match as
56
+ whole units. Terms in the software-term allowlist are exempt and never
57
+ flagged.
58
+ """
59
+ prose_text = strip_non_prose_regions(text)
60
+ all_matches: list[tuple[str, str]] = []
61
+ seen_terms: set[str] = set()
62
+ for each_pattern, each_replacement in ALL_TERM_PATTERNS:
63
+ first_match = each_pattern.search(prose_text)
64
+ if first_match is None:
65
+ continue
66
+ normalized_term = first_match.group(0).lower()
67
+ if normalized_term in seen_terms:
68
+ continue
69
+ if normalized_term in ALL_SOFTWARE_TERMS:
70
+ continue
71
+ seen_terms.add(normalized_term)
72
+ all_matches.append((normalized_term, each_replacement))
73
+ return all_matches
74
+
75
+
76
+ def build_block_reason(all_matches: list[tuple[str, str]]) -> str:
77
+ """Return a deny reason naming each flagged term and its plain replacement."""
78
+ swap_phrases = ", ".join(
79
+ f'use "{each_replacement}" instead of "{each_term}"'
80
+ for each_term, each_replacement in all_matches
81
+ )
82
+ return (
83
+ "BLOCKED: [PLAIN_LANGUAGE] Heavy words detected -- "
84
+ f"{swap_phrases}. Reach for the everyday word the reader understands "
85
+ "on the first pass."
86
+ )
87
+
88
+
89
+ def _collect_ask_user_question_prose(tool_input: dict) -> str:
90
+ all_questions = tool_input.get("questions", [])
91
+ if not isinstance(all_questions, list):
92
+ return ""
93
+ prose_segments: list[str] = []
94
+ for each_question in all_questions:
95
+ if not isinstance(each_question, dict):
96
+ continue
97
+ question_text = each_question.get("question", "")
98
+ if isinstance(question_text, str):
99
+ prose_segments.append(question_text)
100
+ all_options = each_question.get("options", [])
101
+ if isinstance(all_options, list):
102
+ for each_option in all_options:
103
+ if isinstance(each_option, dict):
104
+ option_label = each_option.get("label", "")
105
+ if isinstance(option_label, str):
106
+ prose_segments.append(option_label)
107
+ option_description = each_option.get("description", "")
108
+ if isinstance(option_description, str):
109
+ prose_segments.append(option_description)
110
+ return "\n".join(prose_segments)
111
+
112
+
113
+ def _collect_write_edit_markdown_prose(tool_name: str, tool_input: dict) -> str:
114
+ file_path = tool_input.get("file_path", "")
115
+ if not isinstance(file_path, str) or not file_path.lower().endswith(MARKDOWN_EXTENSION):
116
+ return ""
117
+ if tool_name == "Write":
118
+ content = tool_input.get("content", "")
119
+ return content if isinstance(content, str) else ""
120
+ if tool_name == "Edit":
121
+ new_string = tool_input.get("new_string", "")
122
+ return new_string if isinstance(new_string, str) else ""
123
+ all_edits = tool_input.get("edits", [])
124
+ if not isinstance(all_edits, list):
125
+ return ""
126
+ prose_segments: list[str] = []
127
+ for each_edit in all_edits:
128
+ if isinstance(each_edit, dict):
129
+ new_string = each_edit.get("new_string", "")
130
+ if isinstance(new_string, str):
131
+ prose_segments.append(new_string)
132
+ return "\n".join(prose_segments)
133
+
134
+
135
+ def _collect_prose_for_tool(tool_name: str, tool_input: dict) -> str:
136
+ if tool_name == ASK_USER_QUESTION_TOOL_NAME:
137
+ return _collect_ask_user_question_prose(tool_input)
138
+ if tool_name in ALL_WRITE_EDIT_TOOL_NAMES:
139
+ return _collect_write_edit_markdown_prose(tool_name, tool_input)
140
+ return ""
141
+
142
+
143
+ def _emit_deny(all_matches: list[tuple[str, str]], output_stream: TextIO) -> None:
144
+ deny_payload = {
145
+ "hookSpecificOutput": {
146
+ "hookEventName": "PreToolUse",
147
+ "permissionDecision": "deny",
148
+ "permissionDecisionReason": build_block_reason(all_matches),
149
+ },
150
+ "systemMessage": USER_FACING_PLAIN_LANGUAGE_NOTICE,
151
+ "suppressOutput": True,
152
+ }
153
+ output_stream.write(json.dumps(deny_payload))
154
+ output_stream.flush()
155
+
156
+
157
+ def main() -> None:
158
+ try:
159
+ input_data = json.load(sys.stdin)
160
+ except json.JSONDecodeError:
161
+ sys.exit(0)
162
+
163
+ if not isinstance(input_data, dict):
164
+ sys.exit(0)
165
+
166
+ tool_name = input_data.get("tool_name", "")
167
+ tool_input = input_data.get("tool_input", {})
168
+ if not isinstance(tool_name, str) or not isinstance(tool_input, dict):
169
+ sys.exit(0)
170
+
171
+ prose_text = _collect_prose_for_tool(tool_name, tool_input)
172
+ if not prose_text:
173
+ sys.exit(0)
174
+
175
+ all_matches = find_banned_terms(prose_text)
176
+ if not all_matches:
177
+ sys.exit(0)
178
+
179
+ _emit_deny(all_matches, sys.stdout)
180
+ sys.exit(0)
181
+
182
+
183
+ if __name__ == "__main__":
184
+ main()
@@ -31,6 +31,7 @@ from hooks_constants.pr_description_enforcer_constants import ( # noqa: E402
31
31
  ALL_HEAVY_TESTING_HEADERS,
32
32
  ALL_READABILITY_CLI_FLAG_TOKENS,
33
33
  ATOMIC_WRITE_TEMP_SUFFIX,
34
+ BLOCKQUOTE_LINE_PATTERN,
34
35
  BLOCKQUOTE_MARKER_PATTERN,
35
36
  BOLD_PAIR_PATTERN,
36
37
  BULLET_MARKER_PATTERN,
@@ -63,6 +64,7 @@ from hooks_constants.pr_description_enforcer_constants import ( # noqa: E402
63
64
  SELF_CLOSING_REFERENCE_MESSAGE_SUFFIX,
64
65
  SELF_REFERENCE_PATTERN_TEMPLATE,
65
66
  STANDARD_SHAPE,
67
+ TABLE_ROW_LINE_PATTERN,
66
68
  THIS_PR_OPENING_PATTERN,
67
69
  TRIVIAL_BODY_CHAR_THRESHOLD,
68
70
  TRIVIAL_SHAPE,
@@ -350,6 +352,23 @@ def _count_substantive_prose_chars(body: str) -> int:
350
352
  return len(body_collapsed)
351
353
 
352
354
 
355
+ def _extract_vague_scan_text(body: str) -> str:
356
+ """Return the prose to scan for vague language, with non-prose regions removed.
357
+
358
+ Drops whole blockquote lines and whole pipe-delimited table rows, then strips
359
+ the same Markdown ceremony as the prose-count path -- which removes fenced
360
+ code, inline code, and whole heading lines. This exempts vague phrases that
361
+ appear only inside code fences, inline code, Markdown headings, quoted
362
+ reviewer text, or pipe-delimited example tables -- those are not the author's
363
+ own prose. A pipe-delimited row carries at least two pipes; a line with a
364
+ single leading pipe, or a borderless table row with no leading pipe, stays in
365
+ scope.
366
+ """
367
+ without_blockquote_lines = BLOCKQUOTE_LINE_PATTERN.sub("", body)
368
+ without_table_rows = TABLE_ROW_LINE_PATTERN.sub("", without_blockquote_lines)
369
+ return _strip_markdown_ceremony(without_table_rows)
370
+
371
+
353
372
  def _iter_section_headers(body: str) -> list[str]:
354
373
  """Return every ATX heading line in the body, preserving canonical form.
355
374
 
@@ -813,7 +832,8 @@ def validate_pr_body(body: str, pr_number: int | None = None) -> list[str]:
813
832
  "(Adds, Fixes, Updates, Removes, Tightens, Ports)"
814
833
  )
815
834
 
816
- vague_matches = VAGUE_LANGUAGE_PATTERN.findall(body)
835
+ vague_scan_text = _extract_vague_scan_text(body)
836
+ vague_matches = VAGUE_LANGUAGE_PATTERN.findall(vague_scan_text)
817
837
  if vague_matches:
818
838
  violations.append(
819
839
  f"Vague language detected: {', '.join(vague_matches)} -- "
@@ -0,0 +1,247 @@
1
+ """Tests for the plain_language_blocker PreToolUse hook.
2
+
3
+ Covers the shared prose scanner (fenced code, inline code, blockquotes, URLs,
4
+ file paths), the word-boundary guard, multi-word phrase matching, case
5
+ insensitivity, the term -> replacement block message, and both registered
6
+ PreToolUse surfaces (AskUserQuestion and Write|Edit on .md targets).
7
+ """
8
+
9
+ import importlib.util
10
+ import json
11
+ import os
12
+ import subprocess
13
+ import sys
14
+ from pathlib import Path
15
+
16
+ HOOK_SCRIPT_PATH = Path(__file__).parent / "plain_language_blocker.py"
17
+ _HOOKS_DIR = str(Path(__file__).resolve().parent)
18
+ _HOOKS_ROOT = str(Path(__file__).resolve().parent.parent)
19
+ if _HOOKS_DIR not in sys.path:
20
+ sys.path.insert(0, _HOOKS_DIR)
21
+ if _HOOKS_ROOT not in sys.path:
22
+ sys.path.insert(0, _HOOKS_ROOT)
23
+
24
+
25
+ def _load_hook_module() -> object:
26
+ module_spec = importlib.util.spec_from_file_location(
27
+ "plain_language_blocker_under_test", HOOK_SCRIPT_PATH
28
+ )
29
+ assert module_spec is not None and module_spec.loader is not None
30
+ loaded_module = importlib.util.module_from_spec(module_spec)
31
+ module_spec.loader.exec_module(loaded_module)
32
+ return loaded_module
33
+
34
+
35
+ hook_module = _load_hook_module()
36
+ find_banned_terms = hook_module.find_banned_terms
37
+ strip_non_prose_regions = hook_module.strip_non_prose_regions
38
+ build_block_reason = hook_module.build_block_reason
39
+
40
+
41
+ def _run_hook_with_payload(payload: dict) -> subprocess.CompletedProcess[str]:
42
+ return subprocess.run(
43
+ [sys.executable, str(HOOK_SCRIPT_PATH)],
44
+ input=json.dumps(payload),
45
+ capture_output=True,
46
+ text=True,
47
+ check=False,
48
+ )
49
+
50
+
51
+ def _decision_from(completed: subprocess.CompletedProcess[str]) -> str | None:
52
+ if not completed.stdout:
53
+ return None
54
+ parsed = json.loads(completed.stdout)
55
+ return parsed.get("hookSpecificOutput", {}).get("permissionDecision")
56
+
57
+
58
+ def test_canonical_hook_script_exists_at_expected_path() -> None:
59
+ assert HOOK_SCRIPT_PATH.is_file()
60
+
61
+
62
+ def test_bare_prose_banned_term_is_detected() -> None:
63
+ matched = find_banned_terms("We initiate the worker pool at boot.")
64
+ assert any(each_term == "initiate" for each_term, _replacement in matched)
65
+
66
+
67
+ def test_banned_term_inside_fenced_code_is_exempt() -> None:
68
+ prose = "Start the pool at boot.\n\n```python\nutilize(pool)\n```\n"
69
+ assert find_banned_terms(prose) == []
70
+
71
+
72
+ def test_banned_term_inside_inline_code_is_exempt() -> None:
73
+ prose = "Call the `utilize` helper from the legacy module to migrate."
74
+ assert find_banned_terms(prose) == []
75
+
76
+
77
+ def test_banned_term_inside_blockquote_is_exempt() -> None:
78
+ prose = "> The old guide said to utilize the pool.\n\nUse the pool directly now."
79
+ assert find_banned_terms(prose) == []
80
+
81
+
82
+ def test_banned_term_inside_url_is_exempt() -> None:
83
+ prose = "See https://example.com/initiate-flow for the original write-up."
84
+ assert find_banned_terms(prose) == []
85
+
86
+
87
+ def test_banned_term_inside_file_path_is_exempt() -> None:
88
+ prose = "Edit src/utilize_helpers/initiate.py to wire the new path."
89
+ assert find_banned_terms(prose) == []
90
+
91
+
92
+ def test_word_boundary_guard_does_not_match_substring() -> None:
93
+ assert find_banned_terms("The reinitialize routine reruns the seed.") == []
94
+
95
+
96
+ def test_case_insensitive_match() -> None:
97
+ matched_lower = find_banned_terms("utilize the cache.")
98
+ matched_upper = find_banned_terms("Utilize the cache.")
99
+ assert any(term == "utilize" for term, _ in matched_lower)
100
+ assert any(term == "utilize" for term, _ in matched_upper)
101
+
102
+
103
+ def test_multi_word_phrase_matches_as_unit() -> None:
104
+ matched = find_banned_terms("Run the migration prior to the deploy step.")
105
+ assert any(term == "prior to" for term, _ in matched)
106
+
107
+
108
+ def test_strip_non_prose_regions_removes_code_and_paths() -> None:
109
+ prose = "Use `utilize` and src/initiate.py and https://x.test/utilize here."
110
+ stripped = strip_non_prose_regions(prose)
111
+ assert "utilize" not in stripped
112
+ assert "initiate" not in stripped
113
+
114
+
115
+ def test_block_reason_names_term_and_replacement() -> None:
116
+ reason = build_block_reason([("initiate", "start")])
117
+ assert "initiate" in reason
118
+ assert "start" in reason
119
+
120
+
121
+ def test_ask_user_question_with_banned_term_is_denied() -> None:
122
+ payload = {
123
+ "tool_name": "AskUserQuestion",
124
+ "tool_input": {
125
+ "questions": [
126
+ {
127
+ "question": "Should we utilize the new allocator now?",
128
+ "header": "Allocator",
129
+ "options": [{"label": "Yes", "description": "Switch now."}],
130
+ }
131
+ ]
132
+ },
133
+ }
134
+ completed = _run_hook_with_payload(payload)
135
+ assert _decision_from(completed) == "deny"
136
+
137
+
138
+ def test_ask_user_question_banned_term_in_option_label_is_denied() -> None:
139
+ payload = {
140
+ "tool_name": "AskUserQuestion",
141
+ "tool_input": {
142
+ "questions": [
143
+ {
144
+ "question": "Which path should we take?",
145
+ "header": "Path",
146
+ "options": [{"label": "Utilize the cache", "description": "Go fast."}],
147
+ }
148
+ ]
149
+ },
150
+ }
151
+ completed = _run_hook_with_payload(payload)
152
+ assert _decision_from(completed) == "deny"
153
+
154
+
155
+ def test_clean_ask_user_question_passes_through() -> None:
156
+ payload = {
157
+ "tool_name": "AskUserQuestion",
158
+ "tool_input": {
159
+ "questions": [
160
+ {
161
+ "question": "Should we switch the allocator now?",
162
+ "header": "Allocator",
163
+ "options": [{"label": "Yes", "description": "Switch now."}],
164
+ }
165
+ ]
166
+ },
167
+ }
168
+ completed = _run_hook_with_payload(payload)
169
+ assert _decision_from(completed) is None
170
+
171
+
172
+ def test_write_markdown_with_banned_term_is_denied(tmp_path: Path) -> None:
173
+ target = tmp_path / "notes.md"
174
+ payload = {
175
+ "tool_name": "Write",
176
+ "tool_input": {
177
+ "file_path": str(target),
178
+ "content": "This guide explains how to utilize the new cache layer.",
179
+ },
180
+ }
181
+ completed = _run_hook_with_payload(payload)
182
+ assert _decision_from(completed) == "deny"
183
+
184
+
185
+ def test_write_non_markdown_is_ignored(tmp_path: Path) -> None:
186
+ target = tmp_path / "notes.txt"
187
+ payload = {
188
+ "tool_name": "Write",
189
+ "tool_input": {
190
+ "file_path": str(target),
191
+ "content": "This guide explains how to utilize the new cache layer.",
192
+ },
193
+ }
194
+ completed = _run_hook_with_payload(payload)
195
+ assert _decision_from(completed) is None
196
+
197
+
198
+ def test_edit_markdown_clean_content_passes_through(tmp_path: Path) -> None:
199
+ target = tmp_path / "notes.md"
200
+ payload = {
201
+ "tool_name": "Edit",
202
+ "tool_input": {
203
+ "file_path": str(target),
204
+ "new_string": "This guide explains how to use the new cache layer.",
205
+ },
206
+ }
207
+ completed = _run_hook_with_payload(payload)
208
+ assert _decision_from(completed) is None
209
+
210
+
211
+ def test_multiedit_markdown_with_banned_term_is_denied(tmp_path: Path) -> None:
212
+ target = tmp_path / "notes.md"
213
+ payload = {
214
+ "tool_name": "MultiEdit",
215
+ "tool_input": {
216
+ "file_path": str(target),
217
+ "edits": [
218
+ {"old_string": "intro", "new_string": "This section reads cleanly."},
219
+ {"old_string": "body", "new_string": "Then we utilize the new cache."},
220
+ ],
221
+ },
222
+ }
223
+ completed = _run_hook_with_payload(payload)
224
+ assert _decision_from(completed) == "deny"
225
+
226
+
227
+ def test_other_tool_is_ignored() -> None:
228
+ payload = {"tool_name": "Bash", "tool_input": {"command": "echo utilize"}}
229
+ completed = _run_hook_with_payload(payload)
230
+ assert _decision_from(completed) is None
231
+
232
+
233
+ def test_software_allowlisted_term_is_not_flagged() -> None:
234
+ assert find_banned_terms("Run this command to start the worker.") == []
235
+
236
+
237
+ def test_non_allowlisted_formal_term_still_flagged() -> None:
238
+ matched = find_banned_terms("Please utilize the cache now.")
239
+ assert any(term == "utilize" for term, _ in matched)
240
+
241
+
242
+ def test_prose_slash_token_is_not_stripped_as_path() -> None:
243
+ assert "client/server" in strip_non_prose_regions("Use a client/server split here.")
244
+
245
+
246
+ def test_real_file_path_is_still_stripped() -> None:
247
+ assert "initiate" not in strip_non_prose_regions("Edit src/initiate.py to wire it.")