npm - claude-dev-env - Versions diffs - 1.48.0 → 1.49.1 - Mend

claude-dev-env 1.48.0 → 1.49.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (39) hide show

package/audit-rubrics/prompts/category-n-test-name-scenario-verifier.md ADDED Viewed

@@ -0,0 +1,132 @@
+Audit [REPO/ARTIFACT] [TARGET_ID] for **Category N only** (test-name scenario verifier). Skip A–M. Sub-bucket forced-exhaustion mode: Category N is decomposed into 9 sub-buckets below. Each sub-bucket REQUIRES at least one Shape A finding OR exactly one Shape B proof-of-absence with **at least 3 adversarial probes** specific to that sub-bucket. A sub-bucket returning neither is a protocol gap.
+[ARTIFACT METADATA — include every changed test alongside the production code path it claims to cover]
+- Title / one-line summary: [TITLE]
+- Head ref / SHA at audit time: [HEAD_SHA]
+- Changed test functions (file + line range + test name + first-line assertion): [CHANGED_TESTS]
+- Production functions the tests claim to cover (file + line range + symbol name + branch structure): [PRODUCTION_TARGETS]
+- Scenario fixtures / monkeypatches in scope (`monkeypatch.setattr`, `pytest.mark.skipif`, `freezegun.freeze_time`, `mock.patch`): [SCENARIO_GATES]
+- Stated intent of each scenario-named test (what condition the test name claims to exercise): [INTENT]
+ID prefix: `find`.
+[ONE-PARAGRAPH FRAME: enumerate every test whose name includes a scenario claim (`_when_*`, `_at_*`, `_under_*`, `_with_*`, `_on_*`, `_after_*`, `_during_*`). State the audit goal: for each scenario-named test, verify the body sets up the named condition via fixture / monkeypatch / environment gate so the production code's scenario-named branch actually runs during the act phase.]
+## Source material ([N] files/sections, all lines in scope)
+[INLINE every changed test function alongside the production function it claims to cover. Include the production function's branch structure so the audit can identify the no-op / early-return / default branches that scenario-named tests must NOT silently pass against.]
+## Sub-buckets (each requires Shape A finding OR Shape B with ≥3 adversarial probes)
+**N1. Scenario-named tests demonstrate the scenario** ⭐ canonical N case
+- For every test whose name contains `_when_X` / `_at_X` / `_under_X` / `_with_X` / `_on_X` / `_after_X` / `_during_X`, verify the body sets up condition X via fixture, monkeypatch, or environment gate before calling the system under test.
+- Adversarial probes: (a) construct an input that satisfies the test's assertion but does NOT trigger the scenario-named code path — does the test still pass; (b) trace the production function's code path under the test's input — which branch executes during the act phase; (c) inspect the test's setup-phase for monkeypatch / fixture calls that gate the scenario.
+**N2. Path-decision parametric matrices**
+- For tests of `is_*_path` / `_resolve_*_path` / `*_path_exemptions` modules, verify the test corpus ships a parametric matrix covering: empty string, single filename, tilde-prefix, UNC path, drive-letter path, symlinked path, `..`-containing path, trailing-slash path.
+- Adversarial probes: (a) walk the production function's path-classification branches — which branch does each input class hit; (b) check the test corpus for input shapes that hit only the default / no-classification branch; (c) for each input class missing from the matrix, construct a probe input and trace which branch executes.
+**N3. Tests that pass "for the wrong reason"**
+- For every assertion of the shape `assert <substring> in result`, verify the substring shape is unique to the scenario-named branch's output.
+- Adversarial probe: walk the production function's branches; for each branch, build the output and test the substring against it. If the substring matches more than one branch's output, the assertion cannot discriminate which branch ran.
+**N4. No-op branch exercised by scenario name**
+- For every scenario-named test, identify the production function's no-op / early-return / no-feature-installed branch. Verify the test's constructed input does NOT hit that branch.
+- Adversarial probes: (a) any test whose input fails the production function's first guard returns the no-op default and the assertion checks the default; (b) any test whose input is empty / None / missing returns early; (c) any test whose fixture is not installed at the test runtime hits the "feature missing" branch.
+**N5. Assertion shape mismatch**
+- For every assertion, verify the assertion's shape can fail by construction. `assert <substring> not in result` where the substring is misspelled relative to the production output, or `assert result == ""` when the production function returns `None` on the negative case, or `len(result) > 0` when the production function returns an empty list on the no-feature path.
+- Adversarial probes: (a) inspect each assertion's shape against the production function's actual return-value space; (b) check for assertions where the substring shape never appears in the production output by construction; (c) check for `assert x is True` where the production function returns truthy non-bool values.
+**N6. Cross-platform scenario gating**
+- For every test named `_on_windows` / `_on_linux` / `_on_macos`, verify the body gates on `sys.platform`, `monkeypatch.setattr(os, "name", ...)`, or `@pytest.mark.skipif`.
+- Bare scenario names that run unchanged across platforms claim more than they prove.
+- Adversarial probes: (a) does the production function's platform-specific branch get skipped on the CI runner's actual platform; (b) does the test pass against the platform fallback rather than the platform-specific code; (c) is the platform fixture installed and respected by the test runner.
+**N7. Time / clock scenario gating**
+- For every test named `_after_<duration>` / `_at_midnight` / `_during_business_hours`, verify the body injects a frozen clock (`freezegun.freeze_time`, `monkeypatch.setattr(time, "time", ...)`, `unittest.mock.patch("datetime.now")`).
+- Wall-clock tests are non-deterministic and may pass against the wrong scenario.
+- Adversarial probes: (a) does the test's act phase depend on the system clock being at a specific value; (b) does any timezone shift cause the test to flake; (c) does the production function read the clock during the act phase.
+**N8. Concurrent / load scenario gating**
+- For every test named `_under_load` / `_with_concurrent_writers` / `_under_contention`, verify the body spawns the concurrent workers and `wait()`s on them.
+- Single-threaded tests cannot claim concurrent-scenario coverage.
+- Adversarial probes: (a) does the test spawn `threading.Thread` / `multiprocessing.Process` / `asyncio.gather` / `concurrent.futures.ThreadPoolExecutor`; (b) does the test's act phase exercise the concurrency primitive the production function relies on; (c) does the test introduce a race window the production function's lock should serialize.
+**N9. Neutral-named tests (out of scope)**
+- Tests named `test_returns_empty_list_for_unknown_key` / `test_handles_y` / `test_raises_value_error` (no scenario claim in the name) are NOT subject to N1–N8.
+- For neutral-named tests, only N5 (assertion shape mismatch) applies.
+## Cross-bucket questions to answer at the end
+Q1: Across all 9 sub-buckets, is there a scenario-named test that does not exercise the named scenario? Cite the test's file:line and the production function's scenario-named branch that should have been exercised.
+Q2: What's the worst false-coverage signal introduced by the diff? Evaluate by (a) whether the test's name is load-bearing in the suite's coverage report, (b) whether the named scenario has any other coverage; (c) whether removing the test would change the coverage percentage.
+Q3: Which scenario-named test most likely will start passing for the wrong reason in a future refactor? Identify tests whose assertions match substrings that could appear in multiple branches — these are time bombs.
+## Output
+Lead: `Total: N (P0=N, P1=N, P2=N)`. For each sub-bucket N1-N9, produce Shape A or Shape B (with ≥3 probes). Each Shape A finding must cite the test's file:line AND the production function's branch the test's name claims to cover. Cross-bucket Q1-Q3 answers after the per-sub-bucket walk. Adversarial second pass: "assume your first pass missed at least 3 scenario-named tests that exercise the no-op branch — find them." Open Questions section for ambiguities. Read-only. No edits, no commits.
+---
+# Worked example: jl-cmd/claude-code-config PR #476
+Audit jl-cmd/claude-code-config PR #476 for **Category N only** (test-name scenario verifier). Skip A–M. Sub-bucket forced-exhaustion mode: Category N is decomposed into 9 sub-buckets below.
+PR: refactor(hooks): cross-platform path resolution for windows-rmtree-blocker
+Head SHA: (the commit that landed the platform-conditional logic)
+ID prefix: `find`.
+The PR adds platform-conditional path-resolution logic to `windows_rmtree_blocker.py` and ships 5 new tests named `test_*_on_windows` and `test_*_on_linux` across `test_windows_rmtree_blocker.py`. The audit goal: verify each scenario-named test sets up the named platform via monkeypatch or skipif gate so the production function's platform-specific branch actually runs during the act phase.
+## Sub-buckets (each requires Shape A finding OR Shape B with ≥3 adversarial probes)
+**N1. Scenario-named tests demonstrate the scenario** ⭐ canonical N case — Shape A findings F5, F21, F23, F26, F27
+- `test_resolves_path_on_windows` calls `windows_rmtree_blocker.resolve_path("C:/Users/test")` and asserts the result equals `Path("C:/Users/test")`. The body does NOT call `monkeypatch.setattr(sys, "platform", "win32")` or `@pytest.mark.skipif(sys.platform != "win32")`. On a Linux CI runner, `sys.platform == "linux"` is in effect when the test runs; the production function's `if sys.platform == "win32":` branch is skipped, and the assertion succeeds against the Linux fallback branch's output (which happens to match `Path("C:/Users/test")` because `pathlib.PurePath` accepts Windows-style strings on Linux without normalization).
+- The test's NAME claims Windows-branch coverage; the test's BODY exercises the Linux fallback. This is the canonical N1 finding shape.
+- Adversarial probe (a): construct an input that the Windows branch would handle differently from the Linux branch — does the test catch the divergence? In F5's case, no: the assertion uses a string that both branches happen to produce, so the test cannot discriminate.
+- Adversarial probe (b): the production function's `sys.platform == "win32"` branch performs UNC-prefix stripping; the Linux fallback does not. Inputs containing `\\?\` would yield different outputs on the two branches. The test does not use such inputs.
+- Adversarial probe (c): the test runtime's `sys.platform` is `"linux"` on the CI runner. The act phase hits the fallback, full stop.
+- **Severity P1** for each of F5, F21, F23, F26, F27: scenario-named tests claim platform-specific coverage they do not provide.
+- **Fix**: wrap each `_on_windows`-named test in `@pytest.mark.skipif(sys.platform != "win32", reason="windows-specific path resolution")` AND duplicate as `_on_linux` for the Linux fallback branch; OR use `monkeypatch.setattr(sys, "platform", "win32")` to force the named platform during the act phase.
+**N2. Path-decision parametric matrices**
+- The production function `resolve_path` is a path-classifier — it qualifies for N2 coverage. The PR ships 5 inputs: drive-letter, UNC-prefix, tilde-prefix, `..`-containing, and trailing-slash. Missing: empty string, single filename, symlinked path. These three input classes have no test in the diff.
+- Adversarial probes: (a) construct an empty-string input — does any branch handle it; (b) construct a single-filename input (no directory component) — does the function return as-is or attempt to resolve against cwd; (c) construct a symlinked path — does the function resolve through the symlink or preserve it.
+**N3. Tests that pass "for the wrong reason"**
+- See N1 findings F5, F21, F23, F26, F27 — each passes because the assertion's substring matches both the Windows-branch output and the Linux-fallback output. The assertion shape cannot discriminate which branch ran.
+**N4. No-op branch exercised by scenario name**
+- F5 finding above: the scenario-named test exercises the Linux-fallback no-op branch on the CI runner.
+**N5. Assertion shape mismatch**
+- All five tests use `assert result == Path(<expected>)`. The shape can fail by construction (Path equality is strict). N5 verified clean.
+**N6. Cross-platform scenario gating** ⭐
+- Five `_on_windows`-named tests have zero platform gating. Five `_on_linux`-named tests have zero platform gating. N6 is the structural lens on the N1 findings — every test's NAME claims platform coverage, every test's BODY ignores the platform gate.
+- See N1 F5 / F21 / F23 / F26 / F27.
+**N7. Time / clock scenario gating**
+- No time-named tests in scope. N7 verified clean.
+**N8. Concurrent / load scenario gating**
+- No concurrency-named tests in scope. N8 verified clean.
+**N9. Neutral-named tests (out of scope)**
+- One test in the diff is neutrally named (`test_returns_path_unchanged_when_already_absolute`). N9 marks it out of scope for N1-N4 / N6-N8; only N5 applies. The assertion is `assert result == input_path` — shape clean. Verified clean.
+## Cross-bucket questions to answer at the end
+Q1: Five scenario-named tests (F5, F21, F23, F26, F27) do not gate on `sys.platform` and pass against the Linux-fallback branch on the CI runner. The Windows-specific code path has zero actual coverage despite the test names claiming it. Cite `test_windows_rmtree_blocker.py:42` (F5 first test) and `windows_rmtree_blocker.py:67` (the `if sys.platform == "win32":` branch) as the misclaim pair.
+Q2: Worst false-coverage signal: F5 — the test's name `test_resolves_path_on_windows` reads as Windows-branch coverage in the PR review, but the act phase exercises the Linux fallback. A reviewer reading the test name during PR review would assume Windows coverage exists; it does not.
+Q3: Once the Windows branch and the Linux branch diverge in their output for the same input — for example, a future PR that adds normalization to the Windows branch only — these five tests will start failing on Windows CI, exposing the false coverage retroactively.
+## Output
+Lead: `Total: 5 (P0=0, P1=5, P2=0)`. F5, F21, F23, F26, F27 are the N1+N6 scenario-gate-missing findings. N2 has one finding (parametric matrix incomplete) at P2. N3 / N4 are subsumed by N1. N5 / N7 / N8 / N9 verified clean. Adversarial second pass: scan for any non-`_on_<platform>`-named test that exercises the platform-conditional branch — verified none in this diff. Open Questions: whether the PR author intended any of the `_on_<platform>` tests to be platform-gated; resolve via reply on the audit thread. Read-only. No edits, no commits.

package/audit-rubrics/source-material-section-types.md ADDED Viewed

@@ -0,0 +1,51 @@
+# What "section" means in the source-material block
+Audit prompt templates ask you to inline the artifact under audit, broken into "sections." A section is **the natural chunk you'd quote and reference back to when reporting a finding.** The right chunk size depends on what you're auditing.
+## Lookup table
+| If you're auditing… | A "section" is… | What you put in the code fence |
+|---|---|---|
+| A code PR | One file in the diff | Filename as header, full file content |
+| A long Python module by itself | One function or class | Function name as header, just that function's body |
+| A design doc / RFC | One named heading (e.g. "## Authentication") | The heading + all paragraphs under it |
+| An essay or article | One section break or chapter | Section title + the paragraphs |
+| A contract or terms-of-service | One clause | Clause number + clause text |
+| A meeting transcript | One topic or speaker block | Topic name + the dialogue |
+| An email thread | One message | Sender + timestamp + message body |
+| A spreadsheet | One sheet or one logical table | Sheet name + the rows |
+| A SQL schema | One table definition | Table name + the CREATE TABLE statement |
+| A config file | One stanza | Stanza name + the keys/values |
+| A test suite | One test file | Filename + all the test functions |
+## Picking the right size
+The rule: **pick the chunk size that lets the agent cite a finding with `[section name]:[line/paragraph N]` and have the user know exactly where to look.**
+- **Too small** (one sentence per section): the agent runs out of context per chunk and findings can't reference cross-chunk patterns.
+- **Too big** (the whole document as one section): the agent can't anchor findings to a specific spot, and the `failure_mode` text becomes vague.
+- **Sweet spot in the May 2026 audit experiment on PR #394**: 4 files, 11–102 lines each. Each finding cited `<filename>:<line>` and was easy to verify. Results were better than the same audit run with the diff fetched on demand instead of inlined.
+## Header format inside the source-material block
+Use one `###` header per section so the agent can reference each one by name:
+````
+## Source material (4 files, all lines in scope)
+### packages/foo/bar.py
+```python
+[content]
+```
+### packages/foo/baz.py
+```python
+[content]
+```
+````
+The header text becomes the anchor the agent quotes back when reporting findings — keep it stable, unambiguous, and copy-pasteable into a citation.
+## When the artifact has no natural section breaks
+If you're auditing something monolithic (a single long function, a contract with no clauses, a stream of dialogue), impose your own breaks at logical hinge points and label them: `### lines 1–40 (parameter parsing)`, `### lines 41–120 (main loop)`, `### lines 121–200 (cleanup)`. Don't hand the agent a wall of text — without anchors, findings degrade to "somewhere in this file."

package/hooks/blocking/plain_language_blocker.py ADDED Viewed

@@ -0,0 +1,184 @@
+#!/usr/bin/env python3
+"""PreToolUse hook that blocks heavy words in AskUserQuestion prose and .md writes.
+Reaches for the everyday word over the formal one: `use` over `utilize`,
+`start` over `initiate`, `enough` over `sufficient`. Two surfaces are guarded --
+AskUserQuestion (its question and option prose) and Write/Edit/MultiEdit targeting a .md
+file. Code fences, inline code, blockquotes, URLs, and file paths are stripped
+before matching so exact identifiers and paths are never flagged.
+See the plain-language rule for the full guidance this hook enforces.
+"""
+import json
+import sys
+from pathlib import Path
+from typing import TextIO
+_hooks_dir = str(Path(__file__).resolve().parent.parent)
+if _hooks_dir not in sys.path:
+    sys.path.insert(0, _hooks_dir)
+from hooks_constants.plain_language_blocker_constants import (  # noqa: E402
+    ALL_SOFTWARE_TERMS,
+    ALL_TERM_PATTERNS,
+    ALL_WRITE_EDIT_TOOL_NAMES,
+    ASK_USER_QUESTION_TOOL_NAME,
+    BLOCKQUOTE_LINE_PATTERN,
+    FENCED_CODE_BLOCK_PATTERN,
+    FILE_PATH_PATTERN,
+    INLINE_CODE_PATTERN,
+    MARKDOWN_EXTENSION,
+    URL_PATTERN,
+    USER_FACING_PLAIN_LANGUAGE_NOTICE,
+)
+def strip_non_prose_regions(text: str) -> str:
+    """Return text with code, quotes, URLs, and file paths removed.
+    These regions carry exact identifiers and references that plain language
+    leaves untouched, so they must not contribute matches.
+    """
+    without_fences = FENCED_CODE_BLOCK_PATTERN.sub("", text)
+    without_inline_code = INLINE_CODE_PATTERN.sub("", without_fences)
+    without_blockquotes = BLOCKQUOTE_LINE_PATTERN.sub("", without_inline_code)
+    without_urls = URL_PATTERN.sub("", without_blockquotes)
+    without_paths = FILE_PATH_PATTERN.sub("", without_urls)
+    return without_paths
+def find_banned_terms(text: str) -> list[tuple[str, str]]:
+    """Return each (matched term, suggested replacement) found in the prose.
+    Each term appears at most once, in first-seen order. Matching is
+    case-insensitive and respects word boundaries; multi-word phrases match as
+    whole units. Terms in the software-term allowlist are exempt and never
+    flagged.
+    """
+    prose_text = strip_non_prose_regions(text)
+    all_matches: list[tuple[str, str]] = []
+    seen_terms: set[str] = set()
+    for each_pattern, each_replacement in ALL_TERM_PATTERNS:
+        first_match = each_pattern.search(prose_text)
+        if first_match is None:
+            continue
+        normalized_term = first_match.group(0).lower()
+        if normalized_term in seen_terms:
+            continue
+        if normalized_term in ALL_SOFTWARE_TERMS:
+            continue
+        seen_terms.add(normalized_term)
+        all_matches.append((normalized_term, each_replacement))
+    return all_matches
+def build_block_reason(all_matches: list[tuple[str, str]]) -> str:
+    """Return a deny reason naming each flagged term and its plain replacement."""
+    swap_phrases = ", ".join(
+        f'use "{each_replacement}" instead of "{each_term}"'
+        for each_term, each_replacement in all_matches
+    )
+    return (
+        "BLOCKED: [PLAIN_LANGUAGE] Heavy words detected -- "
+        f"{swap_phrases}. Reach for the everyday word the reader understands "
+        "on the first pass."
+    )
+def _collect_ask_user_question_prose(tool_input: dict) -> str:
+    all_questions = tool_input.get("questions", [])
+    if not isinstance(all_questions, list):
+        return ""
+    prose_segments: list[str] = []
+    for each_question in all_questions:
+        if not isinstance(each_question, dict):
+            continue
+        question_text = each_question.get("question", "")
+        if isinstance(question_text, str):
+            prose_segments.append(question_text)
+        all_options = each_question.get("options", [])
+        if isinstance(all_options, list):
+            for each_option in all_options:
+                if isinstance(each_option, dict):
+                    option_label = each_option.get("label", "")
+                    if isinstance(option_label, str):
+                        prose_segments.append(option_label)
+                    option_description = each_option.get("description", "")
+                    if isinstance(option_description, str):
+                        prose_segments.append(option_description)
+    return "\n".join(prose_segments)
+def _collect_write_edit_markdown_prose(tool_name: str, tool_input: dict) -> str:
+    file_path = tool_input.get("file_path", "")
+    if not isinstance(file_path, str) or not file_path.lower().endswith(MARKDOWN_EXTENSION):
+        return ""
+    if tool_name == "Write":
+        content = tool_input.get("content", "")
+        return content if isinstance(content, str) else ""
+    if tool_name == "Edit":
+        new_string = tool_input.get("new_string", "")
+        return new_string if isinstance(new_string, str) else ""
+    all_edits = tool_input.get("edits", [])
+    if not isinstance(all_edits, list):
+        return ""
+    prose_segments: list[str] = []
+    for each_edit in all_edits:
+        if isinstance(each_edit, dict):
+            new_string = each_edit.get("new_string", "")
+            if isinstance(new_string, str):
+                prose_segments.append(new_string)
+    return "\n".join(prose_segments)
+def _collect_prose_for_tool(tool_name: str, tool_input: dict) -> str:
+    if tool_name == ASK_USER_QUESTION_TOOL_NAME:
+        return _collect_ask_user_question_prose(tool_input)
+    if tool_name in ALL_WRITE_EDIT_TOOL_NAMES:
+        return _collect_write_edit_markdown_prose(tool_name, tool_input)
+    return ""
+def _emit_deny(all_matches: list[tuple[str, str]], output_stream: TextIO) -> None:
+    deny_payload = {
+        "hookSpecificOutput": {
+            "hookEventName": "PreToolUse",
+            "permissionDecision": "deny",
+            "permissionDecisionReason": build_block_reason(all_matches),
+        },
+        "systemMessage": USER_FACING_PLAIN_LANGUAGE_NOTICE,
+        "suppressOutput": True,
+    }
+    output_stream.write(json.dumps(deny_payload))
+    output_stream.flush()
+def main() -> None:
+    try:
+        input_data = json.load(sys.stdin)
+    except json.JSONDecodeError:
+        sys.exit(0)
+    if not isinstance(input_data, dict):
+        sys.exit(0)
+    tool_name = input_data.get("tool_name", "")
+    tool_input = input_data.get("tool_input", {})
+    if not isinstance(tool_name, str) or not isinstance(tool_input, dict):
+        sys.exit(0)
+    prose_text = _collect_prose_for_tool(tool_name, tool_input)
+    if not prose_text:
+        sys.exit(0)
+    all_matches = find_banned_terms(prose_text)
+    if not all_matches:
+        sys.exit(0)
+    _emit_deny(all_matches, sys.stdout)
+    sys.exit(0)
+if __name__ == "__main__":
+    main()

package/hooks/blocking/pr_description_enforcer.py CHANGED Viewed

@@ -31,6 +31,7 @@ from hooks_constants.pr_description_enforcer_constants import (  # noqa: E402
     ALL_HEAVY_TESTING_HEADERS,
     ALL_READABILITY_CLI_FLAG_TOKENS,
     ATOMIC_WRITE_TEMP_SUFFIX,
+    BLOCKQUOTE_LINE_PATTERN,
     BLOCKQUOTE_MARKER_PATTERN,
     BOLD_PAIR_PATTERN,
     BULLET_MARKER_PATTERN,
@@ -63,6 +64,7 @@ from hooks_constants.pr_description_enforcer_constants import (  # noqa: E402
     SELF_CLOSING_REFERENCE_MESSAGE_SUFFIX,
     SELF_REFERENCE_PATTERN_TEMPLATE,
     STANDARD_SHAPE,
+    TABLE_ROW_LINE_PATTERN,
     THIS_PR_OPENING_PATTERN,
     TRIVIAL_BODY_CHAR_THRESHOLD,
     TRIVIAL_SHAPE,
@@ -350,6 +352,23 @@ def _count_substantive_prose_chars(body: str) -> int:
     return len(body_collapsed)
+def _extract_vague_scan_text(body: str) -> str:
+    """Return the prose to scan for vague language, with non-prose regions removed.
+    Drops whole blockquote lines and whole pipe-delimited table rows, then strips
+    the same Markdown ceremony as the prose-count path -- which removes fenced
+    code, inline code, and whole heading lines. This exempts vague phrases that
+    appear only inside code fences, inline code, Markdown headings, quoted
+    reviewer text, or pipe-delimited example tables -- those are not the author's
+    own prose. A pipe-delimited row carries at least two pipes; a line with a
+    single leading pipe, or a borderless table row with no leading pipe, stays in
+    scope.
+    """
+    without_blockquote_lines = BLOCKQUOTE_LINE_PATTERN.sub("", body)
+    without_table_rows = TABLE_ROW_LINE_PATTERN.sub("", without_blockquote_lines)
+    return _strip_markdown_ceremony(without_table_rows)
 def _iter_section_headers(body: str) -> list[str]:
     """Return every ATX heading line in the body, preserving canonical form.
@@ -813,7 +832,8 @@ def validate_pr_body(body: str, pr_number: int | None = None) -> list[str]:
             "(Adds, Fixes, Updates, Removes, Tightens, Ports)"
         )
-    vague_matches = VAGUE_LANGUAGE_PATTERN.findall(body)
+    vague_scan_text = _extract_vague_scan_text(body)
+    vague_matches = VAGUE_LANGUAGE_PATTERN.findall(vague_scan_text)
     if vague_matches:
         violations.append(
             f"Vague language detected: {', '.join(vague_matches)} -- "

package/hooks/blocking/test_plain_language_blocker.py ADDED Viewed

@@ -0,0 +1,247 @@
+"""Tests for the plain_language_blocker PreToolUse hook.
+Covers the shared prose scanner (fenced code, inline code, blockquotes, URLs,
+file paths), the word-boundary guard, multi-word phrase matching, case
+insensitivity, the term -> replacement block message, and both registered
+PreToolUse surfaces (AskUserQuestion and Write|Edit on .md targets).
+"""
+import importlib.util
+import json
+import os
+import subprocess
+import sys
+from pathlib import Path
+HOOK_SCRIPT_PATH = Path(__file__).parent / "plain_language_blocker.py"
+_HOOKS_DIR = str(Path(__file__).resolve().parent)
+_HOOKS_ROOT = str(Path(__file__).resolve().parent.parent)
+if _HOOKS_DIR not in sys.path:
+    sys.path.insert(0, _HOOKS_DIR)
+if _HOOKS_ROOT not in sys.path:
+    sys.path.insert(0, _HOOKS_ROOT)
+def _load_hook_module() -> object:
+    module_spec = importlib.util.spec_from_file_location(
+        "plain_language_blocker_under_test", HOOK_SCRIPT_PATH
+    )
+    assert module_spec is not None and module_spec.loader is not None
+    loaded_module = importlib.util.module_from_spec(module_spec)
+    module_spec.loader.exec_module(loaded_module)
+    return loaded_module
+hook_module = _load_hook_module()
+find_banned_terms = hook_module.find_banned_terms
+strip_non_prose_regions = hook_module.strip_non_prose_regions
+build_block_reason = hook_module.build_block_reason
+def _run_hook_with_payload(payload: dict) -> subprocess.CompletedProcess[str]:
+    return subprocess.run(
+        [sys.executable, str(HOOK_SCRIPT_PATH)],
+        input=json.dumps(payload),
+        capture_output=True,
+        text=True,
+        check=False,
+    )
+def _decision_from(completed: subprocess.CompletedProcess[str]) -> str | None:
+    if not completed.stdout:
+        return None
+    parsed = json.loads(completed.stdout)
+    return parsed.get("hookSpecificOutput", {}).get("permissionDecision")
+def test_canonical_hook_script_exists_at_expected_path() -> None:
+    assert HOOK_SCRIPT_PATH.is_file()
+def test_bare_prose_banned_term_is_detected() -> None:
+    matched = find_banned_terms("We initiate the worker pool at boot.")
+    assert any(each_term == "initiate" for each_term, _replacement in matched)
+def test_banned_term_inside_fenced_code_is_exempt() -> None:
+    prose = "Start the pool at boot.\n\n```python\nutilize(pool)\n```\n"
+    assert find_banned_terms(prose) == []
+def test_banned_term_inside_inline_code_is_exempt() -> None:
+    prose = "Call the `utilize` helper from the legacy module to migrate."
+    assert find_banned_terms(prose) == []
+def test_banned_term_inside_blockquote_is_exempt() -> None:
+    prose = "> The old guide said to utilize the pool.\n\nUse the pool directly now."
+    assert find_banned_terms(prose) == []
+def test_banned_term_inside_url_is_exempt() -> None:
+    prose = "See https://example.com/initiate-flow for the original write-up."
+    assert find_banned_terms(prose) == []
+def test_banned_term_inside_file_path_is_exempt() -> None:
+    prose = "Edit src/utilize_helpers/initiate.py to wire the new path."
+    assert find_banned_terms(prose) == []
+def test_word_boundary_guard_does_not_match_substring() -> None:
+    assert find_banned_terms("The reinitialize routine reruns the seed.") == []
+def test_case_insensitive_match() -> None:
+    matched_lower = find_banned_terms("utilize the cache.")
+    matched_upper = find_banned_terms("Utilize the cache.")
+    assert any(term == "utilize" for term, _ in matched_lower)
+    assert any(term == "utilize" for term, _ in matched_upper)
+def test_multi_word_phrase_matches_as_unit() -> None:
+    matched = find_banned_terms("Run the migration prior to the deploy step.")
+    assert any(term == "prior to" for term, _ in matched)
+def test_strip_non_prose_regions_removes_code_and_paths() -> None:
+    prose = "Use `utilize` and src/initiate.py and https://x.test/utilize here."
+    stripped = strip_non_prose_regions(prose)
+    assert "utilize" not in stripped
+    assert "initiate" not in stripped
+def test_block_reason_names_term_and_replacement() -> None:
+    reason = build_block_reason([("initiate", "start")])
+    assert "initiate" in reason
+    assert "start" in reason
+def test_ask_user_question_with_banned_term_is_denied() -> None:
+    payload = {
+        "tool_name": "AskUserQuestion",
+        "tool_input": {
+            "questions": [
+                {
+                    "question": "Should we utilize the new allocator now?",
+                    "header": "Allocator",
+                    "options": [{"label": "Yes", "description": "Switch now."}],
+                }
+            ]
+        },
+    }
+    completed = _run_hook_with_payload(payload)
+    assert _decision_from(completed) == "deny"
+def test_ask_user_question_banned_term_in_option_label_is_denied() -> None:
+    payload = {
+        "tool_name": "AskUserQuestion",
+        "tool_input": {
+            "questions": [
+                {
+                    "question": "Which path should we take?",
+                    "header": "Path",
+                    "options": [{"label": "Utilize the cache", "description": "Go fast."}],
+                }
+            ]
+        },
+    }
+    completed = _run_hook_with_payload(payload)
+    assert _decision_from(completed) == "deny"
+def test_clean_ask_user_question_passes_through() -> None:
+    payload = {
+        "tool_name": "AskUserQuestion",
+        "tool_input": {
+            "questions": [
+                {
+                    "question": "Should we switch the allocator now?",
+                    "header": "Allocator",
+                    "options": [{"label": "Yes", "description": "Switch now."}],
+                }
+            ]
+        },
+    }
+    completed = _run_hook_with_payload(payload)
+    assert _decision_from(completed) is None
+def test_write_markdown_with_banned_term_is_denied(tmp_path: Path) -> None:
+    target = tmp_path / "notes.md"
+    payload = {
+        "tool_name": "Write",
+        "tool_input": {
+            "file_path": str(target),
+            "content": "This guide explains how to utilize the new cache layer.",
+        },
+    }
+    completed = _run_hook_with_payload(payload)
+    assert _decision_from(completed) == "deny"
+def test_write_non_markdown_is_ignored(tmp_path: Path) -> None:
+    target = tmp_path / "notes.txt"
+    payload = {
+        "tool_name": "Write",
+        "tool_input": {
+            "file_path": str(target),
+            "content": "This guide explains how to utilize the new cache layer.",
+        },
+    }
+    completed = _run_hook_with_payload(payload)
+    assert _decision_from(completed) is None
+def test_edit_markdown_clean_content_passes_through(tmp_path: Path) -> None:
+    target = tmp_path / "notes.md"
+    payload = {
+        "tool_name": "Edit",
+        "tool_input": {
+            "file_path": str(target),
+            "new_string": "This guide explains how to use the new cache layer.",
+        },
+    }
+    completed = _run_hook_with_payload(payload)
+    assert _decision_from(completed) is None
+def test_multiedit_markdown_with_banned_term_is_denied(tmp_path: Path) -> None:
+    target = tmp_path / "notes.md"
+    payload = {
+        "tool_name": "MultiEdit",
+        "tool_input": {
+            "file_path": str(target),
+            "edits": [
+                {"old_string": "intro", "new_string": "This section reads cleanly."},
+                {"old_string": "body", "new_string": "Then we utilize the new cache."},
+            ],
+        },
+    }
+    completed = _run_hook_with_payload(payload)
+    assert _decision_from(completed) == "deny"
+def test_other_tool_is_ignored() -> None:
+    payload = {"tool_name": "Bash", "tool_input": {"command": "echo utilize"}}
+    completed = _run_hook_with_payload(payload)
+    assert _decision_from(completed) is None
+def test_software_allowlisted_term_is_not_flagged() -> None:
+    assert find_banned_terms("Run this command to start the worker.") == []
+def test_non_allowlisted_formal_term_still_flagged() -> None:
+    matched = find_banned_terms("Please utilize the cache now.")
+    assert any(term == "utilize" for term, _ in matched)
+def test_prose_slash_token_is_not_stripped_as_path() -> None:
+    assert "client/server" in strip_non_prose_regions("Use a client/server split here.")
+def test_real_file_path_is_still_stripped() -> None:
+    assert "initiate" not in strip_non_prose_regions("Edit src/initiate.py to wire it.")