npm - claude-dev-env - Versions diffs - 1.49.1 → 1.50.1 - Mend

claude-dev-env 1.49.1 → 1.50.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (34) hide show

package/audit-rubrics/category_rubrics/category-a-api-contracts.md CHANGED Viewed

@@ -21,10 +21,10 @@ The decomposition that worked best for PR #394 (a Python+PowerShell scheduled-ta
 | ID | Axis name | Concrete checks |
 |---|---|---|
-| A1 | Python function signatures vs internal call sites | Parameter count, names, defaults, kw-only barriers; every internal call binds correctly. |
-| A2 | Python return-type annotation vs every code path | Each function's return annotation is satisfied by every path: explicit `return X`, fall-through, exception-handler exit. |
+| A1 | Python function signatures vs internal call sites | Parameter count, names, defaults, kw-only barriers; every internal call binds correctly. Is the symbol `async def`? Confirm the exact access path a caller uses: free function vs instance method reached through an object attribute vs import path. A keyword-only parameter with no default is required; omitting it raises `TypeError`. |
+| A2 | Python return-type annotation vs every code path | Each function's return annotation is satisfied by every path: explicit `return X`, fall-through, exception-handler exit. The full failure contract is the return value AND every exception raised — trace the body and the docstring `Raises:` for each `raise`, including custom errors. A `-> bool` function that also raises is not fully described by "returns bool". |
 | A3 | argparse parser → Namespace contract | Every `add_argument(...)` produces the exact dest name accessed downstream; `type=` matches downstream usage; switches produce bools. |
-| A4 | Stdlib callback contracts | `os.walk(onerror=...)` callback shape; `os.path.getctime` / `os.rmdir` argument and exception contracts; `time.sleep` argument types. |
+| A4 | Stdlib callback contracts | `os.walk(onerror=...)` callback shape; `os.path.getctime` / `os.rmdir` argument and exception contracts; `time.sleep` argument types. Catch-site precision: for any claim that code "catches X", confirm the exact catch site and scope — an `except` around only a rollback inside `finally` does not catch the same error raised in the `with` body. |
 | A5 | subprocess invocation contract | `subprocess.run` kwargs valid for the targeted Python; `args=[list]` shape; exception propagation under `check=True`. |
 | A6 | PowerShell cmdlet parameter sets and binding | `param(...)` with `ParameterSetName=`; `[CmdletBinding(DefaultParameterSetName=…)]` presence; cmdlet parameter combinations valid per Microsoft docs. |
 | A7 | Cross-language argv boundary | The `-Argument` string composition → Windows process loader → C-runtime argv parser → Python `sys.argv` → argparse. Trailing-backslash and embedded-space hazards. |
@@ -34,6 +34,20 @@ Adapt these axes for your artifact. For a pure Python codebase, drop A6 and A7 a
 ---
+### Documentation as contract: verifying a doc claim about code
+When the audited artifact is documentation — a CLAUDE.md, a rule file, a README, a table mapping symbols to behavior — that asserts facts about the codebase, API-contract verification means checking every assertion against the current code, not just confirming the symbol exists and its return type matches. A doc that passes the happy-path contract can still be wrong on any of the seven checks below. Run all seven up front. Checks 1, 2, and 6 are the full-contract sharpening of sub-buckets A1, A2, and A4 applied to a doc claim; checks 3, 4, 5, and 7 are specific to documentation artifacts.
+1. **Full failure contract** — the failure signals of a function are its return value AND every exception it raises; trace the body and the docstring `Raises:` for every `raise`. _Example:_ a docs PR says a UI helper "returns `bool`", but it also raises a custom not-found error, and a database writer documented by its return type also raises `ValueError` / `RuntimeError` / a driver error, so "returns bool" understates the contract.
+2. **Call shape** — required versus optional parameters (a keyword-only parameter with NO default is required; omitting it raises `TypeError`), sync versus async, and the exact access path (free function versus instance method reached through an object attribute versus import path). _Example:_ a doc presents a helper as a free function, but it is an `async` instance method reached through an object attribute and one keyword-only parameter has no default, so the call example in the doc would raise `TypeError`.
+3. **Reuse-first** — before a doc endorses a hand-written snippet, search for a dedicated helper that already does it. _Example:_ a doc endorses hand-composing `normalize(name).lower()` inline while a dedicated `normalize_for_matching()` helper already does exactly that, contradicting the reuse-before-building rule the doc itself states.
+4. **Path resolution** — every file or directory path a doc cites resolves from the repository root. _Example:_ a doc cites a bare `snapshots/` directory as if it sat at the repo root, but the tree lives under `subsystem/snapshots/`.
+5. **Cross-entry consistency** — scan parallel rows, sections, and table entries for claims that contradict each other. _Example:_ two adjacent table rows map the same subsystem to two different exception base classes.
+6. **Catch-site precision** — when a doc claims code "catches X", confirm the exact site and scope of the catch. _Example:_ a doc says a context manager catches a driver error, but the `except` wraps only the rollback inside `finally`, so an error raised in the `with` body propagates uncaught.
+7. **Citation freshness** — re-derive every `file:line` claim against the current code; never trust a prior "verified" assertion or wording borrowed from a comment. _Example:_ an attribute name carried over from a review comment names a member the class does not define; the current code exposes it under a different name.
+---
 ## Sample prompt
 The literal text used in the May 2026 audit experiment is in [`../prompts/category-a-api-contracts.md`](../prompts/category-a-api-contracts.md). It produced 8–10 findings (P0=1–2, P1=2–6, P2=2–5) across two runs. Inline the full diff verbatim — do not ask the agent to fetch it.

package/audit-rubrics/prompts/category-a-api-contracts.md CHANGED Viewed

@@ -1,4 +1,4 @@
-Audit [REPO/ARTIFACT] [TARGET_ID] for **Category A only** (API contract verification). Skip B–K. Sub-bucket forced-exhaustion mode: Category A is decomposed into 8 sub-buckets below. Each sub-bucket REQUIRES at least one Shape A finding OR exactly one Shape B proof-of-absence with **at least 3 adversarial probes** specific to that sub-bucket. A sub-bucket returning neither is a protocol gap.
+Audit [REPO/ARTIFACT] [TARGET_ID] for **Category A only** (API contract verification). Skip B–N. Sub-bucket forced-exhaustion mode: Category A is decomposed into 9 sub-buckets below. Each sub-bucket REQUIRES at least one Shape A finding OR exactly one Shape B proof-of-absence with **at least 3 adversarial probes** specific to that sub-bucket. A sub-bucket returning neither is a protocol gap.
 [ARTIFACT METADATA: title / change description / head SHA or revision identifier / scope summary]
 ID prefix: `find`.
@@ -15,6 +15,7 @@ ID prefix: `find`.
 - Flag positional arguments passed to keyword-only parameters and vice versa.
 - Flag calls that omit a required parameter relying on a default that does not exist on the current branch.
 - Verify decorators (`@staticmethod`, `@classmethod`, `@property`) do not silently shift the parameter binding (e.g., `self` / `cls` insertion).
+- Confirm sync-vs-async (is the symbol `async def`?), the exact access path a caller uses (free function vs instance method via an object attribute vs import path), and that a keyword-only parameter with no default is required — omitting it raises `TypeError`.
 **A2. Return-type annotation vs every code path**
 - For each annotated function, walk every code path: explicit `return X`, fall-through to implicit `None`, exception-handler exit, generator `yield` paths, async coroutine return value.
@@ -22,6 +23,7 @@ ID prefix: `find`.
 - For functions that raise instead of returning on some path, confirm the annotation does not promise a value the caller will dereference.
 - Inspect `try/except/finally` chains for paths that return from `finally` and override `try`/`except` returns.
 - For async functions, confirm the annotation refers to the awaited type, not the coroutine wrapper.
+- The full failure contract is the return value AND every exception raised — list each `raise` in the body and the docstring `Raises:`; a `-> bool` that can also raise is not fully described as "returns bool".
 **A3. CLI/argument-parser declaration → downstream Namespace contract**
 - For every `add_argument(...)` (or equivalent CLI declaration), verify the auto-derived or explicit `dest=` matches the attribute name accessed downstream on the parsed namespace.
@@ -34,6 +36,7 @@ ID prefix: `find`.
 - Identify every callback handed to a library function (e.g., `os.walk(onerror=...)`, sort `key=`, `filter`, `map`, `re.sub(repl=callable)`, signal handlers, threading callbacks). Verify each callback's signature matches what the library calls it with — arity, positional-vs-keyword, return type the library consumes.
 - For every stdlib function the artifact calls, verify argument types and exception contracts: which exceptions can each call raise, and is each caller prepared (or deliberately not prepared) for them.
 - Verify kwargs to stdlib functions are spelled correctly for the targeted runtime version (deprecated/renamed kwargs, version-introduced kwargs).
+- Catch-site precision — for any "catches X" claim confirm the exact catch site and scope (an `except` around only a rollback inside `finally` does not catch the same error from the `with` body).
 - Flag callbacks whose return value the library consumes but the implementation returns `None` (or vice versa).
 - Confirm callback exception behavior: which exceptions in the callback bubble out, which are swallowed by the library, which terminate iteration.
@@ -66,6 +69,18 @@ ID prefix: `find`.
 - For write calls, verify the signature against the provider's own published API contract — their REST reference docs, OpenAPI spec, SDK source code, or `--help` output. When a read endpoint exposes the same state, call it to confirm the write contract.
 - Flag every call where documented parameters, types, or behavior diverge from the official API contract.
+**A9. Documentation claims about the codebase (when the artifact asserts facts about the code)**
+When the artifact is documentation that asserts facts about the codebase (symbol names, signatures, return types, exceptions, file paths), run all seven documentation-as-contract checks below; each yields a confirmation or a finding. For a pure-code artifact, A9 is one line of proof-of-absence (the artifact asserts no code facts).
+- Full failure contract — the failure signals of a function are its return value AND every exception it raises; trace the body and the docstring `Raises:` for every `raise`. _Example:_ a docs PR says a UI helper "returns `bool`", but it also raises a custom not-found error, so "returns bool" understates the contract.
+- Call shape — required versus optional parameters (a keyword-only parameter with NO default is required; omitting it raises `TypeError`), sync versus async, and the exact access path (free function versus instance method reached through an object attribute versus import path). _Example:_ a doc presents a helper as a free function, but it is an `async` instance method reached through an object attribute, so the doc's call example would raise `TypeError`.
+- Reuse-first — before a doc endorses a hand-written snippet, search for a dedicated helper that already does it. _Example:_ a doc endorses hand-composing `normalize(name).lower()` inline while a dedicated `normalize_for_matching()` helper already does exactly that.
+- Path resolution — every file or directory path a doc cites resolves from the repository root. _Example:_ a doc cites a bare `snapshots/` directory as if it sat at the repo root, but the tree lives under `subsystem/snapshots/`.
+- Cross-entry consistency — scan parallel rows, sections, and table entries for claims that contradict each other. _Example:_ two adjacent table rows map the same subsystem to two different exception base classes.
+- Catch-site precision — when a doc claims code "catches X", confirm the exact site and scope of the catch. _Example:_ a doc says a context manager catches a driver error, but the `except` wraps only the rollback inside `finally`, so an error raised in the `with` body propagates uncaught.
+- Citation freshness — re-derive every `file:line` claim against the current code; never trust a prior "verified" assertion or wording borrowed from a comment. _Example:_ an attribute name carried over from a review comment names a member the class does not define; the current code exposes it under a different name.
 ## Cross-bucket questions to answer at the end
 Q1: Are there any contracts that span two sub-buckets that single-bucket analysis would miss?
@@ -74,7 +89,7 @@ Q3: Where would a future refactor most likely break a cross-bucket or cross-lang
 ## Output
-Lead: `Total: N (P0=N, P1=N, P2=N)`. For each sub-bucket A1–A8, produce Shape A or Shape B (with ≥3 adversarial probes). Cross-bucket Q1–Q3 answers after the per-sub-bucket walk. Adversarial second pass: "assume your first pass missed at least 3 P1 bugs across these 8 sub-buckets — find them." Open Questions section for ambiguities. Read-only. No edits, no commits.
+Lead: `Total: N (P0=N, P1=N, P2=N)`. For each sub-bucket A1–A9, produce Shape A or Shape B (with ≥3 adversarial probes). Cross-bucket Q1–Q3 answers after the per-sub-bucket walk. Adversarial second pass: "assume your first pass missed at least 3 P1 bugs across these 9 sub-buckets — find them." Open Questions section for ambiguities. Read-only. No edits, no commits.
 ---

package/docs/CODE_RULES.md CHANGED Viewed

@@ -62,6 +62,8 @@ These rules are automatically enforced by `code_rules_enforcer.py`. Violations b
 | Test-mode branching in production | Reading `TESTING`, `PYTEST_CURRENT_TEST`, `IS_TEST`, etc. from production code creates two parallel implementations. Use dependency injection so production stays single-path. **Test files and hook infrastructure exempt.** |
 | Thin wrapper files | A non-`__init__.py` module whose body is only imports (optionally with an `__all__` assignment) is a re-export indirection with no payload. Callers should import from the real module. `__init__.py` is the canonical re-export surface and is exempt. |
 | Docstring format (Google-style) | Public functions/methods (no leading underscore, not dunder, body > 3 lines, not `@property`/`@abstractmethod`) require Google-style `Args:` / `Returns:` (or `Yields:`) / `Raises:` sections matching the signature. **Test files exempt.** |
+| Docstring Args match signature | A public function whose docstring `Args:` section names a parameter the signature does not declare is flagged — a rename that left the adjacent `Args:` line stale. Only the `Args:` section is compared against the signature; `Raises:` is left alone because callee-propagated exceptions cause false positives. **Test files and hook infrastructure exempt.** |
+| Ignored must-check return | A bare-statement call to a function whose return value is its only failure signal (the curated `find_and_click`, `write_outcome` set) is flagged — the discarded boolean lets the caller move on silently after a failure. Assign the return and check it. Assigned (`clicked = …`) and branched-on (`if …:`) calls are exempt. Attribute calls are matched by their terminal method name alone (the receiver type is not resolved), so an unrelated `obj.write_outcome()` or `widget.find_and_click()` whose method name collides with a curated name is also flagged. **Test files exempt.** |
 ### Where UPPER_SNAKE is allowed
@@ -124,7 +126,7 @@ Full words only. No mental translation.
 **Extended naming rules** :
 - Loop vars: `each_order`, `each_user` (prefix `each_`)
-- Booleans: `is_valid`, `has_permission`, `should_retry` (prefix `is_`/`has_`/`should_`/`can_`)
+- Booleans: `is_valid`, `has_permission`, `should_retry`, `was_clicked`, `did_succeed` (prefix `is_`/`has_`/`should_`/`can_`/`was_`/`did_`). The hook covers both boolean assignments and boolean-typed function parameters (a parameter annotated `bool` or defaulting to a boolean literal); `self`/`cls` and single-character names are exempt.
 - Collections: `all_orders`, `all_users` (prefix `all_`)
 - Maps: `price_by_product`, `user_by_id` (pattern `X_by_Y`)
 - Preposition params: `from_path=`, `to=`, `into=`
@@ -400,6 +402,9 @@ Hook will enforce:
 [⚡] No test-mode branching in production (TESTING / PYTEST_CURRENT_TEST)
 [⚡] No thin wrapper modules (imports only, optionally with __all__, outside __init__.py)
 [⚡] Public functions have Google-style Args:/Returns:/Raises: when warranted
+[⚡] Docstring Args: names match the signature (a stale renamed param is flagged)
+[⚡] Boolean names prefixed is_/has_/should_/can_/was_/did_ (assignments AND bool-typed parameters)
+[⚡] No discarded must-check return (assign and check find_and_click/write_outcome outcomes)
 Manual check:
 [ ] No abbreviations?

package/hooks/blocking/_gh_body_arg_utils.py CHANGED Viewed

@@ -1,4 +1,4 @@
-"""Shared gh body-arg parsing utilities for blocking hooks."""
+"""Shared shell-token and gh body-arg parsing utilities for blocking hooks."""
 from __future__ import annotations
@@ -55,6 +55,63 @@ _all_equals_prefixes_for_skip: tuple[str, ...] = tuple(
 bash_continuation_marker: str = "\\"
 powershell_continuation_marker: str = "`"
+shell_variable_sigil: str = "$"
+all_quote_characters: frozenset[str] = frozenset({'"', "'"})
+minimum_meaningful_token_length: int = 2
+non_body_value_flags: frozenset[str] = all_value_flags - {body_file_flag, body_file_short_flag}
+_non_body_value_flag_equals_prefixes: tuple[str, ...] = tuple(
+    sorted((f"{each_flag}=" for each_flag in non_body_value_flags), key=len, reverse=True)
+)
+def is_flag_shaped_token(token: str) -> bool:
+    """Report whether a token is flag-shaped for body/PR-number extraction.
+    Treats any token whose second character is "-" as flag-shaped, so bare
+    "--" and "--<digit>" tokens both count as flags. `_is_flag_shaped` applies
+    a stricter rule for token-stream scanning.
+    """
+    if len(token) < minimum_meaningful_token_length:
+        return False
+    if not token.startswith("-"):
+        return False
+    return token[1] == "-" or token[1].isalpha()
+def strip_surrounding_quotes(token: str) -> str:
+    if len(token) < minimum_meaningful_token_length:
+        return token
+    first_character = token[0]
+    last_character = token[-1]
+    if first_character in all_quote_characters and first_character == last_character:
+        return token[1:-1]
+    return token
+def is_unresolvable_shell_value(token: str) -> bool:
+    return token.startswith(shell_variable_sigil)
+def _match_prefix(token: str, all_prefixes: tuple[str, ...]) -> str | None:
+    for each_prefix in all_prefixes:
+        if token.startswith(each_prefix):
+            return each_prefix
+    return None
+def match_body_flag_equals_prefix(token: str) -> str | None:
+    return _match_prefix(token, all_body_flag_prefixes)
+def match_body_file_equals_prefix(token: str) -> str | None:
+    return _match_prefix(token, (body_file_flag_prefix, body_file_short_flag_prefix))
+def match_non_body_value_flag_equals_prefix(token: str) -> str | None:
+    return _match_prefix(token, _non_body_value_flag_equals_prefixes)
 def _count_trailing_run(text: str, marker_character: str) -> int:
     trailing_run_length = 0
@@ -91,7 +148,13 @@ def get_logical_first_line(command: str) -> str:
 def _is_flag_shaped(token: str) -> bool:
-    if len(token) < 2:
+    """Report whether a token is flag-shaped for token-stream scanning.
+    Requires an alphabetic character after "--", so bare "--" and "--<digit>"
+    tokens are not flag-shaped. `is_flag_shaped_token` applies a looser rule
+    for body/PR-number extraction.
+    """
+    if len(token) < minimum_meaningful_token_length:
         return False
     if not token.startswith("-"):
         return False
@@ -102,7 +165,7 @@ def _is_flag_shaped(token: str) -> bool:
 def _quoted_value_starts_split(value_token: str) -> bool:
-    if len(value_token) < 2:
+    if len(value_token) < minimum_meaningful_token_length:
         return False
     first_character = value_token[0]
     if first_character not in {'"', "'"}:
@@ -129,13 +192,6 @@ def count_extra_tokens_to_skip_for_split_quoted_value(
     return None
-def _match_equals_prefix_for_skip(token: str) -> str | None:
-    for each_prefix in _all_equals_prefixes_for_skip:
-        if token.startswith(each_prefix):
-            return each_prefix
-    return None
 def iter_significant_tokens(
     command: str,
     pre_tokenized: tuple[str, list[str]] | None = None,
@@ -175,7 +231,7 @@ def iter_significant_tokens(
     while token_index < len(all_tokens):
         current_token = all_tokens[token_index]
         remaining_tokens = all_tokens[token_index + 1:]
-        matched_equals_prefix = _match_equals_prefix_for_skip(current_token)
+        matched_equals_prefix = _match_prefix(current_token, _all_equals_prefixes_for_skip)
         if matched_equals_prefix is not None:
             value_token = current_token[len(matched_equals_prefix):]
             split_value_extra_tokens = count_extra_tokens_to_skip_for_split_quoted_value(

package/hooks/blocking/_md_to_html_blocker_test_support.py ADDED Viewed

@@ -0,0 +1,65 @@
+"""Shared subprocess-invocation helpers for the md_to_html_blocker test suites.
+Subprocess CWD is rooted in a per-session sandbox created lazily so that
+relative-path test cases canonicalize outside any `.claude-plugin/` ancestor,
+outside the OS temp directory, and outside the exempt home-relative
+subdirectories. The sandbox is a real repo root (it carries a `.git` marker) so
+relative `README.md` / `CHANGELOG.md` writes exercise the repo-root exemption
+path. This keeps the suites independent of where pytest itself is run.
+"""
+import functools
+import json
+import os
+import shutil
+import stat
+import subprocess
+import sys
+import tempfile
+from pathlib import Path
+HOOK_SCRIPT_PATH = os.path.join(os.path.dirname(__file__), "md_to_html_blocker.py")
+def _strip_read_only_and_retry(removal_function, target_path, *_exc_info):
+    try:
+        os.chmod(target_path, stat.S_IWRITE)
+        removal_function(target_path)
+    except OSError:
+        pass
+def _force_rmtree(target_path: str) -> None:
+    handler_kw = (
+        {"onexc": _strip_read_only_and_retry}
+        if sys.version_info >= (3, 12)
+        else {"onerror": _strip_read_only_and_retry}
+    )
+    try:
+        shutil.rmtree(target_path, **handler_kw)
+    except OSError:
+        pass
+@functools.lru_cache(maxsize=1)
+def _get_sandbox_parent_directory() -> str:
+    sandbox_parent = tempfile.mkdtemp(prefix="pytest_md_blocker_", dir=str(Path.home()))
+    git_marker_path = os.path.join(sandbox_parent, ".git")
+    Path(git_marker_path).touch()
+    return sandbox_parent
+class _RunHook:
+    def __call__(self, tool_name: str, tool_input: dict) -> subprocess.CompletedProcess:
+        payload = json.dumps({"tool_name": tool_name, "tool_input": tool_input})
+        return subprocess.run(
+            [sys.executable, HOOK_SCRIPT_PATH],
+            input=payload,
+            capture_output=True,
+            text=True,
+            check=False,
+            cwd=_get_sandbox_parent_directory(),
+        )
+_run_hook = _RunHook()