claude-dev-env 1.49.0 → 1.49.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (31) hide show
  1. package/audit-rubrics/category_rubrics/category-a-api-contracts.md +72 -0
  2. package/audit-rubrics/category_rubrics/category-b-selector-engine-compat.md +36 -0
  3. package/audit-rubrics/category_rubrics/category-c-resource-cleanup.md +35 -0
  4. package/audit-rubrics/category_rubrics/category-d-scoping-and-ordering.md +35 -0
  5. package/audit-rubrics/category_rubrics/category-e-dead-code.md +38 -0
  6. package/audit-rubrics/category_rubrics/category-f-silent-failures.md +38 -0
  7. package/audit-rubrics/category_rubrics/category-g-bounds-and-overflow.md +38 -0
  8. package/audit-rubrics/category_rubrics/category-h-security-boundaries.md +40 -0
  9. package/audit-rubrics/category_rubrics/category-i-concurrency.md +38 -0
  10. package/audit-rubrics/category_rubrics/category-j-code-rules-compliance.md +46 -0
  11. package/audit-rubrics/category_rubrics/category-k-codebase-conflicts.md +59 -0
  12. package/audit-rubrics/category_rubrics/category-l-behavior-equivalence.md +45 -0
  13. package/audit-rubrics/category_rubrics/category-m-producer-consumer-cardinality.md +44 -0
  14. package/audit-rubrics/category_rubrics/category-n-test-name-scenario-verifier.md +45 -0
  15. package/audit-rubrics/prompts/category-a-api-contracts.md +384 -0
  16. package/audit-rubrics/prompts/category-b-selector-engine-compat.md +401 -0
  17. package/audit-rubrics/prompts/category-c-resource-cleanup.md +420 -0
  18. package/audit-rubrics/prompts/category-d-scoping-and-ordering.md +414 -0
  19. package/audit-rubrics/prompts/category-e-dead-code.md +420 -0
  20. package/audit-rubrics/prompts/category-f-silent-failures.md +420 -0
  21. package/audit-rubrics/prompts/category-g-bounds-and-overflow.md +383 -0
  22. package/audit-rubrics/prompts/category-h-security-boundaries.md +423 -0
  23. package/audit-rubrics/prompts/category-i-concurrency.md +429 -0
  24. package/audit-rubrics/prompts/category-j-code-rules-compliance.md +463 -0
  25. package/audit-rubrics/prompts/category-k-codebase-conflicts.md +328 -0
  26. package/audit-rubrics/prompts/category-l-behavior-equivalence.md +128 -0
  27. package/audit-rubrics/prompts/category-m-producer-consumer-cardinality.md +129 -0
  28. package/audit-rubrics/prompts/category-n-test-name-scenario-verifier.md +132 -0
  29. package/audit-rubrics/source-material-section-types.md +51 -0
  30. package/package.json +2 -1
  31. package/skills/bugteam/reference/teardown-publish-permissions.md +7 -2
@@ -0,0 +1,383 @@
1
+ Audit [REPO/ARTIFACT] [TARGET_ID] for **Category G only** (off-by-one, bounds, integer overflow). Skip A–F, H–K. Sub-bucket forced-exhaustion mode: Category G is decomposed into 8 sub-buckets below. Each sub-bucket REQUIRES at least one Shape A finding OR exactly one Shape B proof-of-absence with **at least 3 adversarial probes** specific to that sub-bucket. A sub-bucket returning neither is a protocol gap.
2
+
3
+ [ARTIFACT METADATA]
4
+ - Repository / artifact: [REPO_OR_ARTIFACT]
5
+ - Target ID (PR / commit / tag / file set): [TARGET_ID]
6
+ - Head SHA / revision: [HEAD_SHA]
7
+ - Title or summary: [TITLE]
8
+ - Languages / runtimes in scope: [LANGS]
9
+
10
+ ID prefix: `find`.
11
+
12
+ ## Source material
13
+
14
+ Inline the artifact under this section (full diff for a PR, full file bodies for a file-set audit, or a representative slice for an oversized artifact). For chunking strategy, file inclusion order, and "all lines in scope" framing, follow the companion chunking guide referenced by the rubric (`../source-material-section-types.md`). When a single artifact exceeds the prompt budget, split into ordered chunks and re-run this prompt per chunk; each chunk must independently satisfy the per-sub-bucket Shape A / Shape B requirement against the lines it contains.
15
+
16
+ ## Sub-buckets (each requires Shape A finding OR Shape B with ≥3 adversarial probes)
17
+
18
+ **G1. Loop bounds**
19
+ Scope: every `range(...)`, `while i < n`, `for i in range(len(x)+1)`, manual index counters, generator-driven loops, recursion depth bounds, and any inclusive-vs-exclusive iteration boundary in the artifact.
20
+ Required output: at least one Shape A finding citing the off-by-one site, OR exactly one Shape B proof-of-absence with ≥3 adversarial probes (e.g., (a) is there an implicit upper bound an iterator could miss — symlink loops, infinite generators, deep recursion? (b) does an empty-collection edge case skip the loop body cleanly? (c) does a manual counter that runs alongside the iterator drift by one when the underlying collection mutates mid-loop?).
21
+
22
+ **G2. Slice / substring indices**
23
+ Scope: every `s[i:j]`, `arr[-n:]`, `split(...)[i]`, character-level indexing, regex match groups treated as substrings, and any computed slice endpoint that could equal `len(x) + 1` or a negative index that clamps unexpectedly.
24
+ Required output: Shape A citing the bad slice, OR Shape B with ≥3 adversarial probes (e.g., (a) does any consumer downstream apply a length-dependent truncation? (b) does a path-splitting helper underflow when its input is exactly the root? (c) can a regex group index be `0` for a non-matching capture and still be sliced into?).
25
+
26
+ **G3. Array / list indexing with computed offsets**
27
+ Scope: every `arr[i + offset]`, `dict[computed_key]` where the key is numeric, off-the-end probes (`arr[len(arr)]`), iterator advancement that returns a sentinel the next call dereferences, and PowerShell `$collection[$index]` with computed `$index`.
28
+ Required output: Shape A citing the index site, OR Shape B with ≥3 adversarial probes (e.g., (a) can a lookup return `$null` / `None` that the next access dereferences? (b) does an argv-style list ever get indexed past its known length? (c) does a `foreach` element receive a `$null` member from an empty collection?).
29
+
30
+ **G4. Integer arithmetic overflow** ⭐ canonical surface
31
+ Scope: 32-bit vs 64-bit assumptions; PowerShell `[int]` overflow at 2^31; `time.time() * 1000` precision loss; multiplication that crosses platform `int` ceilings; counters seeded from user input; ticks / nanoseconds / milliseconds conversions; cross-language defaults that share a magnitude but not a source of truth.
32
+ Required output: Shape A citing the overflow site or the duplicated-default drift hazard, OR Shape B with ≥3 adversarial probes (e.g., (a) what happens at `2^31 - 1` and `2^31` for each `[int]`-typed parameter? (b) is `0` accepted as a degenerate value that produces a busy loop or a zero-interval scheduler entry? (c) are equivalent constants declared in two languages without a shared source of truth — and does drift on one side go undetected?).
33
+
34
+ **G5. Floating-point comparison**
35
+ Scope: every `==` / `!=` / `>=` / `<=` between floats; iterative accumulators where epsilon noise compounds; `0.1 + 0.2 != 0.3` patterns; mixed int-float comparison; filesystem-resolution rounding (FAT32 = 2s, NTFS = 100ns, ext4 = ns) interacting with sub-second thresholds.
36
+ Required output: Shape A citing the float-equality or epsilon-free comparison, OR Shape B with ≥3 adversarial probes (e.g., (a) is the comparison `==` vs `>=` / `<=` — and does the boundary semantics matter? (b) does sub-second filesystem-resolution rounding on one platform produce stale-equality results that another platform avoids? (c) is the float subtraction monotonic under wall-clock adjustment, or could it produce a negative result that flips the comparison direction?).
37
+
38
+ **G6. Date / time arithmetic** ⭐ canonical surface
39
+ Scope: timezone math; DST transitions; leap seconds; `now - then >= threshold` precision; `time.time()` (wall-clock) vs `time.monotonic()` / `time.perf_counter()` selection; Unix epoch vs Windows FILETIME (100ns ticks since 1601); `[DateTime]` cast on strings without timezone suffix; cross-language datetime contracts.
40
+ Required output: Shape A citing the timezone-naive arithmetic or the wall-clock-vs-monotonic mismatch, OR Shape B with ≥3 adversarial probes (e.g., (a) is the threshold wide enough to absorb worst-case NTP slew (≤128ms typical) without flipping the comparison? (b) does a cross-language string-based datetime contract silently discard timezone information? (c) does the platform-dependent meaning of "ctime" (creation time on Windows, inode-change time on POSIX) match the test's assumption?).
41
+
42
+ **G7. Unicode codepoint vs byte length**
43
+ Scope: `len()` semantics (Python = codepoints, Go = bytes, JS = UTF-16 code units); UTF-8 encoded byte-length truncation that splits mid-codepoint; surrogate pairs; BMP vs non-BMP characters; argv encoding across `subprocess.run` / `CreateProcessW` / `execve`.
44
+ Required output: Shape A citing the codepoint/byte mismatch, OR Shape B with ≥3 adversarial probes (e.g., (a) does any consumer apply a byte-length cap that could split a UTF-8 codepoint? (b) does a directory walker decode names as raw bytes (POSIX `surrogateescape`) vs UTF-16 (Windows) in a way that affects which entries are seen? (c) do non-BMP characters round-trip correctly through subprocess argv encoding?).
45
+
46
+ **G8. Threshold and age comparisons** ⭐ canonical surface
47
+ Scope: every `>=` vs `>` boundary on age / size / count thresholds; inclusive-vs-exclusive semantics; docstring/help-text/code disagreement on boundary direction; tests that exercise comfortably-above and comfortably-below cases but skip the exact-boundary case; user-facing copy that uses one symbol while the code uses another.
48
+ Required output: Shape A citing the boundary-semantics conflict (code site + docstring/help-text/UI site), OR Shape B with ≥3 adversarial probes (e.g., (a) is the inclusive-vs-exclusive choice safe under sub-second filesystem-resolution rounding that could land a "fresh" value at exactly the boundary? (b) does any test seed a value at exactly the threshold to exercise the boundary? (c) do user-facing strings use `≥` / `>` symbols faithful to the code, and does the docstring agree?).
49
+
50
+ ## Cross-bucket questions to answer at the end
51
+
52
+ Q1: Are there boundary hazards that span two sub-buckets — e.g., a G6 timestamp imprecision that combines with a G8 inclusive comparison to flip a borderline case, or a G4 overflow that interacts with a G1 loop bound to produce an infinite iteration? Cite the line pair.
53
+ Q2: What's the worst boundary hazard introduced by this artifact? Cite `[file]:[line]` (and any companion file:line if the hazard is multi-site).
54
+ Q3: Which threshold or constant is most fragile to a future change in input scale (e.g., shifting from minute-scale ages to second-scale, or from 2-minute defaults to 2-millisecond defaults)? Identify the line(s) where the unit assumption is hardcoded.
55
+
56
+ ## Output
57
+
58
+ Lead: `Total: N (P0=N, P1=N, P2=N)`. For each sub-bucket G1-G8, produce Shape A or Shape B (with ≥3 probes). Each Shape A finding must cite the file:line where the boundary or numeric type fails. Cross-bucket Q1-Q3 answers after the per-sub-bucket walk. Adversarial second pass: "assume your first pass missed at least 3 P1 boundary or overflow bugs across these 8 sub-buckets — find them." Open Questions section for ambiguities. Read-only. No edits, no commits.
59
+
60
+ ---
61
+
62
+ # Worked example: jl-cmd/claude-code-config PR #394
63
+
64
+ Audit jl-cmd/claude-code-config PR #394 for **Category G only** (off-by-one, bounds, integer overflow). Skip A–F, H–K. Sub-bucket forced-exhaustion mode: Category G is decomposed into 8 sub-buckets below. Each sub-bucket REQUIRES at least one Shape A finding OR exactly one Shape B proof-of-absence with **at least 3 adversarial probes** specific to that sub-bucket. A sub-bucket returning neither is a protocol gap.
65
+
66
+ PR: feat(scripts): add sweep-empty-dirs utility and scheduled-task installer
67
+ Head SHA: 62c9c169ee7a44824e5da25c4cf8b74fdca08a53
68
+ ID prefix: `find`.
69
+
70
+ ## Sub-buckets (each requires Shape A finding OR Shape B with ≥3 adversarial probes)
71
+
72
+ **G1. Loop bounds**
73
+ - The only iteration in `sweep_empty_dirs.py` is `for each_directory_path, _, _ in os.walk(root, onerror=_log_walk_error, topdown=False):` at lines 23-25 — `os.walk` is iterator-driven with no explicit numeric range, no `range()`, no `while i < n` counter.
74
+ - The only `while` is `while True:` at line 67 inside `main()`'s loop branch, terminated by `KeyboardInterrupt` — no numeric bound to be off-by-one against.
75
+ - Shape B probes: (a) is there an implicit upper bound that `os.walk` could miss (symlink loops creating infinite descent, very deep trees hitting `MAXPATHLEN`)? (b) does `topdown=False` interact with the iteration count when the root contains 0 entries — does the loop body even execute, or is the post-condition `removed = []` returned cleanly? (c) the test at line 62-65 (`test_empty_root_does_not_crash`) iterates a root that itself was aged to `time.time() - 300` — does `os.walk` yield the root itself, and if so does `os.rmdir` attempt to delete `tempfile.TemporaryDirectory`'s own directory while the context manager still holds it?
76
+
77
+ **G2. Slice / substring indices**
78
+ - No string or list slicing in any of the four files. No `s[i:j]`, no `arr[-n:]`, no `split(...)[i]` indexing, no `[0]`/`[1]`/`[-1]` element access.
79
+ - `os.path.join(tmp, "parent", "child", "leaf")` at test line 51 builds a path; no slice operations on the result.
80
+ - Shape B probes: (a) does the f-string `f"warning: cannot scan {os_error.filename} — {os_error.strerror}"` at line 14 of `sweep_empty_dirs.py` perform any implicit truncation that depends on filename length? (b) does `os.path.join` ever produce a string the consumer slices downstream? (c) the PowerShell `Split-Path -Parent $PSCommandPath` at line 46 of `Install-SweepEmptyDirs.ps1` — does it index a substring that could underflow when `$PSCommandPath` is exactly the drive root?
81
+
82
+ **G3. Array / list indexing with computed offsets**
83
+ - `removed: list[str] = []` at line 21 is only ever appended to (line 34) and returned (line 38) — no index access into `removed`.
84
+ - Test asserts use membership (`assert empty_dir in removed`, lines 36, 45, 57, 58, 59, 74) — no positional indexing.
85
+ - The PowerShell `foreach ($action in $task.Actions)` and `foreach ($trigger in $task.Triggers)` at lines 30, 34 of `Install-SweepEmptyDirs.ps1` iterate by element, not by index.
86
+ - Shape B probes: (a) does `Get-ScheduledTask` ever return `$null` in a way that makes `$task.Actions` an indexer-into-null? (b) does `_set_creation_time_windows` ever receive a path that exceeds `MAX_PATH` (260 chars on Windows pre–long-path) such that `subprocess.run`'s argv list overflows? (c) does the `["powershell", "-Command", ...]` argv list at lines 24-26 of the test file ever produce a fourth element that an `argv[i]` consumer would expect at index 2?
87
+
88
+ **G4. Integer arithmetic overflow** ⭐ canonical surface
89
+ - Python side: `DEFAULT_AGE_SECONDS: int = 120` and `DEFAULT_POLL_INTERVAL: int = 30` at lines 3-4 of `sweep_config.py`. Python `int` is arbitrary precision — no 32-bit ceiling. `argparse` `type=int` (lines 44, 48 of `sweep_empty_dirs.py`) likewise produces a Python int. `time.sleep(arguments.interval)` at line 69 accepts any non-negative number; no overflow risk on the value itself.
90
+ - PowerShell side: `[int]$IntervalMinutes = 5` (line 7) and `[int]$AgeSeconds = 120` (line 10) of `Install-SweepEmptyDirs.ps1`. `[int]` in PowerShell is `System.Int32` — range `-2147483648` to `2147483647`. Both defaults are 4-5 orders of magnitude below the ceiling. The user-overridable values would have to exceed 2^31-1 (≈ 68 years in seconds, ≈ 4084 years in minutes) to overflow.
91
+ - `New-TimeSpan -Minutes $IntervalMinutes` at line 71 of `Install-SweepEmptyDirs.ps1`: `TimeSpan` is internally `Int64` ticks (100ns each). 5 minutes × 60 × 10⁷ = 3×10⁹ ticks — well within `Int64` range. Even 4084-year overflow on `[int]$IntervalMinutes` would be caught at the `[int]` cast, not at the `TimeSpan` construction.
92
+ - Cross-language defaults: PowerShell `[int]$AgeSeconds = 120` (line 10) is passed through argv as the bare token `120` to Python's `argparse type=int`. Python parses it back to an arbitrary-precision int. No precision drift at this magnitude. **But:** the two defaults (PS line 10 and Python sweep_config.py line 3) are independently hardcoded with no shared source of truth — drift in a future edit is the real G4 hazard, not arithmetic overflow today.
93
+ - Shape A candidate (P2): `[int]$AgeSeconds = 120` and `DEFAULT_AGE_SECONDS: int = 120` are duplicated literals across two files with no validation that they match. A future edit to one without the other produces a silent default drift. Cite `Install-SweepEmptyDirs.ps1:10` and `sweep_config.py:3`.
94
+ - Adversarial probes: (a) what happens if the user invokes `Install-SweepEmptyDirs.ps1 -AgeSeconds 2147483648`? PowerShell's `[int]` cast would throw at parameter binding — verify whether the script handles this or crashes uninformatively. (b) `New-TimeSpan -Minutes` accepts `[int32]` per Microsoft docs — is `$IntervalMinutes = 0` accepted, and does `-RepetitionInterval (TimeSpan.Zero)` register a degenerate task? (c) `time.sleep(arguments.interval)` at line 69 with `arguments.interval = 0` — does the busy loop spin without yielding to the OS?
95
+
96
+ **G5. Floating-point comparison**
97
+ - `now = time.time()` at line 20 of `sweep_empty_dirs.py` returns `float` (seconds since epoch, sub-second precision).
98
+ - `created = os.path.getctime(each_directory_path)` at line 27 returns `float`.
99
+ - `if now - created >= min_age_seconds:` at line 30 — float minus float compared against `int` (`min_age_seconds: int`). Python promotes the int to float for comparison. The comparison is `>=`, not `==`, so IEEE-754 epsilon noise does not produce stale-result equality bugs of the `0.1 + 0.2 != 0.3` kind.
100
+ - The float subtraction `now - created` carries ~15-16 significant decimal digits. For Unix timestamps in 2026 (~1.78×10⁹), epsilon-magnitude noise is on the order of 10⁻⁷ seconds — irrelevant against a 120-second threshold.
101
+ - Shape B probes: (a) for very small `min_age_seconds` (e.g., `min_age_seconds=0` as in `test_skips_nonempty_dir` at line 73), the comparison `now - created >= 0` is dominated by `getctime` filesystem-resolution rounding (FAT32 = 2-second granularity, NTFS = 100ns) — does this matter for the test's intent? (b) does `time.time()` ever return a value equal to `os.path.getctime` for a directory just created, producing `0.0 >= 0` = True and immediate deletion? (c) is the float subtraction monotonic under wall-clock adjustment (NTP slew, manual clock change) — could `now < created` produce a negative result that compares False against a positive `min_age_seconds`?
102
+
103
+ **G6. Date / time arithmetic** ⭐ canonical surface
104
+ - `time.time()` at line 20 of `sweep_empty_dirs.py` returns a UTC-anchored Unix timestamp (seconds since 1970-01-01T00:00:00Z) per Python docs. There is no DST math, no timezone arithmetic, no leap-second handling in the file.
105
+ - `os.path.getctime` at line 27 also returns a UTC-anchored Unix timestamp (the platform-dependent "ctime" — on Windows this is creation time, on POSIX it is inode-change time). The two values are in the same units and the same epoch, so the subtraction at line 30 (`now - created`) is dimensionally consistent.
106
+ - Wall-clock vs monotonic: `time.time()` is wall-clock and subject to NTP adjustment, manual clock changes, and (theoretically) leap-second smearing. For age comparisons against a 2-minute default this is robust; for sub-second thresholds it would not be. The rubric calls out `time.time()` precision vs `time.monotonic()` / `time.perf_counter()` — verify whether the 2-minute default is wide enough to absorb a worst-case 1-second NTP slew.
107
+ - The test file at lines 21-22 builds a UTC `datetime` and formats it as `"%Y-%m-%d %H:%M:%S"` (no timezone suffix in the format string) and passes it to PowerShell `[DateTime]'{date_str}'`. PowerShell's `[DateTime]` cast on a string with no timezone parses as `Kind=Unspecified`, then assignment to `.CreationTimeUtc` reinterprets it as UTC. **This is a fragile contract:** if the format string ever included `%z`, PowerShell's `[DateTime]` cast would still discard the offset.
108
+ - Adversarial probes: (a) is the 120-second default wide enough that NTP slew (typically ≤128ms per `ntpd`) cannot push `now - created` below the threshold for a directory that should be deleted? (b) does `os.path.getctime` on Windows return file creation time or inode-change time — and does this match the test's assumption when `_set_creation_time_windows` mutates `CreationTimeUtc`? (c) the test at lines 34, 53, 54, 55, 64 passes `time.time() - 300` to `_set_creation_time_windows` — by the time PowerShell runs and writes `CreationTimeUtc`, additional wall-clock seconds have elapsed; does the 300-second offset have enough margin against a slow CI runner?
109
+ - Shape A candidate (P1, intended-behavior question): `if now - created >= min_age_seconds` at line 30 includes the boundary. Combined with the docstring at line 2 saying "older than 2 minutes," the boundary semantics may not match the docstring (see G8).
110
+
111
+ **G7. Unicode codepoint vs byte length**
112
+ - Python `len(...)` is never called on any string in `sweep_empty_dirs.py` or `sweep_config.py`. No length-based decisions.
113
+ - The test file's `subprocess.run(["powershell", "-Command", f"(Get-Item '{path}').CreationTimeUtc = ..."])` at lines 23-26 embeds `path` directly into a PowerShell command string. If `path` contains a single quote, the embedded single quote terminates the literal early — but this is a quoting/injection concern (Category C), not a codepoint vs byte length concern. There is no `len(path)` check upstream.
114
+ - The PowerShell `Write-Host` lines (e.g., line 75 of `Install-SweepEmptyDirs.ps1`) use string interpolation but no character/byte counting.
115
+ - Shape B probes: (a) does any consumer of `_log_walk_error`'s output (line 14) — for example, log forwarding or stderr capture — apply a byte-length truncation that mid-codepoint splits a UTF-8-encoded `os_error.filename`? (b) does `os.walk` decode directory names from raw bytes (POSIX) vs UTF-16 (Windows) in a way that affects which entries are seen — the `surrogateescape` boundary on Linux? (c) PowerShell's `Get-Item '{path}'` at test line 25 — does `'{path}'` containing non-BMP characters (codepoints above U+FFFF) get correctly round-tripped through `subprocess.run`'s argv encoding (Windows uses UTF-16 internally, but `subprocess.run` on Python 3.6+ uses CreateProcessW so this is normally fine)?
116
+
117
+ **G8. Threshold and age comparisons** ⭐ canonical surface
118
+ - `if now - created >= min_age_seconds:` at line 30 of `sweep_empty_dirs.py` uses `>=`, which means a directory whose age is **exactly** `min_age_seconds` IS deleted (the boundary fires).
119
+ - The docstring at line 2 reads: `"""Delete empty directories older than 2 minutes under a given root."""` — strict reading of "older than" suggests `>` (exclusive boundary).
120
+ - The argparse help text at line 45 reads: `f"Minimum age in seconds (default: {DEFAULT_AGE_SECONDS} = 2 minutes)"` — "minimum age" wording is ambiguous: a "minimum of 2 minutes" can be interpreted as "≥ 2 minutes" (matches the code) OR as "must exceed the 2-minute floor" (matches the docstring).
121
+ - The test at lines 30-37 (`test_deletes_empty_dir_older_than_threshold`) seeds `time.time() - 300` (300s old) against `min_age_seconds=120` — comfortably above the boundary, does not exercise the exact-boundary case.
122
+ - The test at lines 40-46 (`test_skips_empty_dir_newer_than_threshold`) seeds a fresh directory (age ≈ 0s) against `min_age_seconds=120` — comfortably below the boundary, does not exercise the exact-boundary case.
123
+ - The test at lines 68-75 (`test_skips_nonempty_dir`) passes `min_age_seconds=0` against a fresh directory. With `>=`, `now - created >= 0` is True (the age is at-or-past the zero boundary), so the only thing keeping the directory alive is `os.rmdir` raising `OSError` because the directory is non-empty (line 35 `except OSError: pass`). This is a load-bearing test that verifies non-emptiness rather than age — but it relies on `>=` semantics; with `>` (strict), the directory would also be skipped because `0 > 0` is False.
124
+ - Shape A candidate (P1): Boundary semantics conflict between code (line 30: `>=`, inclusive) and docstring (line 2: "older than", suggests exclusive `>`). Cite the conflict pair.
125
+ - Shape A candidate (P2 alternative): The age threshold inclusivity is not exercised by any test. Cite line 30 (the `>=` site) and the absence of an exact-120s test in `test_sweep_empty_dirs.py`.
126
+ - Adversarial probes for the `>=` boundary: (a) does a directory created at exactly `now - 120.0` seconds match the spirit of "older than 2 minutes" (no, by strict reading) or the letter of the code (yes)? (b) is the inclusive-boundary semantics safe under sub-second filesystem-resolution rounding (FAT32's 2-second granularity could land a "fresh" directory at exactly the boundary)? (c) does the PowerShell installer's `Write-Host` at line 75 (`age ≥ ${AgeSeconds}s`) document the inclusive boundary correctly — the `≥` symbol in the user-facing message is faithful to the code at sweep_empty_dirs.py:30, but the Python docstring at sweep_empty_dirs.py:2 says "older than" — three sites, two interpretations.
127
+
128
+ ## Cross-bucket questions to answer at the end
129
+
130
+ Q1: Are there boundary hazards that span two sub-buckets — e.g., a G6 timestamp imprecision that combines with a G8 inclusive comparison to flip a borderline case? Cite the line pair.
131
+ Q2: What's the worst boundary hazard introduced by this PR? Cite `packages/claude-dev-env/scripts/sweep_empty_dirs.py:<line>` (and any companion file:line if the hazard is multi-site).
132
+ Q3: Which threshold or constant is most fragile to a future change in input scale (e.g., shifting from minute-scale ages to second-scale, or from 2-minute defaults to 2-millisecond defaults)? Identify the line(s) where the unit assumption is hardcoded.
133
+
134
+ ## Output
135
+
136
+ Lead: `Total: N (P0=N, P1=N, P2=N)`. For each sub-bucket G1-G8, produce Shape A or Shape B (with ≥3 probes). Each Shape A finding must cite the file:line where the boundary or numeric type fails. Cross-bucket Q1-Q3 answers after the per-sub-bucket walk. Adversarial second pass: "assume your first pass missed at least 3 P1 boundary or overflow bugs across these 8 sub-buckets — find them." Open Questions section for ambiguities. Read-only. No edits, no commits.
137
+
138
+ ## Diff (4 new files, all lines in scope)
139
+
140
+ ### packages/claude-dev-env/scripts/sweep_empty_dirs.py
141
+ ```python
142
+ #!/usr/bin/env python3
143
+ """Delete empty directories older than 2 minutes under a given root."""
144
+
145
+ import argparse
146
+ import os
147
+ import sys
148
+ import time
149
+
150
+ from config.sweep_config import DEFAULT_AGE_SECONDS
151
+ from config.sweep_config import DEFAULT_POLL_INTERVAL
152
+
153
+
154
+ def _log_walk_error(os_error: OSError) -> None:
155
+ print(f"warning: cannot scan {os_error.filename} — {os_error.strerror}", file=sys.stderr)
156
+
157
+
158
+ def sweep(root: str, min_age_seconds: int) -> list[str]:
159
+ """Remove empty directories under *root* older than *min_age_seconds*."""
160
+
161
+ now = time.time()
162
+ removed: list[str] = []
163
+
164
+ for each_directory_path, _, _ in os.walk(
165
+ root, onerror=_log_walk_error, topdown=False
166
+ ):
167
+ try:
168
+ created = os.path.getctime(each_directory_path)
169
+ except OSError:
170
+ continue
171
+ if now - created >= min_age_seconds:
172
+ try:
173
+ os.rmdir(each_directory_path)
174
+ print(f"deleted: {each_directory_path}")
175
+ removed.append(each_directory_path)
176
+ except OSError:
177
+ pass
178
+
179
+ return removed
180
+
181
+
182
+ def _build_parser() -> argparse.ArgumentParser:
183
+ parser = argparse.ArgumentParser(description="Delete empty directories older than a given age.")
184
+ parser.add_argument("root", help="Root directory to scan")
185
+ parser.add_argument("--age", type=int, default=DEFAULT_AGE_SECONDS,
186
+ help=f"Minimum age in seconds (default: {DEFAULT_AGE_SECONDS} = 2 minutes)")
187
+ parser.add_argument("--once", action="store_true",
188
+ help="Single pass and exit instead of watching in a loop")
189
+ parser.add_argument("--interval", type=int, default=DEFAULT_POLL_INTERVAL,
190
+ help=f"Poll interval in seconds when looping (default: {DEFAULT_POLL_INTERVAL})")
191
+ return parser
192
+
193
+
194
+ def main() -> None:
195
+ parser = _build_parser()
196
+ arguments = parser.parse_args()
197
+
198
+ if not os.path.isdir(arguments.root):
199
+ print(f"error: not a directory: {arguments.root}", file=sys.stderr)
200
+ sys.exit(1)
201
+
202
+ if arguments.once:
203
+ sweep(arguments.root, arguments.age)
204
+ return
205
+
206
+ print(f"watching {arguments.root} every {arguments.interval}s (age threshold: {arguments.age}s)")
207
+ try:
208
+ while True:
209
+ sweep(arguments.root, arguments.age)
210
+ time.sleep(arguments.interval)
211
+ except KeyboardInterrupt:
212
+ print("\nstopped.")
213
+
214
+
215
+ if __name__ == "__main__":
216
+ main()
217
+ ```
218
+
219
+ ### packages/claude-dev-env/scripts/config/sweep_config.py
220
+ ```python
221
+ """Centralized timing configuration for sweep_empty_dirs."""
222
+
223
+ DEFAULT_AGE_SECONDS: int = 120
224
+ DEFAULT_POLL_INTERVAL: int = 30
225
+ ```
226
+
227
+ ### packages/claude-dev-env/scripts/tests/test_sweep_empty_dirs.py
228
+ ```python
229
+ """Tests for sweep_empty_dirs.py"""
230
+
231
+ from __future__ import annotations
232
+
233
+ import datetime
234
+ import os
235
+ import subprocess
236
+ import sys
237
+ import tempfile
238
+ import time
239
+ from pathlib import Path
240
+
241
+ _SCRIPTS_DIR = Path(__file__).resolve().parent.parent
242
+ if str(_SCRIPTS_DIR) not in sys.path:
243
+ sys.path.insert(0, str(_SCRIPTS_DIR))
244
+
245
+ from sweep_empty_dirs import sweep # noqa: E402
246
+
247
+
248
+ def _set_creation_time_windows(path: str, timestamp: float) -> None:
249
+ dt = datetime.datetime.fromtimestamp(timestamp, tz=datetime.timezone.utc)
250
+ date_str = dt.strftime("%Y-%m-%d %H:%M:%S")
251
+ subprocess.run(
252
+ ["powershell", "-Command",
253
+ f"(Get-Item '{path}').CreationTimeUtc = [DateTime]'{date_str}'"],
254
+ check=True, capture_output=True,
255
+ )
256
+
257
+
258
+ def test_deletes_empty_dir_older_than_threshold() -> None:
259
+ with tempfile.TemporaryDirectory() as tmp:
260
+ empty_dir = os.path.join(tmp, "old_empty")
261
+ os.mkdir(empty_dir)
262
+ _set_creation_time_windows(empty_dir, time.time() - 300)
263
+ removed = sweep(tmp, min_age_seconds=120)
264
+ assert empty_dir in removed
265
+ assert not os.path.isdir(empty_dir)
266
+
267
+
268
+ def test_skips_empty_dir_newer_than_threshold() -> None:
269
+ with tempfile.TemporaryDirectory() as tmp:
270
+ fresh_dir = os.path.join(tmp, "fresh_empty")
271
+ os.mkdir(fresh_dir)
272
+ removed = sweep(tmp, min_age_seconds=120)
273
+ assert fresh_dir not in removed
274
+ assert os.path.isdir(fresh_dir)
275
+
276
+
277
+ def test_deletes_nested_empty_dirs() -> None:
278
+ with tempfile.TemporaryDirectory() as tmp:
279
+ leaf = os.path.join(tmp, "parent", "child", "leaf")
280
+ os.makedirs(leaf)
281
+ _set_creation_time_windows(os.path.join(tmp, "parent"), time.time() - 300)
282
+ _set_creation_time_windows(os.path.join(tmp, "parent", "child"), time.time() - 300)
283
+ _set_creation_time_windows(leaf, time.time() - 300)
284
+ removed = sweep(tmp, min_age_seconds=120)
285
+ assert leaf in removed
286
+ assert os.path.join(tmp, "parent", "child") in removed
287
+ assert os.path.join(tmp, "parent") in removed
288
+
289
+
290
+ def test_empty_root_does_not_crash() -> None:
291
+ with tempfile.TemporaryDirectory() as tmp:
292
+ _set_creation_time_windows(tmp, time.time() - 300)
293
+ sweep(tmp, min_age_seconds=120)
294
+
295
+
296
+ def test_skips_nonempty_dir() -> None:
297
+ with tempfile.TemporaryDirectory() as tmp:
298
+ nonempty_dir = os.path.join(tmp, "has_stuff")
299
+ os.mkdir(nonempty_dir)
300
+ Path(nonempty_dir, "keepme.txt").write_text("hello")
301
+ removed = sweep(tmp, min_age_seconds=0)
302
+ assert nonempty_dir not in removed
303
+ assert os.path.isdir(nonempty_dir)
304
+ ```
305
+
306
+ ### packages/claude-dev-env/scripts/Install-SweepEmptyDirs.ps1
307
+ ```powershell
308
+ #!/usr/bin/env pwsh
309
+ param(
310
+ [Parameter(ParameterSetName = "install")]
311
+ [string]$Target,
312
+
313
+ [Parameter(ParameterSetName = "install")]
314
+ [int]$IntervalMinutes = 5,
315
+
316
+ [Parameter(ParameterSetName = "install")]
317
+ [int]$AgeSeconds = 120,
318
+
319
+ [Parameter(ParameterSetName = "remove")]
320
+ [switch]$Remove,
321
+
322
+ [Parameter(ParameterSetName = "status")]
323
+ [switch]$Status
324
+ )
325
+
326
+ $TaskName = "SweepEmptyDirs"
327
+
328
+ if ($Status) {
329
+ $task = Get-ScheduledTask -TaskName $TaskName -ErrorAction SilentlyContinue
330
+ if (-not $task) {
331
+ Write-Host "STATUS: $TaskName is not registered."
332
+ return
333
+ }
334
+ Write-Host "STATUS: $TaskName is registered."
335
+ Write-Host " State: $($task.State)"
336
+ Write-Host " Actions:"
337
+ foreach ($action in $task.Actions) {
338
+ Write-Host " $($action.Execute) $($action.Arguments)"
339
+ }
340
+ Write-Host " Triggers:"
341
+ foreach ($trigger in $task.Triggers) {
342
+ Write-Host " $($trigger.Repetition.Interval) (starting $($trigger.StartBoundary))"
343
+ }
344
+ return
345
+ }
346
+
347
+ if ($Remove) {
348
+ Unregister-ScheduledTask -TaskName $TaskName -Confirm:$false -ErrorAction SilentlyContinue
349
+ Write-Host "$TaskName removed."
350
+ return
351
+ }
352
+
353
+ $ScriptDir = Split-Path -Parent $PSCommandPath
354
+ $ScriptPath = Join-Path $ScriptDir "sweep_empty_dirs.py"
355
+
356
+ if (-not (Test-Path $ScriptPath)) {
357
+ Write-Error "sweep_empty_dirs.py not found at: $ScriptPath"
358
+ exit 1
359
+ }
360
+
361
+ if (-not $Target) {
362
+ Write-Error "Parameter -Target is required (the directory to watch)."
363
+ exit 1
364
+ }
365
+
366
+ if (-not (Test-Path $Target)) {
367
+ Write-Error "Target directory does not exist: $Target"
368
+ exit 1
369
+ }
370
+
371
+ $_py = Get-Command py -ErrorAction SilentlyContinue
372
+ $PythonPath = if ($_py) { $_py.Source } else { (Get-Command python).Source }
373
+ if (-not $PythonPath) {
374
+ Write-Error "Cannot find Python (py or python) on PATH."
375
+ exit 1
376
+ }
377
+ $Action = New-ScheduledTaskAction -Execute $PythonPath -Argument "$ScriptPath --once --age $AgeSeconds ""$Target"""
378
+ $Trigger = New-ScheduledTaskTrigger -Daily -At "00:00" -RepetitionInterval (New-TimeSpan -Minutes $IntervalMinutes)
379
+ $Settings = New-ScheduledTaskSettingsSet -AllowStartIfOnBatteries -DontStopIfGoingOnBatteries -StartWhenAvailable
380
+
381
+ Register-ScheduledTask -TaskName $TaskName -Action $Action -Trigger $Trigger -Settings $Settings -Force | Out-Null
382
+ Write-Host "$TaskName registered — runs every ${IntervalMinutes}min against '$Target' (age ≥ ${AgeSeconds}s)."
383
+ ```