@bookedsolid/rea 0.22.0 → 0.23.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (55) hide show
  1. package/README.md +15 -0
  2. package/THREAT_MODEL.md +582 -0
  3. package/dist/audit/append.js +1 -1
  4. package/dist/cli/doctor.js +11 -12
  5. package/dist/cli/hook.d.ts +37 -3
  6. package/dist/cli/hook.js +167 -5
  7. package/dist/cli/init.js +14 -26
  8. package/dist/cli/install/canonical.js +18 -3
  9. package/dist/cli/install/commit-msg.js +1 -2
  10. package/dist/cli/install/copy.js +4 -13
  11. package/dist/cli/install/fs-safe.js +5 -16
  12. package/dist/cli/install/gitignore.js +1 -5
  13. package/dist/cli/install/pre-push.js +3 -8
  14. package/dist/cli/install/settings-merge.js +79 -16
  15. package/dist/cli/upgrade.js +14 -10
  16. package/dist/gateway/downstream.js +1 -2
  17. package/dist/gateway/live-state.js +3 -1
  18. package/dist/gateway/log.js +1 -3
  19. package/dist/gateway/middleware/audit.js +1 -1
  20. package/dist/gateway/middleware/injection.js +3 -9
  21. package/dist/gateway/middleware/policy.js +3 -1
  22. package/dist/gateway/middleware/redact.js +1 -1
  23. package/dist/gateway/observability/codex-telemetry.js +1 -2
  24. package/dist/gateway/reviewers/claude-self.js +10 -6
  25. package/dist/hooks/bash-scanner/blocked-scan.d.ts +26 -0
  26. package/dist/hooks/bash-scanner/blocked-scan.js +467 -0
  27. package/dist/hooks/bash-scanner/index.d.ts +41 -0
  28. package/dist/hooks/bash-scanner/index.js +62 -0
  29. package/dist/hooks/bash-scanner/parse-fail-closed.d.ts +31 -0
  30. package/dist/hooks/bash-scanner/parse-fail-closed.js +27 -0
  31. package/dist/hooks/bash-scanner/parser.d.ts +42 -0
  32. package/dist/hooks/bash-scanner/parser.js +92 -0
  33. package/dist/hooks/bash-scanner/protected-scan.d.ts +76 -0
  34. package/dist/hooks/bash-scanner/protected-scan.js +815 -0
  35. package/dist/hooks/bash-scanner/verdict.d.ts +80 -0
  36. package/dist/hooks/bash-scanner/verdict.js +49 -0
  37. package/dist/hooks/bash-scanner/walker.d.ts +165 -0
  38. package/dist/hooks/bash-scanner/walker.js +7954 -0
  39. package/dist/hooks/push-gate/base.js +2 -6
  40. package/dist/hooks/push-gate/codex-runner.js +3 -1
  41. package/dist/hooks/push-gate/index.js +9 -10
  42. package/dist/policy/loader.js +4 -1
  43. package/dist/registry/tofu-gate.js +2 -2
  44. package/hooks/blocked-paths-bash-gate.sh +142 -272
  45. package/hooks/protected-paths-bash-gate.sh +227 -511
  46. package/package.json +3 -2
  47. package/profiles/bst-internal-no-codex.yaml +1 -1
  48. package/profiles/bst-internal.yaml +1 -1
  49. package/profiles/client-engagement.yaml +1 -1
  50. package/profiles/lit-wc.yaml +1 -1
  51. package/profiles/minimal.yaml +1 -1
  52. package/profiles/open-source-no-codex.yaml +1 -1
  53. package/profiles/open-source.yaml +1 -1
  54. package/scripts/postinstall.mjs +1 -2
  55. package/scripts/run-vitest.mjs +117 -0
package/README.md CHANGED
@@ -152,6 +152,21 @@ PR-issue-link advisory, architecture advisory). Each hook uses
152
152
  runs a HALT check near the top. See [Hooks shipped](#hooks-shipped) for
153
153
  the full inventory.
154
154
 
155
+ **Bash-tier scanner (parser-backed since 0.23.0).** Two hooks —
156
+ `protected-paths-bash-gate.sh` and `blocked-paths-bash-gate.sh` — are
157
+ shims that forward stdin to `rea hook scan-bash`, a CLI subcommand
158
+ that parses the Bash command via `mvdan-sh@0.10.1`, walks the AST,
159
+ and emits a verdict JSON. Pre-0.23.0 these were 500-line bash regex
160
+ pipelines; the rewrite closes 24 known-bypass classes
161
+ (helix-021..023 + discord-ops Round 13 + codex round 1) by replacing
162
+ re-tokenization heuristics with structural matches against the parsed
163
+ argv tree. The other nine hooks remain regex-based bash. The shim
164
+ re-verifies the verdict JSON shape on return so a tampered
165
+ `REA_NODE_CLI` env var cannot bypass. See
166
+ [`docs/architecture/bash-scanner.md`](docs/architecture/bash-scanner.md)
167
+ for the AST-walker design and [`docs/migration/0.23.0.md`](docs/migration/0.23.0.md)
168
+ for consumer migration notes.
169
+
155
170
  The hook layer runs independently of the MCP gateway — bypassing one does
156
171
  not disable the other. That redundancy is intentional.
157
172
 
package/THREAT_MODEL.md CHANGED
@@ -524,3 +524,585 @@ REA operates two independent layers. Bypassing one does not disable the other.
524
524
  **Gateway layer** (runtime, `rea serve`): A middleware chain processes every proxied MCP tool call. Middleware enforces: audit, kill switch, policy/autonomy level, tier classification, blocked paths, rate limit, circuit breaker, prompt-injection classification (§5.21), secret redaction (pre and post), and result size cap. The gateway also supervises downstream child processes (§5.14), emits a `SESSION_BLOCKER` audit event on persistent failure (§5.15), and publishes a live per-downstream state snapshot to `.rea/serve.state.json` (§5.16) that `rea status` reads read-only. The `__rea__health` meta-tool short-circuits the chain for callability under HALT and runs a dedicated sanitizer on its response (§5.17).
525
525
 
526
526
  Both layers fail closed: on read failure, parse error, unknown errno on HALT, regex timeout, or any unexpected condition, the default action is deny (or for redaction specifically: replace with a sentinel — the content never escapes unscanned).
527
+
528
+ ---
529
+
530
+ ## 8. Bash-tier scanner (parser-backed, 0.23.0+)
531
+
532
+ Two of the shipped hooks — `protected-paths-bash-gate.sh` and
533
+ `blocked-paths-bash-gate.sh` — are thin shims that forward stdin to
534
+ the `rea hook scan-bash` CLI subcommand. The CLI parses the Bash
535
+ command via `mvdan-sh@0.10.1`, walks the AST in
536
+ `src/hooks/bash-scanner/walker.ts`, and applies per-utility detectors
537
+ that produce a `DetectedWrite[]`. The scanner then matches each
538
+ detection's path against the protected-paths or blocked_paths policy
539
+ and emits a verdict JSON. The shim re-verifies the verdict shape via
540
+ `node -e` before honoring the exit code (defense against a tampered
541
+ `REA_NODE_CLI` that returns exit 0 with empty stdout).
542
+
543
+ ### 8.1 Trust assumptions
544
+
545
+ The scanner trusts the following components. Each row names what we
546
+ trust, what would happen if the trust were violated, and what pins
547
+ the trust.
548
+
549
+ | Component | What we trust | If violated | Pinned by |
550
+ | ---------------------------------- | -------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------- | -------------------------------------------------------------------------------------- |
551
+ | `mvdan-sh@0.10.1` | Produces a faithful AST from any input bash accepts | Detector misses target on mis-shaped node | Pinned version, RedirOperator op-code snapshot tests, Class O exhaustiveness contract |
552
+ | Walker dispatch table | Enumerates every shape that produces a write | Novel utility silently allowed | Per-PR corpus fixture requirement; 18 corpus classes; convergence ladder |
553
+ | `fs.lstatSync` / `fs.readlinkSync` | Identify symlinks including dangling links | Symlink-out bypass class reopens | Codex round 1 F-2 closure + symlink corpus |
554
+ | `node` on PATH in the shim | Verdict-JSON verifier runs | Shim refuses on uncertainty (fail-closed) | Shim test for missing-node branch |
555
+ | Project-root realpath | `realpath(cli).startsWith(realpath(CLAUDE_PROJECT_DIR))` defends symlink-out | Forged CLI path inside `node_modules` defeats CLI-resolver | Codex round 5 F2 closure + corpus |
556
+ | OS realpath semantics | `node:fs.realpathSync` resolves symlinks consistently with the kernel; macOS `/var` ↔ `/private/var` aliasing handled via the rea_resolved_relative_form helper | Path-traversal escape | helix-021 closure + corpus |
557
+ | `@bookedsolid/rea` package install | `node_modules/@bookedsolid/rea/package.json#name === "@bookedsolid/rea"` AND realpath stays in project | Supply-chain compromise — see §8.3 | npm provenance; opt-in `policy.review.cli_sha256` (deferred to future minor) |
558
+
559
+ The scanner does NOT trust:
560
+
561
+ - The Bash command string itself — every input is parsed with a
562
+ hostile-input-tolerant parser and walked with a deny-by-default
563
+ visitor.
564
+ - `REA_NODE_CLI` / any environment variable nominating the CLI path
565
+ (codex round 1 F-3 + round 2 R2-3 — env-var hijack class dropped).
566
+ - The CLI exit code alone — the shim re-parses verdict JSON via
567
+ `node -e` and cross-checks exit code matches verdict (round 1 F-3).
568
+ - The visitor's per-`Cmd`-kind enumeration — replaced with `syntax.Walk`
569
+ in round-6 (round 6 closure: walker-dispatch field-omission is
570
+ structurally impossible).
571
+ - mvdan-sh's `syntax.Walk` to visit every Word-bearing field — Class O
572
+ contract test pins reach (round-7 closure).
573
+ - `unshellEscape` to handle every DQ-escape sequence — bash spec
574
+ enumerated and pinned by Class P corpus (round-8 closure).
575
+
576
+ ### 8.2 Bypass classes structurally impossible
577
+
578
+ - **walker dispatch field-omission is structurally impossible**, and
579
+ **mvdan-sh `syntax.Walk` field gaps are pinned by a contract test**
580
+ (0.23.0 round-6 + round-7 layered closure).
581
+
582
+ Round 6 — our own dispatch. Pre-refactor `walkForWrites` dispatched
583
+ on AST `Cmd` kinds via an explicit `case` ladder
584
+ (`case 'WhileClause':`, `case 'ForClause':`, …) and manually
585
+ enumerated which fields each kind traversed. Any field NOT
586
+ enumerated in a case branch was silently dropped — that pattern
587
+ produced six rounds of P0 bypasses (rounds 1-5 patched detection
588
+ gaps; round 6 closed the structural class). Round 6 found two P0s
589
+ in the same class as round 5 (`WhileClause.Cond`,
590
+ `ForClause.CStyleLoop.{Init,Cond,Post}`) and the convergence ladder
591
+ 34→14→9→8→5→2 demonstrated the walker would never reach 0 with
592
+ patches alone — it was structurally a denylist over AST shapes.
593
+ The new walker uses `mvdan-sh`'s built-in `syntax.Walk(node, visit)`
594
+ which traverses every Cmd-kind dispatch exhaustively from OUR side.
595
+ Our dispatch is preserved (per-utility cp/mv/sed/find/etc.), but the
596
+ TRAVERSAL is no longer a denylist of OUR shapes. A new `Cmd` type
597
+ added to mvdan-sh, or a new field on an existing type, reaches OUR
598
+ dispatcher when Walk descends into its inner Stmts / CallExprs /
599
+ BinaryCmds.
600
+
601
+ Round 7 — Walk's own field gaps. Codex round 7 (P0) flagged that
602
+ the round-6 framing "Walk visits every field" was overclaim:
603
+ mvdan-sh@0.10.1's `syntax.Walk` itself has empirically-verified
604
+ field gaps. Specifically, `ParamExp.Slice.Offset` and
605
+ `ParamExp.Slice.Length` (Word nodes that can hold CmdSubst payloads)
606
+ are NOT visited. Pre-fix this defeated 17 round-7 PoCs including
607
+ `${X:$(rm)}`, `${X:0:$(rm)}`, `${arr[@]:$(rm)}`, `${@:$(rm)}` —
608
+ every paramexp-slice form bypassed every detector. That made
609
+ 0.23.0 a regression vs 0.22.0 (whose bash regex caught
610
+ `${X:$(rm)}` directly). Round-7 closure layers on round-6:
611
+
612
+ 1. Tactical fix: `walkForWrites` declares its visitor up front and
613
+ manually re-enters `syntax.Walk` on `ParamExp.Slice.Offset` /
614
+ `Slice.Length` whenever the visitor sees a ParamExp. The
615
+ re-entry uses the SAME visitor, so nested ParamExp.Slice forms
616
+ (e.g. `${X:${Y:$(rm)}}`) recurse correctly.
617
+ 2. Structural pin: the **Class O exhaustiveness contract test**
618
+ (`__tests__/hooks/bash-scanner/walker-exhaustiveness.contract.test.ts`)
619
+ enumerates every named (node-type, field) Word-bearing position
620
+ mvdan-sh's parser populates and asserts the walker reaches each
621
+ one. If mvdan-sh@0.11.0+ adds a new node-with-Word-field that
622
+ Walk skips, the contract fails CI before runtime. The fix in
623
+ that case is always a one-line manual recursion in the visit
624
+ callback (same pattern as `recurseParamExpSlice`).
625
+
626
+ Combined: walker-dispatch field-omission bugs in OUR code are
627
+ structurally impossible (Walk-based traversal). Walk's own field
628
+ gaps are pinned by Class O. New mvdan-sh versions cannot silently
629
+ introduce new bypass classes — they fail the contract first.
630
+ - segment-splitter mis-detection (helix-014/015/016) — there is no
631
+ segmenter to bypass.
632
+ - shell-redirect ordering vs argv ordering ambiguity — the parser
633
+ attaches `Redirs` to the right `Stmt`, including FuncDecl Body
634
+ Stmts (codex round 1 F-1).
635
+ - nested-shell payload bypass (helix-017 #2 / helix-022 #3) —
636
+ `bash -c PAYLOAD` re-parses the payload up to a depth cap of 8.
637
+ - `find -exec bash -c` re-wrap (codex round 1 F-4) — inner argv
638
+ routes through `detectNestedShell` for re-parse.
639
+ - combined flag-cluster bypass (`bash -ic`, `node -pe`, `perl -E`,
640
+ codex round 1 F-13/F-8) — the eval-flag scanner accepts any
641
+ short-flag cluster containing the eval char.
642
+ - backslash-escape bypass (codex round 1 F-15) — normalize strips
643
+ bash's runtime backslash collapse.
644
+ - dangling-symlink bypass (codex round 1 F-2) — `lstat`+`readlink`
645
+ resolves the link manually.
646
+ - REA_NODE_CLI hijack (codex round 1 F-3, codex round 2 R2-3) — env
647
+ var dropped entirely. The 4-tier resolver was further narrowed in
648
+ codex round 4 Finding 2: tiers 1 (`command -v rea` PATH lookup) and
649
+ 2 (`node_modules/.bin/rea` symlink) were DROPPED because both are
650
+ workspace-attacker-controllable. The shim now uses a 2-tier sandboxed
651
+ resolver (`node_modules/@bookedsolid/rea/dist/cli/index.js` or
652
+ `dist/cli/index.js`). Workspace-bin attacker hijack via fake `rea`
653
+ binary at `node_modules/.bin/rea` is closed. The structural-impossibility
654
+ claim is bounded — see §8.3 for the supply-chain residual.
655
+ - symlink-out-of-project sandbox bypass (codex round 5 F2 — P0) — pre-fix
656
+ the realpath check only walked up looking for a `package.json` with
657
+ `name === "@bookedsolid/rea"`. An attacker who could write a symlink
658
+ at `node_modules/@bookedsolid/rea` pointing to `/tmp/sym-attacker/`
659
+ containing a forged `package.json` + a forged `dist/cli/index.js` that
660
+ exits 0 with `{"verdict":"allow"}` defeated the sandbox. Fix: PRIMARY
661
+ check is `realpath(cli).startsWith(realpath(CLAUDE_PROJECT_DIR) + sep)`;
662
+ any escape-out-of-project realpath refuses. The pkg.json walk-up
663
+ remains as the SECONDARY check guarding intra-project hijacks.
664
+ - assignment-side cmdsubst silent bypass (codex round 5 F1 — P0) —
665
+ `FOO=$(rm -rf .rea)`, `X=\`rm -rf .rea\``, `export FOO=$(rm)`,
666
+ `readonly X=$(rm)`, `local`/`declare`/`typeset`, `ARR=( $(rm) )`,
667
+ `[[ -n $(rm) ]]`, `case $(rm) in *) ;; esac`, `cat <<< $(rm)`,
668
+ `read X < <(rm)`, `(( $(rm | wc -l) ))`, `for x in $(rm)`. Pre-fix the
669
+ walker short-circuited at `args.length === 0` and ignored
670
+ `CallExpr.Assigns`; the AST cases `DeclClause`, `TestClause`,
671
+ `ArithmCmd`, `LetClause`, `SelectClause` and `CaseClause.Word` were
672
+ dropped at the walkCmd default. Stmt.Redirs's Word also wasn't walked
673
+ for embedded CmdSubst on read ops (here-string `<<<` 0x3f, procsubst-
674
+ on-stdin `< <(...)` 0x38). Fix: new `walkAssignsForSubstNodes` walks
675
+ every Assign.Value / Assign.Array.Elems[*].Value / Assign.Index for
676
+ embedded CmdSubst/ProcSubst/ArithmExp; new `walkTestExpr` recurses
677
+ through UnaryTest/BinaryTest/ParenTest leaves; walkCmd cases added for
678
+ every dropped clause type; extractStmtRedirects walks the Word for
679
+ cmdsubst regardless of operator.
680
+ - mixed-quote interpreter shell-out (codex round 5 F3 — P1) — pre-fix
681
+ the per-language `*_SHELL_OUT_RE` patterns used `["']([^"']+)["']` for
682
+ the inner-cmd capture, which truncated bodies whose source contained
683
+ the alternate quote (e.g. `os.system('rm "x"')` captured `rm `). Fix:
684
+ quote-aware variants `(["'])((?:(?!\1)[^\\]|\\.)+)\1` for python /
685
+ ruby / node / perl, plus a fail-closed shell-out fallback that emits a
686
+ dynamic detection when the payload contains a shell-out API token but
687
+ no shell-out regex extracted a clean payload.
688
+ - chained-interpreter multi-level escape bypass (codex round 5 F4 — P1)
689
+ — pre-fix `python -c "import os; os.system('node -e \"require(\\\"fs
690
+ \\\").rmSync(\\\".rea\\\", ...)\"')"` allowed because each layer
691
+ accumulates a `\\\"` shell-escape level and the per-language path-
692
+ quote regex rejects `(\\"` after the call paren. Fix: when a shell-
693
+ out body itself contains a known interpreter binary head followed by
694
+ an eval flag (`-c`/`-e`/`--eval`/`-pe`/`-ic`), emit a dynamic
695
+ detection (`looksLikeChainedInterpreter`). Refuse on uncertainty.
696
+ - non-string `tool_input.command` (codex round 1 F-31) — refused at
697
+ CLI input parse.
698
+ - absolute / relative-path command head (codex round 2 R2-14) —
699
+ basename-normalized before dispatcher switch; `/usr/bin/bash`,
700
+ `./sed`, `/usr/bin/env bash` all dispatch identically to the
701
+ bare-name form.
702
+ - decoupled-variable interpreter writes (codex round 2 R2-1) —
703
+ flat-scan over the payload: write API + any string-construction
704
+ primitive → dynamic detection.
705
+ - symlink cycles / deep chains (codex round 2 R2-2) — visited-set +
706
+ depth cap (32); cycle/cap returns sentinel that maps to "refuse on
707
+ uncertainty".
708
+ - joined `-t<DIR>` form (codex round 2 R2-4) — cp/mv/install/ln all
709
+ recognize the no-space form.
710
+ - tar `-C DIR`, rsync DEST, curl `-o`, wget `-O`, shred FILE,
711
+ eval payload, git checkout/restore/reset path (codex round 2 R2-7
712
+ through R2-13) — each utility now has a dedicated dispatcher.
713
+ - heredoc-into-shell (codex round 2 R2-12) — `bash <<EOF\n…\nEOF`
714
+ re-parses the heredoc body and walks the inner AST.
715
+ - eval re-parse (codex round 2 R2-13) — argv concat → re-parse →
716
+ walk; refuse on dynamic argv or parse failure.
717
+ - eval ordering with cmdsubst (codex round 3 Finding 1 — P0) —
718
+ `eval $(cmd)` no longer slips through the empty-inner short-circuit;
719
+ any-dynamic-argv emits a dynamic detection.
720
+ - pipe-into-bare-shell (codex round 3 Finding 2 — P1) — `cmd | bash`
721
+ with no `-c` is refuse-on-uncertainty.
722
+ - tar cluster `-xzfC` (codex round 3 Finding 3 — P1) — value-bearing
723
+ cluster chars consume subsequent argv tokens correctly.
724
+ - git top-level value-bearing flags (codex round 3 Finding 4 — P1) —
725
+ `-C`, `-c`, `--git-dir`, `--work-tree`, etc. are walked past before
726
+ identifying the subcommand.
727
+ - python shell-out shapes (codex round 3 Finding 5 — P1) —
728
+ `subprocess.* shell=True` and `subprocess.run(..., stdout=open())`
729
+ re-parse the inner shell.
730
+ - recursive directory delete bypass (codex round 4 Finding 1 — P0) —
731
+ `rm -rf .rea`, `rmdir .rea`, `find .rea -delete`, `shutil.rmtree(...)`,
732
+ `fs.rmSync(..., {recursive:true})`, `FileUtils.rm_rf(...)` etc. all
733
+ flag isDestructive on emit; the matcher's protected-ancestry path
734
+ treats writes against an ancestor directory as hits on every protected
735
+ pattern under it. Structurally closed: `PROTECTED_DIR_ANCESTORS` was
736
+ added to the corpus generator so the cross-product produces directory-
737
+ shaped destructive fixtures, eliminating the structural gap that
738
+ prevented detection.
739
+ - mv source-side bypass (codex round 4 Finding 3 — P1) — `mv` source
740
+ positionals are emitted as destructive writes too (mv removes content
741
+ at the source).
742
+ - find -delete unmodeled (codex round 4 Finding 4 — P1) — seed paths
743
+ are emitted as destructive write targets; with `-name PREDICATE`
744
+ present, the seed is emitted as dynamic+destructive (refuse on
745
+ uncertainty).
746
+ - interpreter shell-out breadth (codex round 4 Finding 5 — P1) — perl
747
+ exec/open-pipe, ruby Open3 / IO.popen, node spawn-with-bash-c, python
748
+ pty.spawn, opaque-spawn APIs (os.spawnv* / os.execv*) all detected
749
+ with re-parse-or-refuse.
750
+ - pathlib & File-class destructive APIs (codex round 4 Finding 6 — P1)
751
+ — Path(...).touch / .unlink / .rmdir / .rename, File.delete /
752
+ File.unlink / File.rename, ruby `open(F, 'w')` (bare), perl unlink /
753
+ rename all caught with isDestructive plumbed through where the API
754
+ semantic is removal.
755
+ - misc utility writes (codex round 4 Finding 7 — P1) — `patch`, `sort
756
+ -o`, `shuf -o`, `gpg --output`, `split <PREFIX>`, `trap "cmd" SIGNAL`
757
+ (re-parse trap command), `git config --file` all detected.
758
+ - procsubst-feeding-bash (codex round 4 Finding 7 — P1) — `bash <(cmd)`,
759
+ `bash 0< <(cmd)`, `bash <<< "cmd"` all emit dynamic detections; the
760
+ shell reads the FIFO/here-string as a script we cannot statically
761
+ resolve.
762
+ - nested-shell DQ-escape parity (codex round 8 P0) — `unshellEscape`
763
+ pre-fix collapsed only `\"` and `\'` from DQ payloads. Bash spec
764
+ collapses 5 DQ-significant escape sequences (`\$`, `` \` ``, `\"`,
765
+ `\\`, `\<newline>`). Pre-fix `bash -c "echo \"\$(rm .rea/HALT)\""`
766
+ was a verified end-to-end exploit: the real shell collapsed `\$` →
767
+ `$`, evaluated the CmdSubst, and deleted HALT, while the walker's
768
+ re-parse saw `\$` as a literal `$` (no CmdSubst child) and ALLOWED.
769
+ Fix: widened `unshellEscape`'s replace class to all five DQ-significant
770
+ escape characters. Class P corpus (≥560 fixtures across 5 shells × 16
771
+ DQ-escape shapes × 7 protected targets, plus 8 negatives) pins the
772
+ closure. Class O contract (round-7) was simultaneously tightened —
773
+ the lenient `|| w.dynamic` acceptance was replaced with opt-in
774
+ `acceptDynamic` per row, so contract-test passes can no longer mask
775
+ walker gaps via unrelated `nested_shell_inner` dynamic emits.
776
+ - wrapper-shell-exec class (codex round 9 F1 + round 10 P1) —
777
+ `<wrapper> <shell> -c PAYLOAD` shape where the wrapper transparently
778
+ forks/execs the next argv as the "real" command (`nice`, `timeout`,
779
+ `chronic`, `parallel`, `watch`, `dbus-launch`, ... and unbounded
780
+ future similar wrappers). Pre-round-9 `stripEnvAndModifiers` ignored
781
+ these wrappers, so the head-dispatch saw the wrapper name and missed
782
+ the inner `<shell> -c PAYLOAD`. Round 9 enumerated 21 wrappers; round
783
+ 10 surfaced 5 more — clear evidence the enumeration approach was
784
+ unbounded. Round-10 closure is **structural**: a new
785
+ `detectWrappedNestedShell` pass runs in `walkCallExpr`'s `default:`
786
+ case (head not in dispatcher's allow-list) and detects the bypass
787
+ shape `<UNRECOGNIZED-HEAD> [...flags...] <KNOWN-SHELL> -c PAYLOAD`
788
+ REGARDLESS of wrapper identity. Synthesizes a `[shell, -c, PAYLOAD,
789
+ ...]` slice and re-dispatches through `detectNestedShell`. False-
790
+ positive guards (a) skip when head is an introspection / output
791
+ utility (`echo`, `printf`, `man`, `which`, ...) and (b) skip when
792
+ argv[1] is itself an introspection head — covers
793
+ `<wrapper> echo bash` shapes. Three-token lookahead window between
794
+ shell positional and `-c` flag bounds false-positive risk. Bare-
795
+ shell-without-`-c` form refuses on uncertainty (stdin read).
796
+ Closes the bug class — every future unknown wrapper that
797
+ fork/execs a shell is caught without enumeration. Round 10 also
798
+ added explicit enumerations for `chronic`/`dbus-launch`/`watch`/
799
+ `script -c`/`parallel ::: ` for clean dispatch (no
800
+ refuse-on-uncertainty banner). Class S (233 wrapper-extension
801
+ positives + 38 negatives) and Class T (314 synthetic-wrapper
802
+ structural-guard positives + 29 false-positive-guard negatives)
803
+ pin the closure.
804
+ - find-exec placeholder, git history-rewrite seams, archive
805
+ extraction, parallel-stdin, more wrappers, php (codex round 11
806
+ F11-1..F11-7) — seven INDEPENDENT classes against the round-10
807
+ wrapper closure. None were variants of the wrapper family;
808
+ each landed in a different parser seam. (a) `find . -name HALT
809
+ -exec rm {} \;` — `{}` is a placeholder substituted at runtime
810
+ by find against the live filesystem; static analysis cannot
811
+ resolve which paths it expands to. Round-11 fix: synthetic
812
+ `find_exec_placeholder_unresolvable` dynamic detection emitted
813
+ whenever inner argv has `{}` AND the inner head is not in a
814
+ small read-only allow-list (`cat`, `grep`, `head`, `wc`, etc.).
815
+ (b) `git rm -f .rea/HALT` and `git mv .rea/HALT /tmp/x` were
816
+ not in the `TRACKED` subcommand set; round-11 added explicit
817
+ branches with `--cached` carve-out for `git rm`. (c) `git
818
+ filter-branch --tree-filter PAYLOAD` and `git rebase --exec`
819
+ / `-x` / `git bisect run` / `git commit --template=PATH` were
820
+ re-parse seams where git feeds PAYLOAD through `/bin/sh -c` at
821
+ runtime; round-11 added per-subcommand handlers feeding PAYLOAD
822
+ through `recurseShellPayload` → `parseBashCommand` →
823
+ `walkForWrites` (full top-level walker re-dispatch). (d)
824
+ archive extraction: `tar -xf x.tar -C . .rea/HALT` extracts the
825
+ protected member; `tar -xzf x.tgz` (no `-C`, no member list)
826
+ extracts every member — archive may contain `.rea/HALT`.
827
+ Round-11 fix: `detectTar` extended with extract-mode positional
828
+ harvesting, plus new `detectUnzip`/`detect7z`/`detectGzip`/
829
+ `detectPax` dispatchers. `bsdtar` aliases to `tar`. When -x is
830
+ set with no `-C` AND no explicit members, emit
831
+ `archive_extract_unresolvable`. (e) `echo .rea/HALT | parallel
832
+ rm` reads input from stdin (no `:::`); round-11 added pre-strip
833
+ detection in `walkCallExpr` that emits
834
+ `parallel_stdin_unresolvable` when parallel head has positional
835
+ template tokens AND no `:::`/`::::`/`:::+`/`::::+` separator.
836
+ (f) `fakeroot`/`flock`/`gtimeout`/`unshare`/`env --chdir=`/
837
+ `sudo -s --` were not stripped by `stripEnvAndModifiers`;
838
+ round-11 added each wrapper with appropriate flag arity
839
+ handling, plus shell-mode `sudo -s --` synthesis to `sh -c
840
+ PAYLOAD`. (g) `php -r CODE` was missing from interpreter
841
+ dispatch; round-11 added `PHP_EVAL_FLAGS` (-r, --run) and
842
+ `PHP_WRITE_PATTERNS` covering `unlink`/`file_put_contents`/
843
+ `rename`/`copy`/`fopen` (write modes)/`mkdir`/`rmdir`/`touch`/
844
+ `chmod`/`chown`/`chgrp`/`symlink`/`link`/`move_uploaded_file`.
845
+ Class U (round-11 find/git/archive/parallel — 360 positives +
846
+ 17 negatives) and Class V (round-11 wrappers + PHP — 124
847
+ positives + 8 negatives) pin the closure.
848
+ - adjacent-utility / cumulative-parity gaps (codex round 12
849
+ F12-1..F12-9 — nine INDEPENDENT findings against the round-11
850
+ surface). Not variants of any prior round; each landed in
851
+ PHP / archive-create / cmake / mkfifo+mknod / find-write-
852
+ predicate space where round-11 had not applied the cumulative
853
+ discipline established by earlier rounds. (a) F12-1 P0:
854
+ PHP `rename(SRC, DEST)` SOURCE-side blindspot — round-4 F3
855
+ established mv-shape source IS destructive; round-11 bundled
856
+ PHP rename with the destination-only group, so SRC slipped
857
+ past. Round-12 fix: split rename into TWO patterns + add
858
+ `rename(` to DESTRUCTIVE_API_TOKENS. (b) F12-2 P0: PHP
859
+ `rmdir(PATH)` not flagged destructive — bundled with mkdir/
860
+ touch (creates), so protected-ancestry never matched. Round-12
861
+ fix: split rmdir + add `rmdir(` to DESTRUCTIVE_API_TOKENS.
862
+ (c) F12-3 P0: PHP shell-out missing entirely —
863
+ `pickShellOutPatternsFor` had no php_r_path case. Round-12
864
+ fix: new PHP_SHELL_OUT_RE with quote-aware backref body
865
+ extraction covering system / exec / shell_exec / passthru /
866
+ popen / proc_open / backtick. (d) F12-4 P0: PHP -B/-E /
867
+ --process-begin / --process-end accept CODE same as -r;
868
+ round-11 PHP_EVAL_FLAGS only had -r/--run. Round-12 fix:
869
+ extend exactLong + shortChars (case-sensitive uppercase).
870
+ (e) F12-5 P0: archive CREATE direction missing — only EXTRACT
871
+ was checked. `tar -cf .rea/policy.yaml docs/`, `zip
872
+ .rea/policy.yaml docs/file`, `7z a .rea/policy.yaml docs/`
873
+ all silently overwrote the OUTPUT archive at the protected
874
+ path. Round-12 fix: detectTar gains isCreateOrAppend pass +
875
+ -f/-cf/--file emit; detect7z gains a/u/d compress branch;
876
+ new detectZip dispatcher (zip OUTPUT.zip [files...]).
877
+ (f) F12-6 P1: cmake -E utility surface — rm/remove/rename/
878
+ copy/copy_if_different/copy_directory/touch/remove_directory/
879
+ create_symlink/create_hardlink/make_directory all slipped past
880
+ pre-fix. Round-12 fix: new detectCmake with per-subcommand
881
+ argv shapes (cp-shape, mv-shape, variadic, second-positional).
882
+ (g) F12-7 P1: mkfifo / mknod create special files at protected
883
+ paths; no dispatchers existed. Round-12 fix: new detectMkfifo
884
+ (variadic) and detectMknod (NAME is first bare positional).
885
+ (h) F12-8 P1: find write-predicates -fls / -fprint / -fprintf
886
+ not in detectFind. Round-12 fix: scan for these predicates
887
+ and emit FILE as destructive write target (-fprintf consumes
888
+ TWO args). (i) F12-9 P2 false-positive regression: detectUnzip
889
+ emitted dynamic unresolvable for read-only flags `-p` (extract
890
+ to stdout), `-l/-Z/-t/-v/-z` (list/test/verbose/comment).
891
+ Round-12 fix: early-return ALLOW from detectUnzip when any
892
+ read-only flag (or cluster char) present. Class W (round-12
893
+ closures — 173 positives + 18 negatives) pins the closure.
894
+
895
+ ### 8.3 Bypass classes still possible
896
+
897
+ - **`@bookedsolid/rea` package-tier supply-chain compromise** (codex
898
+ round 5 F5 — P1/P3 acknowledged residual). The bash-tier shim's
899
+ CLI-resolution sandbox check (codex round 4 #2 + round 5 F2)
900
+ defeats node_modules-symlink-out and workspace-bin hijack. It does
901
+ NOT defeat an attacker who can write a forged
902
+ `node_modules/@bookedsolid/rea/dist/cli/index.js` *and* a matching
903
+ `node_modules/@bookedsolid/rea/package.json` with `name ===
904
+ "@bookedsolid/rea"` directly inside the project's `node_modules/`.
905
+ Such an attacker has already compromised the package install
906
+ pipeline (e.g. via a compromised lockfile / dependency-confusion
907
+ attack / npm registry compromise). At that level the attacker can
908
+ also forge any other dependency the agent uses, so hook-tier defense
909
+ is past — the trust boundary is the `npm install` provenance check
910
+ (npm provenance + manifest verification) rather than the bash gate.
911
+ Hardening (opt-in): operators may set `policy.review.cli_sha256: <hex>`
912
+ in `.rea/policy.yaml` and the shim will refuse if the resolved CLI's
913
+ SHA-256 does not match. Defaults to unset; documented as an opt-in
914
+ belt-and-braces measure rather than a structural defense (see
915
+ `docs/architecture/bash-scanner.md` for the full rationale).
916
+ - **`git checkout REVISION PATH` and `git restore --source=REVISION
917
+ PATH` without the POSIX `--` argv separator** (codex round 9 F3 —
918
+ accepted false negative since 0.22.0). When an attacker invokes
919
+ `git checkout main .rea/HALT` (or `git restore --source=HEAD~1
920
+ .rea/policy.yaml`), git's pre-`--` argv shape is structurally
921
+ ambiguous between "REVISION PATH" and "PATH...". The `detectGit`
922
+ walker conservatively treats positionals after the subcommand as
923
+ destructive-overwrite targets ONLY when `--` is present, because
924
+ disambiguation requires a runtime ref-existence check that the
925
+ static walker cannot perform without filesystem I/O. Workaround:
926
+ the kill-switch invariants for protected files (`.rea/HALT`,
927
+ `.rea/policy.yaml`, `.claude/settings.json`, etc.) are still caught
928
+ by the symlink-resolution layer in `protected-paths-bash-gate.sh`
929
+ at file-write time when git actually opens the destination for
930
+ write — the bash-scanner's static layer is one of multiple
931
+ enforcement points. Pin: 0.24.0 milestone for a comprehensive fix
932
+ (likely a conservative refusal whenever the first positional
933
+ matches a known git-revspec shape AND following positionals exist).
934
+ - glob expansion in argv-based commands (`chmod +x bin/*.sh` on a
935
+ filesystem where `bin/.rea` exists). Glob detection is scoped to
936
+ redirect-form targets only because argv-globs in legitimate code
937
+ are common. Future: enumerate filesystem-level glob matches.
938
+ - awk `-f script-file` body. We currently emit a dynamic detection
939
+ (refuse on uncertainty). Future: read + scan the file.
940
+ - semantic obfuscation via `${!ind}` indirect expansion, `read -p`
941
+ prompts, computed-property attacks in interpreter payloads that
942
+ evade the flat-scan heuristic. These are parse-correct and the
943
+ detector is best-effort.
944
+ - WASM `sh-syntax` migration risk: parser bugs in the new library
945
+ could reopen bypass classes. Mitigated by the corpus fixture
946
+ suite (every closed bypass replays as a positive regression
947
+ test).
948
+ - **Round-13 deferred to 0.24.0** (utility-enumeration completeness
949
+ — same shape as round 12; convergence at this tier is asymptotic
950
+ per codex's explicit assessment):
951
+ 1. **PHP indirect-callable shell-out** —
952
+ `array_map("system", [...])`, `call_user_func("system", ...)`,
953
+ `eval("system(...);")`, variable-bound callable
954
+ `$f = "system"; $f(...)`. Round 12 closed direct calls;
955
+ indirect callable forms remain.
956
+ 2. **vim/emacs editor exec re-parse seam** — `vim -c "!cmd"`,
957
+ `vim -c "w PATH"`, `vim -c "source PATH"`, `emacs --eval
958
+ "(shell-command \"...\")"`, `emacs --eval "(delete-file
959
+ \"...\")"`. No vim/emacs dispatcher in 0.23.0.
960
+ 3. **Standalone compression utilities** — `xz -o FILE`,
961
+ `zstd -o FILE`, `lz4 INPUT FILE`, `lzma -o FILE`. Round 12
962
+ closed tar/zip/7z/cpio/pax create-direction; standalone
963
+ compression family adjacent.
964
+ 4. **Image-builder utilities** — `mksquashfs INPUT OUTPUT`,
965
+ `genisoimage -o OUTPUT`, `mkisofs -o OUTPUT`, `xorriso
966
+ -outdev OUTPUT`. CI/build pipelines use these.
967
+ 5. **`7zz` Linux-canonical 7zip alias** — `detect7z` only fires
968
+ on `7z`; the official Linux package binary is `7zz`.
969
+ 6. **Python argv-form subprocess** — `subprocess.Popen(['rm',
970
+ '...'])`, `subprocess.run(['rm', '...'])` etc. without
971
+ `shell=True`. Round 3 F5 closed `shell=True` form;
972
+ argv-list form still allows.
973
+ 7. **vim ex-mode literal-path write** (P2) — `vim -c "w PATH"`
974
+ writes literally with PATH in argv, regex-level miss
975
+ independent of the broader `-c` re-parse seam.
976
+ - **Denylist scanner is structurally limited** (acknowledged by
977
+ codex round 4 and reaffirmed by codex round 13: "convergence on
978
+ enumeration completeness is asymptotic — every round will probably
979
+ find more"). A denylist enumerates the destructive shapes the
980
+ scanner knows. Novel shapes (interpreters or utilities not yet
981
+ modeled, language constructs we haven't seen) can in principle
982
+ bypass until added. Defense in depth:
983
+ 1. mvdan-sh AST parsing eliminates an entire class of regex/
984
+ segmenter mistakes.
985
+ 2. Comprehensive walker dispatchers per known destructive utility +
986
+ per known shell-out + per known interpreter API.
987
+ 3. Adversarial corpus generators span the parameter cross-product
988
+ so generators produce shapes Codex hasn't visited.
989
+ 4. Per-round Codex review surfaces gaps before release; the
990
+ convergence ladder
991
+ (round 1 → round 2 → round 3 → round 4 → round 5 ...) is the
992
+ audit trail.
993
+ 5. Fail-closed defaults: dynamic targets always block.
994
+ An allowlist scanner ("only known-safe commands pass") would close
995
+ this class structurally but is incompatible with the rea use case
996
+ (agentic workflows need arbitrary bash access).
997
+
998
+ ### 8.4 Test surface
999
+
1000
+ The fixture corpus at
1001
+ `__tests__/hooks/bash-tier-corpus.test.ts` (≥185 entries) and
1002
+ `__tests__/hooks/bash-tier-corpus-round2.test.ts` (≥186 entries,
1003
+ codex round 2 bypass-class fixtures) locks every documented bypass
1004
+ class as a regression-positive test.
1005
+ The walker unit tests at `__tests__/hooks/bash-scanner/walker.test.ts`
1006
+ pin the parser-emitted RedirOperator codes (codex round 1 F-33) so a
1007
+ parser-library bump that re-numbers them fails LOUDLY. The verdict-
1008
+ shape snapshot at `__tests__/hooks/bash-scanner/verdict-shape.test.ts`
1009
+ locks the wire format for the bash shim consumers.
1010
+
1011
+ The cross-product corpus at
1012
+ `__tests__/hooks/bash-scanner/adversarial-corpus.test.ts` runs ≥7700
1013
+ fixtures across 18 classes (A–P plus extensions). Coverage assertion:
1014
+ ≥3000 positive (must-block) and ≥1000 negative (must-allow) fixtures.
1015
+
1016
+ The Class O exhaustiveness contract test at
1017
+ `__tests__/hooks/bash-scanner/walker-exhaustiveness.contract.test.ts`
1018
+ pins the walker reach across every Word-bearing AST position
1019
+ mvdan-sh's parser populates. Round-8 tightened the acceptance to
1020
+ path-explicit-by-default; opt-in `acceptDynamic` per row is the only
1021
+ way to accept a `dynamic: true` write as proof-of-reach.
1022
+
1023
+ The bash-shim subprocess sampling at
1024
+ `adversarial-corpus.test.ts > "bash shim subprocess sampling"`
1025
+ spawns the actual hook script under a clean env across 100
1026
+ deterministically-sampled fixtures, parses verdict JSON, and
1027
+ cross-checks against in-process scan. Catches drift between the
1028
+ in-process verdict and what the shim's JSON verifier + 4-tier
1029
+ resolver chain actually returns.
1030
+
1031
+ ### 8.5 Defense in depth
1032
+
1033
+ The bash gate is one layer. The full defensive stack:
1034
+
1035
+ 1. **Parser AST** (`mvdan-sh@0.10.1`) — eliminates regex/segmenter
1036
+ tokenization mistakes.
1037
+ 2. **Walker** (`syntax.Walk`-based deny-by-default traversal +
1038
+ `recurseParamExpSlice` for Walk gaps) — visits every node type;
1039
+ no Cmd-kind branch can silently drop a field.
1040
+ 3. **Per-utility dispatchers** — comprehensive coverage of cp, mv,
1041
+ sed, dd, tee, install, ln, awk, ed, ex, find, xargs, node,
1042
+ python, ruby, perl, tar, rsync, curl, wget, shred, eval, git,
1043
+ patch, sort, shuf, gpg, split, trap, bash/sh/zsh/dash/ksh.
1044
+ 4. **Interpreter scanners** — write-API tokens for node fs, python
1045
+ os/shutil/subprocess, ruby Pathname/FileUtils, perl unlink/rename;
1046
+ shell-out re-parse for `system`/`subprocess.run shell=True`/`qx`/
1047
+ backticks.
1048
+ 5. **DQ-escape parity** — `unshellEscape` collapses all 5 DQ-significant
1049
+ escapes (round-8) so re-parser sees the same syntax tree as bash.
1050
+ 6. **Symlink resolver** — visited-set + depth cap (32); refuses on
1051
+ cycle/cap; macOS `/var` ↔ `/private/var` aliasing handled by
1052
+ `rea_resolved_relative_form` (helix-021).
1053
+ 7. **2-tier sandboxed CLI resolver** — only
1054
+ `node_modules/@bookedsolid/rea/dist/cli/index.js` and
1055
+ `dist/cli/index.js` accepted; `realpath` containment check
1056
+ refuses any escape from `CLAUDE_PROJECT_DIR`.
1057
+ 8. **Verdict JSON verifier** — shim re-parses CLI output via
1058
+ `node -e` and cross-checks exit code ↔ verdict (round-1 F-3).
1059
+ 9. **Cross-product corpus** (Classes A–P) — ≥7700 fixtures span the
1060
+ parameter space so generators produce shapes the round-by-round
1061
+ manual review hadn't visited.
1062
+ 10. **Class O exhaustiveness contract** — pins every Word-bearing
1063
+ AST position so mvdan-sh upgrades cannot silently introduce
1064
+ Walk-skip bypasses.
1065
+ 11. **Codex adversarial review** — every release goes through
1066
+ `/codex-review` before merge; convergence ladder is the audit
1067
+ trail. 0.23.0 round count: 8 (and counting).
1068
+ 12. **Middleware audit log** — every tool invocation is hash-chained
1069
+ in `.rea/audit.jsonl` (append-only, tamper-evident).
1070
+ 13. **Codex push-gate** (0.11.0+) — pre-push stateless review by
1071
+ GPT-5.4 (codex-auto-review) catches semantic concerns the static
1072
+ scanner cannot reason about.
1073
+ 14. **Husky 9 hook chain** — `commit-msg`, `pre-push`, `pre-commit`
1074
+ register every hook in the package; consumers can extend via
1075
+ `.husky/{commit-msg,pre-push,pre-commit}.d/*` (helix-018 Option B).
1076
+
1077
+ A bypass requires defeating multiple layers simultaneously. The
1078
+ trust boundary between this stack and the rest of the system is
1079
+ package-tier integrity (npm provenance + manifest verification).
1080
+
1081
+ ### 8.6 Out of scope
1082
+
1083
+ The bash gate explicitly does NOT defend against:
1084
+
1085
+ - **Kernel-level / LD_PRELOAD / ptrace attacks.** The OS kernel and
1086
+ loader are trusted. An attacker with code execution at that tier
1087
+ bypasses every userland defense.
1088
+ - **Network-tier MITM during package install.** rea trusts the npm
1089
+ registry (with provenance verification) and the system's TLS root
1090
+ store. A compromised TLS chain at `npm install` time gives the
1091
+ attacker the same authority as the package itself.
1092
+ - **Supply-chain compromise of `@bookedsolid/rea` on npm.** A signed
1093
+ release with malicious code defeats the gate because the gate IS
1094
+ the malicious code. Mitigation is npm provenance + manifest
1095
+ verification + (opt-in) `policy.review.cli_sha256`. See §8.3.
1096
+ - **Out-of-band file modifications.** rea gates Bash tool calls and
1097
+ Write/Edit/MultiEdit Write-tier hooks. Filesystem changes initiated
1098
+ outside the harness (user editing files directly, language server
1099
+ edits, other processes) are not gated.
1100
+ - **Read-side policy leaks.** The bash gate concerns WRITES. Reading
1101
+ `.rea/policy.yaml` is allowed by default — the policy is checked-in
1102
+ and visible. `env-file-protection.sh` handles `.env*` reads at the
1103
+ Write tier; bash-tier coverage of `.env*` reads is via
1104
+ `dependency-audit-gate.sh` and the segmenter for those forms.
1105
+ - **Attacker-controlled PATH at scanner runtime.** If `rea` resolves
1106
+ to an attacker binary on PATH, the gate is defeated. Production
1107
+ deployments pin PATH via the harness; `rea doctor` verifies PATH
1108
+ integrity at install time but does not enforce it at runtime.
@@ -35,7 +35,7 @@
35
35
  import fs from 'node:fs/promises';
36
36
  import path from 'node:path';
37
37
  import { Tier, InvocationStatus } from '../policy/types.js';
38
- import { GENESIS_HASH, computeHash, fsyncFile, readLastRecord, withAuditLock, } from './fs.js';
38
+ import { GENESIS_HASH, computeHash, fsyncFile, readLastRecord, withAuditLock } from './fs.js';
39
39
  import { maybeRotate } from '../gateway/audit/rotator.js';
40
40
  const REA_DIR = '.rea';
41
41
  const AUDIT_FILE = 'audit.jsonl';