polin-guard 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,94 +1,141 @@
1
- # polin-guard
2
-
3
- **Block obfuscated build/commit-time code-injection payloads before they ever enter your repo.**
4
-
5
- `polin-guard` is a tiny, **zero-dependency** scanner that catches the family of
6
- malicious JavaScript "stagers" that hide a payload on a single, space-padded line
7
- inside an otherwise normal config or entry file — e.g. `tailwind.config.js`,
8
- `ecosystem.config.js`, `.eslintrc.js`, `postcss.config.js`, or `src/index.ts`.
9
-
10
- These payloads typically:
1
+ <h1 align="center">🛡️ polin-guard</h1>
2
+
3
+ <p align="center">
4
+ <strong>Stop obfuscated malware from being committed to your repo — automatically, on every commit.</strong>
5
+ </p>
6
+
7
+ <p align="center">
8
+ <a href="https://www.npmjs.com/package/polin-guard"><img alt="npm version" src="https://img.shields.io/npm/v/polin-guard?color=cb3837&logo=npm"></a>
9
+ <a href="https://www.npmjs.com/package/polin-guard"><img alt="npm downloads" src="https://img.shields.io/npm/dm/polin-guard?color=cb3837&logo=npm"></a>
10
+ <a href="https://github.com/Valentin-Shyaka/polin-guard/blob/main/LICENSE"><img alt="license" src="https://img.shields.io/npm/l/polin-guard?color=blue"></a>
11
+ <img alt="dependencies" src="https://img.shields.io/badge/dependencies-0-brightgreen">
12
+ <img alt="node" src="https://img.shields.io/node/v/polin-guard">
13
+ </p>
14
+
15
+ <p align="center">
16
+ <code>npm install --save-dev polin-guard</code>
17
+ </p>
18
+
19
+ <p align="center">
20
+ <img src="https://raw.githubusercontent.com/Valentin-Shyaka/polin-guard/main/assets/demo.svg" alt="polin-guard blocking a commit that contains an obfuscated injected payload" width="760">
21
+ </p>
22
+
23
+ ---
24
+
25
+ ## The problem it solves
26
+
27
+ Modern supply-chain attacks hide a malicious payload on a **single, space-padded
28
+ line** inside an ordinary-looking config or entry file — `tailwind.config.js`,
29
+ `ecosystem.config.js`, `.eslintrc.js`, `postcss.config.js`, `src/index.ts`, etc.
30
+ The line is hundreds of spaces wide, so the payload scrolls **off-screen** in your
31
+ editor and sails through code review:
32
+
33
+ ```js
34
+ plugins: [tailwindcssAnimate];
35
+ }; global['!']='…';var d=String.fromCharCode(127);…require;…Function(…)(…)
36
+ // ^ legitimate code ^ hundreds of spaces hide this →→→ obfuscated payload
37
+ ```
11
38
 
12
- - decode strings at runtime with a character-shuffle cipher,
13
- - re-expose Node's `require`/`module` as globals, and
14
- - execute a second stage through a `Function()` constructor
39
+ When the project is built, that line runs with **full Node.js access** — reading
40
+ your environment variables, `.env`, SSH keys, and tokens, and pulling a second
41
+ stage. Because it sits in a file that's `require`d during `dev`/`build`/`test`, it
42
+ executes silently and automatically.
15
43
 
16
- …all of which runs **automatically at build/dev/CI time** with full Node.js
17
- access to your environment variables, SSH keys, and tokens. Because the malicious
18
- code sits hundreds of spaces to the right of legitimate code, it is trivially
19
- missed in review. `polin-guard` makes it impossible to miss.
44
+ **polin-guard catches it before it can ever be committed.**
20
45
 
21
46
  > Built after a real incident in which this exact payload was committed across
22
- > multiple repositories. The detection rules are tuned for **high precision** —
23
- > a `CRITICAL` finding should be safe to block a commit on.
24
-
25
- ## What it detects
26
-
27
- | Rule | Severity | What it catches |
28
- |------|----------|-----------------|
29
- | `global-bang-key` | critical | `global['!']=…` stager marker |
30
- | `global-underscore-handle` | critical | `global[_$_…]=…` obfuscated handle |
31
- | `require-reexposed` | critical | `…]=require; typeof module` capability escalation |
32
- | `char-shuffle-cipher` | critical | `String.fromCharCode(127)` cipher delimiter |
33
- | `escape-density` | critical | a line with ≥25 `\xNN`/`\uNNNN` escapes (obfuscated blob) |
34
- | `iife-constructor` | critical | immediately-invoked `Function()` on a long line |
35
- | `oversized-line` | critical* | a source line > 1000 chars **with** exec/require tokens (the concealment trick) |
36
- | `oversized-line` | warning | a long line with no exec tokens (review) |
37
- | `eval` / `atob` / `child_process` | warning | weaker indicators |
38
-
39
- \* A long line **without** exec tokens is only a warning, to keep false positives near zero.
40
-
41
- Lockfiles, minified bundles, source maps, and `node_modules` are excluded automatically.
42
-
43
- ## Install
47
+ > multiple repositories and reached production branches.
48
+
49
+ ## How it detects (beyond signatures)
50
+
51
+ polin-guard doesn't just match known payload strings — those are trivially
52
+ renamed. It detects the **necessary conditions** of the attack and combines
53
+ independent signals into a weighted **risk score**. To stay hidden *and* execute
54
+ at build time, a payload is forced to do several of these at once:
55
+
56
+ | Signal | What it catches | Weight |
57
+ |--------|-----------------|-------:|
58
+ | `concealment` | code hidden after a long mid-line whitespace gap (the off-screen trick) | 80 |
59
+ | `signature` | known-family markers (`global['!']`, `global[_$_…]`, re-exposed `require`, `fromCharCode(127)`) | 60 |
60
+ | `escape-density` | ≥25 `\xNN`/`\uNNNN` escapes on a line (obfuscated blob) | 50 |
61
+ | `exec-sink` | dynamic execution: `Function()` / `eval` | 40 |
62
+ | `long-token` | unbroken ≥120-char token (encoded blob) | 35 |
63
+ | `ctor-chain` | `constructor.constructor` reach to `Function` | 35 |
64
+ | `oversized-line` | line > 1000 chars (+30 more if it carries exec/require tokens) | 25 |
65
+ | `entropy` | long, high-entropy line | 25 |
66
+ | `indirect-require` · `dyn-timer` · `vm-module` | `require(<var>)`, `setTimeout("…")`, `require('vm')` | 25 |
67
+ | `net-exec-combo` | network **+** code-exec/file-write together (runtime-fetched payload) | 30 |
68
+ | `network` · `capability` | `fetch`/`http(s)`, or `process.env`/`fs`/`child_process` | 15 / 10 |
69
+ | `autoloaded-context` | the above inside an auto-loaded config/entry file | +20 |
70
+
71
+ A **file-level pass** also aggregates across lines, so a payload **split across
72
+ many lines** or **fetched at runtime** still trips the score.
73
+
74
+ **Block at score ≥ 70, warn at ≥ 35** (configurable). Because the signals are
75
+ independent, evading one (rename, split, runtime-fetch, drop the padding) still
76
+ trips the others — so evasion becomes self-defeating: visible in review, inert,
77
+ readable, or capability-less. Lockfiles, minified bundles, source maps, and
78
+ `node_modules` are skipped to keep false positives near zero.
79
+
80
+ > **Evasion-tested.** The suite proves that a **renamed** (signature-free),
81
+ > **split-across-lines**, and **runtime-fetched** payload are all still blocked,
82
+ > while legitimate long-data lines and ordinary dynamic `require()` are not.
83
+
84
+ ## Quick start
44
85
 
45
86
  ```bash
46
87
  npm install --save-dev polin-guard
47
88
  ```
48
89
 
49
- Or run it without installing:
90
+ Scan right now:
50
91
 
51
92
  ```bash
52
- npx polin-guard --all
53
- ```
54
-
55
- No Node? Use the standalone script — copy `scan-injection.sh` into your repo.
56
-
57
- ## Usage
58
-
59
- ```bash
60
- polin-guard --staged # scan staged content (use in pre-commit; also covers `git commit --amend`)
61
- polin-guard --all # scan every tracked file
62
- polin-guard --ci # same as --all, for CI
63
- polin-guard path/to/file.js ... # scan specific files (no git required)
64
- polin-guard --strict # treat warnings as blocking too
93
+ npx polin-guard --all # scan every tracked file in the repo
65
94
  ```
66
95
 
67
- Exit code `1` means a blocking finding was detected.
68
-
69
- ### As a pre-commit hook (husky)
96
+ ### Block it on every commit (husky)
70
97
 
71
98
  ```bash
72
99
  npm install --save-dev polin-guard husky
73
100
  npx husky init
74
- # add the scan to the hook (runs on commit AND amend):
75
101
  echo 'npx --no-install polin-guard --staged' > .husky/pre-commit
76
102
  ```
77
103
 
78
- A ready-made hook is included at `.husky/pre-commit` in this package.
104
+ That's it. The hook runs on **every commit and every `git commit --amend`**, and
105
+ scans the exact content being committed. A malicious payload makes the commit fail.
79
106
 
80
- ### As a pre-commit hook (pre-commit.com framework)
107
+ ### Add the CI backstop (recommended)
81
108
 
82
- See `examples/pre-commit-config.yaml`. It calls the standalone `scan-injection.sh`,
83
- so it needs no Node.
109
+ A local hook can be skipped (`git commit --no-verify`) or sidestepped by a
110
+ force-push from a compromised machine. Re-scan on the server, where it can't be
111
+ skipped — copy [`examples/github-action.yml`](examples/github-action.yml) to
112
+ `.github/workflows/polin-guard.yml`.
84
113
 
85
- ### In CI (the bypass-proof backstop)
114
+ ### No Node? Use the standalone script
86
115
 
87
- A local hook can be skipped with `git commit --no-verify` or sidestepped by a
88
- force-push from a compromised machine. Add the server-side scan so history is
89
- always re-checked:
116
+ Drop [`scan-injection.sh`](scan-injection.sh) into your repo (works with the
117
+ [pre-commit framework](examples/pre-commit-config.yaml) too):
90
118
 
91
- Copy `examples/github-action.yml` to `.github/workflows/polin-guard.yml`.
119
+ ```bash
120
+ ./scan-injection.sh --staged
121
+ ```
122
+
123
+ ## Usage
124
+
125
+ ```text
126
+ polin-guard [options] [paths...]
127
+
128
+ --staged Scan staged content (default; for pre-commit hooks; covers --amend)
129
+ --all Scan all git-tracked files
130
+ --ci Alias for --all (use in CI)
131
+ [paths...] Scan specific files (no git required)
132
+ --strict Treat warnings as blocking too
133
+ --quiet Only print on findings
134
+ -h, --help Show help
135
+ -v, --version
136
+
137
+ Exit 0 = clean · 1 = blocking finding · 2 = usage error
138
+ ```
92
139
 
93
140
  ## Configuration
94
141
 
@@ -103,55 +150,45 @@ Optional `.polinguardrc.json` in your repo root:
103
150
  }
104
151
  ```
105
152
 
106
- ### Acknowledging a verified false positive
153
+ **Acknowledging a verified false positive** (e.g. a legitimate inline blob):
107
154
 
108
- If a line is genuinely legitimate (a real minified-in-source blob, say):
109
-
110
- - put `// polinguard-allow-line` on the same line, **or**
111
- - put `// polinguard-allow-next-line` on the line above it, **or**
155
+ - add `// polinguard-allow-line` on the same line, **or**
156
+ - add `// polinguard-allow-next-line` on the line above it, **or**
112
157
  - raise `maxLineLength` / exclude the path in `.polinguardrc.json`.
113
158
 
114
- Never use `git commit --no-verify` to push past a finding you haven't understood.
115
-
116
- ## Audit an existing repo / whole org
159
+ > Never use `git commit --no-verify` to push past a finding you don't understand.
117
160
 
118
- One-off scan of a checked-out repo:
161
+ ## Audit an existing repo or whole org
119
162
 
120
163
  ```bash
164
+ # one repo
121
165
  npx polin-guard --all
122
- # or, without Node:
123
- git grep -nI '.\{1000,\}' # flag any suspiciously long line
124
- ```
125
-
126
- Scan every branch of every repo in a GitHub org:
127
166
 
128
- ```bash
167
+ # every branch of every repo in a GitHub org
129
168
  for r in $(gh repo list YOUR_ORG --limit 200 --json name --jq '.[].name'); do
130
- git clone --quiet "https://github.com/YOUR_ORG/$r.git" "/tmp/scan/$r" || continue
169
+ git clone -q "https://github.com/YOUR_ORG/$r.git" "/tmp/scan/$r" || continue
131
170
  ( cd "/tmp/scan/$r"
132
171
  for b in $(git branch -r | grep -v HEAD | sed 's# *origin/##'); do
133
- git checkout -q "$b" 2>/dev/null || continue
134
- npx --yes polin-guard --all || echo "FOUND in $r @ $b"
172
+ git checkout -q "$b" 2>/dev/null && { npx --yes polin-guard --all || echo "FOUND in $r @ $b"; }
135
173
  done )
136
174
  done
137
175
  ```
138
176
 
139
177
  ## How it works
140
178
 
141
- Pure Node, no dependencies (so the security tool itself adds no supply-chain risk).
142
- For each candidate file it reads the staged blob (`git show :file`) or the working
143
- copy, then analyzes every line for the signatures and concealment patterns above.
144
- It exits non-zero on any `critical` finding so a hook or CI step fails the build.
145
-
146
- ## Development
179
+ Pure Node, **zero dependencies** (so the security tool adds no supply-chain risk
180
+ of its own). For each candidate file it reads the staged blob (`git show :file`)
181
+ or the working copy, analyzes every line against the rules above, and exits
182
+ non-zero on any `critical` finding so your hook or CI step fails.
147
183
 
148
- ```bash
149
- npm test # runs the zero-dependency test suite (clean + inert-malicious fixtures)
150
- ```
184
+ ## Links
151
185
 
152
- The `test/fixtures/malicious.config.js` fixture contains the detection *signatures*
153
- but performs no real decode/exec — it exists only to prove the detector fires.
186
+ - 📦 **npm:** https://www.npmjs.com/package/polin-guard
187
+ - 🐙 **GitHub:** https://github.com/Valentin-Shyaka/polin-guard
188
+ - 🐛 **Issues:** https://github.com/Valentin-Shyaka/polin-guard/issues
189
+ - 🔒 **Security policy:** [SECURITY.md](SECURITY.md)
190
+ - 🤝 **Contributing:** [CONTRIBUTING.md](CONTRIBUTING.md)
154
191
 
155
192
  ## License
156
193
 
157
- MIT
194
+ [MIT](LICENSE) © Valentin Shyaka
package/bin/cli.js CHANGED
@@ -93,7 +93,8 @@ function main() {
93
93
  const tag = it.severity === 'critical'
94
94
  ? paint('CRITICAL', '1;31', c)
95
95
  : paint('warning ', '33', c);
96
- process.stderr.write(` ${tag} ${file}:${it.line} [${it.ruleId}]\n ${it.message}\n`);
96
+ const loc = it.line === 0 ? `${file} (file-level)` : `${file}:${it.line}`;
97
+ process.stderr.write(` ${tag} ${loc} [${it.ruleId}]\n ${it.message}\n`);
97
98
  }
98
99
  process.stderr.write('\n');
99
100
  }
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "polin-guard",
3
- "version": "0.1.0",
3
+ "version": "0.2.0",
4
4
  "description": "Block obfuscated build/commit-time code-injection payloads (hidden long-line JS stagers) before they enter your repo. Zero dependencies. Works as a pre-commit hook, in CI, or standalone.",
5
5
  "bin": {
6
6
  "polin-guard": "bin/cli.js"
package/src/patterns.js CHANGED
@@ -1,25 +1,40 @@
1
1
  'use strict';
2
2
 
3
3
  /**
4
- * Detection rules for polin-guard.
4
+ * Detection model for polin-guard (v0.2).
5
5
  *
6
- * These target the family of obfuscated build/commit-time JavaScript "stagers"
7
- * that hide a payload on a single, heavily space-padded line inside an otherwise
8
- * legitimate config or entry file (e.g. tailwind.config.js, ecosystem.config.js,
9
- * .eslintrc.js, postcss.config.js, src/index.ts). The payload typically:
10
- * - decodes strings at runtime via a character-shuffle cipher,
11
- * - re-exposes Node's `require`/`module` as globals, and
12
- * - runs a second stage through a Function() constructor.
6
+ * v0.1 was signature-only and therefore evadable. v0.2 detects the *necessary
7
+ * conditions* of the attack class and combines independent signals into a
8
+ * weighted RISK SCORE. To stay hidden yet execute at build time, the payload is
9
+ * forced to do several things at once — and each is a detector here:
13
10
  *
14
- * The goal is HIGH precision: a "critical" finding should almost never be a
15
- * false positive, so it is safe to BLOCK a commit on it.
11
+ * - HIDE visually -> concealment (code after a long whitespace gap),
12
+ * long unbroken tokens, dense escapes, high entropy
13
+ * - EXECUTE implicitly -> dynamic exec sinks (Function/eval/indirect require/
14
+ * constructor.constructor/vm/string-timer) in
15
+ * auto-loaded config/entry files
16
+ * - OBFUSCATE -> entropy / escape / long-token signals (token-agnostic)
17
+ * - REACH SECRETS -> env/fs/child_process + network capability
18
+ *
19
+ * Defeating one detector by renaming/splitting/runtime-fetching still trips the
20
+ * others, so evasion becomes self-defeating (visible, inert, readable, or
21
+ * capability-less). The known-family signatures remain as fast, high-weight hits.
16
22
  */
17
23
 
18
- // Default thresholds (override via .polinguardrc.json).
19
24
  const DEFAULTS = {
20
- maxLineLength: 1000, // a single source line longer than this is suspicious
21
- maxEscapes: 25, // count of \xNN / \uNNNN escapes on one line => obfuscated blob
22
- // Files / directories that legitimately contain long or generated lines.
25
+ // Scoring thresholds.
26
+ criticalScore: 70, // >= this blocks the commit
27
+ warningScore: 35, // >= this is reported (non-blocking unless --strict)
28
+
29
+ // Detector thresholds.
30
+ maxLineLength: 1000, // oversized source line
31
+ maxEscapes: 25, // \xNN / \uNNNN escapes on one line => obfuscated blob
32
+ maxTokenLength: 120, // unbroken non-whitespace run => encoded blob
33
+ minGapWhitespace: 80, // code hidden after this many mid-line spaces/tabs
34
+ entropyMinLen: 200, // only entropy-score lines at least this long
35
+ entropyThreshold: 4.3, // bits/char; obfuscated/encoded content runs high
36
+ fileEscapeTotal: 100, // total escapes across a file (catches split payloads)
37
+
23
38
  excludeDirs: [
24
39
  'node_modules', '.git', 'dist', 'build', 'out', 'coverage',
25
40
  '.next', '.nuxt', '.output', '.turbo', '.cache', 'vendor', '__snapshots__',
@@ -30,63 +45,53 @@ const DEFAULTS = {
30
45
  /(^|\/)(package-lock\.json|pnpm-lock\.yaml|yarn\.lock|bun\.lockb?)$/i,
31
46
  /\.snap$/i,
32
47
  ],
33
- // Only these extensions are scanned. Covers JS/TS, Vue, configs, and the
34
- // Windows batch / shell droppers seen alongside the JS stager.
35
48
  includeExtensions: [
36
49
  '.js', '.cjs', '.mjs', '.jsx', '.ts', '.tsx', '.vue',
37
50
  '.json', '.bat', '.cmd', '.ps1', '.sh',
38
51
  ],
39
52
  };
40
53
 
41
- // Near-unique signatures of the known stager. Each match is CRITICAL on its own.
42
- const SIGNATURES = [
43
- {
44
- id: 'global-bang-key',
45
- re: /global\s*\[\s*['"`]!['"`]\s*\]/,
46
- message: "Assigns to global['!'] a known obfuscated-stager marker.",
47
- },
48
- {
49
- id: 'global-underscore-handle',
50
- re: /global\s*\[\s*_\$_/,
51
- message: 'Assigns to global[_$_…] — obfuscated stager variable handle.',
52
- },
53
- {
54
- id: 'require-reexposed',
55
- re: /\]\s*=\s*require\s*;[\s\S]{0,60}typeof\s+module/,
56
- message: 'Re-exposes require()/module as globals capability-escalation pattern.',
57
- },
58
- {
59
- id: 'char-shuffle-cipher',
60
- // String.fromCharCode(127) used as a sentinel/delimiter in the shuffle cipher.
61
- // Legitimate uses are virtually always inside excluded node_modules (e.g. websocket).
62
- re: /String\.fromCharCode\(\s*127\s*\)/,
63
- message: 'Uses fromCharCode(127) cipher delimiter — stager string-decoder pattern.',
64
- },
65
- ];
66
-
67
- // Immediately-invoked Function() / this[...] constructor: a second-stage exec sink.
68
- const IIFE_CONSTRUCTOR =
69
- /(?:\bFunction\b|this\s*\[[^\]]+\]|global\s*\[[^\]]+\])\s*\([^)]*\)\s*\(/;
70
-
71
- // Tokens that turn an over-long line from "suspicious" into "critical".
72
- const EXEC_TOKENS =
73
- /\b(require|eval|atob|unescape|child_process|execSync|spawnSync|Function)\b|process\s*\.\s*env|global\s*\[|String\.fromCharCode/;
54
+ // Per-signal weights (points added to a line's / file's risk score).
55
+ const WEIGHTS = {
56
+ signature: 60, // a known-family signature
57
+ concealment: 80, // code after a long mid-line whitespace gap (off-screen trick)
58
+ longToken: 35, // unbroken token >= maxTokenLength
59
+ escapeDense: 50, // >= maxEscapes on one line
60
+ entropy: 25, // long, high-entropy line
61
+ oversized: 25, // line longer than maxLineLength
62
+ oversizedExecBonus: 30, // ...and it also carries exec/require tokens
63
+ execSink: 40, // Function()/eval dynamic execution
64
+ indirectRequire: 25, // require(<non-literal>)
65
+ ctorChain: 35, // constructor.constructor / ['constructor']
66
+ dynTimer: 25, // setTimeout/Interval("string")
67
+ vmModule: 25, // require('vm')
68
+ network: 15, // fetch / http(s)/net/dns/tls
69
+ capability: 10, // process.env / fs / child_process
70
+ netExecCombo: 30, // network + exec/file-write on the same line
71
+ autoloadBonus: 20, // exec/capability/network inside an auto-loaded file
72
+ };
74
73
 
75
- // Standalone weaker indicators (reported as warnings, never block on their own).
76
- const SOFT_INDICATORS = [
77
- { id: 'eval-call', re: /\beval\s*\(/, message: 'Contains eval().' },
78
- { id: 'atob-call', re: /\batob\s*\(/, message: 'Contains atob() (base64 decode).' },
79
- {
80
- id: 'child-process-in-config',
81
- re: /require\(\s*['"`]child_process['"`]\s*\)/,
82
- message: "Loads child_process.",
83
- },
74
+ // Known-family signatures (fast, high-confidence). Each adds WEIGHTS.signature.
75
+ const SIGNATURES = [
76
+ { id: 'global-bang-key', re: /global\s*\[\s*['"`]!['"`]\s*\]/, message: "global['!'] stager marker" },
77
+ { id: 'global-underscore-handle', re: /global\s*\[\s*_\$_/, message: 'global[_$_…] obfuscated handle' },
78
+ { id: 'require-reexposed', re: /\]\s*=\s*require\s*;[\s\S]{0,60}typeof\s+module/, message: 'require/module re-exposed as globals' },
79
+ { id: 'char-shuffle-cipher', re: /String\.fromCharCode\(\s*127\s*\)/, message: 'fromCharCode(127) cipher delimiter' },
84
80
  ];
85
81
 
86
- module.exports = {
87
- DEFAULTS,
88
- SIGNATURES,
89
- IIFE_CONSTRUCTOR,
90
- EXEC_TOKENS,
91
- SOFT_INDICATORS,
82
+ // Behavioral / structural regexes (token-agnostic where possible).
83
+ const RE = {
84
+ execSink: /\b(?:new\s+)?Function\s*\(|\beval\s*\(/,
85
+ indirectRequire: /\brequire\s*\(\s*(?!['"`)])/, // require( not immediately a string
86
+ ctorChain: /\bconstructor\b\s*(?:\.\s*constructor|\[\s*['"`]\s*constructor)|\[\s*['"`]constructor['"`]\s*\]/,
87
+ dynTimer: /\bset(?:Timeout|Interval)\s*\(\s*['"`]/,
88
+ vmModule: /\brequire\s*\(\s*['"`]vm['"`]\s*\)/,
89
+ network: /\bfetch\s*\(|\bXMLHttpRequest\b|require\s*\(\s*['"`](?:https?|net|dns|tls|dgram)['"`]\s*\)|\bhttps?\s*\.\s*(?:get|request)\b/,
90
+ capability: /\bprocess\s*\.\s*env\b|require\s*\(\s*['"`](?:fs|os|child_process)['"`]\s*\)|\bchild_process\b/,
91
+ fsWrite: /\b(?:writeFileSync|writeFile|appendFileSync|appendFile|createWriteStream)\b/,
92
+ childProc: /\bchild_process\b|\b(?:execSync|spawnSync|spawn|fork)\s*\(|\bexec\s*\(/,
93
+ // tokens that upgrade an oversized line to critical
94
+ execTokens: /\b(?:require|eval|atob|unescape|child_process|Function)\b|process\s*\.\s*env|global\s*\[|String\.fromCharCode/,
92
95
  };
96
+
97
+ module.exports = { DEFAULTS, WEIGHTS, SIGNATURES, RE };
package/src/scan.js CHANGED
@@ -3,27 +3,17 @@
3
3
  const fs = require('fs');
4
4
  const path = require('path');
5
5
  const { execFileSync } = require('child_process');
6
- const {
7
- DEFAULTS,
8
- SIGNATURES,
9
- IIFE_CONSTRUCTOR,
10
- EXEC_TOKENS,
11
- SOFT_INDICATORS,
12
- } = require('./patterns');
6
+ const { DEFAULTS, WEIGHTS, SIGNATURES, RE } = require('./patterns');
13
7
 
14
8
  const ALLOW_LINE_MARKER = 'polinguard-allow-next-line';
15
9
  const ALLOW_INLINE_MARKER = 'polinguard-allow-line';
16
10
 
17
11
  /** Load optional config file from the repo root or cwd. */
18
12
  function loadConfig(cwd) {
19
- const candidates = ['.polinguardrc.json', '.polinguard.json'];
20
- for (const name of candidates) {
13
+ for (const name of ['.polinguardrc.json', '.polinguard.json']) {
21
14
  const p = path.join(cwd, name);
22
15
  try {
23
- if (fs.existsSync(p)) {
24
- const user = JSON.parse(fs.readFileSync(p, 'utf8'));
25
- return { ...DEFAULTS, ...user };
26
- }
16
+ if (fs.existsSync(p)) return { ...DEFAULTS, ...JSON.parse(fs.readFileSync(p, 'utf8')) };
27
17
  } catch (e) {
28
18
  process.stderr.write(`polin-guard: ignoring invalid ${name}: ${e.message}\n`);
29
19
  }
@@ -34,34 +24,33 @@ function loadConfig(cwd) {
34
24
  function git(args, cwd) {
35
25
  return execFileSync('git', args, { cwd, encoding: 'utf8', maxBuffer: 1024 * 1024 * 64 });
36
26
  }
37
-
38
- /** Files staged for commit (added/copied/modified/renamed). */
39
27
  function getStagedFiles(cwd) {
40
- const out = git(['diff', '--cached', '--name-only', '--diff-filter=ACMR'], cwd);
41
- return out.split('\n').filter(Boolean);
28
+ return git(['diff', '--cached', '--name-only', '--diff-filter=ACMR'], cwd).split('\n').filter(Boolean);
42
29
  }
43
-
44
- /** All tracked files (for --all / --ci). */
45
30
  function getTrackedFiles(cwd) {
46
31
  return git(['ls-files'], cwd).split('\n').filter(Boolean);
47
32
  }
48
-
49
- /** Read the *staged* blob content (what will actually be committed). */
50
33
  function readStaged(file, cwd) {
51
- try {
52
- return git(['show', `:${file}`], cwd);
53
- } catch (e) {
54
- return null; // deleted or unreadable
55
- }
34
+ try { return git(['show', `:${file}`], cwd); } catch { return null; }
56
35
  }
57
36
 
58
37
  function isExcluded(file, cfg) {
59
38
  const parts = file.split(/[\\/]/);
60
39
  if (parts.some((p) => cfg.excludeDirs.includes(p))) return true;
61
40
  if (cfg.excludeFilePatterns.some((re) => re.test(file))) return true;
62
- const ext = path.extname(file).toLowerCase();
63
- if (!cfg.includeExtensions.includes(ext)) return true;
64
- return false;
41
+ return !cfg.includeExtensions.includes(path.extname(file).toLowerCase());
42
+ }
43
+
44
+ /** Files that a build/test tool loads automatically — exec here is high-risk. */
45
+ function isAutoLoaded(file) {
46
+ const base = (file.split(/[\\/]/).pop() || '').toLowerCase();
47
+ return (
48
+ /\.config\.(js|cjs|mjs|ts)$/.test(base) ||
49
+ /^\.?eslintrc(\.(js|cjs|json|yml|yaml))?$/.test(base) ||
50
+ /^(index|main|server|app)\.(js|cjs|mjs|ts)$/.test(base) ||
51
+ /^\.(babelrc|prettierrc|stylelintrc)/.test(base) ||
52
+ base === 'ecosystem.config.js'
53
+ );
65
54
  }
66
55
 
67
56
  function countEscapes(line) {
@@ -69,103 +58,151 @@ function countEscapes(line) {
69
58
  return m ? m.length : 0;
70
59
  }
71
60
 
72
- /** Analyze a single line. Returns array of findings for that line. */
73
- function analyzeLine(line, cfg) {
74
- const out = [];
61
+ /** Shannon entropy (bits/char). Obfuscated/encoded blobs run high. */
62
+ function entropy(s) {
63
+ if (!s.length) return 0;
64
+ const freq = Object.create(null);
65
+ for (const ch of s) freq[ch] = (freq[ch] || 0) + 1;
66
+ let h = 0;
67
+ for (const k in freq) { const p = freq[k] / s.length; h -= p * Math.log2(p); }
68
+ return h;
69
+ }
75
70
 
76
- // 1) Near-unique stager signatures -> always critical.
77
- for (const sig of SIGNATURES) {
78
- if (sig.re.test(line)) {
79
- out.push({ ruleId: sig.id, severity: 'critical', message: sig.message });
80
- }
71
+ function longestToken(line) {
72
+ let max = 0;
73
+ for (const t of line.split(/\s+/)) if (t.length > max) max = t.length;
74
+ return max;
75
+ }
76
+
77
+ /**
78
+ * Score a single line. Returns { score, signals: [{id, weight, message}], flags }.
79
+ * `flags` exposes booleans the file-level pass aggregates.
80
+ */
81
+ function analyzeLine(line, cfg, ctx = {}) {
82
+ const signals = [];
83
+ const add = (id, weight, message) => signals.push({ id, weight, message });
84
+
85
+ for (const sig of SIGNATURES) if (sig.re.test(line)) add(sig.id, WEIGHTS.signature, sig.message);
86
+
87
+ // Concealment: code after a long mid-line whitespace gap (off-screen trick).
88
+ const body = line.replace(/^[ \t]+/, '');
89
+ if (new RegExp(`\\S[ \\t]{${cfg.minGapWhitespace},}\\S`).test(body)) {
90
+ add('concealment', WEIGHTS.concealment, `code hidden after ${cfg.minGapWhitespace}+ spaces (off-screen concealment)`);
81
91
  }
82
92
 
83
- // 2) Dense escape-sequence blob -> obfuscated payload.
93
+ const tok = longestToken(body);
94
+ if (tok >= cfg.maxTokenLength) add('long-token', WEIGHTS.longToken, `unbroken ${tok}-char token (encoded blob)`);
95
+
84
96
  const esc = countEscapes(line);
85
- if (esc >= cfg.maxEscapes) {
86
- out.push({
87
- ruleId: 'escape-density',
88
- severity: 'critical',
89
- message: `High escape-sequence density (${esc} \\x/\\u escapes) — obfuscated blob.`,
90
- });
91
- }
97
+ if (esc >= cfg.maxEscapes) add('escape-density', WEIGHTS.escapeDense, `${esc} \\x/\\u escapes (obfuscated blob)`);
92
98
 
93
- // 3) Immediately-invoked Function() constructor on a long line -> exec sink.
94
- if (IIFE_CONSTRUCTOR.test(line) && line.length > 200) {
95
- out.push({
96
- ruleId: 'iife-constructor',
97
- severity: 'critical',
98
- message: 'Immediately-invoked Function()/dynamic constructor on a long line — second-stage exec sink.',
99
- });
99
+ if (line.length >= cfg.entropyMinLen) {
100
+ const h = entropy(line);
101
+ if (h >= cfg.entropyThreshold) add('entropy', WEIGHTS.entropy, `high entropy ${h.toFixed(2)} over ${line.length} chars`);
100
102
  }
101
103
 
102
- // 4) Oversized line: critical if it also carries exec/require tokens, else a warning.
103
104
  if (line.length > cfg.maxLineLength) {
104
- const exec = EXEC_TOKENS.test(line);
105
- out.push({
106
- ruleId: 'oversized-line',
107
- severity: exec ? 'critical' : 'warning',
108
- message: `Line length ${line.length} exceeds limit (${cfg.maxLineLength})` +
109
- (exec ? ' and contains exec/require tokens — classic hidden-payload concealment.' : ' — review for hidden content.'),
110
- });
105
+ const exec = RE.execTokens.test(line);
106
+ add('oversized-line', WEIGHTS.oversized + (exec ? WEIGHTS.oversizedExecBonus : 0),
107
+ `line length ${line.length}${exec ? ' with exec/require tokens' : ''}`);
111
108
  }
112
109
 
113
- // 5) Soft indicators (warnings only).
114
- for (const ind of SOFT_INDICATORS) {
115
- if (ind.re.test(line)) {
116
- out.push({ ruleId: ind.id, severity: 'warning', message: ind.message });
117
- }
110
+ const execSink = RE.execSink.test(line);
111
+ if (execSink) add('exec-sink', WEIGHTS.execSink, 'dynamic code-exec sink (Function/eval)');
112
+ if (RE.indirectRequire.test(line)) add('indirect-require', WEIGHTS.indirectRequire, 'require() with a non-literal argument');
113
+ if (RE.ctorChain.test(line)) add('ctor-chain', WEIGHTS.ctorChain, 'constructor.constructor access (reaches Function)');
114
+ if (RE.dynTimer.test(line)) add('dyn-timer', WEIGHTS.dynTimer, 'setTimeout/Interval with a string body');
115
+ if (RE.vmModule.test(line)) add('vm-module', WEIGHTS.vmModule, 'loads the vm module');
116
+
117
+ const net = RE.network.test(line);
118
+ if (net) add('network', WEIGHTS.network, 'network access');
119
+ const cap = RE.capability.test(line);
120
+ if (cap) add('capability', WEIGHTS.capability, 'env/fs/child_process capability');
121
+ const fsWrite = RE.fsWrite.test(line);
122
+ const child = RE.childProc.test(line);
123
+
124
+ if (net && (execSink || RE.indirectRequire.test(line) || fsWrite || child)) {
125
+ add('net-exec-combo', WEIGHTS.netExecCombo, 'network + code-exec/file-write on one line (runtime-fetched payload)');
118
126
  }
127
+ if (ctx.autoLoaded && (execSink || cap || net || child)) {
128
+ add('autoloaded-context', WEIGHTS.autoloadBonus, 'in an auto-loaded config/entry file');
129
+ }
130
+
131
+ const score = signals.reduce((a, s) => a + s.weight, 0);
132
+ return { score, signals, flags: { execSink, net, cap, fsWrite, child, esc, longTok: tok >= cfg.maxTokenLength } };
133
+ }
119
134
 
120
- return out;
135
+ function severityFor(score, cfg) {
136
+ if (score >= cfg.criticalScore) return 'critical';
137
+ if (score >= cfg.warningScore) return 'warning';
138
+ return null;
121
139
  }
122
140
 
123
- /** Scan one file's content (string). */
141
+ /** Scan one file's content (string) -> findings[]. */
124
142
  function scanContent(file, content, cfg) {
125
- const findings = [];
143
+ const ctx = { autoLoaded: isAutoLoaded(file) };
126
144
  const lines = content.split(/\r?\n/);
145
+ const findings = [];
146
+
147
+ let fileEsc = 0, anyExec = false, anyNet = false, anyChild = false, anyFsWrite = false, anyLongTok = false;
148
+
127
149
  for (let i = 0; i < lines.length; i++) {
128
150
  const line = lines[i];
129
- // Inline allow markers let a maintainer acknowledge a known-good long line.
130
151
  if (line.includes(ALLOW_INLINE_MARKER)) continue;
131
152
  if (i > 0 && lines[i - 1].includes(ALLOW_LINE_MARKER)) continue;
132
153
 
133
- const lineFindings = analyzeLine(line, cfg);
134
- for (const f of lineFindings) {
135
- findings.push({ ...f, file, line: i + 1 });
154
+ const { score, signals, flags } = analyzeLine(line, cfg, ctx);
155
+ fileEsc += flags.esc;
156
+ anyExec = anyExec || flags.execSink;
157
+ anyNet = anyNet || flags.net;
158
+ anyChild = anyChild || flags.child;
159
+ anyFsWrite = anyFsWrite || flags.fsWrite;
160
+ anyLongTok = anyLongTok || flags.longTok;
161
+
162
+ const sev = severityFor(score, cfg);
163
+ if (sev) {
164
+ const top = signals.slice().sort((a, b) => b.weight - a.weight);
165
+ findings.push({
166
+ file, line: i + 1, score, severity: sev,
167
+ ruleId: top[0].id,
168
+ message: `risk ${score} [${top.map((s) => s.id).join(', ')}] — ${top[0].message}`,
169
+ });
136
170
  }
137
171
  }
172
+
173
+ // File-level pass: catches payloads split across many lines or fetched at runtime.
174
+ const region = [];
175
+ if (fileEsc >= cfg.fileEscapeTotal && (anyExec || anyChild)) {
176
+ region.push({ s: 80, m: `${fileEsc} escape sequences across the file + a dynamic-exec sink (split/obfuscated payload)` });
177
+ }
178
+ if (anyLongTok && (anyExec || anyChild)) {
179
+ region.push({ s: 55, m: 'long encoded token(s) + a dynamic-exec sink' });
180
+ }
181
+ if (anyNet && (anyExec || anyChild || anyFsWrite)) {
182
+ region.push({ s: 55 + (ctx.autoLoaded ? WEIGHTS.autoloadBonus : 0), m: 'network access + code-exec/file-write across the file (runtime-fetched payload)' });
183
+ }
184
+ if (region.length) {
185
+ const best = region.sort((a, b) => b.s - a.s)[0];
186
+ const sev = severityFor(best.s, cfg);
187
+ if (sev) findings.push({ file, line: 0, score: best.s, severity: sev, ruleId: 'file-region', message: `file-level risk ${best.s} — ${best.m}` });
188
+ }
189
+
138
190
  return findings;
139
191
  }
140
192
 
141
- /**
142
- * Run a scan.
143
- * @param {object} opts
144
- * @param {'staged'|'all'|'paths'} opts.mode
145
- * @param {string[]} [opts.paths] explicit paths when mode === 'paths'
146
- * @param {string} [opts.cwd]
147
- * @param {boolean} [opts.strict] treat warnings as blocking too
148
- */
149
193
  function run(opts = {}) {
150
194
  const cwd = opts.cwd || process.cwd();
151
195
  const cfg = loadConfig(cwd);
152
196
  const mode = opts.mode || 'staged';
153
197
 
154
- let files = [];
155
- let readFile;
156
-
198
+ let files, readFile;
157
199
  if (mode === 'paths') {
158
200
  files = opts.paths || [];
159
- readFile = (f) => {
160
- try { return fs.readFileSync(path.resolve(cwd, f), 'utf8'); } catch { return null; }
161
- };
201
+ readFile = (f) => { try { return fs.readFileSync(path.resolve(cwd, f), 'utf8'); } catch { return null; } };
162
202
  } else if (mode === 'all') {
163
203
  files = getTrackedFiles(cwd);
164
- readFile = (f) => {
165
- try { return fs.readFileSync(path.resolve(cwd, f), 'utf8'); } catch { return null; }
166
- };
204
+ readFile = (f) => { try { return fs.readFileSync(path.resolve(cwd, f), 'utf8'); } catch { return null; } };
167
205
  } else {
168
- // staged (default): scan the exact content that will be committed.
169
206
  files = getStagedFiles(cwd);
170
207
  readFile = (f) => readStaged(f, cwd);
171
208
  }
@@ -183,8 +220,7 @@ function run(opts = {}) {
183
220
  const critical = findings.filter((f) => f.severity === 'critical');
184
221
  const warnings = findings.filter((f) => f.severity === 'warning');
185
222
  const blocking = opts.strict ? findings.length > 0 : critical.length > 0;
186
-
187
223
  return { filesScanned: scanned.length, findings, critical, warnings, blocking };
188
224
  }
189
225
 
190
- module.exports = { run, scanContent, analyzeLine, loadConfig };
226
+ module.exports = { run, scanContent, analyzeLine, loadConfig, entropy, isAutoLoaded };