pureshellcheck 0.1.0__tar.gz → 0.2.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {pureshellcheck-0.1.0/src/pureshellcheck.egg-info → pureshellcheck-0.2.0}/PKG-INFO +39 -13
- {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/README.md +38 -12
- {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/pyproject.toml +1 -1
- {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/__init__.py +1 -1
- {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/analyzer.py +32 -10
- {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/astlib.py +3 -0
- {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/checks/commands.py +1 -1
- {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/checks/misc.py +1 -1
- {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/checks/variables.py +14 -14
- {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/parser.py +18 -2
- {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/shast.py +46 -7
- {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/varflow.py +65 -27
- {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/varscan.py +40 -17
- {pureshellcheck-0.1.0 → pureshellcheck-0.2.0/src/pureshellcheck.egg-info}/PKG-INFO +39 -13
- {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/LICENSE +0 -0
- {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/MANIFEST.in +0 -0
- {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/setup.cfg +0 -0
- {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/checks/__init__.py +0 -0
- {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/checks/quoting.py +0 -0
- {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/cli.py +0 -0
- {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck.egg-info/SOURCES.txt +0 -0
- {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck.egg-info/dependency_links.txt +0 -0
- {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck.egg-info/entry_points.txt +0 -0
- {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck.egg-info/top_level.txt +0 -0
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: pureshellcheck
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.2.0
|
|
4
4
|
Summary: A pure Python reimplementation of ShellCheck's most common checks
|
|
5
5
|
Author: adam2go
|
|
6
6
|
License: MIT
|
|
@@ -54,10 +54,10 @@ rm -rf $BUILD_DIR/*
|
|
|
54
54
|
`shellcheck-py` just download the 30 MB Haskell binary — useless in
|
|
55
55
|
WASM, Lambda layers, or hermetic build sandboxes. pureshellcheck is
|
|
56
56
|
~7000 lines of stdlib-only Python.
|
|
57
|
-
- **In-process speed.** Calling `pureshellcheck.check()` takes ~
|
|
58
|
-
typical script vs ~
|
|
59
|
-
|
|
60
|
-
[Benchmarks](#benchmarks)).
|
|
57
|
+
- **In-process speed.** Calling `pureshellcheck.check()` takes ~1.3 ms for
|
|
58
|
+
a typical script vs ~50 ms to spawn the shellcheck binary (38×), and is
|
|
59
|
+
~33× faster than the binary even on 1200-line scripts; one-line snippets
|
|
60
|
+
check in ~40 µs (see [Benchmarks](#benchmarks)).
|
|
61
61
|
- **Verified against the real thing.** Test cases are extracted from
|
|
62
62
|
ShellCheck's own test suite and the output is differentially tested
|
|
63
63
|
against the shellcheck binary on real-world scripts.
|
|
@@ -136,19 +136,45 @@ the bash AST if you want to build your own analyses.
|
|
|
136
136
|
|
|
137
137
|
## Benchmarks
|
|
138
138
|
|
|
139
|
-
|
|
140
|
-
|
|
141
|
-
|
|
139
|
+
All numbers: CPython 3.12, shellcheck 0.11.0, Apple Silicon. Two
|
|
140
|
+
experiments, each repeated in 3 independent sessions; medians shown
|
|
141
|
+
(session-to-session spread was < 4% everywhere). In both experiments the
|
|
142
|
+
findings are verified identical **before** any timing.
|
|
143
|
+
|
|
144
|
+
**vs the shellcheck binary** (`python tools/bench.py`, median of 9 runs
|
|
145
|
+
per session; both tools timed in the same session):
|
|
142
146
|
|
|
143
147
|
| workload | shellcheck | pureshellcheck | speedup |
|
|
144
148
|
|---|---|---|---|
|
|
145
|
-
|
|
|
146
|
-
| embedded `check()`,
|
|
147
|
-
|
|
|
148
|
-
|
|
|
149
|
+
| embedded `check()`, brew.sh (1216 lines) | 720 ms | 21.7 ms | **33×** |
|
|
150
|
+
| embedded `check()`, 263-line script | 113 ms | 5.1 ms | **22×** |
|
|
151
|
+
| embedded `check()`, 75-line script | 51 ms | 1.3 ms | **38×** |
|
|
152
|
+
| CLI end-to-end, brew.sh | 720 ms | 51 ms | **14×** |
|
|
153
|
+
| CLI end-to-end, 75-line script | 51 ms | 28 ms | 1.8× |
|
|
149
154
|
|
|
150
155
|
The embedded rows are what an agent or editor integration pays per call:
|
|
151
|
-
no process spawn, no binary.
|
|
156
|
+
no process spawn, no binary. A one-line snippet checks in **~40 µs**
|
|
157
|
+
(~25,000 checks/second); throughput on large scripts is ~57k lines/s. CLI
|
|
158
|
+
time is dominated by CPython interpreter startup (~20 ms).
|
|
159
|
+
|
|
160
|
+
**v0.2.0 vs v0.1.0** (controlled before/after,
|
|
161
|
+
`python tools/bench_compare.py`: baseline wheel from PyPI vs this tree in
|
|
162
|
+
the same interpreter, 25–200 in-process repeats, outputs verified
|
|
163
|
+
identical on every workload):
|
|
164
|
+
|
|
165
|
+
| workload | v0.1.0 | v0.2.0 | improvement |
|
|
166
|
+
|---|---|---|---|
|
|
167
|
+
| tiny (1 line) | 0.061 ms | 0.037 ms | 1.6× |
|
|
168
|
+
| small (75 lines) | 2.62 ms | 1.30 ms | 2.0× |
|
|
169
|
+
| medium (263 lines) | 9.46 ms | 4.81 ms | 2.0× |
|
|
170
|
+
| large (1216 lines) | 48.6 ms | 21.3 ms | **2.3×** |
|
|
171
|
+
|
|
172
|
+
The v0.2.0 speedups came from caching the AST child/parent structure and a
|
|
173
|
+
document-order node table (one traversal instead of dozens), making
|
|
174
|
+
variable states immutable tuples so branch snapshots are plain dict
|
|
175
|
+
copies, a banded Levenshtein for SC2153 (fuzz-tested against the
|
|
176
|
+
reference implementation on 20,000 random pairs), and memoizing repeated
|
|
177
|
+
word/command resolution.
|
|
152
178
|
|
|
153
179
|
## Compatibility notes
|
|
154
180
|
|
|
@@ -27,10 +27,10 @@ rm -rf $BUILD_DIR/*
|
|
|
27
27
|
`shellcheck-py` just download the 30 MB Haskell binary — useless in
|
|
28
28
|
WASM, Lambda layers, or hermetic build sandboxes. pureshellcheck is
|
|
29
29
|
~7000 lines of stdlib-only Python.
|
|
30
|
-
- **In-process speed.** Calling `pureshellcheck.check()` takes ~
|
|
31
|
-
typical script vs ~
|
|
32
|
-
|
|
33
|
-
[Benchmarks](#benchmarks)).
|
|
30
|
+
- **In-process speed.** Calling `pureshellcheck.check()` takes ~1.3 ms for
|
|
31
|
+
a typical script vs ~50 ms to spawn the shellcheck binary (38×), and is
|
|
32
|
+
~33× faster than the binary even on 1200-line scripts; one-line snippets
|
|
33
|
+
check in ~40 µs (see [Benchmarks](#benchmarks)).
|
|
34
34
|
- **Verified against the real thing.** Test cases are extracted from
|
|
35
35
|
ShellCheck's own test suite and the output is differentially tested
|
|
36
36
|
against the shellcheck binary on real-world scripts.
|
|
@@ -109,19 +109,45 @@ the bash AST if you want to build your own analyses.
|
|
|
109
109
|
|
|
110
110
|
## Benchmarks
|
|
111
111
|
|
|
112
|
-
|
|
113
|
-
|
|
114
|
-
|
|
112
|
+
All numbers: CPython 3.12, shellcheck 0.11.0, Apple Silicon. Two
|
|
113
|
+
experiments, each repeated in 3 independent sessions; medians shown
|
|
114
|
+
(session-to-session spread was < 4% everywhere). In both experiments the
|
|
115
|
+
findings are verified identical **before** any timing.
|
|
116
|
+
|
|
117
|
+
**vs the shellcheck binary** (`python tools/bench.py`, median of 9 runs
|
|
118
|
+
per session; both tools timed in the same session):
|
|
115
119
|
|
|
116
120
|
| workload | shellcheck | pureshellcheck | speedup |
|
|
117
121
|
|---|---|---|---|
|
|
118
|
-
|
|
|
119
|
-
| embedded `check()`,
|
|
120
|
-
|
|
|
121
|
-
|
|
|
122
|
+
| embedded `check()`, brew.sh (1216 lines) | 720 ms | 21.7 ms | **33×** |
|
|
123
|
+
| embedded `check()`, 263-line script | 113 ms | 5.1 ms | **22×** |
|
|
124
|
+
| embedded `check()`, 75-line script | 51 ms | 1.3 ms | **38×** |
|
|
125
|
+
| CLI end-to-end, brew.sh | 720 ms | 51 ms | **14×** |
|
|
126
|
+
| CLI end-to-end, 75-line script | 51 ms | 28 ms | 1.8× |
|
|
122
127
|
|
|
123
128
|
The embedded rows are what an agent or editor integration pays per call:
|
|
124
|
-
no process spawn, no binary.
|
|
129
|
+
no process spawn, no binary. A one-line snippet checks in **~40 µs**
|
|
130
|
+
(~25,000 checks/second); throughput on large scripts is ~57k lines/s. CLI
|
|
131
|
+
time is dominated by CPython interpreter startup (~20 ms).
|
|
132
|
+
|
|
133
|
+
**v0.2.0 vs v0.1.0** (controlled before/after,
|
|
134
|
+
`python tools/bench_compare.py`: baseline wheel from PyPI vs this tree in
|
|
135
|
+
the same interpreter, 25–200 in-process repeats, outputs verified
|
|
136
|
+
identical on every workload):
|
|
137
|
+
|
|
138
|
+
| workload | v0.1.0 | v0.2.0 | improvement |
|
|
139
|
+
|---|---|---|---|
|
|
140
|
+
| tiny (1 line) | 0.061 ms | 0.037 ms | 1.6× |
|
|
141
|
+
| small (75 lines) | 2.62 ms | 1.30 ms | 2.0× |
|
|
142
|
+
| medium (263 lines) | 9.46 ms | 4.81 ms | 2.0× |
|
|
143
|
+
| large (1216 lines) | 48.6 ms | 21.3 ms | **2.3×** |
|
|
144
|
+
|
|
145
|
+
The v0.2.0 speedups came from caching the AST child/parent structure and a
|
|
146
|
+
document-order node table (one traversal instead of dozens), making
|
|
147
|
+
variable states immutable tuples so branch snapshots are plain dict
|
|
148
|
+
copies, a banded Levenshtein for SC2153 (fuzz-tested against the
|
|
149
|
+
reference implementation on 20,000 random pairs), and memoizing repeated
|
|
150
|
+
word/command resolution.
|
|
125
151
|
|
|
126
152
|
## Compatibility notes
|
|
127
153
|
|
|
@@ -81,6 +81,7 @@ class Context:
|
|
|
81
81
|
self.positions = Positions(source)
|
|
82
82
|
self.findings = []
|
|
83
83
|
self.cache = {}
|
|
84
|
+
self.nodes = None # doc-order node list, set by run_checks
|
|
84
85
|
|
|
85
86
|
# -- emission ------------------------------------------------------
|
|
86
87
|
|
|
@@ -125,6 +126,14 @@ class Context:
|
|
|
125
126
|
|
|
126
127
|
def command_resolution(self, cmd):
|
|
127
128
|
"""(name_word, index, wrapper_names) after skipping wrappers."""
|
|
129
|
+
cached = cmd.fields.get("_cmdres")
|
|
130
|
+
if cached is not None:
|
|
131
|
+
return cached
|
|
132
|
+
result = self._command_resolution(cmd)
|
|
133
|
+
cmd.fields["_cmdres"] = result
|
|
134
|
+
return result
|
|
135
|
+
|
|
136
|
+
def _command_resolution(self, cmd):
|
|
128
137
|
if cmd.kind != "T_SimpleCommand" or not cmd.words:
|
|
129
138
|
return None, -1, []
|
|
130
139
|
words = cmd.words
|
|
@@ -279,13 +288,24 @@ class Context:
|
|
|
279
288
|
return "other", None
|
|
280
289
|
|
|
281
290
|
|
|
282
|
-
|
|
291
|
+
STATEMENT_CONTAINER_KINDS = frozenset({
|
|
292
|
+
"T_Script", "T_BraceGroup", "T_Subshell", "T_WhileExpression",
|
|
293
|
+
"T_UntilExpression", "T_ForIn", "T_ForArithmetic", "T_SelectIn",
|
|
294
|
+
"T_IfExpression", "T_CaseItem", "T_DollarExpansion", "T_Backticked",
|
|
295
|
+
"T_ProcSub", "T_DollarBraceCommandExpansion", "T_CoProc", "T_BatsTest",
|
|
296
|
+
})
|
|
297
|
+
|
|
298
|
+
|
|
299
|
+
def statement_lists(nodes):
|
|
283
300
|
"""Yield every list of statement nodes in the tree."""
|
|
284
|
-
|
|
301
|
+
container = STATEMENT_CONTAINER_KINDS
|
|
302
|
+
for node in nodes:
|
|
303
|
+
if node.kind not in container:
|
|
304
|
+
continue
|
|
285
305
|
f = node.fields
|
|
286
|
-
for key in ("commands", "body", "condition", "else_body"
|
|
306
|
+
for key in ("commands", "body", "condition", "else_body"):
|
|
287
307
|
v = f.get(key)
|
|
288
|
-
if
|
|
308
|
+
if type(v) is list and v:
|
|
289
309
|
yield v
|
|
290
310
|
branches = f.get("branches")
|
|
291
311
|
if branches:
|
|
@@ -294,14 +314,14 @@ def statement_lists(root):
|
|
|
294
314
|
yield body
|
|
295
315
|
|
|
296
316
|
|
|
297
|
-
def apply_directives(findings, directives,
|
|
317
|
+
def apply_directives(findings, directives, nodes, source, positions):
|
|
298
318
|
"""Filter findings according to `# shellcheck disable=` directives."""
|
|
299
319
|
if not directives:
|
|
300
320
|
return findings
|
|
301
321
|
# statements eligible as directive targets, sorted by position
|
|
302
322
|
statements = []
|
|
303
323
|
seen = set()
|
|
304
|
-
for lst in statement_lists(
|
|
324
|
+
for lst in statement_lists(nodes):
|
|
305
325
|
for node in lst:
|
|
306
326
|
if id(node) not in seen:
|
|
307
327
|
seen.add(id(node))
|
|
@@ -372,7 +392,7 @@ def run_checks(source, shell=None, include_optional=False,
|
|
|
372
392
|
min(e.pos + 1, len(source)))
|
|
373
393
|
f.locate(Positions(source))
|
|
374
394
|
return [f], e
|
|
375
|
-
set_parents(root)
|
|
395
|
+
nodes = set_parents(root)
|
|
376
396
|
|
|
377
397
|
detected = shell_from_shebang(root.get("shebang"))
|
|
378
398
|
directives = parser.directives
|
|
@@ -386,16 +406,18 @@ def run_checks(source, shell=None, include_optional=False,
|
|
|
386
406
|
ctx.detected_shell = detected
|
|
387
407
|
ctx.explicit_shell = shell
|
|
388
408
|
ctx.directives = directives
|
|
409
|
+
ctx.nodes = nodes
|
|
389
410
|
|
|
390
|
-
|
|
391
|
-
|
|
411
|
+
get_checks = NODE_CHECKS.get
|
|
412
|
+
for node in nodes:
|
|
413
|
+
fns = get_checks(node.kind)
|
|
392
414
|
if fns:
|
|
393
415
|
for fn in fns:
|
|
394
416
|
fn(ctx, node)
|
|
395
417
|
for fn in TREE_CHECKS:
|
|
396
418
|
fn(ctx, root)
|
|
397
419
|
|
|
398
|
-
findings = apply_directives(ctx.findings, directives,
|
|
420
|
+
findings = apply_directives(ctx.findings, directives, nodes, source,
|
|
399
421
|
ctx.positions)
|
|
400
422
|
findings.sort(key=lambda f: (f.pos, f.code))
|
|
401
423
|
for f in findings:
|
|
@@ -1,6 +1,7 @@
|
|
|
1
1
|
"""Shared AST analysis helpers, ported from ShellCheck's ASTLib semantics."""
|
|
2
2
|
|
|
3
3
|
import re
|
|
4
|
+
from functools import lru_cache
|
|
4
5
|
|
|
5
6
|
from .shast import ancestors, walk
|
|
6
7
|
from .parser import literal_text, quoted_literal_text
|
|
@@ -108,6 +109,7 @@ def is_glob_free_literal(text):
|
|
|
108
109
|
# ----------------------------------------------------------------------
|
|
109
110
|
# ${...} decomposition
|
|
110
111
|
|
|
112
|
+
@lru_cache(maxsize=4096)
|
|
111
113
|
def braced_reference(content):
|
|
112
114
|
"""The variable name referenced by ${content}."""
|
|
113
115
|
s = content
|
|
@@ -119,6 +121,7 @@ def braced_reference(content):
|
|
|
119
121
|
return m.group(0) if m else s
|
|
120
122
|
|
|
121
123
|
|
|
124
|
+
@lru_cache(maxsize=4096)
|
|
122
125
|
def braced_modifier(content):
|
|
123
126
|
"""Everything after the name/indices in ${content}."""
|
|
124
127
|
s = content
|
|
@@ -277,7 +277,7 @@ def has_set_e(ctx):
|
|
|
277
277
|
if SET_E_SHEBANG_RE.search(shebang):
|
|
278
278
|
result = True
|
|
279
279
|
else:
|
|
280
|
-
for node in
|
|
280
|
+
for node in ctx.nodes:
|
|
281
281
|
if node.kind != "T_SimpleCommand" or not node.words:
|
|
282
282
|
continue
|
|
283
283
|
if first_word_basename(node) != "set":
|
|
@@ -94,7 +94,7 @@ def _has_execfail(ctx):
|
|
|
94
94
|
if ctx.shell in ("sh", "dash", "ash"):
|
|
95
95
|
ctx.cache["execfail"] = False
|
|
96
96
|
return False
|
|
97
|
-
for n in
|
|
97
|
+
for n in ctx.nodes:
|
|
98
98
|
if n.kind == "T_SimpleCommand" and \
|
|
99
99
|
first_word_basename(n) == "shopt":
|
|
100
100
|
if any(word_approx(w) == "execfail" for w in n.words[1:]):
|
|
@@ -54,7 +54,7 @@ COMMON_COMMANDS_HINT = frozenset({
|
|
|
54
54
|
def get_varscan(ctx):
|
|
55
55
|
scan = ctx.cache.get("varscan")
|
|
56
56
|
if scan is None:
|
|
57
|
-
scan = VarScan(ctx.root, ctx.shell)
|
|
57
|
+
scan = VarScan(ctx.root, ctx.shell, nodes=ctx.nodes)
|
|
58
58
|
ctx.cache["varscan"] = scan
|
|
59
59
|
return scan
|
|
60
60
|
|
|
@@ -196,21 +196,21 @@ def check_arithmetic_deref(ctx, node):
|
|
|
196
196
|
" variables.")
|
|
197
197
|
|
|
198
198
|
|
|
199
|
-
@
|
|
200
|
-
def check_assignment_index_deref(ctx,
|
|
199
|
+
@node_check("T_Assignment")
|
|
200
|
+
def check_assignment_index_deref(ctx, node):
|
|
201
201
|
# a[$i]=foo for indexed arrays
|
|
202
|
+
indices = node.get("indices")
|
|
203
|
+
if not indices:
|
|
204
|
+
return
|
|
202
205
|
scan = get_varscan(ctx)
|
|
203
|
-
|
|
204
|
-
|
|
205
|
-
|
|
206
|
-
if
|
|
207
|
-
|
|
208
|
-
|
|
209
|
-
|
|
210
|
-
|
|
211
|
-
idx.strip()):
|
|
212
|
-
ctx.style(node, 2004, "$/${} is unnecessary on arithmetic"
|
|
213
|
-
" variables.")
|
|
206
|
+
if node.name in scan.assoc_arrays:
|
|
207
|
+
return
|
|
208
|
+
for idx in indices:
|
|
209
|
+
if isinstance(idx, str) \
|
|
210
|
+
and re.fullmatch(r"\$\{?[A-Za-z_][A-Za-z0-9_]*\}?",
|
|
211
|
+
idx.strip()):
|
|
212
|
+
ctx.style(node, 2004, "$/${} is unnecessary on arithmetic"
|
|
213
|
+
" variables.")
|
|
214
214
|
|
|
215
215
|
|
|
216
216
|
def _in_let(node):
|
|
@@ -1023,7 +1023,15 @@ class Parser:
|
|
|
1023
1023
|
start = self.i
|
|
1024
1024
|
parts = []
|
|
1025
1025
|
src, n = self.src, self.n
|
|
1026
|
+
run_match = UNQUOTED_RUN.match
|
|
1026
1027
|
while self.i < n:
|
|
1028
|
+
m = run_match(src, self.i)
|
|
1029
|
+
if m is not None:
|
|
1030
|
+
parts.append(Node("T_Literal", self.i, m.end(),
|
|
1031
|
+
text=m.group(0)))
|
|
1032
|
+
self.i = m.end()
|
|
1033
|
+
if self.i >= n:
|
|
1034
|
+
break
|
|
1027
1035
|
c = src[self.i]
|
|
1028
1036
|
if c in METACHARS or c in stop_chars:
|
|
1029
1037
|
if c in "<>" and src[self.i + 1:self.i + 2] == "(" \
|
|
@@ -1917,13 +1925,21 @@ def literal_text(word):
|
|
|
1917
1925
|
"""The literal string of a word made only of literal parts, else None."""
|
|
1918
1926
|
if word is None or word.kind != "T_NormalWord":
|
|
1919
1927
|
return None
|
|
1928
|
+
fields = word.fields
|
|
1929
|
+
cached = fields.get("_lit", False)
|
|
1930
|
+
if cached is not False:
|
|
1931
|
+
return cached
|
|
1920
1932
|
out = []
|
|
1933
|
+
result = None
|
|
1921
1934
|
for p in word.parts:
|
|
1922
1935
|
if p.kind == "T_Literal":
|
|
1923
1936
|
out.append(p.text)
|
|
1924
1937
|
else:
|
|
1925
|
-
|
|
1926
|
-
|
|
1938
|
+
break
|
|
1939
|
+
else:
|
|
1940
|
+
result = "".join(out)
|
|
1941
|
+
fields["_lit"] = result
|
|
1942
|
+
return result
|
|
1927
1943
|
|
|
1928
1944
|
|
|
1929
1945
|
def heredoc_delimiter(word):
|
|
@@ -13,7 +13,7 @@ import bisect
|
|
|
13
13
|
|
|
14
14
|
|
|
15
15
|
class Node:
|
|
16
|
-
__slots__ = ("kind", "pos", "end", "parent", "fields")
|
|
16
|
+
__slots__ = ("kind", "pos", "end", "parent", "fields", "children")
|
|
17
17
|
|
|
18
18
|
def __init__(self, kind, pos, end, **fields):
|
|
19
19
|
self.kind = kind
|
|
@@ -21,6 +21,7 @@ class Node:
|
|
|
21
21
|
self.end = end
|
|
22
22
|
self.parent = None
|
|
23
23
|
self.fields = fields
|
|
24
|
+
self.children = None # filled by set_parents
|
|
24
25
|
|
|
25
26
|
def __getattr__(self, name):
|
|
26
27
|
try:
|
|
@@ -70,18 +71,56 @@ def iter_children(node):
|
|
|
70
71
|
def walk(node):
|
|
71
72
|
"""Yield node and all descendants in document order."""
|
|
72
73
|
stack = [node]
|
|
74
|
+
pop = stack.pop
|
|
73
75
|
while stack:
|
|
74
|
-
n =
|
|
76
|
+
n = pop()
|
|
75
77
|
yield n
|
|
76
|
-
children =
|
|
77
|
-
children
|
|
78
|
-
|
|
78
|
+
children = n.children
|
|
79
|
+
if children is None:
|
|
80
|
+
children = list(iter_children(n))
|
|
81
|
+
if children:
|
|
82
|
+
stack.extend(reversed(children))
|
|
79
83
|
|
|
80
84
|
|
|
81
85
|
def set_parents(root):
|
|
82
|
-
|
|
83
|
-
|
|
86
|
+
"""Link parents, cache children, and return all nodes in doc order."""
|
|
87
|
+
nodes = []
|
|
88
|
+
append = nodes.append
|
|
89
|
+
stack = [root]
|
|
90
|
+
pop = stack.pop
|
|
91
|
+
while stack:
|
|
92
|
+
n = pop()
|
|
93
|
+
append(n)
|
|
94
|
+
children = []
|
|
95
|
+
add = children.append
|
|
96
|
+
for value in n.fields.values():
|
|
97
|
+
tv = type(value)
|
|
98
|
+
if tv is Node:
|
|
99
|
+
add(value)
|
|
100
|
+
elif tv is list:
|
|
101
|
+
for item in value:
|
|
102
|
+
ti = type(item)
|
|
103
|
+
if ti is Node:
|
|
104
|
+
add(item)
|
|
105
|
+
elif ti is list or ti is tuple:
|
|
106
|
+
for sub in item:
|
|
107
|
+
ts = type(sub)
|
|
108
|
+
if ts is Node:
|
|
109
|
+
add(sub)
|
|
110
|
+
elif ts is list:
|
|
111
|
+
for s2 in sub:
|
|
112
|
+
if type(s2) is Node:
|
|
113
|
+
add(s2)
|
|
114
|
+
elif tv is tuple:
|
|
115
|
+
for item in value:
|
|
116
|
+
if type(item) is Node:
|
|
117
|
+
add(item)
|
|
118
|
+
n.children = children
|
|
119
|
+
for c in children:
|
|
84
120
|
c.parent = n
|
|
121
|
+
if children:
|
|
122
|
+
stack.extend(reversed(children))
|
|
123
|
+
return nodes
|
|
85
124
|
|
|
86
125
|
|
|
87
126
|
def ancestors(node):
|
|
@@ -22,12 +22,9 @@ EXIT_COMMANDS = {"exit", "return"}
|
|
|
22
22
|
DECLARING_COMMANDS = {"declare", "typeset", "local", "export", "readonly"}
|
|
23
23
|
|
|
24
24
|
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
def __init__(self, status, integer=False):
|
|
29
|
-
self.status = status
|
|
30
|
-
self.integer = integer
|
|
25
|
+
def VarInfo(status, integer=False):
|
|
26
|
+
"""Variable state as an immutable (status, integer) tuple."""
|
|
27
|
+
return (status, integer)
|
|
31
28
|
|
|
32
29
|
|
|
33
30
|
class Scope:
|
|
@@ -83,21 +80,20 @@ class VarFlow:
|
|
|
83
80
|
break
|
|
84
81
|
old = scope.vars.get(name)
|
|
85
82
|
if integer is None:
|
|
86
|
-
integer = old
|
|
83
|
+
integer = old[1] if old is not None else False
|
|
87
84
|
elif self.conditional_depth:
|
|
88
85
|
# attribute only maybe applied: keep the weaker assumption
|
|
89
|
-
integer = integer and (old
|
|
86
|
+
integer = integer and (old[1] if old is not None else False)
|
|
90
87
|
if integer and status == DIRTY:
|
|
91
88
|
status = CLEAN
|
|
92
89
|
if self.conditional_depth and old is not None:
|
|
93
|
-
status = merge_status(old
|
|
90
|
+
status = merge_status(old[0], status)
|
|
94
91
|
elif self.conditional_depth and old is None:
|
|
95
92
|
status = DIRTY
|
|
96
|
-
scope.vars[name] =
|
|
93
|
+
scope.vars[name] = (status, integer)
|
|
97
94
|
|
|
98
95
|
def snapshot(self):
|
|
99
|
-
return [dict(
|
|
100
|
-
for k, v in s.vars.items()) for s in self.scopes]
|
|
96
|
+
return [dict(s.vars) for s in self.scopes]
|
|
101
97
|
|
|
102
98
|
def restore(self, snap):
|
|
103
99
|
for scope, vars_ in zip(self.scopes, snap):
|
|
@@ -107,20 +103,41 @@ class VarFlow:
|
|
|
107
103
|
"""Merge variable states from multiple branches (worst case)."""
|
|
108
104
|
merged = []
|
|
109
105
|
for level in range(len(self.scopes)):
|
|
106
|
+
dicts = [snap[level] for snap in snaps]
|
|
107
|
+
if len(dicts) == 2:
|
|
108
|
+
d1, d2 = dicts
|
|
109
|
+
if d1 == d2:
|
|
110
|
+
merged.append(dict(d1))
|
|
111
|
+
continue
|
|
112
|
+
vars_ = {}
|
|
113
|
+
for name, i1 in d1.items():
|
|
114
|
+
i2 = d2.get(name)
|
|
115
|
+
if i2 is None:
|
|
116
|
+
vars_[name] = (merge_status(i1[0], DIRTY), False)
|
|
117
|
+
elif i1 == i2:
|
|
118
|
+
vars_[name] = i1
|
|
119
|
+
else:
|
|
120
|
+
vars_[name] = (merge_status(i1[0], i2[0]),
|
|
121
|
+
i1[1] and i2[1])
|
|
122
|
+
for name, i2 in d2.items():
|
|
123
|
+
if name not in d1:
|
|
124
|
+
vars_[name] = (merge_status(i2[0], DIRTY), False)
|
|
125
|
+
merged.append(vars_)
|
|
126
|
+
continue
|
|
110
127
|
allnames = set()
|
|
111
|
-
for
|
|
112
|
-
allnames.update(
|
|
128
|
+
for d in dicts:
|
|
129
|
+
allnames.update(d)
|
|
113
130
|
vars_ = {}
|
|
114
131
|
for name in allnames:
|
|
115
|
-
infos = [snap[level].get(name) for snap in snaps]
|
|
116
132
|
status = None
|
|
117
133
|
integer = True
|
|
118
|
-
for
|
|
119
|
-
|
|
120
|
-
|
|
134
|
+
for d in dicts:
|
|
135
|
+
info = d.get(name)
|
|
136
|
+
s = info[0] if info is not None else DIRTY
|
|
137
|
+
i = info[1] if info is not None else False
|
|
121
138
|
status = s if status is None else merge_status(status, s)
|
|
122
139
|
integer = integer and i
|
|
123
|
-
vars_[name] =
|
|
140
|
+
vars_[name] = (status, integer)
|
|
124
141
|
merged.append(vars_)
|
|
125
142
|
self.restore(merged)
|
|
126
143
|
|
|
@@ -136,9 +153,9 @@ class VarFlow:
|
|
|
136
153
|
info = self.lookup(name)
|
|
137
154
|
if info is None:
|
|
138
155
|
return DIRTY, False
|
|
139
|
-
if info
|
|
156
|
+
if info[1]:
|
|
140
157
|
return CLEAN, True
|
|
141
|
-
return info
|
|
158
|
+
return info[0], False
|
|
142
159
|
|
|
143
160
|
def word_status(self, word):
|
|
144
161
|
"""SpaceStatus of a word's value (assignment RHS semantics)."""
|
|
@@ -249,7 +266,7 @@ class VarFlow:
|
|
|
249
266
|
if assign.get("append"):
|
|
250
267
|
old = self.lookup(assign.name)
|
|
251
268
|
if old is not None:
|
|
252
|
-
status = join_status(old
|
|
269
|
+
status = join_status(old[0], status)
|
|
253
270
|
self.assign(assign.name, status, integer=integer,
|
|
254
271
|
local=is_local, global_="g" in flags)
|
|
255
272
|
self.on_assign(assign.name, value, assign)
|
|
@@ -272,7 +289,7 @@ class VarFlow:
|
|
|
272
289
|
self.assign(t, DIRTY, integer=integer)
|
|
273
290
|
elif integer is not None:
|
|
274
291
|
old = self.lookup(t)
|
|
275
|
-
self.assign(t, old
|
|
292
|
+
self.assign(t, old[0] if old else EMPTY,
|
|
276
293
|
integer=integer)
|
|
277
294
|
elif cmd_name == "read":
|
|
278
295
|
self._apply_read(node)
|
|
@@ -514,6 +531,21 @@ class VarFlow:
|
|
|
514
531
|
if node is None or isinstance(node, str):
|
|
515
532
|
return
|
|
516
533
|
k = node.kind
|
|
534
|
+
if k == "T_Literal" or k == "T_SingleQuoted" or k == "T_Glob":
|
|
535
|
+
return
|
|
536
|
+
if k == "T_NormalWord":
|
|
537
|
+
for p in node.parts:
|
|
538
|
+
kp = p.kind
|
|
539
|
+
if kp != "T_Literal" and kp != "T_SingleQuoted" \
|
|
540
|
+
and kp != "T_Glob":
|
|
541
|
+
self.visit_word(p)
|
|
542
|
+
return
|
|
543
|
+
if k == "T_DoubleQuoted":
|
|
544
|
+
for p in node.parts:
|
|
545
|
+
kp = p.kind
|
|
546
|
+
if kp != "T_Literal":
|
|
547
|
+
self.visit_word(p)
|
|
548
|
+
return
|
|
517
549
|
if k == "T_DollarBraced":
|
|
518
550
|
name = braced_reference(node.content)
|
|
519
551
|
status, integer = self.ref_status(name)
|
|
@@ -544,8 +576,11 @@ class VarFlow:
|
|
|
544
576
|
self.visit_word(op)
|
|
545
577
|
self._apply_redirect_assign(node)
|
|
546
578
|
return
|
|
547
|
-
|
|
548
|
-
|
|
579
|
+
children = node.children
|
|
580
|
+
if children is None:
|
|
581
|
+
from .shast import iter_children
|
|
582
|
+
children = iter_children(node)
|
|
583
|
+
for c in children:
|
|
549
584
|
self.visit_word(c)
|
|
550
585
|
|
|
551
586
|
def visit_arith(self, node):
|
|
@@ -576,8 +611,11 @@ class VarFlow:
|
|
|
576
611
|
for p in node.parts:
|
|
577
612
|
self.visit_word(p)
|
|
578
613
|
return
|
|
579
|
-
|
|
580
|
-
|
|
614
|
+
children = node.children
|
|
615
|
+
if children is None:
|
|
616
|
+
from .shast import iter_children
|
|
617
|
+
children = iter_children(node)
|
|
618
|
+
for c in children:
|
|
581
619
|
if c.kind.startswith("TA_"):
|
|
582
620
|
self.visit_arith(c)
|
|
583
621
|
else:
|
|
@@ -43,19 +43,20 @@ class Assign:
|
|
|
43
43
|
|
|
44
44
|
|
|
45
45
|
class VarScan:
|
|
46
|
-
def __init__(self, root, shell="bash"):
|
|
46
|
+
def __init__(self, root, shell="bash", nodes=None):
|
|
47
47
|
self.refs = []
|
|
48
48
|
self.assigns = []
|
|
49
49
|
self.assoc_arrays = set()
|
|
50
50
|
self._suppressed = set()
|
|
51
51
|
self.root = root
|
|
52
52
|
self.shell = shell
|
|
53
|
+
self.nodes = nodes if nodes is not None else list(walk(root))
|
|
53
54
|
self._prescan_assoc()
|
|
54
55
|
self._scan()
|
|
55
56
|
|
|
56
57
|
def _prescan_assoc(self):
|
|
57
58
|
"""Find `declare -A name` declarations before the main scan."""
|
|
58
|
-
for node in
|
|
59
|
+
for node in self.nodes:
|
|
59
60
|
if node.kind != "T_SimpleCommand" or not node.words:
|
|
60
61
|
continue
|
|
61
62
|
cmd = literal_text(node.words[0])
|
|
@@ -73,12 +74,19 @@ class VarScan:
|
|
|
73
74
|
|
|
74
75
|
# ------------------------------------------------------------------
|
|
75
76
|
|
|
77
|
+
_METHOD_CACHE = {}
|
|
78
|
+
|
|
76
79
|
def _scan(self):
|
|
77
|
-
|
|
80
|
+
methods = self._METHOD_CACHE
|
|
81
|
+
cls = type(self)
|
|
82
|
+
for node in self.nodes:
|
|
78
83
|
k = node.kind
|
|
79
|
-
method =
|
|
84
|
+
method = methods.get(k, False)
|
|
85
|
+
if method is False:
|
|
86
|
+
method = getattr(cls, "_scan_" + k, None)
|
|
87
|
+
methods[k] = method
|
|
80
88
|
if method is not None:
|
|
81
|
-
method(node)
|
|
89
|
+
method(self, node)
|
|
82
90
|
|
|
83
91
|
def _ref(self, name, node, kind="normal"):
|
|
84
92
|
self.refs.append(Ref(name, node, kind))
|
|
@@ -492,20 +500,35 @@ def _flag_arg_attached(flag):
|
|
|
492
500
|
|
|
493
501
|
|
|
494
502
|
def levenshtein(a, b, cap=3):
|
|
495
|
-
"""
|
|
503
|
+
"""Banded edit distance: O(len * cap), returns `cap` once unbeatable."""
|
|
496
504
|
if a == b:
|
|
497
505
|
return 0
|
|
498
|
-
|
|
506
|
+
la, lb = len(a), len(b)
|
|
507
|
+
if abs(la - lb) >= cap:
|
|
499
508
|
return cap
|
|
500
|
-
if
|
|
501
|
-
a, b = b, a
|
|
502
|
-
|
|
503
|
-
|
|
504
|
-
|
|
505
|
-
|
|
506
|
-
|
|
507
|
-
|
|
508
|
-
|
|
509
|
+
if la > lb:
|
|
510
|
+
a, b, la, lb = b, a, lb, la
|
|
511
|
+
if la == 0:
|
|
512
|
+
return lb if lb < cap else cap
|
|
513
|
+
band = cap - 1
|
|
514
|
+
INF = cap + 1
|
|
515
|
+
prev = [j if j <= band else INF for j in range(la + 1)]
|
|
516
|
+
for i in range(1, lb + 1):
|
|
517
|
+
cb = b[i - 1]
|
|
518
|
+
lo = max(1, i - band)
|
|
519
|
+
hi = min(la, i + band)
|
|
520
|
+
cur = [INF] * (la + 1)
|
|
521
|
+
if lo == 1:
|
|
522
|
+
cur[0] = i if i <= band else INF
|
|
523
|
+
best = INF
|
|
524
|
+
for j in range(lo, hi + 1):
|
|
525
|
+
c = min(prev[j] + 1, cur[j - 1] + 1,
|
|
526
|
+
prev[j - 1] + (a[j - 1] != cb))
|
|
527
|
+
cur[j] = c
|
|
528
|
+
if c < best:
|
|
529
|
+
best = c
|
|
530
|
+
if best >= cap:
|
|
509
531
|
return cap
|
|
510
532
|
prev = cur
|
|
511
|
-
|
|
533
|
+
d = prev[la]
|
|
534
|
+
return d if d < cap else cap
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: pureshellcheck
|
|
3
|
-
Version: 0.
|
|
3
|
+
Version: 0.2.0
|
|
4
4
|
Summary: A pure Python reimplementation of ShellCheck's most common checks
|
|
5
5
|
Author: adam2go
|
|
6
6
|
License: MIT
|
|
@@ -54,10 +54,10 @@ rm -rf $BUILD_DIR/*
|
|
|
54
54
|
`shellcheck-py` just download the 30 MB Haskell binary — useless in
|
|
55
55
|
WASM, Lambda layers, or hermetic build sandboxes. pureshellcheck is
|
|
56
56
|
~7000 lines of stdlib-only Python.
|
|
57
|
-
- **In-process speed.** Calling `pureshellcheck.check()` takes ~
|
|
58
|
-
typical script vs ~
|
|
59
|
-
|
|
60
|
-
[Benchmarks](#benchmarks)).
|
|
57
|
+
- **In-process speed.** Calling `pureshellcheck.check()` takes ~1.3 ms for
|
|
58
|
+
a typical script vs ~50 ms to spawn the shellcheck binary (38×), and is
|
|
59
|
+
~33× faster than the binary even on 1200-line scripts; one-line snippets
|
|
60
|
+
check in ~40 µs (see [Benchmarks](#benchmarks)).
|
|
61
61
|
- **Verified against the real thing.** Test cases are extracted from
|
|
62
62
|
ShellCheck's own test suite and the output is differentially tested
|
|
63
63
|
against the shellcheck binary on real-world scripts.
|
|
@@ -136,19 +136,45 @@ the bash AST if you want to build your own analyses.
|
|
|
136
136
|
|
|
137
137
|
## Benchmarks
|
|
138
138
|
|
|
139
|
-
|
|
140
|
-
|
|
141
|
-
|
|
139
|
+
All numbers: CPython 3.12, shellcheck 0.11.0, Apple Silicon. Two
|
|
140
|
+
experiments, each repeated in 3 independent sessions; medians shown
|
|
141
|
+
(session-to-session spread was < 4% everywhere). In both experiments the
|
|
142
|
+
findings are verified identical **before** any timing.
|
|
143
|
+
|
|
144
|
+
**vs the shellcheck binary** (`python tools/bench.py`, median of 9 runs
|
|
145
|
+
per session; both tools timed in the same session):
|
|
142
146
|
|
|
143
147
|
| workload | shellcheck | pureshellcheck | speedup |
|
|
144
148
|
|---|---|---|---|
|
|
145
|
-
|
|
|
146
|
-
| embedded `check()`,
|
|
147
|
-
|
|
|
148
|
-
|
|
|
149
|
+
| embedded `check()`, brew.sh (1216 lines) | 720 ms | 21.7 ms | **33×** |
|
|
150
|
+
| embedded `check()`, 263-line script | 113 ms | 5.1 ms | **22×** |
|
|
151
|
+
| embedded `check()`, 75-line script | 51 ms | 1.3 ms | **38×** |
|
|
152
|
+
| CLI end-to-end, brew.sh | 720 ms | 51 ms | **14×** |
|
|
153
|
+
| CLI end-to-end, 75-line script | 51 ms | 28 ms | 1.8× |
|
|
149
154
|
|
|
150
155
|
The embedded rows are what an agent or editor integration pays per call:
|
|
151
|
-
no process spawn, no binary.
|
|
156
|
+
no process spawn, no binary. A one-line snippet checks in **~40 µs**
|
|
157
|
+
(~25,000 checks/second); throughput on large scripts is ~57k lines/s. CLI
|
|
158
|
+
time is dominated by CPython interpreter startup (~20 ms).
|
|
159
|
+
|
|
160
|
+
**v0.2.0 vs v0.1.0** (controlled before/after,
|
|
161
|
+
`python tools/bench_compare.py`: baseline wheel from PyPI vs this tree in
|
|
162
|
+
the same interpreter, 25–200 in-process repeats, outputs verified
|
|
163
|
+
identical on every workload):
|
|
164
|
+
|
|
165
|
+
| workload | v0.1.0 | v0.2.0 | improvement |
|
|
166
|
+
|---|---|---|---|
|
|
167
|
+
| tiny (1 line) | 0.061 ms | 0.037 ms | 1.6× |
|
|
168
|
+
| small (75 lines) | 2.62 ms | 1.30 ms | 2.0× |
|
|
169
|
+
| medium (263 lines) | 9.46 ms | 4.81 ms | 2.0× |
|
|
170
|
+
| large (1216 lines) | 48.6 ms | 21.3 ms | **2.3×** |
|
|
171
|
+
|
|
172
|
+
The v0.2.0 speedups came from caching the AST child/parent structure and a
|
|
173
|
+
document-order node table (one traversal instead of dozens), making
|
|
174
|
+
variable states immutable tuples so branch snapshots are plain dict
|
|
175
|
+
copies, a banded Levenshtein for SC2153 (fuzz-tested against the
|
|
176
|
+
reference implementation on 20,000 random pairs), and memoizing repeated
|
|
177
|
+
word/command resolution.
|
|
152
178
|
|
|
153
179
|
## Compatibility notes
|
|
154
180
|
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
|
File without changes
|
{pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck.egg-info/dependency_links.txt
RENAMED
|
File without changes
|
|
File without changes
|
|
File without changes
|