pureshellcheck 0.1.0__tar.gz → 0.2.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (24) hide show
  1. {pureshellcheck-0.1.0/src/pureshellcheck.egg-info → pureshellcheck-0.2.0}/PKG-INFO +39 -13
  2. {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/README.md +38 -12
  3. {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/pyproject.toml +1 -1
  4. {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/__init__.py +1 -1
  5. {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/analyzer.py +32 -10
  6. {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/astlib.py +3 -0
  7. {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/checks/commands.py +1 -1
  8. {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/checks/misc.py +1 -1
  9. {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/checks/variables.py +14 -14
  10. {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/parser.py +18 -2
  11. {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/shast.py +46 -7
  12. {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/varflow.py +65 -27
  13. {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/varscan.py +40 -17
  14. {pureshellcheck-0.1.0 → pureshellcheck-0.2.0/src/pureshellcheck.egg-info}/PKG-INFO +39 -13
  15. {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/LICENSE +0 -0
  16. {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/MANIFEST.in +0 -0
  17. {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/setup.cfg +0 -0
  18. {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/checks/__init__.py +0 -0
  19. {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/checks/quoting.py +0 -0
  20. {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/cli.py +0 -0
  21. {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck.egg-info/SOURCES.txt +0 -0
  22. {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck.egg-info/dependency_links.txt +0 -0
  23. {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck.egg-info/entry_points.txt +0 -0
  24. {pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck.egg-info/top_level.txt +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: pureshellcheck
3
- Version: 0.1.0
3
+ Version: 0.2.0
4
4
  Summary: A pure Python reimplementation of ShellCheck's most common checks
5
5
  Author: adam2go
6
6
  License: MIT
@@ -54,10 +54,10 @@ rm -rf $BUILD_DIR/*
54
54
  `shellcheck-py` just download the 30 MB Haskell binary — useless in
55
55
  WASM, Lambda layers, or hermetic build sandboxes. pureshellcheck is
56
56
  ~7000 lines of stdlib-only Python.
57
- - **In-process speed.** Calling `pureshellcheck.check()` takes ~2 ms for a
58
- typical script vs ~40 ms to spawn the shellcheck binary and it's
59
- 8–13× faster than the binary even on 1200-line scripts (see
60
- [Benchmarks](#benchmarks)).
57
+ - **In-process speed.** Calling `pureshellcheck.check()` takes ~1.3 ms for
58
+ a typical script vs ~50 ms to spawn the shellcheck binary (38×), and is
59
+ ~33× faster than the binary even on 1200-line scripts; one-line snippets
60
+ check in ~40 µs (see [Benchmarks](#benchmarks)).
61
61
  - **Verified against the real thing.** Test cases are extracted from
62
62
  ShellCheck's own test suite and the output is differentially tested
63
63
  against the shellcheck binary on real-world scripts.
@@ -136,19 +136,45 @@ the bash AST if you want to build your own analyses.
136
136
 
137
137
  ## Benchmarks
138
138
 
139
- Measured with `python tools/bench.py` (median of 7 runs, after verifying
140
- both tools report identical findings on the workload; CPython 3.12,
141
- shellcheck 0.11.0, Apple Silicon):
139
+ All numbers: CPython 3.12, shellcheck 0.11.0, Apple Silicon. Two
140
+ experiments, each repeated in 3 independent sessions; medians shown
141
+ (session-to-session spread was < 4% everywhere). In both experiments the
142
+ findings are verified identical **before** any timing.
143
+
144
+ **vs the shellcheck binary** (`python tools/bench.py`, median of 9 runs
145
+ per session; both tools timed in the same session):
142
146
 
143
147
  | workload | shellcheck | pureshellcheck | speedup |
144
148
  |---|---|---|---|
145
- | CLI, brew.sh (1216 lines) | 604 ms | 68 ms | **8.9×** |
146
- | embedded `check()`, brew.sh | 604 ms | 45 ms | **13.3×** |
147
- | CLI, 75-line script | 42 ms | 24 ms | 1.8× |
148
- | embedded `check()`, 75-line script | 42 ms | 2.4 ms | **17×** |
149
+ | embedded `check()`, brew.sh (1216 lines) | 720 ms | 21.7 ms | **33×** |
150
+ | embedded `check()`, 263-line script | 113 ms | 5.1 ms | **22×** |
151
+ | embedded `check()`, 75-line script | 51 ms | 1.3 ms | **38×** |
152
+ | CLI end-to-end, brew.sh | 720 ms | 51 ms | **14×** |
153
+ | CLI end-to-end, 75-line script | 51 ms | 28 ms | 1.8× |
149
154
 
150
155
  The embedded rows are what an agent or editor integration pays per call:
151
- no process spawn, no binary.
156
+ no process spawn, no binary. A one-line snippet checks in **~40 µs**
157
+ (~25,000 checks/second); throughput on large scripts is ~57k lines/s. CLI
158
+ time is dominated by CPython interpreter startup (~20 ms).
159
+
160
+ **v0.2.0 vs v0.1.0** (controlled before/after,
161
+ `python tools/bench_compare.py`: baseline wheel from PyPI vs this tree in
162
+ the same interpreter, 25–200 in-process repeats, outputs verified
163
+ identical on every workload):
164
+
165
+ | workload | v0.1.0 | v0.2.0 | improvement |
166
+ |---|---|---|---|
167
+ | tiny (1 line) | 0.061 ms | 0.037 ms | 1.6× |
168
+ | small (75 lines) | 2.62 ms | 1.30 ms | 2.0× |
169
+ | medium (263 lines) | 9.46 ms | 4.81 ms | 2.0× |
170
+ | large (1216 lines) | 48.6 ms | 21.3 ms | **2.3×** |
171
+
172
+ The v0.2.0 speedups came from caching the AST child/parent structure and a
173
+ document-order node table (one traversal instead of dozens), making
174
+ variable states immutable tuples so branch snapshots are plain dict
175
+ copies, a banded Levenshtein for SC2153 (fuzz-tested against the
176
+ reference implementation on 20,000 random pairs), and memoizing repeated
177
+ word/command resolution.
152
178
 
153
179
  ## Compatibility notes
154
180
 
@@ -27,10 +27,10 @@ rm -rf $BUILD_DIR/*
27
27
  `shellcheck-py` just download the 30 MB Haskell binary — useless in
28
28
  WASM, Lambda layers, or hermetic build sandboxes. pureshellcheck is
29
29
  ~7000 lines of stdlib-only Python.
30
- - **In-process speed.** Calling `pureshellcheck.check()` takes ~2 ms for a
31
- typical script vs ~40 ms to spawn the shellcheck binary and it's
32
- 8–13× faster than the binary even on 1200-line scripts (see
33
- [Benchmarks](#benchmarks)).
30
+ - **In-process speed.** Calling `pureshellcheck.check()` takes ~1.3 ms for
31
+ a typical script vs ~50 ms to spawn the shellcheck binary (38×), and is
32
+ ~33× faster than the binary even on 1200-line scripts; one-line snippets
33
+ check in ~40 µs (see [Benchmarks](#benchmarks)).
34
34
  - **Verified against the real thing.** Test cases are extracted from
35
35
  ShellCheck's own test suite and the output is differentially tested
36
36
  against the shellcheck binary on real-world scripts.
@@ -109,19 +109,45 @@ the bash AST if you want to build your own analyses.
109
109
 
110
110
  ## Benchmarks
111
111
 
112
- Measured with `python tools/bench.py` (median of 7 runs, after verifying
113
- both tools report identical findings on the workload; CPython 3.12,
114
- shellcheck 0.11.0, Apple Silicon):
112
+ All numbers: CPython 3.12, shellcheck 0.11.0, Apple Silicon. Two
113
+ experiments, each repeated in 3 independent sessions; medians shown
114
+ (session-to-session spread was < 4% everywhere). In both experiments the
115
+ findings are verified identical **before** any timing.
116
+
117
+ **vs the shellcheck binary** (`python tools/bench.py`, median of 9 runs
118
+ per session; both tools timed in the same session):
115
119
 
116
120
  | workload | shellcheck | pureshellcheck | speedup |
117
121
  |---|---|---|---|
118
- | CLI, brew.sh (1216 lines) | 604 ms | 68 ms | **8.9×** |
119
- | embedded `check()`, brew.sh | 604 ms | 45 ms | **13.3×** |
120
- | CLI, 75-line script | 42 ms | 24 ms | 1.8× |
121
- | embedded `check()`, 75-line script | 42 ms | 2.4 ms | **17×** |
122
+ | embedded `check()`, brew.sh (1216 lines) | 720 ms | 21.7 ms | **33×** |
123
+ | embedded `check()`, 263-line script | 113 ms | 5.1 ms | **22×** |
124
+ | embedded `check()`, 75-line script | 51 ms | 1.3 ms | **38×** |
125
+ | CLI end-to-end, brew.sh | 720 ms | 51 ms | **14×** |
126
+ | CLI end-to-end, 75-line script | 51 ms | 28 ms | 1.8× |
122
127
 
123
128
  The embedded rows are what an agent or editor integration pays per call:
124
- no process spawn, no binary.
129
+ no process spawn, no binary. A one-line snippet checks in **~40 µs**
130
+ (~25,000 checks/second); throughput on large scripts is ~57k lines/s. CLI
131
+ time is dominated by CPython interpreter startup (~20 ms).
132
+
133
+ **v0.2.0 vs v0.1.0** (controlled before/after,
134
+ `python tools/bench_compare.py`: baseline wheel from PyPI vs this tree in
135
+ the same interpreter, 25–200 in-process repeats, outputs verified
136
+ identical on every workload):
137
+
138
+ | workload | v0.1.0 | v0.2.0 | improvement |
139
+ |---|---|---|---|
140
+ | tiny (1 line) | 0.061 ms | 0.037 ms | 1.6× |
141
+ | small (75 lines) | 2.62 ms | 1.30 ms | 2.0× |
142
+ | medium (263 lines) | 9.46 ms | 4.81 ms | 2.0× |
143
+ | large (1216 lines) | 48.6 ms | 21.3 ms | **2.3×** |
144
+
145
+ The v0.2.0 speedups came from caching the AST child/parent structure and a
146
+ document-order node table (one traversal instead of dozens), making
147
+ variable states immutable tuples so branch snapshots are plain dict
148
+ copies, a banded Levenshtein for SC2153 (fuzz-tested against the
149
+ reference implementation on 20,000 random pairs), and memoizing repeated
150
+ word/command resolution.
125
151
 
126
152
  ## Compatibility notes
127
153
 
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
4
4
 
5
5
  [project]
6
6
  name = "pureshellcheck"
7
- version = "0.1.0"
7
+ version = "0.2.0"
8
8
  description = "A pure Python reimplementation of ShellCheck's most common checks"
9
9
  readme = "README.md"
10
10
  requires-python = ">=3.9"
@@ -6,7 +6,7 @@ common checks.
6
6
  ... print(finding.line, finding.column, finding.code, finding.message)
7
7
  """
8
8
 
9
- __version__ = "0.1.0"
9
+ __version__ = "0.2.0"
10
10
 
11
11
  from .analyzer import Finding, run_checks # noqa: F401
12
12
  from .parser import ParseError, parse # noqa: F401
@@ -81,6 +81,7 @@ class Context:
81
81
  self.positions = Positions(source)
82
82
  self.findings = []
83
83
  self.cache = {}
84
+ self.nodes = None # doc-order node list, set by run_checks
84
85
 
85
86
  # -- emission ------------------------------------------------------
86
87
 
@@ -125,6 +126,14 @@ class Context:
125
126
 
126
127
  def command_resolution(self, cmd):
127
128
  """(name_word, index, wrapper_names) after skipping wrappers."""
129
+ cached = cmd.fields.get("_cmdres")
130
+ if cached is not None:
131
+ return cached
132
+ result = self._command_resolution(cmd)
133
+ cmd.fields["_cmdres"] = result
134
+ return result
135
+
136
+ def _command_resolution(self, cmd):
128
137
  if cmd.kind != "T_SimpleCommand" or not cmd.words:
129
138
  return None, -1, []
130
139
  words = cmd.words
@@ -279,13 +288,24 @@ class Context:
279
288
  return "other", None
280
289
 
281
290
 
282
- def statement_lists(root):
291
+ STATEMENT_CONTAINER_KINDS = frozenset({
292
+ "T_Script", "T_BraceGroup", "T_Subshell", "T_WhileExpression",
293
+ "T_UntilExpression", "T_ForIn", "T_ForArithmetic", "T_SelectIn",
294
+ "T_IfExpression", "T_CaseItem", "T_DollarExpansion", "T_Backticked",
295
+ "T_ProcSub", "T_DollarBraceCommandExpansion", "T_CoProc", "T_BatsTest",
296
+ })
297
+
298
+
299
+ def statement_lists(nodes):
283
300
  """Yield every list of statement nodes in the tree."""
284
- for node in walk(root):
301
+ container = STATEMENT_CONTAINER_KINDS
302
+ for node in nodes:
303
+ if node.kind not in container:
304
+ continue
285
305
  f = node.fields
286
- for key in ("commands", "body", "condition", "else_body", "init"):
306
+ for key in ("commands", "body", "condition", "else_body"):
287
307
  v = f.get(key)
288
- if isinstance(v, list) and v and isinstance(v[0], object):
308
+ if type(v) is list and v:
289
309
  yield v
290
310
  branches = f.get("branches")
291
311
  if branches:
@@ -294,14 +314,14 @@ def statement_lists(root):
294
314
  yield body
295
315
 
296
316
 
297
- def apply_directives(findings, directives, root, source, positions):
317
+ def apply_directives(findings, directives, nodes, source, positions):
298
318
  """Filter findings according to `# shellcheck disable=` directives."""
299
319
  if not directives:
300
320
  return findings
301
321
  # statements eligible as directive targets, sorted by position
302
322
  statements = []
303
323
  seen = set()
304
- for lst in statement_lists(root):
324
+ for lst in statement_lists(nodes):
305
325
  for node in lst:
306
326
  if id(node) not in seen:
307
327
  seen.add(id(node))
@@ -372,7 +392,7 @@ def run_checks(source, shell=None, include_optional=False,
372
392
  min(e.pos + 1, len(source)))
373
393
  f.locate(Positions(source))
374
394
  return [f], e
375
- set_parents(root)
395
+ nodes = set_parents(root)
376
396
 
377
397
  detected = shell_from_shebang(root.get("shebang"))
378
398
  directives = parser.directives
@@ -386,16 +406,18 @@ def run_checks(source, shell=None, include_optional=False,
386
406
  ctx.detected_shell = detected
387
407
  ctx.explicit_shell = shell
388
408
  ctx.directives = directives
409
+ ctx.nodes = nodes
389
410
 
390
- for node in walk(root):
391
- fns = NODE_CHECKS.get(node.kind)
411
+ get_checks = NODE_CHECKS.get
412
+ for node in nodes:
413
+ fns = get_checks(node.kind)
392
414
  if fns:
393
415
  for fn in fns:
394
416
  fn(ctx, node)
395
417
  for fn in TREE_CHECKS:
396
418
  fn(ctx, root)
397
419
 
398
- findings = apply_directives(ctx.findings, directives, root, source,
420
+ findings = apply_directives(ctx.findings, directives, nodes, source,
399
421
  ctx.positions)
400
422
  findings.sort(key=lambda f: (f.pos, f.code))
401
423
  for f in findings:
@@ -1,6 +1,7 @@
1
1
  """Shared AST analysis helpers, ported from ShellCheck's ASTLib semantics."""
2
2
 
3
3
  import re
4
+ from functools import lru_cache
4
5
 
5
6
  from .shast import ancestors, walk
6
7
  from .parser import literal_text, quoted_literal_text
@@ -108,6 +109,7 @@ def is_glob_free_literal(text):
108
109
  # ----------------------------------------------------------------------
109
110
  # ${...} decomposition
110
111
 
112
+ @lru_cache(maxsize=4096)
111
113
  def braced_reference(content):
112
114
  """The variable name referenced by ${content}."""
113
115
  s = content
@@ -119,6 +121,7 @@ def braced_reference(content):
119
121
  return m.group(0) if m else s
120
122
 
121
123
 
124
+ @lru_cache(maxsize=4096)
122
125
  def braced_modifier(content):
123
126
  """Everything after the name/indices in ${content}."""
124
127
  s = content
@@ -277,7 +277,7 @@ def has_set_e(ctx):
277
277
  if SET_E_SHEBANG_RE.search(shebang):
278
278
  result = True
279
279
  else:
280
- for node in walk(ctx.root):
280
+ for node in ctx.nodes:
281
281
  if node.kind != "T_SimpleCommand" or not node.words:
282
282
  continue
283
283
  if first_word_basename(node) != "set":
@@ -94,7 +94,7 @@ def _has_execfail(ctx):
94
94
  if ctx.shell in ("sh", "dash", "ash"):
95
95
  ctx.cache["execfail"] = False
96
96
  return False
97
- for n in walk(ctx.root):
97
+ for n in ctx.nodes:
98
98
  if n.kind == "T_SimpleCommand" and \
99
99
  first_word_basename(n) == "shopt":
100
100
  if any(word_approx(w) == "execfail" for w in n.words[1:]):
@@ -54,7 +54,7 @@ COMMON_COMMANDS_HINT = frozenset({
54
54
  def get_varscan(ctx):
55
55
  scan = ctx.cache.get("varscan")
56
56
  if scan is None:
57
- scan = VarScan(ctx.root, ctx.shell)
57
+ scan = VarScan(ctx.root, ctx.shell, nodes=ctx.nodes)
58
58
  ctx.cache["varscan"] = scan
59
59
  return scan
60
60
 
@@ -196,21 +196,21 @@ def check_arithmetic_deref(ctx, node):
196
196
  " variables.")
197
197
 
198
198
 
199
- @tree_check
200
- def check_assignment_index_deref(ctx, root):
199
+ @node_check("T_Assignment")
200
+ def check_assignment_index_deref(ctx, node):
201
201
  # a[$i]=foo for indexed arrays
202
+ indices = node.get("indices")
203
+ if not indices:
204
+ return
202
205
  scan = get_varscan(ctx)
203
- for node in walk(root):
204
- if node.kind != "T_Assignment":
205
- continue
206
- if node.name in scan.assoc_arrays:
207
- continue
208
- for idx in node.get("indices", ()):
209
- if isinstance(idx, str) \
210
- and re.fullmatch(r"\$\{?[A-Za-z_][A-Za-z0-9_]*\}?",
211
- idx.strip()):
212
- ctx.style(node, 2004, "$/${} is unnecessary on arithmetic"
213
- " variables.")
206
+ if node.name in scan.assoc_arrays:
207
+ return
208
+ for idx in indices:
209
+ if isinstance(idx, str) \
210
+ and re.fullmatch(r"\$\{?[A-Za-z_][A-Za-z0-9_]*\}?",
211
+ idx.strip()):
212
+ ctx.style(node, 2004, "$/${} is unnecessary on arithmetic"
213
+ " variables.")
214
214
 
215
215
 
216
216
  def _in_let(node):
@@ -1023,7 +1023,15 @@ class Parser:
1023
1023
  start = self.i
1024
1024
  parts = []
1025
1025
  src, n = self.src, self.n
1026
+ run_match = UNQUOTED_RUN.match
1026
1027
  while self.i < n:
1028
+ m = run_match(src, self.i)
1029
+ if m is not None:
1030
+ parts.append(Node("T_Literal", self.i, m.end(),
1031
+ text=m.group(0)))
1032
+ self.i = m.end()
1033
+ if self.i >= n:
1034
+ break
1027
1035
  c = src[self.i]
1028
1036
  if c in METACHARS or c in stop_chars:
1029
1037
  if c in "<>" and src[self.i + 1:self.i + 2] == "(" \
@@ -1917,13 +1925,21 @@ def literal_text(word):
1917
1925
  """The literal string of a word made only of literal parts, else None."""
1918
1926
  if word is None or word.kind != "T_NormalWord":
1919
1927
  return None
1928
+ fields = word.fields
1929
+ cached = fields.get("_lit", False)
1930
+ if cached is not False:
1931
+ return cached
1920
1932
  out = []
1933
+ result = None
1921
1934
  for p in word.parts:
1922
1935
  if p.kind == "T_Literal":
1923
1936
  out.append(p.text)
1924
1937
  else:
1925
- return None
1926
- return "".join(out)
1938
+ break
1939
+ else:
1940
+ result = "".join(out)
1941
+ fields["_lit"] = result
1942
+ return result
1927
1943
 
1928
1944
 
1929
1945
  def heredoc_delimiter(word):
@@ -13,7 +13,7 @@ import bisect
13
13
 
14
14
 
15
15
  class Node:
16
- __slots__ = ("kind", "pos", "end", "parent", "fields")
16
+ __slots__ = ("kind", "pos", "end", "parent", "fields", "children")
17
17
 
18
18
  def __init__(self, kind, pos, end, **fields):
19
19
  self.kind = kind
@@ -21,6 +21,7 @@ class Node:
21
21
  self.end = end
22
22
  self.parent = None
23
23
  self.fields = fields
24
+ self.children = None # filled by set_parents
24
25
 
25
26
  def __getattr__(self, name):
26
27
  try:
@@ -70,18 +71,56 @@ def iter_children(node):
70
71
  def walk(node):
71
72
  """Yield node and all descendants in document order."""
72
73
  stack = [node]
74
+ pop = stack.pop
73
75
  while stack:
74
- n = stack.pop()
76
+ n = pop()
75
77
  yield n
76
- children = list(iter_children(n))
77
- children.reverse()
78
- stack.extend(children)
78
+ children = n.children
79
+ if children is None:
80
+ children = list(iter_children(n))
81
+ if children:
82
+ stack.extend(reversed(children))
79
83
 
80
84
 
81
85
  def set_parents(root):
82
- for n in walk(root):
83
- for c in iter_children(n):
86
+ """Link parents, cache children, and return all nodes in doc order."""
87
+ nodes = []
88
+ append = nodes.append
89
+ stack = [root]
90
+ pop = stack.pop
91
+ while stack:
92
+ n = pop()
93
+ append(n)
94
+ children = []
95
+ add = children.append
96
+ for value in n.fields.values():
97
+ tv = type(value)
98
+ if tv is Node:
99
+ add(value)
100
+ elif tv is list:
101
+ for item in value:
102
+ ti = type(item)
103
+ if ti is Node:
104
+ add(item)
105
+ elif ti is list or ti is tuple:
106
+ for sub in item:
107
+ ts = type(sub)
108
+ if ts is Node:
109
+ add(sub)
110
+ elif ts is list:
111
+ for s2 in sub:
112
+ if type(s2) is Node:
113
+ add(s2)
114
+ elif tv is tuple:
115
+ for item in value:
116
+ if type(item) is Node:
117
+ add(item)
118
+ n.children = children
119
+ for c in children:
84
120
  c.parent = n
121
+ if children:
122
+ stack.extend(reversed(children))
123
+ return nodes
85
124
 
86
125
 
87
126
  def ancestors(node):
@@ -22,12 +22,9 @@ EXIT_COMMANDS = {"exit", "return"}
22
22
  DECLARING_COMMANDS = {"declare", "typeset", "local", "export", "readonly"}
23
23
 
24
24
 
25
- class VarInfo:
26
- __slots__ = ("status", "integer")
27
-
28
- def __init__(self, status, integer=False):
29
- self.status = status
30
- self.integer = integer
25
+ def VarInfo(status, integer=False):
26
+ """Variable state as an immutable (status, integer) tuple."""
27
+ return (status, integer)
31
28
 
32
29
 
33
30
  class Scope:
@@ -83,21 +80,20 @@ class VarFlow:
83
80
  break
84
81
  old = scope.vars.get(name)
85
82
  if integer is None:
86
- integer = old.integer if old is not None else False
83
+ integer = old[1] if old is not None else False
87
84
  elif self.conditional_depth:
88
85
  # attribute only maybe applied: keep the weaker assumption
89
- integer = integer and (old.integer if old is not None else False)
86
+ integer = integer and (old[1] if old is not None else False)
90
87
  if integer and status == DIRTY:
91
88
  status = CLEAN
92
89
  if self.conditional_depth and old is not None:
93
- status = merge_status(old.status, status)
90
+ status = merge_status(old[0], status)
94
91
  elif self.conditional_depth and old is None:
95
92
  status = DIRTY
96
- scope.vars[name] = VarInfo(status, integer)
93
+ scope.vars[name] = (status, integer)
97
94
 
98
95
  def snapshot(self):
99
- return [dict((k, VarInfo(v.status, v.integer))
100
- for k, v in s.vars.items()) for s in self.scopes]
96
+ return [dict(s.vars) for s in self.scopes]
101
97
 
102
98
  def restore(self, snap):
103
99
  for scope, vars_ in zip(self.scopes, snap):
@@ -107,20 +103,41 @@ class VarFlow:
107
103
  """Merge variable states from multiple branches (worst case)."""
108
104
  merged = []
109
105
  for level in range(len(self.scopes)):
106
+ dicts = [snap[level] for snap in snaps]
107
+ if len(dicts) == 2:
108
+ d1, d2 = dicts
109
+ if d1 == d2:
110
+ merged.append(dict(d1))
111
+ continue
112
+ vars_ = {}
113
+ for name, i1 in d1.items():
114
+ i2 = d2.get(name)
115
+ if i2 is None:
116
+ vars_[name] = (merge_status(i1[0], DIRTY), False)
117
+ elif i1 == i2:
118
+ vars_[name] = i1
119
+ else:
120
+ vars_[name] = (merge_status(i1[0], i2[0]),
121
+ i1[1] and i2[1])
122
+ for name, i2 in d2.items():
123
+ if name not in d1:
124
+ vars_[name] = (merge_status(i2[0], DIRTY), False)
125
+ merged.append(vars_)
126
+ continue
110
127
  allnames = set()
111
- for snap in snaps:
112
- allnames.update(snap[level])
128
+ for d in dicts:
129
+ allnames.update(d)
113
130
  vars_ = {}
114
131
  for name in allnames:
115
- infos = [snap[level].get(name) for snap in snaps]
116
132
  status = None
117
133
  integer = True
118
- for info in infos:
119
- s = info.status if info is not None else DIRTY
120
- i = info.integer if info is not None else False
134
+ for d in dicts:
135
+ info = d.get(name)
136
+ s = info[0] if info is not None else DIRTY
137
+ i = info[1] if info is not None else False
121
138
  status = s if status is None else merge_status(status, s)
122
139
  integer = integer and i
123
- vars_[name] = VarInfo(status, integer)
140
+ vars_[name] = (status, integer)
124
141
  merged.append(vars_)
125
142
  self.restore(merged)
126
143
 
@@ -136,9 +153,9 @@ class VarFlow:
136
153
  info = self.lookup(name)
137
154
  if info is None:
138
155
  return DIRTY, False
139
- if info.integer:
156
+ if info[1]:
140
157
  return CLEAN, True
141
- return info.status, False
158
+ return info[0], False
142
159
 
143
160
  def word_status(self, word):
144
161
  """SpaceStatus of a word's value (assignment RHS semantics)."""
@@ -249,7 +266,7 @@ class VarFlow:
249
266
  if assign.get("append"):
250
267
  old = self.lookup(assign.name)
251
268
  if old is not None:
252
- status = join_status(old.status, status)
269
+ status = join_status(old[0], status)
253
270
  self.assign(assign.name, status, integer=integer,
254
271
  local=is_local, global_="g" in flags)
255
272
  self.on_assign(assign.name, value, assign)
@@ -272,7 +289,7 @@ class VarFlow:
272
289
  self.assign(t, DIRTY, integer=integer)
273
290
  elif integer is not None:
274
291
  old = self.lookup(t)
275
- self.assign(t, old.status if old else EMPTY,
292
+ self.assign(t, old[0] if old else EMPTY,
276
293
  integer=integer)
277
294
  elif cmd_name == "read":
278
295
  self._apply_read(node)
@@ -514,6 +531,21 @@ class VarFlow:
514
531
  if node is None or isinstance(node, str):
515
532
  return
516
533
  k = node.kind
534
+ if k == "T_Literal" or k == "T_SingleQuoted" or k == "T_Glob":
535
+ return
536
+ if k == "T_NormalWord":
537
+ for p in node.parts:
538
+ kp = p.kind
539
+ if kp != "T_Literal" and kp != "T_SingleQuoted" \
540
+ and kp != "T_Glob":
541
+ self.visit_word(p)
542
+ return
543
+ if k == "T_DoubleQuoted":
544
+ for p in node.parts:
545
+ kp = p.kind
546
+ if kp != "T_Literal":
547
+ self.visit_word(p)
548
+ return
517
549
  if k == "T_DollarBraced":
518
550
  name = braced_reference(node.content)
519
551
  status, integer = self.ref_status(name)
@@ -544,8 +576,11 @@ class VarFlow:
544
576
  self.visit_word(op)
545
577
  self._apply_redirect_assign(node)
546
578
  return
547
- from .shast import iter_children
548
- for c in iter_children(node):
579
+ children = node.children
580
+ if children is None:
581
+ from .shast import iter_children
582
+ children = iter_children(node)
583
+ for c in children:
549
584
  self.visit_word(c)
550
585
 
551
586
  def visit_arith(self, node):
@@ -576,8 +611,11 @@ class VarFlow:
576
611
  for p in node.parts:
577
612
  self.visit_word(p)
578
613
  return
579
- from .shast import iter_children
580
- for c in iter_children(node):
614
+ children = node.children
615
+ if children is None:
616
+ from .shast import iter_children
617
+ children = iter_children(node)
618
+ for c in children:
581
619
  if c.kind.startswith("TA_"):
582
620
  self.visit_arith(c)
583
621
  else:
@@ -43,19 +43,20 @@ class Assign:
43
43
 
44
44
 
45
45
  class VarScan:
46
- def __init__(self, root, shell="bash"):
46
+ def __init__(self, root, shell="bash", nodes=None):
47
47
  self.refs = []
48
48
  self.assigns = []
49
49
  self.assoc_arrays = set()
50
50
  self._suppressed = set()
51
51
  self.root = root
52
52
  self.shell = shell
53
+ self.nodes = nodes if nodes is not None else list(walk(root))
53
54
  self._prescan_assoc()
54
55
  self._scan()
55
56
 
56
57
  def _prescan_assoc(self):
57
58
  """Find `declare -A name` declarations before the main scan."""
58
- for node in walk(self.root):
59
+ for node in self.nodes:
59
60
  if node.kind != "T_SimpleCommand" or not node.words:
60
61
  continue
61
62
  cmd = literal_text(node.words[0])
@@ -73,12 +74,19 @@ class VarScan:
73
74
 
74
75
  # ------------------------------------------------------------------
75
76
 
77
+ _METHOD_CACHE = {}
78
+
76
79
  def _scan(self):
77
- for node in walk(self.root):
80
+ methods = self._METHOD_CACHE
81
+ cls = type(self)
82
+ for node in self.nodes:
78
83
  k = node.kind
79
- method = getattr(self, "_scan_" + k, None)
84
+ method = methods.get(k, False)
85
+ if method is False:
86
+ method = getattr(cls, "_scan_" + k, None)
87
+ methods[k] = method
80
88
  if method is not None:
81
- method(node)
89
+ method(self, node)
82
90
 
83
91
  def _ref(self, name, node, kind="normal"):
84
92
  self.refs.append(Ref(name, node, kind))
@@ -492,20 +500,35 @@ def _flag_arg_attached(flag):
492
500
 
493
501
 
494
502
  def levenshtein(a, b, cap=3):
495
- """Edit distance, returning `cap` early once it can't be beaten."""
503
+ """Banded edit distance: O(len * cap), returns `cap` once unbeatable."""
496
504
  if a == b:
497
505
  return 0
498
- if abs(len(a) - len(b)) >= cap:
506
+ la, lb = len(a), len(b)
507
+ if abs(la - lb) >= cap:
499
508
  return cap
500
- if len(a) > len(b):
501
- a, b = b, a
502
- prev = list(range(len(a) + 1))
503
- for i, cb in enumerate(b):
504
- cur = [i + 1]
505
- for j, ca in enumerate(a):
506
- cur.append(min(prev[j + 1] + 1, cur[j] + 1,
507
- prev[j] + (ca != cb)))
508
- if min(cur) >= cap:
509
+ if la > lb:
510
+ a, b, la, lb = b, a, lb, la
511
+ if la == 0:
512
+ return lb if lb < cap else cap
513
+ band = cap - 1
514
+ INF = cap + 1
515
+ prev = [j if j <= band else INF for j in range(la + 1)]
516
+ for i in range(1, lb + 1):
517
+ cb = b[i - 1]
518
+ lo = max(1, i - band)
519
+ hi = min(la, i + band)
520
+ cur = [INF] * (la + 1)
521
+ if lo == 1:
522
+ cur[0] = i if i <= band else INF
523
+ best = INF
524
+ for j in range(lo, hi + 1):
525
+ c = min(prev[j] + 1, cur[j - 1] + 1,
526
+ prev[j - 1] + (a[j - 1] != cb))
527
+ cur[j] = c
528
+ if c < best:
529
+ best = c
530
+ if best >= cap:
509
531
  return cap
510
532
  prev = cur
511
- return prev[-1]
533
+ d = prev[la]
534
+ return d if d < cap else cap
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: pureshellcheck
3
- Version: 0.1.0
3
+ Version: 0.2.0
4
4
  Summary: A pure Python reimplementation of ShellCheck's most common checks
5
5
  Author: adam2go
6
6
  License: MIT
@@ -54,10 +54,10 @@ rm -rf $BUILD_DIR/*
54
54
  `shellcheck-py` just download the 30 MB Haskell binary — useless in
55
55
  WASM, Lambda layers, or hermetic build sandboxes. pureshellcheck is
56
56
  ~7000 lines of stdlib-only Python.
57
- - **In-process speed.** Calling `pureshellcheck.check()` takes ~2 ms for a
58
- typical script vs ~40 ms to spawn the shellcheck binary and it's
59
- 8–13× faster than the binary even on 1200-line scripts (see
60
- [Benchmarks](#benchmarks)).
57
+ - **In-process speed.** Calling `pureshellcheck.check()` takes ~1.3 ms for
58
+ a typical script vs ~50 ms to spawn the shellcheck binary (38×), and is
59
+ ~33× faster than the binary even on 1200-line scripts; one-line snippets
60
+ check in ~40 µs (see [Benchmarks](#benchmarks)).
61
61
  - **Verified against the real thing.** Test cases are extracted from
62
62
  ShellCheck's own test suite and the output is differentially tested
63
63
  against the shellcheck binary on real-world scripts.
@@ -136,19 +136,45 @@ the bash AST if you want to build your own analyses.
136
136
 
137
137
  ## Benchmarks
138
138
 
139
- Measured with `python tools/bench.py` (median of 7 runs, after verifying
140
- both tools report identical findings on the workload; CPython 3.12,
141
- shellcheck 0.11.0, Apple Silicon):
139
+ All numbers: CPython 3.12, shellcheck 0.11.0, Apple Silicon. Two
140
+ experiments, each repeated in 3 independent sessions; medians shown
141
+ (session-to-session spread was < 4% everywhere). In both experiments the
142
+ findings are verified identical **before** any timing.
143
+
144
+ **vs the shellcheck binary** (`python tools/bench.py`, median of 9 runs
145
+ per session; both tools timed in the same session):
142
146
 
143
147
  | workload | shellcheck | pureshellcheck | speedup |
144
148
  |---|---|---|---|
145
- | CLI, brew.sh (1216 lines) | 604 ms | 68 ms | **8.9×** |
146
- | embedded `check()`, brew.sh | 604 ms | 45 ms | **13.3×** |
147
- | CLI, 75-line script | 42 ms | 24 ms | 1.8× |
148
- | embedded `check()`, 75-line script | 42 ms | 2.4 ms | **17×** |
149
+ | embedded `check()`, brew.sh (1216 lines) | 720 ms | 21.7 ms | **33×** |
150
+ | embedded `check()`, 263-line script | 113 ms | 5.1 ms | **22×** |
151
+ | embedded `check()`, 75-line script | 51 ms | 1.3 ms | **38×** |
152
+ | CLI end-to-end, brew.sh | 720 ms | 51 ms | **14×** |
153
+ | CLI end-to-end, 75-line script | 51 ms | 28 ms | 1.8× |
149
154
 
150
155
  The embedded rows are what an agent or editor integration pays per call:
151
- no process spawn, no binary.
156
+ no process spawn, no binary. A one-line snippet checks in **~40 µs**
157
+ (~25,000 checks/second); throughput on large scripts is ~57k lines/s. CLI
158
+ time is dominated by CPython interpreter startup (~20 ms).
159
+
160
+ **v0.2.0 vs v0.1.0** (controlled before/after,
161
+ `python tools/bench_compare.py`: baseline wheel from PyPI vs this tree in
162
+ the same interpreter, 25–200 in-process repeats, outputs verified
163
+ identical on every workload):
164
+
165
+ | workload | v0.1.0 | v0.2.0 | improvement |
166
+ |---|---|---|---|
167
+ | tiny (1 line) | 0.061 ms | 0.037 ms | 1.6× |
168
+ | small (75 lines) | 2.62 ms | 1.30 ms | 2.0× |
169
+ | medium (263 lines) | 9.46 ms | 4.81 ms | 2.0× |
170
+ | large (1216 lines) | 48.6 ms | 21.3 ms | **2.3×** |
171
+
172
+ The v0.2.0 speedups came from caching the AST child/parent structure and a
173
+ document-order node table (one traversal instead of dozens), making
174
+ variable states immutable tuples so branch snapshots are plain dict
175
+ copies, a banded Levenshtein for SC2153 (fuzz-tested against the
176
+ reference implementation on 20,000 random pairs), and memoizing repeated
177
+ word/command resolution.
152
178
 
153
179
  ## Compatibility notes
154
180
 
File without changes
File without changes