PyPI - pureshellcheck - Versions diffs - 0.1.0__tar.gz → 0.2.0__tar.gz - Mend

pureshellcheck 0.1.0tar.gz → 0.2.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (24) hide show

{pureshellcheck-0.1.0/src/pureshellcheck.egg-info → pureshellcheck-0.2.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: pureshellcheck
-Version: 0.1.0
+Version: 0.2.0
 Summary: A pure Python reimplementation of ShellCheck's most common checks
 Author: adam2go
 License: MIT
@@ -54,10 +54,10 @@ rm -rf $BUILD_DIR/*
   `shellcheck-py` just download the 30 MB Haskell binary — useless in
   WASM, Lambda layers, or hermetic build sandboxes. pureshellcheck is
   ~7000 lines of stdlib-only Python.
-- **In-process speed.** Calling `pureshellcheck.check()` takes ~2 ms for a
-  typical script vs ~40 ms to spawn the shellcheck binary — and it's
-  8–13× faster than the binary even on 1200-line scripts (see
-  [Benchmarks](#benchmarks)).
+- **In-process speed.** Calling `pureshellcheck.check()` takes ~1.3 ms for
+  a typical script vs ~50 ms to spawn the shellcheck binary (38×), and is
+  ~33× faster than the binary even on 1200-line scripts; one-line snippets
+  check in ~40 µs (see [Benchmarks](#benchmarks)).
 - **Verified against the real thing.** Test cases are extracted from
   ShellCheck's own test suite and the output is differentially tested
   against the shellcheck binary on real-world scripts.
@@ -136,19 +136,45 @@ the bash AST if you want to build your own analyses.
 ## Benchmarks
-Measured with `python tools/bench.py` (median of 7 runs, after verifying
-both tools report identical findings on the workload; CPython 3.12,
-shellcheck 0.11.0, Apple Silicon):
+All numbers: CPython 3.12, shellcheck 0.11.0, Apple Silicon. Two
+experiments, each repeated in 3 independent sessions; medians shown
+(session-to-session spread was < 4% everywhere). In both experiments the
+findings are verified identical **before** any timing.
+**vs the shellcheck binary** (`python tools/bench.py`, median of 9 runs
+per session; both tools timed in the same session):
 | workload | shellcheck | pureshellcheck | speedup |
 |---|---|---|---|
-| CLI, brew.sh (1216 lines) | 604 ms | 68 ms | **8.9×** |
-| embedded `check()`, brew.sh | 604 ms | 45 ms | **13.3×** |
-| CLI, 75-line script | 42 ms | 24 ms | 1.8× |
-| embedded `check()`, 75-line script | 42 ms | 2.4 ms | **17×** |
+| embedded `check()`, brew.sh (1216 lines) | 720 ms | 21.7 ms | **33×** |
+| embedded `check()`, 263-line script | 113 ms | 5.1 ms | **22×** |
+| embedded `check()`, 75-line script | 51 ms | 1.3 ms | **38×** |
+| CLI end-to-end, brew.sh | 720 ms | 51 ms | **14×** |
+| CLI end-to-end, 75-line script | 51 ms | 28 ms | 1.8× |
 The embedded rows are what an agent or editor integration pays per call:
-no process spawn, no binary.
+no process spawn, no binary. A one-line snippet checks in **~40 µs**
+(~25,000 checks/second); throughput on large scripts is ~57k lines/s. CLI
+time is dominated by CPython interpreter startup (~20 ms).
+**v0.2.0 vs v0.1.0** (controlled before/after,
+`python tools/bench_compare.py`: baseline wheel from PyPI vs this tree in
+the same interpreter, 25–200 in-process repeats, outputs verified
+identical on every workload):
+| workload | v0.1.0 | v0.2.0 | improvement |
+|---|---|---|---|
+| tiny (1 line) | 0.061 ms | 0.037 ms | 1.6× |
+| small (75 lines) | 2.62 ms | 1.30 ms | 2.0× |
+| medium (263 lines) | 9.46 ms | 4.81 ms | 2.0× |
+| large (1216 lines) | 48.6 ms | 21.3 ms | **2.3×** |
+The v0.2.0 speedups came from caching the AST child/parent structure and a
+document-order node table (one traversal instead of dozens), making
+variable states immutable tuples so branch snapshots are plain dict
+copies, a banded Levenshtein for SC2153 (fuzz-tested against the
+reference implementation on 20,000 random pairs), and memoizing repeated
+word/command resolution.
 ## Compatibility notes

{pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/README.md RENAMED Viewed

@@ -27,10 +27,10 @@ rm -rf $BUILD_DIR/*
   `shellcheck-py` just download the 30 MB Haskell binary — useless in
   WASM, Lambda layers, or hermetic build sandboxes. pureshellcheck is
   ~7000 lines of stdlib-only Python.
-- **In-process speed.** Calling `pureshellcheck.check()` takes ~2 ms for a
-  typical script vs ~40 ms to spawn the shellcheck binary — and it's
-  8–13× faster than the binary even on 1200-line scripts (see
-  [Benchmarks](#benchmarks)).
+- **In-process speed.** Calling `pureshellcheck.check()` takes ~1.3 ms for
+  a typical script vs ~50 ms to spawn the shellcheck binary (38×), and is
+  ~33× faster than the binary even on 1200-line scripts; one-line snippets
+  check in ~40 µs (see [Benchmarks](#benchmarks)).
 - **Verified against the real thing.** Test cases are extracted from
   ShellCheck's own test suite and the output is differentially tested
   against the shellcheck binary on real-world scripts.
@@ -109,19 +109,45 @@ the bash AST if you want to build your own analyses.
 ## Benchmarks
-Measured with `python tools/bench.py` (median of 7 runs, after verifying
-both tools report identical findings on the workload; CPython 3.12,
-shellcheck 0.11.0, Apple Silicon):
+All numbers: CPython 3.12, shellcheck 0.11.0, Apple Silicon. Two
+experiments, each repeated in 3 independent sessions; medians shown
+(session-to-session spread was < 4% everywhere). In both experiments the
+findings are verified identical **before** any timing.
+**vs the shellcheck binary** (`python tools/bench.py`, median of 9 runs
+per session; both tools timed in the same session):
 | workload | shellcheck | pureshellcheck | speedup |
 |---|---|---|---|
-| CLI, brew.sh (1216 lines) | 604 ms | 68 ms | **8.9×** |
-| embedded `check()`, brew.sh | 604 ms | 45 ms | **13.3×** |
-| CLI, 75-line script | 42 ms | 24 ms | 1.8× |
-| embedded `check()`, 75-line script | 42 ms | 2.4 ms | **17×** |
+| embedded `check()`, brew.sh (1216 lines) | 720 ms | 21.7 ms | **33×** |
+| embedded `check()`, 263-line script | 113 ms | 5.1 ms | **22×** |
+| embedded `check()`, 75-line script | 51 ms | 1.3 ms | **38×** |
+| CLI end-to-end, brew.sh | 720 ms | 51 ms | **14×** |
+| CLI end-to-end, 75-line script | 51 ms | 28 ms | 1.8× |
 The embedded rows are what an agent or editor integration pays per call:
-no process spawn, no binary.
+no process spawn, no binary. A one-line snippet checks in **~40 µs**
+(~25,000 checks/second); throughput on large scripts is ~57k lines/s. CLI
+time is dominated by CPython interpreter startup (~20 ms).
+**v0.2.0 vs v0.1.0** (controlled before/after,
+`python tools/bench_compare.py`: baseline wheel from PyPI vs this tree in
+the same interpreter, 25–200 in-process repeats, outputs verified
+identical on every workload):
+| workload | v0.1.0 | v0.2.0 | improvement |
+|---|---|---|---|
+| tiny (1 line) | 0.061 ms | 0.037 ms | 1.6× |
+| small (75 lines) | 2.62 ms | 1.30 ms | 2.0× |
+| medium (263 lines) | 9.46 ms | 4.81 ms | 2.0× |
+| large (1216 lines) | 48.6 ms | 21.3 ms | **2.3×** |
+The v0.2.0 speedups came from caching the AST child/parent structure and a
+document-order node table (one traversal instead of dozens), making
+variable states immutable tuples so branch snapshots are plain dict
+copies, a banded Levenshtein for SC2153 (fuzz-tested against the
+reference implementation on 20,000 random pairs), and memoizing repeated
+word/command resolution.
 ## Compatibility notes

{pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "pureshellcheck"
-version = "0.1.0"
+version = "0.2.0"
 description = "A pure Python reimplementation of ShellCheck's most common checks"
 readme = "README.md"
 requires-python = ">=3.9"

{pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/__init__.py RENAMED Viewed

@@ -6,7 +6,7 @@ common checks.
 ...     print(finding.line, finding.column, finding.code, finding.message)
 """
-__version__ = "0.1.0"
+__version__ = "0.2.0"
 from .analyzer import Finding, run_checks  # noqa: F401
 from .parser import ParseError, parse  # noqa: F401

{pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/analyzer.py RENAMED Viewed

@@ -81,6 +81,7 @@ class Context:
         self.positions = Positions(source)
         self.findings = []
         self.cache = {}
+        self.nodes = None  # doc-order node list, set by run_checks
     # -- emission ------------------------------------------------------
@@ -125,6 +126,14 @@ class Context:
     def command_resolution(self, cmd):
         """(name_word, index, wrapper_names) after skipping wrappers."""
+        cached = cmd.fields.get("_cmdres")
+        if cached is not None:
+            return cached
+        result = self._command_resolution(cmd)
+        cmd.fields["_cmdres"] = result
+        return result
+    def _command_resolution(self, cmd):
         if cmd.kind != "T_SimpleCommand" or not cmd.words:
             return None, -1, []
         words = cmd.words
@@ -279,13 +288,24 @@ class Context:
         return "other", None
-def statement_lists(root):
+STATEMENT_CONTAINER_KINDS = frozenset({
+    "T_Script", "T_BraceGroup", "T_Subshell", "T_WhileExpression",
+    "T_UntilExpression", "T_ForIn", "T_ForArithmetic", "T_SelectIn",
+    "T_IfExpression", "T_CaseItem", "T_DollarExpansion", "T_Backticked",
+    "T_ProcSub", "T_DollarBraceCommandExpansion", "T_CoProc", "T_BatsTest",
+})
+def statement_lists(nodes):
     """Yield every list of statement nodes in the tree."""
-    for node in walk(root):
+    container = STATEMENT_CONTAINER_KINDS
+    for node in nodes:
+        if node.kind not in container:
+            continue
         f = node.fields
-        for key in ("commands", "body", "condition", "else_body", "init"):
+        for key in ("commands", "body", "condition", "else_body"):
             v = f.get(key)
-            if isinstance(v, list) and v and isinstance(v[0], object):
+            if type(v) is list and v:
                 yield v
         branches = f.get("branches")
         if branches:
@@ -294,14 +314,14 @@ def statement_lists(root):
                 yield body
-def apply_directives(findings, directives, root, source, positions):
+def apply_directives(findings, directives, nodes, source, positions):
     """Filter findings according to `# shellcheck disable=` directives."""
     if not directives:
         return findings
     # statements eligible as directive targets, sorted by position
     statements = []
     seen = set()
-    for lst in statement_lists(root):
+    for lst in statement_lists(nodes):
         for node in lst:
             if id(node) not in seen:
                 seen.add(id(node))
@@ -372,7 +392,7 @@ def run_checks(source, shell=None, include_optional=False,
                     min(e.pos + 1, len(source)))
         f.locate(Positions(source))
         return [f], e
-    set_parents(root)
+    nodes = set_parents(root)
     detected = shell_from_shebang(root.get("shebang"))
     directives = parser.directives
@@ -386,16 +406,18 @@ def run_checks(source, shell=None, include_optional=False,
     ctx.detected_shell = detected
     ctx.explicit_shell = shell
     ctx.directives = directives
+    ctx.nodes = nodes
-    for node in walk(root):
-        fns = NODE_CHECKS.get(node.kind)
+    get_checks = NODE_CHECKS.get
+    for node in nodes:
+        fns = get_checks(node.kind)
         if fns:
             for fn in fns:
                 fn(ctx, node)
     for fn in TREE_CHECKS:
         fn(ctx, root)
-    findings = apply_directives(ctx.findings, directives, root, source,
+    findings = apply_directives(ctx.findings, directives, nodes, source,
                                 ctx.positions)
     findings.sort(key=lambda f: (f.pos, f.code))
     for f in findings:

{pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/astlib.py RENAMED Viewed

@@ -1,6 +1,7 @@
 """Shared AST analysis helpers, ported from ShellCheck's ASTLib semantics."""
 import re
+from functools import lru_cache
 from .shast import ancestors, walk
 from .parser import literal_text, quoted_literal_text
@@ -108,6 +109,7 @@ def is_glob_free_literal(text):
 # ----------------------------------------------------------------------
 # ${...} decomposition
+@lru_cache(maxsize=4096)
 def braced_reference(content):
     """The variable name referenced by ${content}."""
     s = content
@@ -119,6 +121,7 @@ def braced_reference(content):
     return m.group(0) if m else s
+@lru_cache(maxsize=4096)
 def braced_modifier(content):
     """Everything after the name/indices in ${content}."""
     s = content

{pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/checks/commands.py RENAMED Viewed

@@ -277,7 +277,7 @@ def has_set_e(ctx):
     if SET_E_SHEBANG_RE.search(shebang):
         result = True
     else:
-        for node in walk(ctx.root):
+        for node in ctx.nodes:
             if node.kind != "T_SimpleCommand" or not node.words:
                 continue
             if first_word_basename(node) != "set":

{pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/checks/misc.py RENAMED Viewed

@@ -94,7 +94,7 @@ def _has_execfail(ctx):
         if ctx.shell in ("sh", "dash", "ash"):
             ctx.cache["execfail"] = False
             return False
-        for n in walk(ctx.root):
+        for n in ctx.nodes:
             if n.kind == "T_SimpleCommand" and \
                     first_word_basename(n) == "shopt":
                 if any(word_approx(w) == "execfail" for w in n.words[1:]):

{pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/checks/variables.py RENAMED Viewed

@@ -54,7 +54,7 @@ COMMON_COMMANDS_HINT = frozenset({
 def get_varscan(ctx):
     scan = ctx.cache.get("varscan")
     if scan is None:
-        scan = VarScan(ctx.root, ctx.shell)
+        scan = VarScan(ctx.root, ctx.shell, nodes=ctx.nodes)
         ctx.cache["varscan"] = scan
     return scan
@@ -196,21 +196,21 @@ def check_arithmetic_deref(ctx, node):
               " variables.")
-@tree_check
-def check_assignment_index_deref(ctx, root):
+@node_check("T_Assignment")
+def check_assignment_index_deref(ctx, node):
     # a[$i]=foo for indexed arrays
+    indices = node.get("indices")
+    if not indices:
+        return
     scan = get_varscan(ctx)
-    for node in walk(root):
-        if node.kind != "T_Assignment":
-            continue
-        if node.name in scan.assoc_arrays:
-            continue
-        for idx in node.get("indices", ()):
-            if isinstance(idx, str) \
-                    and re.fullmatch(r"\$\{?[A-Za-z_][A-Za-z0-9_]*\}?",
-                                     idx.strip()):
-                ctx.style(node, 2004, "$/${} is unnecessary on arithmetic"
-                          " variables.")
+    if node.name in scan.assoc_arrays:
+        return
+    for idx in indices:
+        if isinstance(idx, str) \
+                and re.fullmatch(r"\$\{?[A-Za-z_][A-Za-z0-9_]*\}?",
+                                 idx.strip()):
+            ctx.style(node, 2004, "$/${} is unnecessary on arithmetic"
+                      " variables.")
 def _in_let(node):

{pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/parser.py RENAMED Viewed

@@ -1023,7 +1023,15 @@ class Parser:
         start = self.i
         parts = []
         src, n = self.src, self.n
+        run_match = UNQUOTED_RUN.match
         while self.i < n:
+            m = run_match(src, self.i)
+            if m is not None:
+                parts.append(Node("T_Literal", self.i, m.end(),
+                                  text=m.group(0)))
+                self.i = m.end()
+                if self.i >= n:
+                    break
             c = src[self.i]
             if c in METACHARS or c in stop_chars:
                 if c in "<>" and src[self.i + 1:self.i + 2] == "(" \
@@ -1917,13 +1925,21 @@ def literal_text(word):
     """The literal string of a word made only of literal parts, else None."""
     if word is None or word.kind != "T_NormalWord":
         return None
+    fields = word.fields
+    cached = fields.get("_lit", False)
+    if cached is not False:
+        return cached
     out = []
+    result = None
     for p in word.parts:
         if p.kind == "T_Literal":
             out.append(p.text)
         else:
-            return None
-    return "".join(out)
+            break
+    else:
+        result = "".join(out)
+    fields["_lit"] = result
+    return result
 def heredoc_delimiter(word):

{pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/shast.py RENAMED Viewed

@@ -13,7 +13,7 @@ import bisect
 class Node:
-    __slots__ = ("kind", "pos", "end", "parent", "fields")
+    __slots__ = ("kind", "pos", "end", "parent", "fields", "children")
     def __init__(self, kind, pos, end, **fields):
         self.kind = kind
@@ -21,6 +21,7 @@ class Node:
         self.end = end
         self.parent = None
         self.fields = fields
+        self.children = None  # filled by set_parents
     def __getattr__(self, name):
         try:
@@ -70,18 +71,56 @@ def iter_children(node):
 def walk(node):
     """Yield node and all descendants in document order."""
     stack = [node]
+    pop = stack.pop
     while stack:
-        n = stack.pop()
+        n = pop()
         yield n
-        children = list(iter_children(n))
-        children.reverse()
-        stack.extend(children)
+        children = n.children
+        if children is None:
+            children = list(iter_children(n))
+        if children:
+            stack.extend(reversed(children))
 def set_parents(root):
-    for n in walk(root):
-        for c in iter_children(n):
+    """Link parents, cache children, and return all nodes in doc order."""
+    nodes = []
+    append = nodes.append
+    stack = [root]
+    pop = stack.pop
+    while stack:
+        n = pop()
+        append(n)
+        children = []
+        add = children.append
+        for value in n.fields.values():
+            tv = type(value)
+            if tv is Node:
+                add(value)
+            elif tv is list:
+                for item in value:
+                    ti = type(item)
+                    if ti is Node:
+                        add(item)
+                    elif ti is list or ti is tuple:
+                        for sub in item:
+                            ts = type(sub)
+                            if ts is Node:
+                                add(sub)
+                            elif ts is list:
+                                for s2 in sub:
+                                    if type(s2) is Node:
+                                        add(s2)
+            elif tv is tuple:
+                for item in value:
+                    if type(item) is Node:
+                        add(item)
+        n.children = children
+        for c in children:
             c.parent = n
+        if children:
+            stack.extend(reversed(children))
+    return nodes
 def ancestors(node):

{pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/varflow.py RENAMED Viewed

@@ -22,12 +22,9 @@ EXIT_COMMANDS = {"exit", "return"}
 DECLARING_COMMANDS = {"declare", "typeset", "local", "export", "readonly"}
-class VarInfo:
-    __slots__ = ("status", "integer")
-    def __init__(self, status, integer=False):
-        self.status = status
-        self.integer = integer
+def VarInfo(status, integer=False):
+    """Variable state as an immutable (status, integer) tuple."""
+    return (status, integer)
 class Scope:
@@ -83,21 +80,20 @@ class VarFlow:
                     break
         old = scope.vars.get(name)
         if integer is None:
-            integer = old.integer if old is not None else False
+            integer = old[1] if old is not None else False
         elif self.conditional_depth:
             # attribute only maybe applied: keep the weaker assumption
-            integer = integer and (old.integer if old is not None else False)
+            integer = integer and (old[1] if old is not None else False)
         if integer and status == DIRTY:
             status = CLEAN
         if self.conditional_depth and old is not None:
-            status = merge_status(old.status, status)
+            status = merge_status(old[0], status)
         elif self.conditional_depth and old is None:
             status = DIRTY
-        scope.vars[name] = VarInfo(status, integer)
+        scope.vars[name] = (status, integer)
     def snapshot(self):
-        return [dict((k, VarInfo(v.status, v.integer))
-                     for k, v in s.vars.items()) for s in self.scopes]
+        return [dict(s.vars) for s in self.scopes]
     def restore(self, snap):
         for scope, vars_ in zip(self.scopes, snap):
@@ -107,20 +103,41 @@ class VarFlow:
         """Merge variable states from multiple branches (worst case)."""
         merged = []
         for level in range(len(self.scopes)):
+            dicts = [snap[level] for snap in snaps]
+            if len(dicts) == 2:
+                d1, d2 = dicts
+                if d1 == d2:
+                    merged.append(dict(d1))
+                    continue
+                vars_ = {}
+                for name, i1 in d1.items():
+                    i2 = d2.get(name)
+                    if i2 is None:
+                        vars_[name] = (merge_status(i1[0], DIRTY), False)
+                    elif i1 == i2:
+                        vars_[name] = i1
+                    else:
+                        vars_[name] = (merge_status(i1[0], i2[0]),
+                                       i1[1] and i2[1])
+                for name, i2 in d2.items():
+                    if name not in d1:
+                        vars_[name] = (merge_status(i2[0], DIRTY), False)
+                merged.append(vars_)
+                continue
             allnames = set()
-            for snap in snaps:
-                allnames.update(snap[level])
+            for d in dicts:
+                allnames.update(d)
             vars_ = {}
             for name in allnames:
-                infos = [snap[level].get(name) for snap in snaps]
                 status = None
                 integer = True
-                for info in infos:
-                    s = info.status if info is not None else DIRTY
-                    i = info.integer if info is not None else False
+                for d in dicts:
+                    info = d.get(name)
+                    s = info[0] if info is not None else DIRTY
+                    i = info[1] if info is not None else False
                     status = s if status is None else merge_status(status, s)
                     integer = integer and i
-                vars_[name] = VarInfo(status, integer)
+                vars_[name] = (status, integer)
             merged.append(vars_)
         self.restore(merged)
@@ -136,9 +153,9 @@ class VarFlow:
         info = self.lookup(name)
         if info is None:
             return DIRTY, False
-        if info.integer:
+        if info[1]:
             return CLEAN, True
-        return info.status, False
+        return info[0], False
     def word_status(self, word):
         """SpaceStatus of a word's value (assignment RHS semantics)."""
@@ -249,7 +266,7 @@ class VarFlow:
             if assign.get("append"):
                 old = self.lookup(assign.name)
                 if old is not None:
-                    status = join_status(old.status, status)
+                    status = join_status(old[0], status)
             self.assign(assign.name, status, integer=integer,
                         local=is_local, global_="g" in flags)
             self.on_assign(assign.name, value, assign)
@@ -272,7 +289,7 @@ class VarFlow:
                         self.assign(t, DIRTY, integer=integer)
                     elif integer is not None:
                         old = self.lookup(t)
-                        self.assign(t, old.status if old else EMPTY,
+                        self.assign(t, old[0] if old else EMPTY,
                                     integer=integer)
         elif cmd_name == "read":
             self._apply_read(node)
@@ -514,6 +531,21 @@ class VarFlow:
         if node is None or isinstance(node, str):
             return
         k = node.kind
+        if k == "T_Literal" or k == "T_SingleQuoted" or k == "T_Glob":
+            return
+        if k == "T_NormalWord":
+            for p in node.parts:
+                kp = p.kind
+                if kp != "T_Literal" and kp != "T_SingleQuoted" \
+                        and kp != "T_Glob":
+                    self.visit_word(p)
+            return
+        if k == "T_DoubleQuoted":
+            for p in node.parts:
+                kp = p.kind
+                if kp != "T_Literal":
+                    self.visit_word(p)
+            return
         if k == "T_DollarBraced":
             name = braced_reference(node.content)
             status, integer = self.ref_status(name)
@@ -544,8 +576,11 @@ class VarFlow:
             self.visit_word(op)
             self._apply_redirect_assign(node)
             return
-        from .shast import iter_children
-        for c in iter_children(node):
+        children = node.children
+        if children is None:
+            from .shast import iter_children
+            children = iter_children(node)
+        for c in children:
             self.visit_word(c)
     def visit_arith(self, node):
@@ -576,8 +611,11 @@ class VarFlow:
             for p in node.parts:
                 self.visit_word(p)
             return
-        from .shast import iter_children
-        for c in iter_children(node):
+        children = node.children
+        if children is None:
+            from .shast import iter_children
+            children = iter_children(node)
+        for c in children:
             if c.kind.startswith("TA_"):
                 self.visit_arith(c)
             else:

{pureshellcheck-0.1.0 → pureshellcheck-0.2.0}/src/pureshellcheck/varscan.py RENAMED Viewed

@@ -43,19 +43,20 @@ class Assign:
 class VarScan:
-    def __init__(self, root, shell="bash"):
+    def __init__(self, root, shell="bash", nodes=None):
         self.refs = []
         self.assigns = []
         self.assoc_arrays = set()
         self._suppressed = set()
         self.root = root
         self.shell = shell
+        self.nodes = nodes if nodes is not None else list(walk(root))
         self._prescan_assoc()
         self._scan()
     def _prescan_assoc(self):
         """Find `declare -A name` declarations before the main scan."""
-        for node in walk(self.root):
+        for node in self.nodes:
             if node.kind != "T_SimpleCommand" or not node.words:
                 continue
             cmd = literal_text(node.words[0])
@@ -73,12 +74,19 @@ class VarScan:
     # ------------------------------------------------------------------
+    _METHOD_CACHE = {}
     def _scan(self):
-        for node in walk(self.root):
+        methods = self._METHOD_CACHE
+        cls = type(self)
+        for node in self.nodes:
             k = node.kind
-            method = getattr(self, "_scan_" + k, None)
+            method = methods.get(k, False)
+            if method is False:
+                method = getattr(cls, "_scan_" + k, None)
+                methods[k] = method
             if method is not None:
-                method(node)
+                method(self, node)
     def _ref(self, name, node, kind="normal"):
         self.refs.append(Ref(name, node, kind))
@@ -492,20 +500,35 @@ def _flag_arg_attached(flag):
 def levenshtein(a, b, cap=3):
-    """Edit distance, returning `cap` early once it can't be beaten."""
+    """Banded edit distance: O(len * cap), returns `cap` once unbeatable."""
     if a == b:
         return 0
-    if abs(len(a) - len(b)) >= cap:
+    la, lb = len(a), len(b)
+    if abs(la - lb) >= cap:
         return cap
-    if len(a) > len(b):
-        a, b = b, a
-    prev = list(range(len(a) + 1))
-    for i, cb in enumerate(b):
-        cur = [i + 1]
-        for j, ca in enumerate(a):
-            cur.append(min(prev[j + 1] + 1, cur[j] + 1,
-                           prev[j] + (ca != cb)))
-        if min(cur) >= cap:
+    if la > lb:
+        a, b, la, lb = b, a, lb, la
+    if la == 0:
+        return lb if lb < cap else cap
+    band = cap - 1
+    INF = cap + 1
+    prev = [j if j <= band else INF for j in range(la + 1)]
+    for i in range(1, lb + 1):
+        cb = b[i - 1]
+        lo = max(1, i - band)
+        hi = min(la, i + band)
+        cur = [INF] * (la + 1)
+        if lo == 1:
+            cur[0] = i if i <= band else INF
+        best = INF
+        for j in range(lo, hi + 1):
+            c = min(prev[j] + 1, cur[j - 1] + 1,
+                    prev[j - 1] + (a[j - 1] != cb))
+            cur[j] = c
+            if c < best:
+                best = c
+        if best >= cap:
             return cap
         prev = cur
-    return prev[-1]
+    d = prev[la]
+    return d if d < cap else cap

{pureshellcheck-0.1.0 → pureshellcheck-0.2.0/src/pureshellcheck.egg-info}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: pureshellcheck
-Version: 0.1.0
+Version: 0.2.0
 Summary: A pure Python reimplementation of ShellCheck's most common checks
 Author: adam2go
 License: MIT
@@ -54,10 +54,10 @@ rm -rf $BUILD_DIR/*
   `shellcheck-py` just download the 30 MB Haskell binary — useless in
   WASM, Lambda layers, or hermetic build sandboxes. pureshellcheck is
   ~7000 lines of stdlib-only Python.
-- **In-process speed.** Calling `pureshellcheck.check()` takes ~2 ms for a
-  typical script vs ~40 ms to spawn the shellcheck binary — and it's
-  8–13× faster than the binary even on 1200-line scripts (see
-  [Benchmarks](#benchmarks)).
+- **In-process speed.** Calling `pureshellcheck.check()` takes ~1.3 ms for
+  a typical script vs ~50 ms to spawn the shellcheck binary (38×), and is
+  ~33× faster than the binary even on 1200-line scripts; one-line snippets
+  check in ~40 µs (see [Benchmarks](#benchmarks)).
 - **Verified against the real thing.** Test cases are extracted from
   ShellCheck's own test suite and the output is differentially tested
   against the shellcheck binary on real-world scripts.
@@ -136,19 +136,45 @@ the bash AST if you want to build your own analyses.
 ## Benchmarks
-Measured with `python tools/bench.py` (median of 7 runs, after verifying
-both tools report identical findings on the workload; CPython 3.12,
-shellcheck 0.11.0, Apple Silicon):
+All numbers: CPython 3.12, shellcheck 0.11.0, Apple Silicon. Two
+experiments, each repeated in 3 independent sessions; medians shown
+(session-to-session spread was < 4% everywhere). In both experiments the
+findings are verified identical **before** any timing.
+**vs the shellcheck binary** (`python tools/bench.py`, median of 9 runs
+per session; both tools timed in the same session):
 | workload | shellcheck | pureshellcheck | speedup |
 |---|---|---|---|
-| CLI, brew.sh (1216 lines) | 604 ms | 68 ms | **8.9×** |
-| embedded `check()`, brew.sh | 604 ms | 45 ms | **13.3×** |
-| CLI, 75-line script | 42 ms | 24 ms | 1.8× |
-| embedded `check()`, 75-line script | 42 ms | 2.4 ms | **17×** |
+| embedded `check()`, brew.sh (1216 lines) | 720 ms | 21.7 ms | **33×** |
+| embedded `check()`, 263-line script | 113 ms | 5.1 ms | **22×** |
+| embedded `check()`, 75-line script | 51 ms | 1.3 ms | **38×** |
+| CLI end-to-end, brew.sh | 720 ms | 51 ms | **14×** |
+| CLI end-to-end, 75-line script | 51 ms | 28 ms | 1.8× |
 The embedded rows are what an agent or editor integration pays per call:
-no process spawn, no binary.
+no process spawn, no binary. A one-line snippet checks in **~40 µs**
+(~25,000 checks/second); throughput on large scripts is ~57k lines/s. CLI
+time is dominated by CPython interpreter startup (~20 ms).
+**v0.2.0 vs v0.1.0** (controlled before/after,
+`python tools/bench_compare.py`: baseline wheel from PyPI vs this tree in
+the same interpreter, 25–200 in-process repeats, outputs verified
+identical on every workload):
+| workload | v0.1.0 | v0.2.0 | improvement |
+|---|---|---|---|
+| tiny (1 line) | 0.061 ms | 0.037 ms | 1.6× |
+| small (75 lines) | 2.62 ms | 1.30 ms | 2.0× |
+| medium (263 lines) | 9.46 ms | 4.81 ms | 2.0× |
+| large (1216 lines) | 48.6 ms | 21.3 ms | **2.3×** |
+The v0.2.0 speedups came from caching the AST child/parent structure and a
+document-order node table (one traversal instead of dozens), making
+variable states immutable tuples so branch snapshots are plain dict
+copies, a banded Levenshtein for SC2153 (fuzz-tested against the
+reference implementation on 20,000 random pairs), and memoizing repeated
+word/command resolution.
 ## Compatibility notes