PyPI - outliner-cli - Versions diffs - 0.3.0__tar.gz → 0.4.0__tar.gz - Mend

outliner-cli 0.3.0tar.gz → 0.4.0tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (62) hide show

{outliner_cli-0.3.0 → outliner_cli-0.4.0}/PKG-INFO RENAMED Viewed

@@ -1,6 +1,6 @@
 Metadata-Version: 2.4
 Name: outliner-cli
-Version: 0.3.0
+Version: 0.4.0
 Summary: Print the structural outline of source files for LLM navigation
 Author: Per Cederberg
 License-Expression: MIT
@@ -13,7 +13,7 @@ Dynamic: license-file
 # outliner
-Print the structural outline of source files — useful declarations and callable
+Print the structural outline of source files — declarations and callable
 landmarks with line ranges — so an LLM agent (or human) can navigate a file
 without reading it whole.
@@ -23,17 +23,20 @@ without reading it whole.
 outliner-cli [OPTIONS] [FILE...]
 ```
-| Option              | Description                                                                   |
-| ------------------- | ----------------------------------------------------------------------------- |
-| `-g, --grep EXPR`   | Only show items whose signature matches EXPR (case-insensitive)               |
-| `-s, --syntax LANG` | Override syntax auto-detection when it is ambiguous                           |
-| `-t, --type LANG`   | Only include files of this language (repeatable, accepts name or extension)   |
-| `-w, --width COLS`  | Truncate output lines to COLS (`0`=unlimited, `auto`=terminal, default=`120`) |
+| Option               | Description                                          |
+| -------------------- | ---------------------------------------------------- |
+| `-g, --grep EXPR`    | Only show items whose signature matches EXPR         |
+| `-s, --syntax LANG`  | Override syntax auto-detection when ambiguous        |
+| `-t, --type LANG`    | Only include files of this language (repeatable)     |
+| `-w, --width COLS`   | Truncate lines (`0`=off, `auto`=fit, default `120`)  |
+| `-x, --exclude GLOB` | Exclude files from directory walks (gitignore-style) |
 Pass a file, a directory (walked recursively), or omit arguments to read stdin.
-Use `-` to read stdin explicitly. `--syntax` is only needed when content
-auto-detection cannot identify the language (e.g. an ambiguous extensionless
-script piped on stdin).
+Use `-` to read stdin explicitly. Directory walks honor `.gitignore` and skip
+hidden directories; all other files are listed, with binary and unrecognized
+files shown as one-line `binary file` / `unsupported file` summaries. `--syntax`
+is only needed when content auto-detection cannot identify the language (e.g. an
+ambiguous extensionless script piped on stdin).
 ## Output
@@ -47,7 +50,7 @@ Each line: `<start>,<count>  <signature>`
 - `start` — 1-based line number, right-aligned
 - `count` — number of lines covered by the item (including doc-comments above)
-- `signature` — first non-comment line of the declaration; multi-line signatures
+- `signature` — first non-comment line of a declaration; multi-line signatures
   are merged into one line; lines longer than the output width are truncated
   with `...`
@@ -80,13 +83,13 @@ uv run pytest
 ## Supported Languages
-AsciiDoc, C/C++, C#, Clojure, Go, HTML, Java, JavaScript/TypeScript,
-JSON/NDJSON, Markdown, Org-mode, Perl, PHP, Python, reStructuredText, Ruby,
-Rust, Scala, Shell, Swift, XML, and Zig.
+AsciiDoc, C/C++, C#, Clojure, Go, HTML, Java, JavaScript/TypeScript (incl.
+Svelte, Vue, and Astro components), JSON/NDJSON, Markdown, Org-mode, Perl, PHP,
+Python, reStructuredText, Ruby, Rust, Scala, Shell, Swift, XML, and Zig.
 ## Example Use Cases
-**Structural overview** — Run on a directory to see all declarations across many
+**Structural overview** — Run on a directory to see declarations across all
 files before reading anything:
 ```
@@ -159,6 +162,6 @@ $ uvx outliner-cli pubmed26n0001.xml
     <MedlineCitation>            elem
       @Status                    attr -- "MEDLINE"
       <Article>                  elem
-        <ArticleTitle>           text -- "Formate assay in body fluids: applica..."
+        <ArticleTitle>           text -- "Formate assay in body fluids..."
         <Abstract>               elem?
 ```

{outliner_cli-0.3.0 → outliner_cli-0.4.0}/README.md RENAMED Viewed

@@ -1,6 +1,6 @@
 # outliner
-Print the structural outline of source files — useful declarations and callable
+Print the structural outline of source files — declarations and callable
 landmarks with line ranges — so an LLM agent (or human) can navigate a file
 without reading it whole.
@@ -10,17 +10,20 @@ without reading it whole.
 outliner-cli [OPTIONS] [FILE...]
 ```
-| Option              | Description                                                                   |
-| ------------------- | ----------------------------------------------------------------------------- |
-| `-g, --grep EXPR`   | Only show items whose signature matches EXPR (case-insensitive)               |
-| `-s, --syntax LANG` | Override syntax auto-detection when it is ambiguous                           |
-| `-t, --type LANG`   | Only include files of this language (repeatable, accepts name or extension)   |
-| `-w, --width COLS`  | Truncate output lines to COLS (`0`=unlimited, `auto`=terminal, default=`120`) |
+| Option               | Description                                          |
+| -------------------- | ---------------------------------------------------- |
+| `-g, --grep EXPR`    | Only show items whose signature matches EXPR         |
+| `-s, --syntax LANG`  | Override syntax auto-detection when ambiguous        |
+| `-t, --type LANG`    | Only include files of this language (repeatable)     |
+| `-w, --width COLS`   | Truncate lines (`0`=off, `auto`=fit, default `120`)  |
+| `-x, --exclude GLOB` | Exclude files from directory walks (gitignore-style) |
 Pass a file, a directory (walked recursively), or omit arguments to read stdin.
-Use `-` to read stdin explicitly. `--syntax` is only needed when content
-auto-detection cannot identify the language (e.g. an ambiguous extensionless
-script piped on stdin).
+Use `-` to read stdin explicitly. Directory walks honor `.gitignore` and skip
+hidden directories; all other files are listed, with binary and unrecognized
+files shown as one-line `binary file` / `unsupported file` summaries. `--syntax`
+is only needed when content auto-detection cannot identify the language (e.g. an
+ambiguous extensionless script piped on stdin).
 ## Output
@@ -34,7 +37,7 @@ Each line: `<start>,<count>  <signature>`
 - `start` — 1-based line number, right-aligned
 - `count` — number of lines covered by the item (including doc-comments above)
-- `signature` — first non-comment line of the declaration; multi-line signatures
+- `signature` — first non-comment line of a declaration; multi-line signatures
   are merged into one line; lines longer than the output width are truncated
   with `...`
@@ -67,13 +70,13 @@ uv run pytest
 ## Supported Languages
-AsciiDoc, C/C++, C#, Clojure, Go, HTML, Java, JavaScript/TypeScript,
-JSON/NDJSON, Markdown, Org-mode, Perl, PHP, Python, reStructuredText, Ruby,
-Rust, Scala, Shell, Swift, XML, and Zig.
+AsciiDoc, C/C++, C#, Clojure, Go, HTML, Java, JavaScript/TypeScript (incl.
+Svelte, Vue, and Astro components), JSON/NDJSON, Markdown, Org-mode, Perl, PHP,
+Python, reStructuredText, Ruby, Rust, Scala, Shell, Swift, XML, and Zig.
 ## Example Use Cases
-**Structural overview** — Run on a directory to see all declarations across many
+**Structural overview** — Run on a directory to see declarations across all
 files before reading anything:
 ```
@@ -146,6 +149,6 @@ $ uvx outliner-cli pubmed26n0001.xml
     <MedlineCitation>            elem
       @Status                    attr -- "MEDLINE"
       <Article>                  elem
-        <ArticleTitle>           text -- "Formate assay in body fluids: applica..."
+        <ArticleTitle>           text -- "Formate assay in body fluids..."
         <Abstract>               elem?
 ```

{outliner_cli-0.3.0 → outliner_cli-0.4.0}/pyproject.toml RENAMED Viewed

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "outliner-cli"
-version = "0.3.0"
+version = "0.4.0"
 description = "Print the structural outline of source files for LLM navigation"
 authors = [{name = "Per Cederberg"}]
 license = "MIT"

{outliner_cli-0.3.0 → outliner_cli-0.4.0}/src/outliner/cli.py RENAMED Viewed

@@ -8,6 +8,7 @@ import shutil
 import sys
 from outliner.parsers import NAMES, EXTENSIONS, detect, outline, syntax
+from outliner.parsers.util import format_count, format_size
 from outliner.types import OutlineItem
 _TEXT_CONTROLS = "\n\r\t\f\b"
@@ -56,22 +57,27 @@ def _is_ignored(name: str, root: str, gi: dict[str, list[str]], is_dir: bool) ->
     return False
-def _expand_sources(sources: list[str], types: set[str] | None = None) -> list[str]:
+def _expand_sources(
+    sources: list[str],
+    types: set[str] | None = None,
+    excludes: list[str] | None = None,
+) -> list[str]:
     result = []
     for src in sources:
         if src == "-" or not os.path.isdir(src):
             result.append(src)
             continue
-        gi: dict[str, list[str]] = {}
+        # CLI excludes behave like a .gitignore in the walk root
+        gi: dict[str, list[str]] = {os.path.normpath(src): list(excludes)} if excludes else {}
         for root, dirs, files in os.walk(src):
             pats = _load_gitignore(root)
             if pats:
-                gi[root] = pats
-            dirs[:] = sorted(d for d in dirs if not _is_ignored(d, root, gi, True))
+                gi[root] = gi.get(root, []) + pats
+            dirs[:] = sorted(d for d in dirs
+                             if not d.startswith(".") and not _is_ignored(d, root, gi, True))
             for name in sorted(files):
-                match = guess_syntax(name)
-                supported = match and (not types or match in types)
-                if supported and not _is_ignored(name, root, gi, False):
+                wanted = not types or guess_syntax(name) in types
+                if wanted and not _is_ignored(name, root, gi, False):
                     result.append(os.path.join(root, name))
     return result
@@ -96,32 +102,40 @@ def _looks_binary(head: str) -> bool:
     return False
-def _format_size(size_bytes: int) -> str:
-    if size_bytes >= 1_000_000_000:
-        return f"{size_bytes / 1_000_000_000:.1f} GB"
-    if size_bytes >= 1_000_000:
-        return f"{size_bytes / 1_000_000:.1f} MB"
-    if size_bytes >= 1_000:
-        return f"{size_bytes / 1_000:.1f} KB"
-    return f"{size_bytes} B"
+def _unsupported_items(size: int, line_count: int) -> list[OutlineItem]:
+    plural = "s" if line_count != 1 else ""
+    sig = f"{format_size(size)} \u00b7 {format_count(line_count)} line{plural}"
+    return [OutlineItem(locator="unsupported file", signature=sig)]
-def _outline_source(src: str, selected: str | None) -> tuple[list[OutlineItem] | None, str | None]:
+def _outline_source(src: str, selected: str | None) -> tuple[list[OutlineItem] | None, str]:
     if src == "-":
         if selected:
             return outline(selected, sys.stdin), selected
-        text = sys.stdin.read()
-        match = selected or detect(text)
-        return (outline(match, text) if match else None), match
-    with open(src, encoding="utf-8", errors="replace") as fh:
+        text = sys.stdin.read().removeprefix("\ufeff")
+        match = detect(text)
+        if match:
+            return outline(match, text), match
+        if not text.strip():
+            return [], "unsupported"
+        return _unsupported_items(len(text), len(text.splitlines())), "unsupported"
+    with open(src, encoding="utf-8-sig", errors="replace") as fh:
         head = fh.read(4096)
         if _looks_binary(head):
-            size = _format_size(os.path.getsize(src))
+            size = format_size(os.path.getsize(src))
             return [OutlineItem(locator="binary file", signature=size)], "binary"
         match = selected or guess_syntax(src) or detect(head)
-        fh.seek(0)
-        return (outline(match, fh) if match else None), match
+        if match:
+            fh.seek(0)
+            return outline(match, fh), match
+        line_count, tail = head.count("\n"), head
+        while chunk := fh.read(1 << 20):
+            line_count += chunk.count("\n")
+            tail = chunk
+        if tail and not tail.endswith("\n"):
+            line_count += 1
+        return _unsupported_items(os.path.getsize(src), line_count), "unsupported"
 def main(argv: list[str] | None = None) -> int:
@@ -139,6 +153,8 @@ def main(argv: list[str] | None = None) -> int:
                     help="Only include files of this language or extension (repeatable)")
     ap.add_argument("-w", "--width", metavar="COLS", default="120",
                     help="Truncate output lines to COLS (0=unlimited, auto=terminal width, default=120)")
+    ap.add_argument("-x", "--exclude", action="append", metavar="PATTERN",
+                    help="Exclude matching files from directory walks, like .gitignore (repeatable)")
     args = ap.parse_args(argv)
     grep_re: re.Pattern | None = None
@@ -178,31 +194,18 @@ def main(argv: list[str] | None = None) -> int:
     if sources == ["-"] and sys.stdin.isatty():
         ap.print_help()
         return 0
-    sources = _expand_sources(sources, types)
+    sources = _expand_sources(sources, types, args.exclude)
     multi = len(sources) > 1
     exit_code = 0
     for src in sources:
         try:
-            items, match = _outline_source(src, args.syntax)
+            items, _ = _outline_source(src, args.syntax)
         except OSError as exc:
             print(f"outliner: {exc}", file=sys.stderr)
             exit_code = 1
             continue
-        if match is None:
-            print(f"outliner: cannot auto-detect syntax for '{src}'; use --syntax",
-                  file=sys.stderr)
-            exit_code = 2
-            continue
-        if items is None:
-            available = ", ".join(NAMES)
-            print(f"outliner: unsupported syntax '{match}'; available: {available}",
-                  file=sys.stderr)
-            exit_code = 2
-            continue
         output_lines = _format_items(items, grep_re, line_width)
         if output_lines:

{outliner_cli-0.3.0 → outliner_cli-0.4.0}/src/outliner/parsers/__init__.py RENAMED Viewed

@@ -57,7 +57,7 @@ def outline(syntax: str, content: str | TextIO) -> list[OutlineItem] | None:
 def _outline_text(mod, content: str) -> list[OutlineItem]:
-    m = _FRONTMATTER_RE.match(content)
+    m = _FRONTMATTER_RE.match(content) if getattr(mod, "STRIP_FRONTMATTER", True) else None
     if not m:
         return list(mod.parse(content))
     offset = m.group(0).count('\n')

{outliner_cli-0.3.0 → outliner_cli-0.4.0}/src/outliner/parsers/html.py RENAMED Viewed

@@ -25,10 +25,6 @@ _BORING_EXCERPTS = {
     "advertisement", "close", "menu", "navigation", "open menu",
     "search", "skip advertisement", "skip to content",
 }
-_VOID_TAGS = {
-    "area", "base", "br", "col", "embed", "hr", "img", "input",
-    "link", "meta", "param", "source", "track", "wbr",
-}
 @dataclass
@@ -109,12 +105,10 @@ class _Parser(HTMLParser):
                 depth=self._tag_depth(tag),
             )
             self.nodes.append(node)
-            if tag not in _VOID_TAGS:
-                self._stack.append(node)
-        elif tag in _VOID_TAGS:
-            return
+            self._stack.append(node)
         if _is_heading(tag):
+            self._flush_heading()
             self._heading = _Heading(
                 tag=tag,
                 level=int(tag[1]),
@@ -123,7 +117,7 @@ class _Parser(HTMLParser):
                 base_depth=self._heading_base_depth(),
                 context_key=self._heading_context_key(),
             )
-        elif tag == "title" and self._inside_document_head():
+        elif tag == "title" and not self._inside_content():
             self._title = _Heading(
                 tag=tag,
                 level=0,
@@ -160,95 +154,89 @@ class _Parser(HTMLParser):
         line = self.getpos()[0]
         if self._heading and tag == self._heading.tag:
-            heading = self._heading
-            text = _clean("".join(heading.text_parts))
-            depth = self._heading_depth(heading)
-            self.headings.append((
-                heading.start,
-                heading.start_col,
-                heading.level,
-                f"{'  ' * depth}<{heading.tag}>{text}</{heading.tag}>",
-            ))
-            for node in reversed(self._stack):
-                if node.tag in _LANDMARKS and not node.heading_text:
-                    node.heading_text = text
-                    break
-            self._heading = None
+            self._flush_heading()
         elif self._title and tag == "title":
-            title = self._title
-            text = _clean("".join(title.text_parts))
-            if text:
-                self.titles.append((title.start, title.start_col, OutlineItem(
-                    start=title.start,
-                    count=max(1, line - title.start + 1),
-                    signature=f"{'  ' * title.base_depth}<title>{text}</title>",
-                )))
-            self._title = None
+            self._flush_title(line)
         if tag in _OUTLINE_TAGS:
             for idx in range(len(self._stack) - 1, -1, -1):
-                node = self._stack[idx]
-                if node.tag == tag:
-                    node.end = line
+                if self._stack[idx].tag == tag:
+                    for node in self._stack[idx:]:
+                        node.end = line
                     del self._stack[idx:]
                     break
     def handle_data(self, data: str) -> None:
+        if self._text_skip:
+            return
         if self._heading:
             self._heading.text_parts.append(data)
         elif self._title:
             self._title.text_parts.append(data)
-        elif not self._text_skip:
+        else:
             self._add_text(data)
     def handle_entityref(self, name: str) -> None:
+        if self._text_skip:
+            return
         if self._heading:
             self._heading.text_parts.append(f"&{name};")
         elif self._title:
             self._title.text_parts.append(f"&{name};")
-        elif not self._text_skip:
+        else:
             self._add_text(f"&{name};", glue=True)
     def handle_charref(self, name: str) -> None:
+        if self._text_skip:
+            return
         if self._heading:
             self._heading.text_parts.append(f"&#{name};")
         elif self._title:
             self._title.text_parts.append(f"&#{name};")
-        elif not self._text_skip:
+        else:
             self._add_text(f"&#{name};", glue=True)
     def close(self) -> None:
         super().close()
-        if self._heading:
-            heading = self._heading
-            text = _clean("".join(heading.text_parts))
-            depth = self._heading_depth(heading)
-            self.headings.append((
-                heading.start,
-                heading.start_col,
-                heading.level,
-                f"{'  ' * depth}<{heading.tag}>{text}</{heading.tag}>",
-            ))
-            self._heading = None
-        if self._title:
-            title = self._title
-            text = _clean("".join(title.text_parts))
-            if text:
-                self.titles.append((title.start, title.start_col, OutlineItem(
-                    start=title.start,
-                    count=max(1, self.line_count - title.start + 1),
-                    signature=f"{'  ' * title.base_depth}<title>{text}</title>",
-                )))
-            self._title = None
+        self._flush_heading()
+        self._flush_title(self.line_count)
         for node in self._stack:
             node.end = self.line_count
+    def _flush_heading(self) -> None:
+        heading = self._heading
+        if not heading:
+            return
+        text = _clean("".join(heading.text_parts))
+        depth = self._heading_depth(heading)
+        self.headings.append((
+            heading.start,
+            heading.start_col,
+            heading.level,
+            f"{'  ' * depth}<{heading.tag}>{text}</{heading.tag}>",
+        ))
+        for node in reversed(self._stack):
+            if node.tag in _LANDMARKS and not node.heading_text:
+                node.heading_text = text
+                break
+        self._heading = None
+    def _flush_title(self, end_line: int) -> None:
+        title = self._title
+        if not title:
+            return
+        text = _clean("".join(title.text_parts))
+        if text:
+            self.titles.append((title.start, title.start_col, OutlineItem(
+                start=title.start,
+                count=max(1, end_line - title.start + 1),
+                signature=f"{'  ' * title.base_depth}<title>{text}</title>",
+            )))
+        self._title = None
     def _inside_content(self) -> bool:
         return any(node.tag in _CONTENT_TAGS for node in self._stack)
-    def _inside_document_head(self) -> bool:
-        return any(node.tag == "head" for node in self._stack) and not self._inside_content()
     def _outline_depth(self) -> int:
         return len([
             node for node in self._stack

outliner-cli 0.3.0__tar.gz → 0.4.0__tar.gz

outliner-cli 0.3.0tar.gz → 0.4.0tar.gz