PyPI - sourcecrumb - Versions diffs - 0.1.0__tar.gz - Mend

sourcecrumb 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (51) hide show

sourcecrumb-0.1.0/.beads/.gitignore +48 -0
sourcecrumb-0.1.0/.beads/.jsonl.lock +0 -0
sourcecrumb-0.1.0/.beads/.local_version +1 -0
sourcecrumb-0.1.0/.beads/README.md +81 -0
sourcecrumb-0.1.0/.beads/beads.db +0 -0
sourcecrumb-0.1.0/.beads/beads.db-shm +0 -0
sourcecrumb-0.1.0/.beads/beads.db-wal +0 -0
sourcecrumb-0.1.0/.beads/beads.left.jsonl +42 -0
sourcecrumb-0.1.0/.beads/beads.left.meta.json +1 -0
sourcecrumb-0.1.0/.beads/config.yaml +67 -0
sourcecrumb-0.1.0/.beads/daemon.lock +7 -0
sourcecrumb-0.1.0/.beads/daemon.log +1041 -0
sourcecrumb-0.1.0/.beads/daemon.pid +1 -0
sourcecrumb-0.1.0/.beads/interactions.jsonl +0 -0
sourcecrumb-0.1.0/.beads/issues.jsonl +42 -0
sourcecrumb-0.1.0/.beads/last-touched +1 -0
sourcecrumb-0.1.0/.beads/metadata.json +4 -0
sourcecrumb-0.1.0/.beads/sync-state.json +7 -0
sourcecrumb-0.1.0/.claude/hooks/ruff-fix.sh +1 -0
sourcecrumb-0.1.0/.claude/rules/python.md +1 -0
sourcecrumb-0.1.0/.claude/settings.json +16 -0
sourcecrumb-0.1.0/.claude/settings.local.json +51 -0
sourcecrumb-0.1.0/.gitattributes +3 -0
sourcecrumb-0.1.0/.gitignore +17 -0
sourcecrumb-0.1.0/.python-version +1 -0
sourcecrumb-0.1.0/AGENTS.md +40 -0
sourcecrumb-0.1.0/CLAUDE.md +22 -0
sourcecrumb-0.1.0/PKG-INFO +127 -0
sourcecrumb-0.1.0/README.md +108 -0
sourcecrumb-0.1.0/pyproject.toml +56 -0
sourcecrumb-0.1.0/sourcecrumb/__init__.py +5 -0
sourcecrumb-0.1.0/sourcecrumb/cli.py +120 -0
sourcecrumb-0.1.0/sourcecrumb/discovery.py +134 -0
sourcecrumb-0.1.0/sourcecrumb/graph.py +80 -0
sourcecrumb-0.1.0/sourcecrumb/languages.py +90 -0
sourcecrumb-0.1.0/sourcecrumb/models.py +64 -0
sourcecrumb-0.1.0/sourcecrumb/parsing.py +182 -0
sourcecrumb-0.1.0/sourcecrumb/queries/__init__.py +1 -0
sourcecrumb-0.1.0/sourcecrumb/queries/python.scm +25 -0
sourcecrumb-0.1.0/sourcecrumb/ranking.py +42 -0
sourcecrumb-0.1.0/sourcecrumb/toon.py +123 -0
sourcecrumb-0.1.0/tests/__init__.py +0 -0
sourcecrumb-0.1.0/tests/conftest.py +185 -0
sourcecrumb-0.1.0/tests/test_cli.py +135 -0
sourcecrumb-0.1.0/tests/test_discovery.py +192 -0
sourcecrumb-0.1.0/tests/test_graph.py +116 -0
sourcecrumb-0.1.0/tests/test_languages.py +49 -0
sourcecrumb-0.1.0/tests/test_parsing.py +150 -0
sourcecrumb-0.1.0/tests/test_ranking.py +85 -0
sourcecrumb-0.1.0/tests/test_toon.py +117 -0
sourcecrumb-0.1.0/uv.lock +412 -0

sourcecrumb-0.1.0/.beads/.gitignore ADDED Viewed

@@ -0,0 +1,48 @@
+# SQLite databases
+*.db
+*.db?*
+*.db-journal
+*.db-wal
+*.db-shm
+# Daemon runtime files
+daemon.lock
+daemon.log
+daemon.pid
+bd.sock
+sync-state.json
+last-touched
+# Local version tracking (prevents upgrade notification spam after git ops)
+.local_version
+# Legacy database files
+db.sqlite
+bd.db
+# Worktree redirect file (contains relative path to main repo's .beads/)
+# Must not be committed as paths would be wrong in other clones
+redirect
+# Merge artifacts (temporary files from 3-way merge)
+beads.base.jsonl
+beads.base.meta.json
+beads.left.jsonl
+beads.left.meta.json
+beads.right.jsonl
+beads.right.meta.json
+# Sync state (local-only, per-machine)
+# These files are machine-specific and should not be shared across clones
+.sync.lock
+sync_base.jsonl
+export-state/
+# Process semaphore slot files (runtime concurrency limiting)
+sem/
+# NOTE: Do NOT add negation patterns (e.g., !issues.jsonl) here.
+# They would override fork protection in .git/info/exclude, allowing
+# contributors to accidentally commit upstream issue databases.
+# The JSONL files (issues.jsonl, interactions.jsonl) and config files
+# are tracked by git by default since no pattern above ignores them.

sourcecrumb-0.1.0/.beads/.jsonl.lock ADDED Viewed

File without changes

sourcecrumb-0.1.0/.beads/.local_version ADDED Viewed

	@@ -0,0 +1 @@
1	+ 0.49.4

sourcecrumb-0.1.0/.beads/README.md ADDED Viewed

@@ -0,0 +1,81 @@
+# Beads - AI-Native Issue Tracking
+Welcome to Beads! This repository uses **Beads** for issue tracking - a modern, AI-native tool designed to live directly in your codebase alongside your code.
+## What is Beads?
+Beads is issue tracking that lives in your repo, making it perfect for AI coding agents and developers who want their issues close to their code. No web UI required - everything works through the CLI and integrates seamlessly with git.
+**Learn more:** [github.com/steveyegge/beads](https://github.com/steveyegge/beads)
+## Quick Start
+### Essential Commands
+```bash
+# Create new issues
+bd create "Add user authentication"
+# View all issues
+bd list
+# View issue details
+bd show <issue-id>
+# Update issue status
+bd update <issue-id> --status in_progress
+bd update <issue-id> --status done
+# Sync with git remote
+bd sync
+```
+### Working with Issues
+Issues in Beads are:
+- **Git-native**: Stored in `.beads/issues.jsonl` and synced like code
+- **AI-friendly**: CLI-first design works perfectly with AI coding agents
+- **Branch-aware**: Issues can follow your branch workflow
+- **Always in sync**: Auto-syncs with your commits
+## Why Beads?
+✨ **AI-Native Design**
+- Built specifically for AI-assisted development workflows
+- CLI-first interface works seamlessly with AI coding agents
+- No context switching to web UIs
+🚀 **Developer Focused**
+- Issues live in your repo, right next to your code
+- Works offline, syncs when you push
+- Fast, lightweight, and stays out of your way
+🔧 **Git Integration**
+- Automatic sync with git commits
+- Branch-aware issue tracking
+- Intelligent JSONL merge resolution
+## Get Started with Beads
+Try Beads in your own projects:
+```bash
+# Install Beads
+curl -sSL https://raw.githubusercontent.com/steveyegge/beads/main/scripts/install.sh | bash
+# Initialize in your repo
+bd init
+# Create your first issue
+bd create "Try out Beads"
+```
+## Learn More
+- **Documentation**: [github.com/steveyegge/beads/docs](https://github.com/steveyegge/beads/tree/main/docs)
+- **Quick Start Guide**: Run `bd quickstart`
+- **Examples**: [github.com/steveyegge/beads/examples](https://github.com/steveyegge/beads/tree/main/examples)
+---
+*Beads: Issue tracking that moves at the speed of thought* ⚡

sourcecrumb-0.1.0/.beads/beads.db ADDED Viewed

Binary file

sourcecrumb-0.1.0/.beads/beads.db-shm ADDED Viewed

Binary file

sourcecrumb-0.1.0/.beads/beads.db-wal ADDED Viewed

Binary file

sourcecrumb-0.1.0/.beads/beads.left.jsonl ADDED Viewed

@@ -0,0 +1,42 @@
+{"id":"repoguide-3t0","title":"Review: repoguide package full review (2026-02-11 17:52)","description":"Full code review of all source files in repoguide/ package","status":"open","priority":2,"issue_type":"epic","owner":"loki77@gmail.com","created_at":"2026-02-11T17:52:00.939245-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T17:52:00.939245-08:00","labels":["code-review"]}
+{"id":"repoguide-3t0.1","title":"Cache freshness check crashes on deleted files","description":"**File**: `/Users/mike/git/repoguide/repoguide/cli.py`\n**Line(s)**: 20-25\n**Description**: `_cache_is_fresh` calls `stat()` on every source file but does not handle the case where a file has been deleted since discovery. If a file listed in `files` no longer exists on disk, `stat()` will raise `FileNotFoundError`, crashing the cache-freshness check.\n\nAdditionally, `files` is a `list[tuple[Path, str]]` but the unpacking on line 25 uses `rel, _` — meaning the first element is the relative path. This is correct, but the function receives `files` from `discover_files` which returns `Path` objects (relative), while the function also accepts a `root` parameter and constructs `root / rel`. If `discover_files` ever returns absolute paths, this would break silently by creating doubled paths.\n\n**Suggested Fix**: Wrap the stat call in a try/except for `OSError`, treating missing files as \"cache is stale\":\n\n```python\ndef _cache_is_fresh(cache: Path, root: Path, files: list[tuple[Path, str]]) -\u003e bool:\n    \"\"\"Check if cache file exists and is newer than all discovered source files.\"\"\"\n    if not cache.is_file():\n        return False\n    cache_mtime = cache.stat().st_mtime\n    try:\n        return all((root / rel).stat().st_mtime \u003c cache_mtime for rel, _ in files)\n    except OSError:\n        return False\n```\n","status":"closed","priority":2,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:52:47.899428-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T18:04:55.059785-08:00","closed_at":"2026-02-11T18:04:55.059785-08:00","close_reason":"Closed","labels":["code-review","reviewer:logic"],"dependencies":[{"issue_id":"repoguide-3t0.1","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:52:47.903082-08:00","created_by":"Michael Barrett"}]}
+{"id":"repoguide-3t0.10","title":"Overly broad exception handler and inconsistent error output in CLI parsing loop","description":"**File**: /Users/mike/git/repoguide/repoguide/cli.py\n**Line(s)**: 93-95\n**Description**: The exception handler in the file-parsing loop uses a bare `except Exception` that catches all exceptions, including programming errors like `TypeError`, `AttributeError`, or `KeyError`. This can silently swallow bugs in the parsing logic and make debugging very difficult. The warning message is also printed via `print()` while all other error output in the CLI uses `typer.echo(..., err=True)`, creating an inconsistency in the error-reporting approach.\n\n**Suggested Fix**: \n1. Narrow the exception type to something more specific (e.g., catch `OSError` for file-read failures and `tree_sitter`-specific exceptions for parse failures).\n2. Use `typer.echo(..., err=True)` instead of `print(..., file=sys.stderr)` for consistency with the rest of the CLI module.\n\n```python\n# Current code:\ntry:\n    tags = extract_tags(abs_path, lang_config)\nexcept Exception as exc:\n    print(f\"Warning: failed to parse {rel_path}: {exc}\", file=sys.stderr)\n    continue\n\n# Suggested fix:\ntry:\n    tags = extract_tags(abs_path, lang_config)\nexcept (OSError, UnicodeDecodeError) as exc:\n    typer.echo(f\"Warning: failed to parse {rel_path}: {exc}\", err=True)\n    continue\n```\n","status":"closed","priority":1,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:53:24.029135-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T17:59:03.569187-08:00","closed_at":"2026-02-11T17:59:03.569189-08:00","labels":["code-review","reviewer:readability"],"dependencies":[{"issue_id":"repoguide-3t0.10","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:53:24.030578-08:00","created_by":"Michael Barrett"}],"comments":[{"id":1,"issue_id":"repoguide-3t0.10","author":"Michael Barrett","text":"Duplicate of repoguide-3t0.5 — same broad except Exception issue in cli.py lines 91-95","created_at":"2026-02-12T01:59:13Z"}]}
+{"id":"repoguide-3t0.11","title":"Unrestricted cache file path allows arbitrary file read/write","description":"**File**: /Users/mike/git/repoguide/repoguide/cli.py\n**Line(s)**: 62-67, 83-84, 115-116\n**Description**: The `--cache` option accepts an arbitrary file path with no validation or restriction. The cache file is both read from (line 84, `cache.read_text`) and written to (line 116, `cache.write_text`). While the tool is a local CLI and the user controls the arguments, there is no guard against overwriting important files. If repoguide were integrated into an automated pipeline where the cache path is constructed from untrusted input, this could be used to overwrite arbitrary files.\n\nThe cache freshness check (`_cache_is_fresh`) only verifies modification times, not content integrity. A tampered cache file would be served directly to stdout without any validation.\n\n**Suggested Fix**: Consider restricting the cache path to be within the repository root or a known cache directory. Additionally, validate that cache content appears well-formed before serving it (e.g., starts with expected TOON header).\n\n```python\n# Add cache path validation\nif cache:\n    cache = cache.resolve()\n    cache_parent = cache.parent.resolve()\n    # Optionally restrict to repo root or known locations\n    if not cache_parent.is_relative_to(root):\n        typer.echo(\"Error: cache file must be within the repository root.\", err=True)\n        raise typer.Exit(1)\n```\n","status":"closed","priority":3,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:53:25.637651-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T18:12:41.456492-08:00","closed_at":"2026-02-11T18:12:41.456492-08:00","close_reason":"Won't fix: local CLI tool where the user controls all arguments. If someone controls your CLI args, they already have code execution. Adding path restrictions would be bad UX — users reasonably want --cache /tmp/map.cache or --cache ~/.cache/repoguide/project.cache.","labels":["code-review","reviewer:security"],"dependencies":[{"issue_id":"repoguide-3t0.11","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:53:25.639196-08:00","created_by":"Michael Barrett"}]}
+{"id":"repoguide-3t0.12","title":"Linear-scan duplicate check and repeated sorting in graph construction","description":"**File**: /Users/mike/git/repoguide/repoguide/graph.py\n**Line(s)**: 39-48\n**Description**: In the reference-matching loop, `sorted(defines.get(tag.name, set()))` is called for every reference tag. If a symbol has many definitions (unlikely but possible), this sort is repeated each time the symbol is referenced. More importantly, the duplicate-symbol check on line 47 (`if tag.name not in edge_symbols[(fi.path, def_file)]`) performs a linear scan of a list for each edge, making it O(n) per check. If many references resolve to the same file pair, this grows quadratically.\n\nAdditionally, line 46 adds an edge to the `MultiDiGraph` unconditionally, meaning duplicate parallel edges for the same symbol between the same two files are added. This inflates the graph with redundant edges that do not contribute to the PageRank calculation, increasing memory usage and slowing down PageRank.\n\n**Suggested Fix**: \n1. Pre-sort the `defines` dict values once rather than sorting on every access.\n2. Use a set for `edge_symbols` values instead of a list to make the duplicate check O(1).\n3. Track seen edges and avoid adding duplicate edges to the graph.\n\n```python\n# Pre-sort definitions once\nsorted_defines: dict[str, list[Path]] = {\n    name: sorted(paths) for name, paths in defines.items()\n}\n\nedge_symbols: dict[tuple[Path, Path], set[str]] = defaultdict(set)\n\nfor fi in file_infos:\n    for tag in fi.tags:\n        if tag.kind != TagKind.REFERENCE:\n            continue\n        for def_file in sorted_defines.get(tag.name, []):\n            if def_file == fi.path:\n                continue\n            edge_key = (fi.path, def_file)\n            if tag.name not in edge_symbols[edge_key]:\n                graph.add_edge(fi.path, def_file, symbol=tag.name)\n                edge_symbols[edge_key].add(tag.name)\n\ndependencies = [\n    Dependency(source=src, target=tgt, symbols=sorted(syms))\n    for (src, tgt), syms in edge_symbols.items()\n]\n```\n","status":"open","priority":2,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:53:26.69063-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T17:53:26.69063-08:00","labels":["code-review","reviewer:perf"],"dependencies":[{"issue_id":"repoguide-3t0.12","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:53:26.692162-08:00","created_by":"Michael Barrett"}]}
+{"id":"repoguide-3t0.13","title":"Discovery traverses all files in skip dirs before filtering","description":"**File**: `/Users/mike/git/repoguide/repoguide/discovery.py`\n**Line(s)**: 57\n**Description**: `root.rglob(\"*\")` traverses the entire directory tree including all files in skipped directories (like `node_modules`, `.git`, `__pycache__`) before the filtering happens on lines 63-64. For large repositories with big `node_modules` or `.git` directories, this is a significant performance bottleneck because Python must enumerate every file path before the filter discards it.\n\nThis is particularly problematic because `.git` can contain thousands of pack files and objects, and `node_modules` in JavaScript repos can easily contain 100k+ files.\n\n**Suggested Fix**: Use `os.walk()` with `topdown=True` and modify `dirs` in-place to prune skipped directories before descending, then filter individual files:\n\n```python\nimport os\n\nresults: list[tuple[Path, str]] = []\nfor dirpath, dirnames, filenames in os.walk(root, topdown=True):\n    rel_dir = Path(dirpath).relative_to(root)\n    \n    # Prune skip dirs and hidden dirs in-place\n    dirnames[:] = [\n        d for d in dirnames\n        if d not in SKIP_DIRS and not d.startswith(\".\")\n    ]\n    \n    for filename in filenames:\n        if filename.startswith(\".\"):\n            continue\n        rel = rel_dir / filename\n        # ... rest of filtering\n```\n","status":"closed","priority":1,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:53:36.335338-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T17:59:36.888234-08:00","closed_at":"2026-02-11T17:59:36.888239-08:00","labels":["code-review","reviewer:logic"],"dependencies":[{"issue_id":"repoguide-3t0.13","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:53:36.33699-08:00","created_by":"Michael Barrett"}],"comments":[{"id":3,"issue_id":"repoguide-3t0.13","author":"Michael Barrett","text":"Duplicate of repoguide-3t0.2 -- same rglob traversal issue in discovery.py","created_at":"2026-02-12T01:59:36Z"}]}
+{"id":"repoguide-3t0.14","title":"Redundant EXTENSION_MAP duplicates data already in LANGUAGES registry","description":"**File**: /Users/mike/git/repoguide/repoguide/languages.py\n**Line(s)**: 15-17\n**Description**: `EXTENSION_MAP` is a separate mapping from extensions to language names, but `TreeSitterLanguage` already stores `extensions` as a tuple. This creates a data duplication problem: when adding a new language, the developer must update both `EXTENSION_MAP` and `LANGUAGES`, and they can easily become out of sync. The `EXTENSION_MAP` is redundant because the same information can be derived from the `LANGUAGES` dictionary.\n\n**Suggested Fix**: Derive `EXTENSION_MAP` automatically from the `LANGUAGES` registry to eliminate the duplication.\n\n```python\nLANGUAGES: dict[str, TreeSitterLanguage] = {\n    \"python\": TreeSitterLanguage(name=\"python\", extensions=(\".py\",)),\n}\n\n# Derive the extension map from the languages registry\nEXTENSION_MAP: dict[str, str] = {\n    ext: lang.name\n    for lang in LANGUAGES.values()\n    for ext in lang.extensions\n}\n```\n","status":"closed","priority":2,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:53:51.552794-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T17:59:44.80578-08:00","closed_at":"2026-02-11T17:59:44.805784-08:00","labels":["code-review","reviewer:readability"],"dependencies":[{"issue_id":"repoguide-3t0.14","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:53:51.554334-08:00","created_by":"Michael Barrett"}],"comments":[{"id":4,"issue_id":"repoguide-3t0.14","author":"Michael Barrett","text":"Duplicate of repoguide-3t0.39 -- same EXTENSION_MAP duplication issue in languages.py","created_at":"2026-02-12T01:59:44Z"}]}
+{"id":"repoguide-3t0.15","title":"Absolute filesystem path disclosure in TOON output","description":"**File**: /Users/mike/git/repoguide/repoguide/toon.py\n**Line(s)**: 26\n**Description**: The TOON encoder outputs the absolute filesystem path of the repository root via `str(repo_map.root)` on line 26. This leaks the full directory structure of the user's machine (e.g., `/Users/mike/git/myproject`) into the output. When the output is shared with LLMs, included in documentation, or posted publicly, this reveals the user's home directory, username, and directory layout, which is a minor information disclosure concern.\n\n**Suggested Fix**: Consider omitting the absolute root path from the output, or replacing it with a relative or anonymized representation. At minimum, document that the output contains the absolute root path so users are aware.\n\n```python\n# Current\nparts.append(f\"root: {_encode_value(str(repo_map.root))}\")\n\n# Suggested: use just the directory name or a relative path\nparts.append(f\"root: {_encode_value(repo_map.root.name)}\")\n```\n","status":"closed","priority":3,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:54:20.009883-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T18:17:46.248976-08:00","closed_at":"2026-02-11T18:17:46.248976-08:00","close_reason":"Closed","labels":["code-review","reviewer:security"],"dependencies":[{"issue_id":"repoguide-3t0.15","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:54:20.011536-08:00","created_by":"Michael Barrett"}]}
+{"id":"repoguide-3t0.16","title":"Regex recompiled on each call in _collapse_whitespace","description":"**File**: /Users/mike/git/repoguide/repoguide/parsing.py\n**Line(s)**: 187-191\n**Description**: The `_collapse_whitespace` function imports `re` inside the function body on every call. While Python caches imported modules, the import lookup still has overhead. More importantly, the regex pattern is recompiled every call via `re.sub(r\"\\s+\", \" \", text)` because no pre-compiled pattern object is used.\n\nThis function is called once per function/method signature, so for large repos it could be invoked thousands of times.\n\n**Suggested Fix**: Move the `import re` to the module top level (it is not there currently) and use a pre-compiled regex pattern at module scope.\n\n```python\n# At module level:\nimport re\n\n_WHITESPACE_RE = re.compile(r\"\\s+\")\n\n# In the function:\ndef _collapse_whitespace(text: str) -\u003e str:\n    \"\"\"Collapse multi-line whitespace into single spaces.\"\"\"\n    return _WHITESPACE_RE.sub(\" \", text)\n```\n","status":"open","priority":3,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:54:21.299708-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T17:54:21.299708-08:00","labels":["code-review","reviewer:perf"],"dependencies":[{"issue_id":"repoguide-3t0.16","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:54:21.302037-08:00","created_by":"Michael Barrett"}]}
+{"id":"repoguide-3t0.17","title":"Only root .gitignore is respected, subdirectory gitignores are ignored","description":"**File**: `/Users/mike/git/repoguide/repoguide/discovery.py`\n**Line(s)**: 84-90\n**Description**: `_load_gitignore` only reads the root `.gitignore` file. Git repositories can have `.gitignore` files in subdirectories, a global gitignore (`core.excludesFile`), and `.git/info/exclude`. None of these are respected. This means files that are truly git-ignored (e.g., by a subdirectory `.gitignore`) will still be discovered and parsed.\n\nThis is a design limitation rather than a bug, but it could lead to surprising behavior for users who expect git-standard ignore behavior.\n\n**Suggested Fix**: Consider using `git ls-files` or `git check-ignore` for repos that have a `.git` directory, falling back to the current pathspec-based approach for non-git directories. Alternatively, document this limitation clearly. The `pathspec` library also supports combining multiple ignore files.\n","status":"closed","priority":2,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:54:22.981762-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T18:28:03.135315-08:00","closed_at":"2026-02-11T18:28:03.135315-08:00","close_reason":"Closed","labels":["code-review","reviewer:logic"],"dependencies":[{"issue_id":"repoguide-3t0.17","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:54:22.983162-08:00","created_by":"Michael Barrett"}]}
+{"id":"repoguide-3t0.18","title":"Cache freshness check lacks error handling and has incomplete documentation","description":"**File**: /Users/mike/git/repoguide/repoguide/cli.py\n**Line(s)**: 20-25\n**Description**: The `_cache_is_fresh` function accesses `(root / rel).stat().st_mtime` for every file but does not handle `FileNotFoundError` if a file has been deleted since discovery. Since `discover_files` runs first and then `_cache_is_fresh` checks all discovered files, a race condition exists where a file could be deleted between discovery and the staleness check. Additionally, the function lacks a docstring explaining its caching semantics -- e.g., what \"fresh\" means precisely and that it compares modification times.\n\nThe `files` parameter type is `list[tuple[Path, str]]` but the function only uses the first element of each tuple. This is not immediately clear from reading the function signature alone.\n\n**Suggested Fix**: Add error handling for missing files and improve documentation.\n\n```python\ndef _cache_is_fresh(cache: Path, root: Path, files: list[tuple[Path, str]]) -\u003e bool:\n    \"\"\"Check if cache file exists and is newer than all discovered source files.\n\n    A cache is considered fresh when it exists and its modification time\n    is more recent than every file in the discovered list. Files that\n    disappear between discovery and this check cause the cache to be\n    considered stale.\n\n    Args:\n        cache: Path to the cache file.\n        root: Repository root directory.\n        files: List of (relative_path, language_name) tuples from discovery.\n\n    Returns:\n        True if cache is fresh and can be reused.\n    \"\"\"\n    if not cache.is_file():\n        return False\n    cache_mtime = cache.stat().st_mtime\n    for rel, _ in files:\n        try:\n            if (root / rel).stat().st_mtime \u003e= cache_mtime:\n                return False\n        except FileNotFoundError:\n            return False\n    return True\n```\n","status":"closed","priority":2,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:54:25.065081-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T18:00:21.163657-08:00","closed_at":"2026-02-11T18:00:21.16366-08:00","labels":["code-review","reviewer:readability"],"dependencies":[{"issue_id":"repoguide-3t0.18","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:54:25.066523-08:00","created_by":"Michael Barrett"}],"comments":[{"id":9,"issue_id":"repoguide-3t0.18","author":"Michael Barrett","text":"Duplicate of repoguide-3t0.1 -- same _cache_is_fresh crash on deleted files in cli.py","created_at":"2026-02-12T02:00:20Z"}]}
+{"id":"repoguide-3t0.19","title":"No file size limit on source file parsing","description":"**File**: /Users/mike/git/repoguide/repoguide/parsing.py\n**Line(s)**: 33\n**Description**: The `extract_tags` function reads the entire file into memory with `file_path.read_bytes()` on line 33. There is no file size limit check. If the discovery module finds an extremely large file (e.g., a multi-gigabyte generated file, minified JavaScript bundle, or a large binary that happens to have a `.py` extension), this could cause excessive memory consumption or an `MemoryError`, leading to a denial-of-service condition.\n\nThis is relevant when the tool is used in automated pipelines or CI systems where repository contents may not be fully trusted.\n\n**Suggested Fix**: Add a file size check before reading, skipping files above a reasonable threshold (e.g., 1 MB for source code analysis).\n\n```python\nimport os\n\nMAX_FILE_SIZE = 1_000_000  # 1 MB\n\ndef extract_tags(file_path: Path, language: TreeSitterLanguage) -\u003e list[Tag]:\n    if file_path.stat().st_size \u003e MAX_FILE_SIZE:\n        return []\n    source = file_path.read_bytes()\n    # ... rest of function\n```\n","status":"closed","priority":3,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:54:30.574609-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T18:00:14.589458-08:00","closed_at":"2026-02-11T18:00:14.58946-08:00","labels":["code-review","reviewer:security"],"dependencies":[{"issue_id":"repoguide-3t0.19","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:54:30.576085-08:00","created_by":"Michael Barrett"}],"comments":[{"id":8,"issue_id":"repoguide-3t0.19","author":"Michael Barrett","text":"Duplicate of repoguide-3t0.26 -- same file size limit issue in parsing.py","created_at":"2026-02-12T02:00:14Z"}]}
+{"id":"repoguide-3t0.2","title":"Inefficient directory traversal: rglob visits skip dirs before filtering","description":"**File**: /Users/mike/git/repoguide/repoguide/discovery.py\n**Line(s)**: 57-81\n**Description**: `root.rglob(\"*\")` recursively enumerates every file and directory in the tree before any filtering occurs. For large repositories, this eagerly materializes a massive sorted list of all filesystem entries (line 57: `sorted(root.rglob(\"*\"))`). The `sorted()` call forces the entire generator into memory before iteration begins, and entries in skip directories (e.g., `node_modules`, `build`) are visited and stat'd even though they will immediately be discarded.\n\n**Suggested Fix**: Replace `sorted(root.rglob(\"*\"))` with `os.walk()` or a custom `rglob` that prunes skip directories at the directory level, avoiding descending into them entirely. This can yield order-of-magnitude speedups on repos with large `node_modules` or `build` directories.\n\n```python\n# Current approach: visits everything then filters\nfor path in sorted(root.rglob(\"*\")):\n    if not path.is_file():\n        continue\n    rel = path.relative_to(root)\n    if _is_hidden(rel) or _in_skip_dir(rel):\n        continue\n    ...\n\n# Suggested: prune directories early with os.walk\nimport os\n\nresults: list[tuple[Path, str]] = []\nfor dirpath, dirnames, filenames in os.walk(root):\n    rel_dir = Path(dirpath).relative_to(root)\n    # Prune skip dirs and hidden dirs IN PLACE to prevent descent\n    dirnames[:] = [\n        d for d in dirnames\n        if d not in SKIP_DIRS and not d.startswith(\".\")\n    ]\n    for fname in filenames:\n        rel = rel_dir / fname\n        if _is_hidden(rel):\n            continue\n        if gitignore.match_file(str(rel)):\n            continue\n        if extra_spec and extra_spec.match_file(str(rel)):\n            continue\n        lang = language_for_extension(Path(fname).suffix)\n        if lang is None:\n            continue\n        if language_filter and lang.name != language_filter:\n            continue\n        results.append((rel, lang.name))\nresults.sort()\n```\n","status":"closed","priority":1,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:52:48.760509-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T18:07:54.311973-08:00","closed_at":"2026-02-11T18:07:54.311973-08:00","close_reason":"Closed","labels":["code-review","reviewer:perf"],"dependencies":[{"issue_id":"repoguide-3t0.2","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:52:48.762096-08:00","created_by":"Michael Barrett"}]}
+{"id":"repoguide-3t0.20","title":"Symbol name collisions create spurious dependency edges","description":"**File**: `/Users/mike/git/repoguide/repoguide/graph.py`\n**Line(s)**: 27-48\n**Description**: The graph construction has an ambiguity problem. When the same symbol name (e.g., `run`, `get`, `create`) is defined in multiple files, `build_graph` creates edges from the referencing file to **all** files that define that symbol. This is a fundamental limitation because common names like `__init__`, `main`, `setup`, `run`, `get`, `create`, `update`, `delete` will generate many false-positive dependency edges.\n\nFor example, if 10 files all define a `run()` function, any file that calls `run()` will get 9 dependency edges (excluding self), most of which are spurious. This will distort PageRank scores significantly, promoting files with common symbol names.\n\n**Suggested Fix**: Consider scoping symbol resolution more narrowly:\n1. Prefer definitions in the same package/directory\n2. Use import statements to resolve which definition is actually referenced\n3. Weight edges by confidence (direct import = high, same-package = medium, global match = low)\n\nAt minimum, the import reference tags could be used to build an import graph first, and then symbol references could be resolved against that import graph to filter false positives.\n","status":"open","priority":2,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:54:44.608845-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T17:54:44.608845-08:00","labels":["code-review","reviewer:logic"],"dependencies":[{"issue_id":"repoguide-3t0.20","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:54:44.610339-08:00","created_by":"Michael Barrett"}]}
+{"id":"repoguide-3t0.21","title":"Cache freshness check stats every file in the repository","description":"**File**: /Users/mike/git/repoguide/repoguide/cli.py\n**Line(s)**: 20-25\n**Description**: `_cache_is_fresh` calls `stat()` on every discovered source file to compare modification times, even though it short-circuits with `all()`. For large repositories with thousands of files, this performs thousands of syscalls just to validate the cache. In the worst case (cache is fresh), every single file is stat'd.\n\nThis is inherently O(n) in the number of files and cannot be avoided entirely, but the cost could be reduced by storing a content hash manifest alongside the cache, or by using an early-exit strategy based on the directory mtime.\n\n**Suggested Fix**: This is a low-priority observation. The current implementation is correct and the `all()` lazy evaluation helps when the cache is stale. For very large repos, consider storing a manifest of file paths and their mtimes alongside the cache file to allow batch comparison, or use `os.scandir()` which is faster than `Path.stat()` for bulk stat operations. Alternatively, skip the per-file check and just use the git HEAD commit hash as the cache key.\n\n```python\ndef _cache_is_fresh(cache: Path, root: Path, files: list[tuple[Path, str]]) -\u003e bool:\n    \"\"\"Check if cache is fresh using a manifest file.\"\"\"\n    manifest_path = cache.with_suffix(\".manifest\")\n    if not cache.is_file() or not manifest_path.is_file():\n        return False\n    stored = manifest_path.read_text(\"utf-8\").strip()\n    # Compare against current file list hash (much cheaper than stat'ing all files)\n    current = _compute_manifest(root, files)\n    return stored == current\n```\n","status":"open","priority":3,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:54:46.717346-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T17:54:46.717346-08:00","labels":["code-review","reviewer:perf"],"dependencies":[{"issue_id":"repoguide-3t0.21","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:54:46.718815-08:00","created_by":"Michael Barrett"}]}
+{"id":"repoguide-3t0.22","title":"Fragile check ordering in _encode_value lacks explanatory comments","description":"**File**: /Users/mike/git/repoguide/repoguide/toon.py\n**Line(s)**: 86-113\n**Description**: The `_encode_value` function has subtle ordering dependencies between its checks that are not documented. Specifically:\n1. Leading/trailing whitespace check (line 98) must come before other checks.\n2. The keyword check (line 101) lowercases the value, which could collide with other checks.\n3. The `_LOOKS_NUMERIC` regex match (line 104) returns the value unquoted, but the subsequent `startswith(\"-\")` check (line 110) would also match negative numbers -- this is only safe because `_LOOKS_NUMERIC` is checked first.\n\nThese ordering dependencies are fragile: reordering the checks would introduce bugs, but there are no comments explaining why the order matters. A brief inline comment documenting the check ordering rationale would improve maintainability.\n\n**Suggested Fix**: Add comments explaining the ordering dependencies in `_encode_value`.\n\n```python\ndef _encode_value(value: str) -\u003e str:\n    \"\"\"Encode a single value, quoting if necessary per TOON rules.\"\"\"\n    if not value:\n        return '\"\"'\n\n    # Must check whitespace first: values with leading/trailing spaces always need quoting\n    if value != value.strip():\n        return _quote(value)\n\n    # Quote TOON keywords (true/false/null) to avoid ambiguity\n    if value.lower() in _KEYWORDS:\n        return _quote(value)\n\n    # Numeric literals pass through unquoted (must check before startswith(\"-\")\n    # since negative numbers like \"-3.14\" should not be quoted)\n    if _LOOKS_NUMERIC.match(value):\n        return value\n\n    # Values containing TOON structural characters need quoting\n    if _NEEDS_QUOTING.search(value):\n        return _quote(value)\n\n    # Non-numeric values starting with \"-\" need quoting to avoid parser confusion\n    if value.startswith(\"-\"):\n        return _quote(value)\n\n    return value\n```\n","status":"open","priority":3,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:54:49.28756-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T17:54:49.28756-08:00","labels":["code-review","reviewer:readability"],"dependencies":[{"issue_id":"repoguide-3t0.22","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:54:49.28965-08:00","created_by":"Michael Barrett"}]}
+{"id":"repoguide-3t0.23","title":"Method name_text mutation leaks across inner loop iterations","description":"**File**: `/Users/mike/git/repoguide/repoguide/parsing.py`\n**Line(s)**: 60-68\n**Description**: When a method is detected inside a class, `name_text` is mutated to `f\"{class_name}.{name_text}\"` (line 68). However, `name_text` was extracted from `name_node` on line 49 and is reused across all capture names in the inner loop (line 51). If a match has multiple capture entries that pass the `_CAPTURE_MAP` check, the second iteration would see the already-modified `name_text` (e.g., `ClassName.method_name`) and could double-prefix it.\n\nIn practice this may not occur with the current Python query file because each match likely produces only one relevant capture besides `@name`, but the code structure is fragile — adding a new query pattern that produces multiple captures per match would trigger the bug.\n\n**Suggested Fix**: Compute the effective name inside the `if` branch or reset `name_text` at the start of each inner loop iteration:\n\n```python\nfor capture_name, nodes in match_dict.items():\n    if capture_name == \"name\":\n        continue\n    if capture_name not in _CAPTURE_MAP:\n        continue\n\n    tag_kind, symbol_kind = _CAPTURE_MAP[capture_name]\n    def_node = nodes[0]\n    effective_name = name_text  # Use original name\n\n    if (\n        tag_kind == TagKind.DEFINITION\n        and symbol_kind == SymbolKind.FUNCTION\n        and _is_method(def_node)\n    ):\n        symbol_kind = SymbolKind.METHOD\n        class_name = _get_enclosing_class_name(def_node)\n        if class_name:\n            effective_name = f\"{class_name}.{name_text}\"\n    # ... use effective_name instead of name_text\n```\n","status":"closed","priority":1,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:55:18.945175-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T18:04:55.052373-08:00","closed_at":"2026-02-11T18:04:55.052373-08:00","close_reason":"Closed","labels":["code-review","reviewer:logic"],"dependencies":[{"issue_id":"repoguide-3t0.23","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:55:18.946799-08:00","created_by":"Michael Barrett"}]}
+{"id":"repoguide-3t0.24","title":"CLI main function mixes multiple concerns without clear phase separation","description":"**File**: /Users/mike/git/repoguide/repoguide/cli.py\n**Line(s)**: 87-103\n**Description**: The `main` function in `cli.py` directly orchestrates parsing, graph building, ranking, file selection, and encoding in a single linear block of approximately 30 lines. While this is manageable at the current size, the function mixes several concerns: file parsing with error handling, graph construction, ranking, map assembly, selection, and output/caching. The function has no clear separation between these phases, which will make it harder to test individual phases or add features (e.g., different output formats, progress reporting).\n\nThe `main` function's docstring (\"Generate a repository map and print it to stdout.\") is accurate but minimal for the entry point of the entire tool.\n\n**Suggested Fix**: Consider extracting the parsing loop (lines 87-96) into a separate function like `_parse_files(root, files) -\u003e list[FileInfo]`. This would improve testability and make the main function read as a clear pipeline.\n\n```python\ndef _parse_files(\n    root: Path, files: list[tuple[Path, str]]\n) -\u003e list[FileInfo]:\n    \"\"\"Parse discovered files and extract tags, skipping failures.\n\n    Args:\n        root: Repository root directory.\n        files: List of (relative_path, language_name) tuples.\n\n    Returns:\n        List of successfully parsed FileInfo objects.\n    \"\"\"\n    file_infos: list[FileInfo] = []\n    for rel_path, lang_name in files:\n        abs_path = root / rel_path\n        lang_config = LANGUAGES[lang_name]\n        try:\n            tags = extract_tags(abs_path, lang_config)\n        except (OSError, UnicodeDecodeError) as exc:\n            typer.echo(f\"Warning: failed to parse {rel_path}: {exc}\", err=True)\n            continue\n        file_infos.append(FileInfo(path=rel_path, language=lang_name, tags=tags))\n    return file_infos\n```\n","status":"open","priority":3,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:55:20.72582-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T17:55:20.72582-08:00","labels":["code-review","reviewer:readability"],"dependencies":[{"issue_id":"repoguide-3t0.24","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:55:20.727819-08:00","created_by":"Michael Barrett"}]}
+{"id":"repoguide-3t0.25","title":"Redundant Path-to-string conversions in TOON encoder","description":"**File**: /Users/mike/git/repoguide/repoguide/toon.py\n**Line(s)**: 28-60\n**Description**: The `encode` function builds the output via repeated string concatenation using `\"\\n\".join(parts)` where each `parts` entry is itself built by `\"\\n\".join(lines)` in `_format_tabular`. For large repos with thousands of files and symbols, this creates many intermediate string objects. Additionally, `str(fi.path)` is called multiple times for the same file -- once in the `file_rows` list comprehension (line 28), and again for every definition tag in the symbol_rows loop (line 37). These path-to-string conversions are repeated unnecessarily.\n\n**Suggested Fix**: Pre-compute the string form of each file path once and reuse it. The overall string building pattern with lists and join is already reasonable for Python; the main gain is avoiding redundant `str()` conversions.\n\n```python\ndef encode(repo_map: RepoMap) -\u003e str:\n    parts: list[str] = []\n    parts.append(f\"repo: {_encode_value(repo_map.repo_name)}\")\n    parts.append(f\"root: {_encode_value(str(repo_map.root))}\")\n\n    # Pre-compute string paths once\n    path_strs = {fi.path: str(fi.path) for fi in repo_map.files}\n\n    file_rows = [[path_strs[fi.path], fi.language, f\"{fi.rank:.4f}\"] for fi in repo_map.files]\n    parts.append(_format_tabular(\"files\", [\"path\", \"language\", \"rank\"], file_rows))\n\n    symbol_rows: list[list[str]] = []\n    for fi in repo_map.files:\n        fi_path_str = path_strs[fi.path]\n        for tag in fi.tags:\n            if tag.kind == TagKind.DEFINITION:\n                symbol_rows.append([fi_path_str, tag.name, tag.symbol_kind.value, str(tag.line), tag.signature])\n    ...\n```\n","status":"open","priority":3,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:55:25.262684-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T17:55:25.262684-08:00","labels":["code-review","reviewer:perf"],"dependencies":[{"issue_id":"repoguide-3t0.25","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:55:25.264168-08:00","created_by":"Michael Barrett"}]}
+{"id":"repoguide-3t0.26","title":"No file size limit in extract_tags allows unbounded memory use","description":"**File**: `/Users/mike/git/repoguide/repoguide/parsing.py`\n**Line(s)**: 33-35\n**Description**: The function reads the file with `file_path.read_bytes()` and then checks `if not source: return []`. An empty bytes object `b\"\"` is falsy, so empty files return early, which is fine. However, there is no size limit or guard against very large files. If a user runs `repoguide` on a repository containing large generated files (e.g., concatenated bundles, SQLite databases with `.py` extension, or minified files), `read_bytes()` will load the entire file into memory, and the tree-sitter parse will attempt to process it, potentially consuming excessive memory and time.\n\n**Suggested Fix**: Add a configurable file size limit:\n\n```python\n_MAX_FILE_SIZE = 1_000_000  # 1 MB\n\ndef extract_tags(file_path: Path, language: TreeSitterLanguage) -\u003e list[Tag]:\n    source = file_path.read_bytes()\n    if not source or len(source) \u003e _MAX_FILE_SIZE:\n        return []\n    # ...\n```\n","status":"open","priority":2,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:55:30.507861-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T17:55:30.507861-08:00","labels":["code-review","reviewer:logic"],"dependencies":[{"issue_id":"repoguide-3t0.26","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:55:30.509314-08:00","created_by":"Michael Barrett"}]}
+{"id":"repoguide-3t0.27","title":"Deeply nested loop body in extract_tags with subtle name mutation","description":"**File**: /Users/mike/git/repoguide/repoguide/parsing.py\n**Line(s)**: 44-83\n**Description**: The `extract_tags` function contains a deeply nested loop body (lines 51-83) that handles capture-name classification, method detection with class-name prefixing, and signature extraction -- all within a nested `for capture_name, nodes in match_dict.items()` loop. The nesting depth reaches 4+ levels (function -\u003e for matches -\u003e for captures -\u003e if/if/if). Additionally, the local variable `name_text` is mutated inside the inner loop (line 68: `name_text = f\"{class_name}.{name_text}\"`) which means the mutation affects the name for subsequent captures within the same match, introducing a subtle ordering dependency.\n\n**Suggested Fix**: Extract the inner loop body into a helper function like `_process_capture(capture_name, nodes, name_text, file_path) -\u003e Tag | None` to reduce nesting and make the name-mutation semantics explicit.\n\n```python\ndef _process_capture(\n    capture_name: str,\n    nodes: list[Node],\n    name_text: str,\n    file_path: Path,\n    name_node: Node,\n) -\u003e Tag | None:\n    \"\"\"Process a single capture from a tree-sitter query match.\n\n    Args:\n        capture_name: The capture group name (e.g., \"definition.function\").\n        nodes: The captured AST nodes.\n        name_text: The symbol name text.\n        file_path: The source file path.\n        name_node: The name node for line information.\n\n    Returns:\n        A Tag if the capture is recognized, or None otherwise.\n    \"\"\"\n    if capture_name == \"name\" or capture_name not in _CAPTURE_MAP:\n        return None\n\n    tag_kind, symbol_kind = _CAPTURE_MAP[capture_name]\n    def_node = nodes[0]\n\n    if (\n        tag_kind == TagKind.DEFINITION\n        and symbol_kind == SymbolKind.FUNCTION\n        and _is_method(def_node)\n    ):\n        symbol_kind = SymbolKind.METHOD\n        class_name = _get_enclosing_class_name(def_node)\n        if class_name:\n            name_text = f\"{class_name}.{name_text}\"\n\n    signature = \"\"\n    if tag_kind == TagKind.DEFINITION:\n        signature = _extract_signature(def_node, symbol_kind)\n\n    return Tag(\n        name=name_text,\n        kind=tag_kind,\n        symbol_kind=symbol_kind,\n        line=name_node.start_point[0] + 1,\n        file=file_path,\n        signature=signature,\n    )\n```\n","status":"closed","priority":2,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:55:37.296597-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T18:00:35.254553-08:00","closed_at":"2026-02-11T18:00:35.254556-08:00","labels":["code-review","reviewer:readability"],"dependencies":[{"issue_id":"repoguide-3t0.27","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:55:37.298263-08:00","created_by":"Michael Barrett"}],"comments":[{"id":11,"issue_id":"repoguide-3t0.27","author":"Michael Barrett","text":"Duplicate of repoguide-3t0.23 -- same name_text mutation issue in extract_tags inner loop; .23 identifies the bug while .27 focuses on readability","created_at":"2026-02-12T02:00:35Z"}]}
+{"id":"repoguide-3t0.28","title":"Sequential file parsing is a bottleneck for large repos","description":"**File**: /Users/mike/git/repoguide/repoguide/cli.py\n**Line(s)**: 87-96\n**Description**: Files are parsed sequentially in a single-threaded loop. Tree-sitter parsing is CPU-bound work, and for repositories with hundreds or thousands of files, this becomes the main bottleneck. Each iteration reads a file from disk, parses it with tree-sitter, and runs the tag query -- all of which could be parallelized.\n\n**Suggested Fix**: Use `concurrent.futures.ProcessPoolExecutor` or `ThreadPoolExecutor` to parse files in parallel. Since tree-sitter parsing releases the GIL during the C-level parse, even a `ThreadPoolExecutor` could provide meaningful speedup. The file I/O is also parallelizable.\n\n```python\nfrom concurrent.futures import ThreadPoolExecutor, as_completed\n\ndef _parse_file(root: Path, rel_path: Path, lang_name: str) -\u003e FileInfo | None:\n    abs_path = root / rel_path\n    lang_config = LANGUAGES[lang_name]\n    try:\n        tags = extract_tags(abs_path, lang_config)\n    except Exception as exc:\n        print(f\"Warning: failed to parse {rel_path}: {exc}\", file=sys.stderr)\n        return None\n    return FileInfo(path=rel_path, language=lang_name, tags=tags)\n\n# In main():\nwith ThreadPoolExecutor() as executor:\n    futures = {\n        executor.submit(_parse_file, root, rel_path, lang_name): rel_path\n        for rel_path, lang_name in files\n    }\n    for future in as_completed(futures):\n        result = future.result()\n        if result is not None:\n            file_infos.append(result)\n```\n\nNote: This requires ensuring `get_parser()` and `get_tag_query()` are thread-safe. If using the suggested caching from the languages.py issue, the cache should use a thread-safe mechanism.\n","status":"open","priority":2,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:55:39.028898-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T17:55:39.028898-08:00","labels":["code-review","reviewer:perf"],"dependencies":[{"issue_id":"repoguide-3t0.28","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:55:39.030297-08:00","created_by":"Michael Barrett"}]}
+{"id":"repoguide-3t0.29","title":"Inline import of re module in _collapse_whitespace","description":"**File**: `/Users/mike/git/repoguide/repoguide/parsing.py`\n**Line(s)**: 187-191\n**Description**: The `re` module is imported inside the `_collapse_whitespace` function. This function is called once per function/method definition signature extraction, which means the `import re` statement executes on every call. While Python caches module imports after the first load, the import machinery still has overhead from the `sys.modules` lookup on each call.\n\nMore importantly, this is inconsistent with the project's convention of placing imports at the top of the module. All other imports in this file are at the top level.\n\n**Suggested Fix**: Move `import re` to the top of the file alongside the other imports:\n\n```python\nfrom __future__ import annotations\n\nimport re\nfrom pathlib import Path\n\nfrom tree_sitter import Node, QueryCursor\n# ...\n```\n","status":"closed","priority":3,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:55:40.119174-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T17:59:58.694561-08:00","closed_at":"2026-02-11T17:59:58.694564-08:00","labels":["code-review","reviewer:logic"],"dependencies":[{"issue_id":"repoguide-3t0.29","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:55:40.120495-08:00","created_by":"Michael Barrett"}],"comments":[{"id":6,"issue_id":"repoguide-3t0.29","author":"Michael Barrett","text":"Duplicate of repoguide-3t0.16 -- same inline re import in _collapse_whitespace","created_at":"2026-02-12T01:59:58Z"}]}
+{"id":"repoguide-3t0.3","title":"Duplicated AST class-detection logic in _is_method and _get_enclosing_class_name","description":"**File**: /Users/mike/git/repoguide/repoguide/parsing.py\n**Line(s)**: 88-138\n**Description**: The `_is_method` function and `_get_enclosing_class_name` function contain duplicated logic for detecting whether a function node is inside a class. Both functions independently traverse the AST parent chain with the same two-branch check (direct parent block-\u003eclass_definition, and decorated_definition-\u003eblock-\u003eclass_definition). If this detection logic changes (e.g., to support nested classes or other decorators), it must be updated in two places.\n\n**Suggested Fix**: Extract the shared \"find enclosing class node\" logic into a single helper function and reuse it in both `_is_method` and `_get_enclosing_class_name`.\n\n```python\n# Extract shared logic into a single helper:\ndef _find_enclosing_class(func_node: Node) -\u003e Node | None:\n    \"\"\"Return the enclosing class_definition node, or None.\"\"\"\n    parent = func_node.parent\n    if parent is None:\n        return None\n    if (\n        parent.type == \"block\"\n        and parent.parent\n        and parent.parent.type == \"class_definition\"\n    ):\n        return parent.parent\n    if parent.type == \"decorated_definition\":\n        grandparent = parent.parent\n        if (\n            grandparent\n            and grandparent.type == \"block\"\n            and grandparent.parent\n            and grandparent.parent.type == \"class_definition\"\n        ):\n            return grandparent.parent\n    return None\n\n\ndef _is_method(func_node: Node) -\u003e bool:\n    \"\"\"Check if a function_definition node is a method (inside a class).\"\"\"\n    return _find_enclosing_class(func_node) is not None\n\n\ndef _get_enclosing_class_name(func_node: Node) -\u003e str | None:\n    \"\"\"Return the name of the enclosing class, or None.\"\"\"\n    class_node = _find_enclosing_class(func_node)\n    if class_node is None:\n        return None\n    for child in class_node.children:\n        if child.type == \"identifier\":\n            return child.text.decode(\"utf-8\")\n    return None\n```\n","status":"closed","priority":2,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:52:50.214993-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T18:28:02.965395-08:00","closed_at":"2026-02-11T18:28:02.965395-08:00","close_reason":"Closed","labels":["code-review","reviewer:readability"],"dependencies":[{"issue_id":"repoguide-3t0.3","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:52:50.216583-08:00","created_by":"Michael Barrett"}]}
+{"id":"repoguide-3t0.30","title":"Undocumented mutable lifecycle and naming asymmetry in FileInfo/Tag path fields","description":"**File**: /Users/mike/git/repoguide/repoguide/models.py\n**Line(s)**: 38-45\n**Description**: The `FileInfo` dataclass uses a mutable default (`rank: float = 0.0`) and is not frozen, which means instances can be mutated after creation. This is used intentionally in `graph.py:rank_files()` where `fi.rank` is set after construction. However, this mutation pattern is not documented -- the `rank` field has no docstring or comment explaining that it starts at zero and is later populated by the ranking phase. For readers unfamiliar with the pipeline, it is unclear when and why `rank` gets updated.\n\nAdditionally, the `Tag` dataclass (line 27) has a `file` field of type `Path` which stores the absolute path, while `FileInfo.path` stores a relative path. This naming asymmetry (`file` vs `path` for the same concept at different abstraction levels) can confuse readers.\n\n**Suggested Fix**: Add field-level comments documenting the lifecycle of mutable fields, and consider renaming `Tag.file` to `Tag.abs_path` or `FileInfo.path` to `FileInfo.rel_path` to make the distinction explicit.\n\n```python\n@dataclass\nclass FileInfo:\n    \"\"\"Metadata and extracted tags for a single source file.\"\"\"\n\n    path: Path  # Relative to repository root\n    language: str\n    tags: list[Tag] = field(default_factory=list)\n    rank: float = 0.0  # Populated by graph.rank_files() after construction\n\n@dataclass(frozen=True)\nclass Tag:\n    \"\"\"A single symbol occurrence extracted from source code.\"\"\"\n\n    name: str\n    kind: TagKind\n    symbol_kind: SymbolKind\n    line: int\n    file: Path  # Absolute path to source file\n    signature: str = \"\"\n```\n","status":"open","priority":2,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:55:59.919498-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T17:55:59.919498-08:00","labels":["code-review","reviewer:readability"],"dependencies":[{"issue_id":"repoguide-3t0.30","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:55:59.921043-08:00","created_by":"Michael Barrett"}]}
+{"id":"repoguide-3t0.31","title":"TOON encoder does not quote values containing newlines/control chars","description":"**File**: `/Users/mike/git/repoguide/repoguide/toon.py`\n**Line(s)**: 86-113\n**Description**: `_encode_value` has a logic gap in its quoting heuristic. The function checks for values that look numeric (line 104) and returns them unquoted. However, it does not handle values that start with a digit but are not valid numbers (e.g., `\"3abc\"`, `\"123.456.789\"`). These pass the `_LOOKS_NUMERIC` check but fail the regex and fall through to be returned unquoted, which is correct.\n\nHowever, there is a quoting gap: values consisting only of whitespace (e.g., `\"   \"`) will be caught by the `value != value.strip()` check on line 98, which is correct. But a value like `\"\"` (already handled on line 96) vs a value that is `\"0\"` — the numeric regex matches `\"0\"` and returns it unquoted (line 105), meaning `0` will be interpreted as a number rather than a string by TOON consumers. If the intention is to preserve string types, numeric-looking strings should be quoted.\n\nAdditionally, the function does not handle values containing only spaces (these would be caught by the strip check), but does not handle values with embedded newlines — the `_quote` function handles `\\n` escaping, but `_encode_value` does not check for newlines as a reason to quote. A value like `\"hello\\nworld\"` would pass all checks and be returned unquoted, inserting a raw newline into the TOON output and corrupting the format.\n\n**Suggested Fix**: Add an explicit check for control characters (newlines, tabs, etc.) that require quoting:\n\n```python\ndef _encode_value(value: str) -\u003e str:\n    if not value:\n        return '\"\"'\n    if value != value.strip():\n        return _quote(value)\n    if \"\\n\" in value or \"\\r\" in value or \"\\t\" in value:\n        return _quote(value)\n    # ... rest of checks\n```\n","status":"closed","priority":0,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:56:01.118742-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T18:04:55.048089-08:00","closed_at":"2026-02-11T18:04:55.048089-08:00","close_reason":"Closed","labels":["code-review","reviewer:logic"],"dependencies":[{"issue_id":"repoguide-3t0.31","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:56:01.120292-08:00","created_by":"Michael Barrett"}]}
+{"id":"repoguide-3t0.32","title":"Redundant AST parent-chain traversal for method detection","description":"**File**: /Users/mike/git/repoguide/repoguide/parsing.py\n**Line(s)**: 44-83\n**Description**: In `extract_tags`, for every match from the query cursor, the code iterates over all capture names in `match_dict` (line 51). For definition tags that are functions, `_is_method` is called (line 63), and if true, `_get_enclosing_class_name` is called (line 66). Both `_is_method` and `_get_enclosing_class_name` walk the AST parent chain and perform overlapping work -- they both check the same parent/grandparent chain for `class_definition`. When a function is indeed a method, the parent chain is traversed twice doing nearly identical work.\n\n**Suggested Fix**: Combine `_is_method` and `_get_enclosing_class_name` into a single function that returns the class name if the function is a method, or `None` otherwise. This avoids the redundant AST walk.\n\n```python\ndef _get_method_class_name(func_node: Node) -\u003e str | None:\n    \"\"\"If func_node is a method, return its enclosing class name; otherwise None.\"\"\"\n    parent = func_node.parent\n    if parent is None:\n        return None\n\n    class_node = None\n    if (\n        parent.type == \"block\"\n        and parent.parent\n        and parent.parent.type == \"class_definition\"\n    ):\n        class_node = parent.parent\n    elif parent.type == \"decorated_definition\":\n        grandparent = parent.parent\n        if (\n            grandparent\n            and grandparent.type == \"block\"\n            and grandparent.parent\n            and grandparent.parent.type == \"class_definition\"\n        ):\n            class_node = grandparent.parent\n\n    if class_node is None:\n        return None\n\n    for child in class_node.children:\n        if child.type == \"identifier\":\n            return child.text.decode(\"utf-8\")\n    return None\n\n# Usage in extract_tags:\nif tag_kind == TagKind.DEFINITION and symbol_kind == SymbolKind.FUNCTION:\n    class_name = _get_method_class_name(def_node)\n    if class_name:\n        symbol_kind = SymbolKind.METHOD\n        name_text = f\"{class_name}.{name_text}\"\n```\n","status":"closed","priority":3,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:56:02.01911-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T18:00:27.003792-08:00","closed_at":"2026-02-11T18:00:27.003795-08:00","labels":["code-review","reviewer:perf"],"dependencies":[{"issue_id":"repoguide-3t0.32","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:56:02.020757-08:00","created_by":"Michael Barrett"}],"comments":[{"id":10,"issue_id":"repoguide-3t0.32","author":"Michael Barrett","text":"Duplicate of repoguide-3t0.3 -- same duplicated _is_method/_get_enclosing_class_name logic in parsing.py","created_at":"2026-02-12T02:00:26Z"}]}
+{"id":"repoguide-3t0.33","title":"queries/__init__.py missing from __future__ import annotations","description":"**File**: /Users/mike/git/repoguide/repoguide/queries/__init__.py\n**Line(s)**: 1\n**Description**: The `queries/__init__.py` module has a docstring but does not include `from __future__ import annotations` as the second line. Per the project's Python conventions (CLAUDE.md), every source module should start with the module docstring followed by `from __future__ import annotations`. While this module currently has no type annotations, adding the import maintains consistency and prevents future omissions if annotations are added later.\n\n**Suggested Fix**: Add `from __future__ import annotations` after the docstring.\n\n```python\n\"\"\"Tree-sitter query files for language-specific symbol extraction.\"\"\"\n\nfrom __future__ import annotations\n```","status":"open","priority":3,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:56:13.318639-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T17:56:13.318639-08:00","labels":["code-review","reviewer:readability"],"dependencies":[{"issue_id":"repoguide-3t0.33","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:56:13.319984-08:00","created_by":"Michael Barrett"}]}
+{"id":"repoguide-3t0.34","title":"Space-joined dependency symbols are ambiguous","description":"**File**: `/Users/mike/git/repoguide/repoguide/toon.py`\n**Line(s)**: 52-54\n**Description**: Dependency symbols are joined with a space (`\" \".join(d.symbols)`) in the TOON output. If any symbol name itself contains a space (unlikely for identifiers, but possible with qualified names or edge cases), the resulting value would be ambiguous — consumers cannot distinguish between two symbols `[\"foo bar\", \"baz\"]` and three symbols `[\"foo\", \"bar\", \"baz\"]` since both produce `\"foo bar baz\"`.\n\n**Suggested Fix**: Use a comma separator or quote individual symbols, or validate that symbols cannot contain spaces:\n\n```python\ndep_rows = [\n    [str(d.source), str(d.target), \",\".join(d.symbols)]\n    for d in repo_map.dependencies\n]\n```\n","status":"open","priority":3,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:56:15.278631-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T17:56:15.278631-08:00","labels":["code-review","reviewer:logic"],"dependencies":[{"issue_id":"repoguide-3t0.34","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:56:15.280065-08:00","created_by":"Michael Barrett"}]}
+{"id":"repoguide-3t0.35","title":"Typer CLI options use PEP 604 syntax despite documented caution","description":"**File**: `/Users/mike/git/repoguide/repoguide/cli.py`\n**Line(s)**: 47-53\n**Description**: The `max_files` parameter uses `int | None` type annotation with Typer. Typer with PEP 604 union syntax (`int | None`) may not work correctly in all Typer versions because Typer inspects annotations at runtime. The project's own CLAUDE.md conventions note: \"frameworks that inspect annotations at runtime (e.g., Typer, Pydantic v1) may require `Optional[str]` instead of PEP 604 syntax.\"\n\nWhile `from __future__ import annotations` defers evaluation (making it a string), Typer typically uses `get_type_hints()` which evaluates the annotation, and on Python 3.13+ the PEP 604 syntax is natively supported. This may work on the target Python 3.13+, but it deviates from the project's own documented caution.\n\n**Suggested Fix**: Either verify Typer works correctly with `int | None` on the target Python version and document that the CLAUDE.md caveat does not apply here, or use `Optional[int]` for consistency with the project's stated convention.\n","status":"closed","priority":3,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:56:26.438501-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T18:00:07.967457-08:00","closed_at":"2026-02-11T18:00:07.96746-08:00","labels":["code-review","reviewer:logic"],"dependencies":[{"issue_id":"repoguide-3t0.35","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:56:26.440035-08:00","created_by":"Michael Barrett"}],"comments":[{"id":7,"issue_id":"repoguide-3t0.35","author":"Michael Barrett","text":"Duplicate of repoguide-3t0.36 -- same Typer PEP 604 annotation issue in cli.py","created_at":"2026-02-12T02:00:07Z"}]}
+{"id":"repoguide-3t0.36","title":"Typer annotations use PEP 604 syntax despite project convention warning against it","description":"**File**: /Users/mike/git/repoguide/repoguide/cli.py\n**Line(s)**: 46-67\n**Description**: The `main` function uses PEP 604 union syntax (`int | None`, `str | None`, `Path | None`) in its Typer option annotations. The project's own CLAUDE.md conventions explicitly warn: \"frameworks that inspect annotations at runtime (e.g., Typer, Pydantic v1) may require `Optional[str]` instead of PEP 604 syntax.\" Typer does inspect annotations at runtime to determine CLI option types.\n\nWhile this works in Python 3.13 with `from __future__ import annotations` (which defers annotation evaluation), it creates a subtle dependency on the `__future__` import. If that import were ever removed, or if a code reorganization moved these annotations into a context where they are evaluated eagerly, the CLI would break at import time. Using `Optional[X]` for Typer annotations would be more defensive and consistent with the project's own documented guidance.\n\n**Suggested Fix**: Use `Optional[int]`, `Optional[str]`, and `Optional[Path]` for the Typer option type annotations to align with the project's stated convention for runtime-inspected frameworks.\n\n```python\nfrom typing import Annotated, Optional\n\n# ...\n\nmax_files: Annotated[\n    Optional[int],\n    typer.Option(\"--max-files\", \"-n\", help=\"Maximum number of files to include.\"),\n] = None,\nlanguage: Annotated[\n    Optional[str],\n    typer.Option(\"--language\", \"-l\", help=\"Restrict to a specific language.\"),\n] = None,\ncache: Annotated[\n    Optional[Path],\n    typer.Option(\"--cache\", help=\"Cache file; reuse if newer than all source files.\"),\n] = None,\n```\n","status":"open","priority":2,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:56:28.196345-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T17:56:28.196345-08:00","labels":["code-review","reviewer:readability"],"dependencies":[{"issue_id":"repoguide-3t0.36","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:56:28.197828-08:00","created_by":"Michael Barrett"}]}
+{"id":"repoguide-3t0.37","title":"max_files=0 or negative values not validated","description":"**File**: `/Users/mike/git/repoguide/repoguide/ranking.py`\n**Line(s)**: 22-23\n**Description**: `select_files` checks `max_files \u003e= len(repo_map.files)` and returns early, but does not validate that `max_files` is a positive integer. If `max_files=0` is passed, the function will return `repo_map.files[:0]` which is an empty list, and the CLI will still produce TOON output with zero files. If `max_files` is negative, `repo_map.files[:negative]` will slice from the end, returning all but the last N files — which is almost certainly not the user's intent.\n\n**Suggested Fix**: Validate the input or clamp it:\n\n```python\ndef select_files(\n    repo_map: RepoMap,\n    *,\n    max_files: int | None = None,\n) -\u003e RepoMap:\n    if max_files is not None and max_files \u003c 1:\n        msg = f\"max_files must be positive, got {max_files}\"\n        raise ValueError(msg)\n    # ...\n```\n\nAlternatively, Typer can enforce `min=1` on the CLI option.\n","status":"closed","priority":2,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:56:35.904585-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T18:28:03.074994-08:00","closed_at":"2026-02-11T18:28:03.074994-08:00","close_reason":"Closed","labels":["code-review","reviewer:logic"],"dependencies":[{"issue_id":"repoguide-3t0.37","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:56:35.906199-08:00","created_by":"Michael Barrett"}]}
+{"id":"repoguide-3t0.38","title":"rank_files both mutates input and returns it, causing ambiguity","description":"**File**: `/Users/mike/git/repoguide/repoguide/models.py`\n**Line(s)**: 38-45\n**Description**: `FileInfo` is a mutable dataclass with a mutable `tags` list and a mutable `rank` float. It is used as a graph node (its `path` field is used as a dict key in `graph.py` line 78: `ranks.get(fi.path, 0.0)`), and the `file_infos` list is mutated in-place across multiple functions (`rank_files` sorts it, modifies `rank`). This shared mutable state makes the data flow harder to reason about.\n\nMore concretely, `rank_files` in `graph.py` modifies `fi.rank` in-place AND sorts the list AND returns it. The caller in `cli.py` reassigns: `file_infos = rank_files(graph, file_infos)`. Since the function both mutates and returns, the reassignment is redundant but not harmful. However, this pattern is confusing — does the caller need the return value or not? A reader might remove the reassignment thinking it is unnecessary, or might assume the function returns a new list.\n\n**Suggested Fix**: Choose one pattern: either return a new sorted list (functional style) or mutate in place and return `None` (imperative style). Mixing both is a maintenance hazard. Recommend the functional approach:\n\n```python\ndef rank_files(\n    graph: nx.MultiDiGraph,\n    file_infos: list[FileInfo],\n) -\u003e list[FileInfo]:\n    \"\"\"Returns a new list sorted by rank, without mutating the input.\"\"\"\n    ranked = []\n    if graph.number_of_edges() == 0:\n        uniform = 1.0 / max(len(file_infos), 1)\n        ranked = [FileInfo(path=fi.path, language=fi.language, tags=fi.tags, rank=uniform) for fi in file_infos]\n    else:\n        ranks = nx.pagerank(graph, alpha=0.85)\n        ranked = [FileInfo(path=fi.path, language=fi.language, tags=fi.tags, rank=ranks.get(fi.path, 0.0)) for fi in file_infos]\n    ranked.sort(key=lambda fi: fi.rank, reverse=True)\n    return ranked\n```\n","status":"closed","priority":2,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:56:52.480579-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T18:28:03.019651-08:00","closed_at":"2026-02-11T18:28:03.019651-08:00","close_reason":"Closed","labels":["code-review","reviewer:logic"],"dependencies":[{"issue_id":"repoguide-3t0.38","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:56:52.482277-08:00","created_by":"Michael Barrett"}]}
+{"id":"repoguide-3t0.39","title":"EXTENSION_MAP and LANGUAGES duplicated extension data","description":"**File**: `/Users/mike/git/repoguide/repoguide/languages.py`\n**Line(s)**: 15-17, 44-46\n**Description**: `EXTENSION_MAP` and `LANGUAGES` are maintained as separate dictionaries that must be kept in sync manually. `EXTENSION_MAP` maps extensions to language names, and `LANGUAGES` maps language names to `TreeSitterLanguage` objects (which also contain their extensions in a tuple). This creates a data duplication problem: the extension `.py` is listed in both `EXTENSION_MAP` and in the `TreeSitterLanguage(extensions=(\".py\",))` tuple.\n\nWhen adding a new language, a developer must update both dictionaries and ensure they match. If they only update `LANGUAGES`, `language_for_extension` will not find the new language. If they only update `EXTENSION_MAP`, the language object will have incomplete extension metadata.\n\n**Suggested Fix**: Derive `EXTENSION_MAP` from `LANGUAGES` automatically:\n\n```python\nEXTENSION_MAP: dict[str, str] = {}\nfor _name, _lang in LANGUAGES.items():\n    for _ext in _lang.extensions:\n        EXTENSION_MAP[_ext] = _name\n```\n\nOr build it as a module-level computed dict:\n\n```python\nEXTENSION_MAP: dict[str, str] = {\n    ext: lang.name\n    for lang in LANGUAGES.values()\n    for ext in lang.extensions\n}\n```\n","status":"open","priority":2,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:57:04.33388-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T17:57:04.33388-08:00","labels":["code-review","reviewer:logic"],"dependencies":[{"issue_id":"repoguide-3t0.39","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:57:04.335269-08:00","created_by":"Michael Barrett"}]}
+{"id":"repoguide-3t0.4","title":"Overly broad exception handler masks parsing errors","description":"**File**: /Users/mike/git/repoguide/repoguide/cli.py\n**Line(s)**: 93-95\n**Description**: The broad `except Exception` clause on line 93 catches and suppresses all exceptions during file parsing, logging only a brief message to stderr. While this prevents the tool from crashing on malformed files, it also silently swallows potentially important errors such as `PermissionError`, `MemoryError`, or unexpected tree-sitter crashes that could indicate a malicious or adversarial input file crafted to exploit tree-sitter parsing vulnerabilities. This makes debugging harder and could mask security-relevant failures.\n\n**Suggested Fix**: Narrow the exception handler to catch only expected parsing failures (e.g., `UnicodeDecodeError`, `SyntaxError`, or tree-sitter-specific errors), and let unexpected errors propagate. At minimum, log the full exception type.\n\n```python\n# Current code\ntry:\n    tags = extract_tags(abs_path, lang_config)\nexcept Exception as exc:\n    print(f\"Warning: failed to parse {rel_path}: {exc}\", file=sys.stderr)\n    continue\n\n# Suggested fix\ntry:\n    tags = extract_tags(abs_path, lang_config)\nexcept (UnicodeDecodeError, ValueError, OSError) as exc:\n    print(f\"Warning: failed to parse {rel_path}: {type(exc).__name__}: {exc}\", file=sys.stderr)\n    continue\n```\n","status":"closed","priority":3,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:52:53.670294-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T17:59:29.565766-08:00","closed_at":"2026-02-11T17:59:29.56577-08:00","labels":["code-review","reviewer:security"],"dependencies":[{"issue_id":"repoguide-3t0.4","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:52:53.672016-08:00","created_by":"Michael Barrett"}],"comments":[{"id":2,"issue_id":"repoguide-3t0.4","author":"Michael Barrett","text":"Duplicate of repoguide-3t0.5 -- same broad except Exception issue in cli.py lines 91-95","created_at":"2026-02-12T01:59:29Z"}]}
+{"id":"repoguide-3t0.40","title":"No guard against empty nodes list in match_dict captures","description":"**File**: `/Users/mike/git/repoguide/repoguide/parsing.py`\n**Line(s)**: 44-48\n**Description**: The code accesses `match_dict.get(\"name\", [])` and then checks `if not name_nodes: continue`. If the tree-sitter query produces a match where `@name` captures multiple nodes, only `name_nodes[0]` is used (line 48). The code does not log or warn when multiple name nodes are captured, which could silently pick the wrong name in edge cases.\n\nMore importantly, `nodes[0]` is accessed on line 58 (`def_node = nodes[0]`) without checking if `nodes` is non-empty. While tree-sitter queries should always populate capture groups with at least one node, a defensive check or assertion would prevent a potential `IndexError` from a malformed query.\n\n**Suggested Fix**: Add a guard or assertion:\n\n```python\ndef_node = nodes[0] if nodes else None\nif def_node is None:\n    continue\n```\n","status":"open","priority":2,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:57:15.279953-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T17:57:15.279953-08:00","labels":["code-review","reviewer:logic"],"dependencies":[{"issue_id":"repoguide-3t0.40","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:57:15.281376-08:00","created_by":"Michael Barrett"}]}
+{"id":"repoguide-3t0.5","title":"Broad except Exception swallows real bugs in extract_tags","description":"**File**: `/Users/mike/git/repoguide/repoguide/cli.py`\n**Line(s)**: 91-95\n**Description**: The broad `except Exception` catch around `extract_tags` silently swallows all errors and continues processing. While a warning is printed to stderr, this defensive pattern masks real bugs in the parsing logic. For example, if `extract_tags` has a programming error (e.g., `AttributeError`, `TypeError`), it will be quietly logged as a \"parse failure\" rather than surfacing as a crash. This makes debugging significantly harder in production.\n\nThe warning is also printed with `print()` instead of using the `typer.echo(err=True)` pattern used elsewhere in the same function, which is inconsistent.\n\n**Suggested Fix**: Narrow the exception handling to only catch expected parsing errors, and use consistent error output:\n\n```python\n# Narrow to expected tree-sitter/IO errors only\ntry:\n    tags = extract_tags(abs_path, lang_config)\nexcept (OSError, UnicodeDecodeError) as exc:\n    typer.echo(f\"Warning: failed to parse {rel_path}: {exc}\", err=True)\n    continue\n```\n\nIf broader exception handling is truly needed (e.g., tree-sitter can raise various errors), at minimum log the exception type so the swallowed error is diagnosable.\n","status":"closed","priority":1,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:53:01.950199-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T18:04:55.05636-08:00","closed_at":"2026-02-11T18:04:55.05636-08:00","close_reason":"Closed","labels":["code-review","reviewer:logic"],"dependencies":[{"issue_id":"repoguide-3t0.5","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:53:01.951706-08:00","created_by":"Michael Barrett"}]}
+{"id":"repoguide-3t0.6","title":"Inline import of re module in _collapse_whitespace is inconsistent with codebase style","description":"**File**: /Users/mike/git/repoguide/repoguide/parsing.py\n**Line(s)**: 187-191\n**Description**: The `_collapse_whitespace` function imports `re` inside the function body rather than at the module level. This is inconsistent with the rest of the codebase -- for example, `toon.py` imports `re` at the top of the module. Deferred imports are appropriate for heavy or optional dependencies, but `re` is a stdlib module that is already imported elsewhere in the project.\n\n**Suggested Fix**: Move `import re` to the module-level imports, alongside the other stdlib imports at the top of the file.\n\n```python\n# At the top of parsing.py, add re to module-level imports:\nfrom __future__ import annotations\n\nimport re\nfrom pathlib import Path\n# ... rest of imports\n\n# Then simplify the function:\ndef _collapse_whitespace(text: str) -\u003e str:\n    \"\"\"Collapse multi-line whitespace into single spaces.\"\"\"\n    return re.sub(r\"\\s+\", \" \", text)\n```\n","status":"closed","priority":3,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:53:07.164387-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T17:59:52.619154-08:00","closed_at":"2026-02-11T17:59:52.619157-08:00","labels":["code-review","reviewer:readability"],"dependencies":[{"issue_id":"repoguide-3t0.6","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:53:07.165766-08:00","created_by":"Michael Barrett"}],"comments":[{"id":5,"issue_id":"repoguide-3t0.6","author":"Michael Barrett","text":"Duplicate of repoguide-3t0.16 -- same inline re import in _collapse_whitespace","created_at":"2026-02-12T01:59:52Z"}]}
+{"id":"repoguide-3t0.7","title":"Parser and query recompiled on every file parse - missing cache","description":"**File**: /Users/mike/git/repoguide/repoguide/languages.py\n**Line(s)**: 35-41\n**Description**: `TreeSitterLanguage.get_tag_query()` reloads the `.scm` file from disk and recompiles the query every time it is called. In `cli.py`, `extract_tags()` is called once per file (line 92), and each call invokes `language.get_tag_query()` (parsing.py line 39). For a repository with hundreds or thousands of Python files, this means the same query file is read from disk and compiled hundreds or thousands of times, creating unnecessary I/O and CPU overhead.\n\nSimilarly, `get_parser()` creates a new parser instance per file. While lighter, this is also redundant.\n\n**Suggested Fix**: Cache the compiled query and parser per language instance. Since `TreeSitterLanguage` is frozen, use `functools.lru_cache` or a module-level cache dict.\n\n```python\nimport functools\n\n@dataclass(frozen=True)\nclass TreeSitterLanguage:\n    \"\"\"A tree-sitter language with its tag query.\"\"\"\n\n    name: str\n    extensions: tuple[str, ...]\n\n    def get_language(self) -\u003e Language:\n        \"\"\"Get the tree-sitter Language object.\"\"\"\n        return get_language(self.name)\n\n    @functools.lru_cache(maxsize=1)\n    def get_parser(self) -\u003e Parser:\n        \"\"\"Get a configured tree-sitter Parser (cached).\"\"\"\n        return get_parser(self.name)\n\n    @functools.lru_cache(maxsize=1)\n    def get_tag_query(self) -\u003e Query:\n        \"\"\"Load and compile the tag query for this language (cached).\"\"\"\n        from tree_sitter import Query as TSQuery\n\n        scm_text = _load_query_file(self.name)\n        lang = self.get_language()\n        return TSQuery(lang, scm_text)\n```\n\nNote: Since the dataclass is frozen, `self` is hashable and `lru_cache` works directly. Each `TreeSitterLanguage` instance will cache its own query and parser after the first call.\n","status":"closed","priority":1,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:53:11.278435-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T18:07:54.315528-08:00","closed_at":"2026-02-11T18:07:54.315528-08:00","close_reason":"Closed","labels":["code-review","reviewer:perf"],"dependencies":[{"issue_id":"repoguide-3t0.7","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:53:11.279807-08:00","created_by":"Michael Barrett"}]}
+{"id":"repoguide-3t0.8","title":"Symlink traversal may expose files outside repository root","description":"**File**: /Users/mike/git/repoguide/repoguide/discovery.py\n**Line(s)**: 57-58\n**Description**: The `discover_files` function uses `root.rglob(\"*\")` to walk the directory tree, and checks `path.is_file()` on line 58. However, it does not check for or exclude symbolic links. A symlink could point outside the repository root to sensitive files elsewhere on the filesystem (e.g., `/etc/passwd`, private keys, etc.), causing the tool to parse and include information about files the user may not intend to expose. While `rglob` by default follows symlinks in Python's `pathlib`, the lack of a symlink check means a repository containing a malicious symlink could cause information about arbitrary files to appear in the output.\n\nAdditionally, a symlink loop could cause `rglob` to recurse indefinitely (though Python 3.13+ handles this better).\n\n**Suggested Fix**: Add a symlink check to exclude symlinked files, or at minimum resolve paths and verify they remain under the repository root.\n\n```python\n# Add after line 58\nfor path in sorted(root.rglob(\"*\")):\n    if not path.is_file():\n        continue\n\n    # Suggested addition: skip symlinks\n    if path.is_symlink():\n        continue\n\n    rel = path.relative_to(root)\n    # ... rest of function\n```\n\nAlternatively, resolve the path and verify it is still under root:\n\n```python\nresolved = path.resolve()\nif not resolved.is_relative_to(root.resolve()):\n    continue\n```\n","status":"closed","priority":2,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:53:13.503675-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T18:17:46.244486-08:00","closed_at":"2026-02-11T18:17:46.244486-08:00","close_reason":"Closed","labels":["code-review","reviewer:security"],"dependencies":[{"issue_id":"repoguide-3t0.8","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:53:13.505192-08:00","created_by":"Michael Barrett"}]}
+{"id":"repoguide-3t0.9","title":"Subtle inconsistency in cache write vs stdout newline handling","description":"**File**: `/Users/mike/git/repoguide/repoguide/cli.py`\n**Line(s)**: 114-117\n**Description**: The TOON output is written with `typer.echo(output)` which appends a trailing newline. However, when caching, the code writes `output + \"\\n\"` (line 116). If `encode()` already returns content with a trailing newline, the cache will have a double newline. Conversely, the `encode()` docstring explicitly says \"no trailing newline\", so `typer.echo(output)` adds a newline that the cache also manually adds.\n\nThe real issue: when the cache is served on line 84, `typer.echo(cache.read_text(\"utf-8\"), nl=False)` is used with `nl=False`, meaning the cached content is printed verbatim (with its explicit `\\n`). But the non-cached path uses `typer.echo(output)` which adds its own newline. This means both paths produce the same stdout output, but the cached file has a trailing newline that `encode()` does not produce. This inconsistency could cause issues if the cache file is consumed by another tool.\n\n**Suggested Fix**: Be consistent — either always let `typer.echo` add the newline, or store the exact output:\n\n```python\noutput = encode(repo_map)\nif cache:\n    cache.write_text(output + \"\\n\", \"utf-8\")\ntyper.echo(output)\n```\n\nThis is currently correct in behavior, but the comment and approach should be documented to prevent future drift. Consider storing exactly what `typer.echo` would produce, or using `nl=False` consistently.\n","status":"open","priority":3,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T17:53:22.085657-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T17:53:22.085657-08:00","labels":["code-review","reviewer:logic"],"dependencies":[{"issue_id":"repoguide-3t0.9","depends_on_id":"repoguide-3t0","type":"parent-child","created_at":"2026-02-11T17:53:22.087204-08:00","created_by":"Michael Barrett"}]}
+{"id":"repoguide-lz0","title":"Write README.md","description":"Replace empty README with full documentation: what it does, installation, usage, example output, Claude Code hook integration, TOON format explanation, architecture, supported languages, and dev commands.","status":"closed","priority":2,"issue_type":"task","owner":"loki77@gmail.com","created_at":"2026-02-11T18:40:00.673513-08:00","created_by":"Michael Barrett","updated_at":"2026-02-11T18:40:04.824794-08:00","closed_at":"2026-02-11T18:40:04.824794-08:00","close_reason":"README written with all planned sections"}

sourcecrumb-0.1.0/.beads/beads.left.meta.json ADDED Viewed

	@@ -0,0 +1 @@
1	+ {"version":"0.49.4","timestamp":"2026-02-12T18:50:08.476107-08:00","commit":"8ad6259"}

sourcecrumb-0.1.0/.beads/config.yaml ADDED Viewed

@@ -0,0 +1,67 @@
+# Beads Configuration File
+# This file configures default behavior for all bd commands in this repository
+# All settings can also be set via environment variables (BD_* prefix)
+# or overridden with command-line flags
+# Issue prefix for this repository (used by bd init)
+# If not set, bd init will auto-detect from directory name
+# Example: issue-prefix: "myproject" creates issues like "myproject-1", "myproject-2", etc.
+# issue-prefix: ""
+# Use no-db mode: load from JSONL, no SQLite, write back after each command
+# When true, bd will use .beads/issues.jsonl as the source of truth
+# instead of SQLite database
+# no-db: false
+# Disable daemon for RPC communication (forces direct database access)
+# no-daemon: false
+# Disable auto-flush of database to JSONL after mutations
+# no-auto-flush: false
+# Disable auto-import from JSONL when it's newer than database
+# no-auto-import: false
+# Enable JSON output by default
+# json: false
+# Default actor for audit trails (overridden by BD_ACTOR or --actor)
+# actor: ""
+# Path to database (overridden by BEADS_DB or --db)
+# db: ""
+# Auto-start daemon if not running (can also use BEADS_AUTO_START_DAEMON)
+# auto-start-daemon: true
+# Debounce interval for auto-flush (can also use BEADS_FLUSH_DEBOUNCE)
+# flush-debounce: "5s"
+# Export events (audit trail) to .beads/events.jsonl on each flush/sync
+# When enabled, new events are appended incrementally using a high-water mark.
+# Use 'bd export --events' to trigger manually regardless of this setting.
+# events-export: false
+# Git branch for beads commits (bd sync will commit to this branch)
+# IMPORTANT: Set this for team projects so all clones use the same sync branch.
+# This setting persists across clones (unlike database config which is gitignored).
+# Can also use BEADS_SYNC_BRANCH env var for local override.
+# If not set, bd sync will require you to run 'bd config set sync.branch <branch>'.
+# sync-branch: "beads-sync"
+# Multi-repo configuration (experimental - bd-307)
+# Allows hydrating from multiple repositories and routing writes to the correct JSONL
+# repos:
+#   primary: "."  # Primary repo (where this database lives)
+#   additional:   # Additional repos to hydrate from (read-only)
+#     - ~/beads-planning  # Personal planning repo
+#     - ~/work-planning   # Work planning repo
+# Integration settings (access with 'bd config get/set')
+# These are stored in the database, not in this file:
+# - jira.url
+# - jira.project
+# - linear.url
+# - linear.api-key
+# - github.org
+# - github.repo

sourcecrumb-0.1.0/.beads/daemon.lock ADDED Viewed

@@ -0,0 +1,7 @@
+{
+  "pid": 93295,
+  "parent_pid": 93286,
+  "database": "/Users/mike/git/sourcecrumb/.beads/beads.db",
+  "version": "0.49.4",
+  "started_at": "2026-02-13T02:50:08.303363Z"
+}