wikifier 4.1.1__tar.gz → 4.1.2__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (48) hide show
  1. {wikifier-4.1.1/wikifier.egg-info → wikifier-4.1.2}/PKG-INFO +10 -1
  2. {wikifier-4.1.1 → wikifier-4.1.2}/README.md +9 -0
  3. {wikifier-4.1.1 → wikifier-4.1.2}/pyproject.toml +1 -1
  4. {wikifier-4.1.1 → wikifier-4.1.2}/wikifier/__init__.py +1 -1
  5. {wikifier-4.1.1 → wikifier-4.1.2}/wikifier/cli.py +67 -10
  6. {wikifier-4.1.1 → wikifier-4.1.2}/wikifier/import_cache.py +37 -8
  7. {wikifier-4.1.1 → wikifier-4.1.2}/wikifier/parsers/python.py +11 -4
  8. {wikifier-4.1.1 → wikifier-4.1.2/wikifier.egg-info}/PKG-INFO +10 -1
  9. {wikifier-4.1.1 → wikifier-4.1.2}/CONTRIBUTING.md +0 -0
  10. {wikifier-4.1.1 → wikifier-4.1.2}/LICENSE +0 -0
  11. {wikifier-4.1.1 → wikifier-4.1.2}/MANIFEST.in +0 -0
  12. {wikifier-4.1.1 → wikifier-4.1.2}/diagnostics.html +0 -0
  13. {wikifier-4.1.1 → wikifier-4.1.2}/docs/Basis-v0.3.md +0 -0
  14. {wikifier-4.1.1 → wikifier-4.1.2}/docs/RELEASE_NOTES.md +0 -0
  15. {wikifier-4.1.1 → wikifier-4.1.2}/docs/TRADEOFFS.md +0 -0
  16. {wikifier-4.1.1 → wikifier-4.1.2}/docs/spec.md +0 -0
  17. {wikifier-4.1.1 → wikifier-4.1.2}/docs/v0.4-Execution-Plan.md +0 -0
  18. {wikifier-4.1.1 → wikifier-4.1.2}/docs/v0.4-execution-plan.md +0 -0
  19. {wikifier-4.1.1 → wikifier-4.1.2}/index.html +0 -0
  20. {wikifier-4.1.1 → wikifier-4.1.2}/setup.cfg +0 -0
  21. {wikifier-4.1.1 → wikifier-4.1.2}/skills/run.md +0 -0
  22. {wikifier-4.1.1 → wikifier-4.1.2}/wikifier/__main__.py +0 -0
  23. {wikifier-4.1.1 → wikifier-4.1.2}/wikifier/contracts.py +0 -0
  24. {wikifier-4.1.1 → wikifier-4.1.2}/wikifier/daemon.py +0 -0
  25. {wikifier-4.1.1 → wikifier-4.1.2}/wikifier/diagnostics.py +0 -0
  26. {wikifier-4.1.1 → wikifier-4.1.2}/wikifier/gap1_validation_harness.py +0 -0
  27. {wikifier-4.1.1 → wikifier-4.1.2}/wikifier/health.py +0 -0
  28. {wikifier-4.1.1 → wikifier-4.1.2}/wikifier/locking.py +0 -0
  29. {wikifier-4.1.1 → wikifier-4.1.2}/wikifier/mcp/__init__.py +0 -0
  30. {wikifier-4.1.1 → wikifier-4.1.2}/wikifier/mcp/server.py +0 -0
  31. {wikifier-4.1.1 → wikifier-4.1.2}/wikifier/parsers/__init__.py +0 -0
  32. {wikifier-4.1.1 → wikifier-4.1.2}/wikifier/parsers/bree.py +0 -0
  33. {wikifier-4.1.1 → wikifier-4.1.2}/wikifier/parsers/cdia.py +0 -0
  34. {wikifier-4.1.1 → wikifier-4.1.2}/wikifier/parsers/javascript.py +0 -0
  35. {wikifier-4.1.1 → wikifier-4.1.2}/wikifier/resolution.py +0 -0
  36. {wikifier-4.1.1 → wikifier-4.1.2}/wikifier/scripts/exclude_patterns.txt +0 -0
  37. {wikifier-4.1.1 → wikifier-4.1.2}/wikifier/scripts/file_health.md +0 -0
  38. {wikifier-4.1.1 → wikifier-4.1.2}/wikifier/scripts/library.md +0 -0
  39. {wikifier-4.1.1 → wikifier-4.1.2}/wikifier/scripts/monitored_paths.txt +0 -0
  40. {wikifier-4.1.1 → wikifier-4.1.2}/wikifier/scripts/pending_updates.md +0 -0
  41. {wikifier-4.1.1 → wikifier-4.1.2}/wikifier/scripts/wikifier.bat +0 -0
  42. {wikifier-4.1.1 → wikifier-4.1.2}/wikifier/scripts/wikifier.ps1 +0 -0
  43. {wikifier-4.1.1 → wikifier-4.1.2}/wikifier/scripts/wikifier.sh +0 -0
  44. {wikifier-4.1.1 → wikifier-4.1.2}/wikifier.egg-info/SOURCES.txt +0 -0
  45. {wikifier-4.1.1 → wikifier-4.1.2}/wikifier.egg-info/dependency_links.txt +0 -0
  46. {wikifier-4.1.1 → wikifier-4.1.2}/wikifier.egg-info/entry_points.txt +0 -0
  47. {wikifier-4.1.1 → wikifier-4.1.2}/wikifier.egg-info/requires.txt +0 -0
  48. {wikifier-4.1.1 → wikifier-4.1.2}/wikifier.egg-info/top_level.txt +0 -0
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: wikifier
3
- Version: 4.1.1
3
+ Version: 4.1.2
4
4
  Summary: Agent-first, zero-dependency, self-maintaining codebase documentation & change tracking system
5
5
  Author-email: Aron Amos <aron@example.com>
6
6
  Maintainer: Aron Amos
@@ -97,6 +97,15 @@ See the final synth in `Findings/M5-Dogfood-Progress.md` and full Assessment-Rep
97
97
  - Synced descriptions in README.md, `skills/run.md`, `wikifier/mcp/README.md`.
98
98
  - All under protocol (FRESH, record-change + mark-green with subid, main clean). Version to 4.1.1. See the separation-fix commit.
99
99
 
100
+ **v4.1.2 (2026-06)**: Very minor patch for mapping / update speed hygiene (no new behaviour or scope; pure internal improvements for large projects).
101
+
102
+ - Faster candidate collection everywhere that drives check-changes, monitor, and update-maps (Python-primary paths): switched to `os.scandir`-based recursive scan (std lib, avoids walk overhead) + git `ls-files --cached --others --exclude-standard` fast-path when in a git repo (dramatically faster on real monorepos; falls back cleanly).
103
+ - Consistent early pruning: `exclude_patterns.txt` (user-editable, populated by init) now applied more broadly during mapping walks in both sh and Python collectors (venvs, caches, site-packages, etc. stop descent sooner).
104
+ - Small parser micro-opt: hoisted common regex compiles (docstring strip, dynamic import detectors) to module level in the Python parser (hot path on dirty files during maps).
105
+ - Minor sh-side note + skeleton for git fast collection in traditional update-maps path (real wins already flow through the Python collectors used by check-changes/streaming/lib/MCP).
106
+ - All changes FRESH-checked + recorded+marked under subid=mapping-speed-hygiene. Complements existing levers (monitored_paths.txt narrowing, --dir / directory= scoping, --stream / --max-files / --max-time streaming+budgets, python-primary, incremental dirty + BRC reverse index).
107
+ - Version to 4.1.2. See the speed-hygiene edits + this commit.
108
+
100
109
  ### Historical (pre-v4.0)
101
110
 
102
111
  #### What's New in v0.3.3 (Gap #1)
@@ -65,6 +65,15 @@ See the final synth in `Findings/M5-Dogfood-Progress.md` and full Assessment-Rep
65
65
  - Synced descriptions in README.md, `skills/run.md`, `wikifier/mcp/README.md`.
66
66
  - All under protocol (FRESH, record-change + mark-green with subid, main clean). Version to 4.1.1. See the separation-fix commit.
67
67
 
68
+ **v4.1.2 (2026-06)**: Very minor patch for mapping / update speed hygiene (no new behaviour or scope; pure internal improvements for large projects).
69
+
70
+ - Faster candidate collection everywhere that drives check-changes, monitor, and update-maps (Python-primary paths): switched to `os.scandir`-based recursive scan (std lib, avoids walk overhead) + git `ls-files --cached --others --exclude-standard` fast-path when in a git repo (dramatically faster on real monorepos; falls back cleanly).
71
+ - Consistent early pruning: `exclude_patterns.txt` (user-editable, populated by init) now applied more broadly during mapping walks in both sh and Python collectors (venvs, caches, site-packages, etc. stop descent sooner).
72
+ - Small parser micro-opt: hoisted common regex compiles (docstring strip, dynamic import detectors) to module level in the Python parser (hot path on dirty files during maps).
73
+ - Minor sh-side note + skeleton for git fast collection in traditional update-maps path (real wins already flow through the Python collectors used by check-changes/streaming/lib/MCP).
74
+ - All changes FRESH-checked + recorded+marked under subid=mapping-speed-hygiene. Complements existing levers (monitored_paths.txt narrowing, --dir / directory= scoping, --stream / --max-files / --max-time streaming+budgets, python-primary, incremental dirty + BRC reverse index).
75
+ - Version to 4.1.2. See the speed-hygiene edits + this commit.
76
+
68
77
  ### Historical (pre-v4.0)
69
78
 
70
79
  #### What's New in v0.3.3 (Gap #1)
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
4
4
 
5
5
  [project]
6
6
  name = "wikifier"
7
- version = "4.1.1"
7
+ version = "4.1.2"
8
8
  description = "Agent-first, zero-dependency, self-maintaining codebase documentation & change tracking system"
9
9
  readme = "README.md"
10
10
  license = {text = "MIT"}
@@ -62,4 +62,4 @@ from .contracts import (
62
62
  compute_acs_confidence,
63
63
  )
64
64
 
65
- __version__ = "4.1.1"
65
+ __version__ = "4.1.2"
@@ -172,18 +172,75 @@ def _collect_candidate_source_files(root: Path) -> List[Path]:
172
172
  root = Path(root).resolve()
173
173
  except Exception:
174
174
  root = Path(root)
175
+ # Also respect the project's exclude_patterns.txt (if present) for parity with sh
176
+ # mapping paths and check-changes. Simple dir-name globs only for pruning speed.
177
+ # This makes python-primary update-maps benefit from user custom excludes (venvs etc)
178
+ # without any behavior change.
179
+ # Look relative to explicit WIKIFIER_PROJECT_ROOT (if set for the target) or the
180
+ # passed root; excludes live at the logical project root, not arbitrary monitored subdirs.
181
+ ep_root = Path(os.environ.get("WIKIFIER_PROJECT_ROOT", root))
182
+ ep = ep_root / "exclude_patterns.txt"
183
+ if ep.exists():
184
+ try:
185
+ for line in ep.read_text(errors="ignore").splitlines():
186
+ p = line.strip()
187
+ if p and not p.startswith("#"):
188
+ p = p.split()[0] # first token
189
+ if p:
190
+ EXCLUDES.add(p)
191
+ # also common glob forms as exact for dirname match
192
+ if p.endswith("/*") or p.endswith("*"):
193
+ EXCLUDES.add(p.rstrip("/*"))
194
+ except Exception:
195
+ pass
175
196
 
176
- for dirpath, dirnames, filenames in os.walk(root):
177
- # prune in-place (same pattern as resolution._discover_*)
178
- dirnames[:] = [d for d in dirnames if d not in EXCLUDES]
179
- for fn in filenames:
180
- if fn.lower().endswith(exts):
181
- p = Path(dirpath) / fn
182
- try:
183
- if p.is_file():
184
- candidates.append(p)
185
- except Exception:
197
+ # Fast path: if inside a git repo, use `git ls-files` + untracked (respects .gitignore, dramatically faster
198
+ # on large checkouts than any walk; falls back to scandir scan). This is a pure speed opt for "updates"
199
+ # (check-changes, update-maps) with near-identical or better candidate set for real codebases.
200
+ git_dir = root / ".git"
201
+ if git_dir.exists() or (root / ".git" / "HEAD").exists(): # works for worktrees too
202
+ try:
203
+ import subprocess
204
+ # cached + others (untracked but not ignored), exclude standard ignores
205
+ out = subprocess.check_output(
206
+ ["git", "ls-files", "--cached", "--others", "--exclude-standard", "-z"],
207
+ cwd=root, stderr=subprocess.DEVNULL
208
+ )
209
+ for entry in out.split(b"\0"):
210
+ if not entry:
186
211
  continue
212
+ p = (root / entry.decode("utf-8", "ignore")).resolve()
213
+ if p.suffix.lower() in exts: # reuse the set from above (adjusted)
214
+ # quick filter for excludes we still want even if git surfaces them
215
+ parts = p.parts
216
+ if not any(part in EXCLUDES or any(part.startswith(e) for e in (".",)) for part in parts):
217
+ candidates.append(p)
218
+ if candidates:
219
+ return candidates # success, use git list
220
+ except Exception:
221
+ pass # fall through to scandir
222
+
223
+ # Use os.scandir for faster directory traversal (std lib only; avoids full listdir + separate is_dir stats on large trees).
224
+ # Pruning is applied on the fly. Behavior identical to prior walk.
225
+ exts_lower = tuple(e.lower() for e in exts)
226
+ def _scan_dir(d: Path) -> None:
227
+ try:
228
+ with os.scandir(d) as it:
229
+ for entry in it:
230
+ try:
231
+ name = entry.name
232
+ if entry.is_dir(follow_symlinks=False):
233
+ if name not in EXCLUDES and not name.startswith('.'):
234
+ _scan_dir(Path(entry.path))
235
+ elif entry.is_file(follow_symlinks=False):
236
+ lname = name.lower()
237
+ if lname.endswith(exts_lower):
238
+ candidates.append(Path(entry.path))
239
+ except Exception:
240
+ continue
241
+ except Exception:
242
+ pass
243
+ _scan_dir(root)
187
244
  return candidates
188
245
 
189
246
 
@@ -1852,15 +1852,44 @@ def generate_update_events(
1852
1852
  # Real early scope projection (proportional for 50k+) - Micro-step 1
1853
1853
  projector_stats: Dict[str, Any] = {"degraded": True}
1854
1854
  proj: Dict[str, Any] = {}
1855
- # Basic candidate collection (inline for self-contained Micro-step 1; mirrors common excludes)
1856
- candidates = []
1855
+ # Faster candidate collection using os.scandir (avoids repeated listdir overhead).
1856
+ # Respects exclude_patterns.txt when present (for consistency with check-changes + mapping speed).
1857
+ # Same semantics as before.
1858
+ candidates: List[Path] = []
1857
1859
  exts = {'.py', '.js', '.ts', '.jsx', '.tsx'}
1858
- exclude_dirs = {'__pycache__', '.git', 'node_modules', '.venv', 'venv', 'build', 'dist', '.next', '.cache'}
1859
- for dirpath, dirnames, filenames in os.walk(root):
1860
- dirnames[:] = [d for d in dirnames if d not in exclude_dirs and not d.startswith('.')]
1861
- for f in filenames:
1862
- if os.path.splitext(f)[1].lower() in exts:
1863
- candidates.append(Path(dirpath) / f)
1860
+ exclude_dirs = {'__pycache__', '.git', 'node_modules', '.venv', 'venv', 'build', 'dist', '.next', '.cache',
1861
+ '.pnpm', '.yarn', '.store', 'tmp', 'temp', '.turbo', '.mypy_cache', '.ruff_cache'}
1862
+ # Load project excludes if available (project root level)
1863
+ ep = root / "exclude_patterns.txt"
1864
+ if ep.exists():
1865
+ try:
1866
+ for line in ep.read_text(errors="ignore").splitlines():
1867
+ p = line.strip()
1868
+ if p and not p.startswith("#"):
1869
+ p = p.split()[0]
1870
+ if p:
1871
+ exclude_dirs.add(p)
1872
+ if p.endswith("/*") or p.endswith("*"):
1873
+ exclude_dirs.add(p.rstrip("/*"))
1874
+ except Exception:
1875
+ pass
1876
+ def _scan(d: Path) -> None:
1877
+ try:
1878
+ with os.scandir(d) as it:
1879
+ for entry in it:
1880
+ try:
1881
+ name = entry.name
1882
+ if entry.is_dir(follow_symlinks=False):
1883
+ if name not in exclude_dirs and not name.startswith('.'):
1884
+ _scan(Path(entry.path))
1885
+ elif entry.is_file(follow_symlinks=False):
1886
+ if os.path.splitext(name)[1].lower() in exts:
1887
+ candidates.append(Path(entry.path))
1888
+ except Exception:
1889
+ continue
1890
+ except Exception:
1891
+ pass
1892
+ _scan(root)
1864
1893
  candidates_rel: List[str] = []
1865
1894
  for p in candidates:
1866
1895
  try:
@@ -43,6 +43,13 @@ import re
43
43
  from pathlib import Path
44
44
  from typing import List, Dict, Optional, Any
45
45
 
46
+ # Module-level compiled regexes for small repeated speed win on every parse (docstring strip + dynamic import detectors).
47
+ # Zero behavior change.
48
+ _DOCSTRING_RE1 = re.compile(r'"""[\s\S]*?"""')
49
+ _DOCSTRING_RE2 = re.compile(r"'''[\s\S]*?'''")
50
+ _DYN_IMPORT_RE = re.compile(r'(?P<call>(?:importlib\.)?import_module)\s*\(', re.MULTILINE)
51
+ _DYN_DUNDER_RE = re.compile(r'(?P<call>__import__)\s*\(', re.MULTILINE)
52
+
46
53
  # Diagnostics & Failure Transparency (Limitation #5) - same robust import pattern as JS parser
47
54
  try:
48
55
  from . import diagnostics
@@ -249,8 +256,8 @@ def _strip_docstrings(content: str) -> str:
249
256
  strings well), but it is good enough for v0.4 and keeps us zero-dependency.
250
257
  """
251
258
  # Remove """...""" and '''...''' (non-greedy, handles both single and double)
252
- content = re.sub(r'"""[\s\S]*?"""', '', content)
253
- content = re.sub(r"'''[\s\S]*?'''", '', content)
259
+ content = _DOCSTRING_RE1.sub('', content)
260
+ content = _DOCSTRING_RE2.sub('', content)
254
261
  return content
255
262
 
256
263
 
@@ -304,8 +311,8 @@ def parse_python_imports(filepath: str) -> List[Dict[str, Any]]:
304
311
  dynamic_imports: List[Dict[str, Any]] = []
305
312
  if _extract_candidate_literals is not None and _apply_dynamic_registry is not None:
306
313
  dyn_patterns = [
307
- (re.compile(r'(?P<call>(?:importlib\.)?import_module)\s*\(', re.MULTILINE), "import_module"),
308
- (re.compile(r'(?P<call>__import__)\s*\(', re.MULTILINE), "dunder_import"),
314
+ (_DYN_IMPORT_RE, "import_module"),
315
+ (_DYN_DUNDER_RE, "dunder_import"),
309
316
  ]
310
317
  for pat, ptype in dyn_patterns:
311
318
  for match in pat.finditer(content):
@@ -1,6 +1,6 @@
1
1
  Metadata-Version: 2.4
2
2
  Name: wikifier
3
- Version: 4.1.1
3
+ Version: 4.1.2
4
4
  Summary: Agent-first, zero-dependency, self-maintaining codebase documentation & change tracking system
5
5
  Author-email: Aron Amos <aron@example.com>
6
6
  Maintainer: Aron Amos
@@ -97,6 +97,15 @@ See the final synth in `Findings/M5-Dogfood-Progress.md` and full Assessment-Rep
97
97
  - Synced descriptions in README.md, `skills/run.md`, `wikifier/mcp/README.md`.
98
98
  - All under protocol (FRESH, record-change + mark-green with subid, main clean). Version to 4.1.1. See the separation-fix commit.
99
99
 
100
+ **v4.1.2 (2026-06)**: Very minor patch for mapping / update speed hygiene (no new behaviour or scope; pure internal improvements for large projects).
101
+
102
+ - Faster candidate collection everywhere that drives check-changes, monitor, and update-maps (Python-primary paths): switched to `os.scandir`-based recursive scan (std lib, avoids walk overhead) + git `ls-files --cached --others --exclude-standard` fast-path when in a git repo (dramatically faster on real monorepos; falls back cleanly).
103
+ - Consistent early pruning: `exclude_patterns.txt` (user-editable, populated by init) now applied more broadly during mapping walks in both sh and Python collectors (venvs, caches, site-packages, etc. stop descent sooner).
104
+ - Small parser micro-opt: hoisted common regex compiles (docstring strip, dynamic import detectors) to module level in the Python parser (hot path on dirty files during maps).
105
+ - Minor sh-side note + skeleton for git fast collection in traditional update-maps path (real wins already flow through the Python collectors used by check-changes/streaming/lib/MCP).
106
+ - All changes FRESH-checked + recorded+marked under subid=mapping-speed-hygiene. Complements existing levers (monitored_paths.txt narrowing, --dir / directory= scoping, --stream / --max-files / --max-time streaming+budgets, python-primary, incremental dirty + BRC reverse index).
107
+ - Version to 4.1.2. See the speed-hygiene edits + this commit.
108
+
100
109
  ### Historical (pre-v4.0)
101
110
 
102
111
  #### What's New in v0.3.3 (Gap #1)
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes
File without changes