npm - sdtk-wiki-kit - Versions diffs - 0.1.0 → 0.1.1 - Mend

sdtk-wiki-kit 0.1.0 → 0.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

package/README.md +87 -11
package/assets/atlas/build_atlas.py +164 -79
package/package.json +1 -1
package/src/commands/help.js +10 -3
package/src/commands/lint.js +2 -1
package/src/commands/search.js +88 -0
package/src/commands/wiki.js +83 -9
package/src/index.js +4 -1
package/src/lib/wiki-compile.js +694 -6
package/src/lib/wiki-extract.js +637 -0
package/src/lib/wiki-flags.js +8 -0
package/src/lib/wiki-lint.js +179 -2
package/src/lib/wiki-search.js +175 -0

package/README.md CHANGED Viewed

@@ -36,13 +36,15 @@ Implemented in the Foundation/Beta package:
 | Run non-destructive wiki lint | `sdtk-wiki lint` |
 | Run stale-page prune dry-run report | `sdtk-wiki wiki prune --dry-run` |
 | Generate local discovery plan from gap evidence | `sdtk-wiki wiki discover --plan` |
-| Generate compile dry-run preview from a local plan | `sdtk-wiki wiki compile --dry-run` |
-| Ask grounded questions over built graph | `sdtk-wiki ask` |
-| Save one redacted query record after successful Ask | `sdtk-wiki ask --save-query` |
+| Generate semantic extraction dry-run report | `sdtk-wiki wiki extract --dry-run` |
+| Generate compile dry-run preview and JSON sidecar | `sdtk-wiki wiki compile --dry-run` |
+| Apply an approved compile JSON sidecar | `sdtk-wiki wiki compile --apply --yes` |
+| Search generated personal-brain pages locally | `sdtk-wiki search` |
+| Ask grounded questions over built graph | `sdtk-wiki ask` with `wiki.ask` entitlement/runtime preconditions |
+| Save one redacted query record after successful Ask | `sdtk-wiki ask --save-query` with `wiki.ask` entitlement/runtime preconditions |
 Not implemented in the Foundation/Beta runtime:
-- `sdtk-wiki wiki compile --apply`
 - automatic web discovery or web fetch
 - automatic source ingest from the web
 - destructive prune/delete/archive
@@ -60,6 +62,7 @@ Not implemented in the Foundation/Beta runtime:
 | `.sdtk/wiki/raw` | metadata-only raw/source registry |
 | `.sdtk/wiki/provenance` | source/build/ingest provenance |
 | `.sdtk/wiki/reports` | lint, prune, discover, and compile preview reports |
+| `.sdtk/wiki/personal-brain` | generated semantic personal-brain pages from explicit apply |
 | `.sdtk/wiki/queries` | opt-in redacted Ask query records |
 | `.sdtk/atlas` | legacy Atlas compatibility output, readable only |
@@ -164,14 +167,40 @@ Safety:
 - no compile/apply
 - no prune/delete/archive
+### Semantic Extraction Dry-Run
+```powershell
+sdtk-wiki wiki extract --project-path <path> --source-root <path> --dry-run
+```
+Reads local Markdown sources and writes a semantic extraction JSON report under
+`.sdtk/wiki/reports`. The report can identify local source records, GitHub
+tool candidates, concept candidates, relations, comparisons, syntheses, and
+source-quality findings.
+Safety:
+- local source roots only
+- no web fetch
+- no page generation
+- no graph/viewer rebuild side effects
+- no raw/provenance mutation
+- no `.sdtk/atlas` mutation
 ### Compile Dry-Run Preview
 ```powershell
 sdtk-wiki wiki compile --plan <path> --project-path <path> --dry-run
 ```
-Reads a local markdown or JSON compile plan and writes a preview report under
-`.sdtk/wiki/reports`.
+Reads a local structured markdown plan, JSON operation plan, or
+`sdtk_wiki_semantic_extraction` JSON report and writes both:
+- `.sdtk/wiki/reports/compile-dry-run-preview-YYYY-MM-DD.md`
+- `.sdtk/wiki/reports/compile-apply-plan-YYYY-MM-DD.json`
+The markdown report is for human review. The JSON sidecar is the only supported
+source of truth for explicit apply.
 Supported operation types:
@@ -180,9 +209,41 @@ Supported operation types:
 - `add_relation`
 - `add_source_ref`
-Unknown operation types are reported as `unsupported_operation`. The current
-runtime has no `--apply` behavior and does not modify wiki pages, raw sources,
-provenance, or `.sdtk/atlas`.
+Unknown operation types are reported as `unsupported_operation`. Dry-run does
+not modify wiki pages, raw sources, provenance, or `.sdtk/atlas`.
+### Compile Apply
+```powershell
+sdtk-wiki wiki compile --plan <compile-apply-plan-json> --project-path <path> --apply --yes
+```
+Applies only a `record_type: "sdtk_wiki_compile_apply_plan"` JSON sidecar
+generated by compile dry-run. Markdown plans and raw semantic extraction JSON
+are rejected for apply.
+Apply behavior:
+- requires `--apply --yes`
+- writes only under `.sdtk/wiki/personal-brain`
+- create-only or same-content no-op
+- no overwrite with different content
+- no delete, archive, rewrite, or reorder
+- no raw/provenance descriptor mutation
+- no `.sdtk/atlas` mutation
+### Local Search
+```powershell
+sdtk-wiki search --project-path <path> "<query>"
+```
+Searches generated personal-brain Markdown pages under
+`.sdtk/wiki/personal-brain/**/*.md`.
+Search is deterministic, read-only, and non-premium. It does not require
+`wiki.ask` entitlement, does not call an LLM/RAG runtime, does not write query
+history, and does not mutate project files.
 ### Ask
@@ -190,7 +251,9 @@ provenance, or `.sdtk/atlas`.
 sdtk-wiki ask --question "<text>" [--project-path <path>] [--json] [--source <id-or-path>] [--max-sources <n>] [--save-query]
 ```
-Native `sdtk-wiki ask` is the canonical Q&A command for capability `wiki.ask`.
+Native `sdtk-wiki ask` is implemented as the canonical Q&A command for
+capability `wiki.ask`, but it is not a free local search command. It requires
+valid `wiki.ask` entitlement and runtime preconditions.
 Preconditions:
@@ -237,6 +300,15 @@ Preview a compile plan without applying it:
 sdtk-wiki wiki compile --plan <local-plan.md-or-json> --project-path . --dry-run
 ```
+Build a personal-brain from local Markdown sources and search it:
+```powershell
+sdtk-wiki wiki extract --project-path . --source-root docs --dry-run
+sdtk-wiki wiki compile --project-path . --plan .sdtk/wiki/reports/semantic-extraction-dry-run-<stamp>.json --dry-run
+sdtk-wiki wiki compile --project-path . --plan .sdtk/wiki/reports/compile-apply-plan-<date>.json --apply --yes
+sdtk-wiki search --project-path . "multi-agent"
+```
 Ask and save an opt-in redacted query record:
 ```powershell
@@ -244,6 +316,10 @@ sdtk-wiki atlas build --project-path .
 sdtk-wiki ask --project-path . --question "Which docs describe the deployment path?" --save-query
 ```
+This flow requires valid `wiki.ask` entitlement/runtime preconditions. Use
+`sdtk-wiki search` for non-premium local validation of generated
+personal-brain pages.
 ## Foundation/Beta Boundaries
 This release is local-first and report-first. It is a foundation for a
@@ -252,10 +328,10 @@ second-brain workflow, not a fully autonomous second brain.
 Do not claim the Foundation/Beta runtime includes:
 - web fetch/discover
-- compile `--apply`
 - destructive prune/delete/archive
 - query list/show/delete
 - default full prompt/full answer query persistence
+- premium Ask without valid `wiki.ask` entitlement/runtime preconditions
 - `.sdtk/atlas` as canonical storage
 See `products/sdtk-wiki/governance/SDTK_WIKI_USAGE_GUIDE.md` for the fuller

package/assets/atlas/build_atlas.py CHANGED Viewed

@@ -147,20 +147,48 @@ def _assert_inside(base: Path, target: Path) -> None:
         raise ValueError(f"Refusing to write outside SDTK-WIKI workspace: {resolved_target}")
-def _is_excluded(
-    path: Path,
-    root: Path,
-    exclude_frags: list[str],
-) -> bool:
-    try:
-        rel = path.relative_to(root).as_posix().lower()
-    except ValueError:
-        rel = path.as_posix().lower()
-    for frag in exclude_frags:
-        norm_frag = frag.replace("\\", "/").lower()
-        if norm_frag in rel:
-            return True
-    return False
+def _is_excluded(
+    path: Path,
+    root: Path,
+    exclude_frags: list[str],
+) -> bool:
+    return _match_exclude(path=path, root=root, exclude_frags=exclude_frags) is not None
+def _display_scan_path(path: Path, root: Path) -> str:
+    try:
+        return path.relative_to(root).as_posix()
+    except ValueError:
+        return path.as_posix()
+def _normalise_exclude_fragment(frag: str) -> list[str]:
+    norm_frag = frag.replace("\\", "/").strip("/").lower()
+    return [part for part in norm_frag.split("/") if part and part != "."]
+def _match_exclude(
+    path: Path,
+    root: Path,
+    exclude_frags: list[str],
+) -> str | None:
+    rel = _display_scan_path(path, root).lower()
+    rel_parts = [part for part in rel.split("/") if part and part != "."]
+    for frag in exclude_frags:
+        frag_parts = _normalise_exclude_fragment(frag)
+        if not frag_parts:
+            continue
+        if len(frag_parts) == 1:
+            if frag_parts[0] in rel_parts:
+                return frag
+            continue
+        for idx in range(0, len(rel_parts) - len(frag_parts) + 1):
+            if rel_parts[idx : idx + len(frag_parts)] == frag_parts:
+                return frag
+    return None
 def _extract_title(text: str) -> str:
@@ -322,9 +350,9 @@ def _compute_file_hash(md_file: Path) -> str:
     return hashlib.sha256(content).hexdigest()
-def _parse_doc_record(md_file: Path, root: Path) -> dict[str, Any]:
-    rel = md_file.relative_to(root).as_posix()
-    text = md_file.read_text(encoding="utf-8", errors="replace")
+def _parse_doc_record(md_file: Path, root: Path) -> dict[str, Any]:
+    rel = _display_scan_path(md_file, root)
+    text = md_file.read_text(encoding="utf-8", errors="replace")
     frontmatter_fields, body_text = _parse_frontmatter(text)
     title = str(
         frontmatter_fields.get("title")
@@ -363,39 +391,70 @@ def _parse_doc_record(md_file: Path, root: Path) -> dict[str, Any]:
     }
-def list_indexable_markdown_files(
-    root: Path,
-    scan_roots: list[Path],
-    exclude_frags: list[str],
-) -> list[Path]:
-    files: list[Path] = []
-    seen_paths: set[str] = set()
-    for scan_root in scan_roots:
-        if not scan_root.exists():
-            print(f"[atlas] Warning: scan root does not exist, skipping: {scan_root}", file=sys.stderr)
-            continue
+def list_indexable_markdown_files(
+    root: Path,
+    scan_roots: list[Path],
+    exclude_frags: list[str],
+) -> list[Path]:
+    return collect_indexable_markdown_files(root, scan_roots, exclude_frags)["files"]
+def collect_indexable_markdown_files(
+    root: Path,
+    scan_roots: list[Path],
+    exclude_frags: list[str],
+) -> dict[str, Any]:
+    files: list[Path] = []
+    seen_paths: set[str] = set()
+    skipped_files: list[dict[str, str]] = []
+    scanned_count = 0
+    for scan_root in scan_roots:
+        if not scan_root.exists():
+            print(f"[atlas] Warning: scan root does not exist, skipping: {scan_root}", file=sys.stderr)
+            continue
         if scan_root.is_file() and scan_root.suffix.lower() == ".md":
             candidates = [scan_root]
         elif scan_root.is_dir():
             candidates = [p for p in sorted(scan_root.rglob("*.md")) if p.is_file()]
         else:
-            candidates = []
-        for md_file in candidates:
-            if _is_excluded(md_file, root=root, exclude_frags=exclude_frags):
-                continue
-            try:
-                rel = md_file.relative_to(root).as_posix()
-            except ValueError:
-                rel = md_file.as_posix()
-            if rel in seen_paths:
-                continue
-            seen_paths.add(rel)
-            files.append(md_file)
-    files.sort(key=lambda p: p.as_posix())
-    return files
+            candidates = []
+        for md_file in candidates:
+            scanned_count += 1
+            matched_exclude = _match_exclude(md_file, root=root, exclude_frags=exclude_frags)
+            display_path = _display_scan_path(md_file, root)
+            if matched_exclude is not None:
+                skipped_files.append(
+                    {
+                        "path": display_path,
+                        "reason": f"exclude:{matched_exclude}",
+                    }
+                )
+                continue
+            try:
+                rel = md_file.relative_to(root).as_posix()
+            except ValueError:
+                rel = md_file.as_posix()
+            if rel in seen_paths:
+                skipped_files.append(
+                    {
+                        "path": display_path,
+                        "reason": "duplicate_scan_root",
+                    }
+                )
+                continue
+            seen_paths.add(rel)
+            files.append(md_file)
+    files.sort(key=lambda p: p.as_posix())
+    return {
+        "files": files,
+        "scanned_count": scanned_count,
+        "indexed_count": len(files),
+        "skipped_count": len(skipped_files),
+        "skipped_files": skipped_files,
+    }
 # ---------------------------------------------------------------------------
@@ -639,16 +698,17 @@ def write_wiki_pages_and_provenance(
     }
-def build_docs_incremental(
-    root: Path,
-    atlas_dir: Path,
-    generated: str,
-    scan_roots: list[Path],
-    exclude_frags: list[str],
-) -> tuple[list[dict[str, Any]], dict[str, Any], dict[str, int]]:
-    prior_state = load_atlas_state(atlas_dir)
-    prior_documents = prior_state.get("documents", {})
-    current_files = list_indexable_markdown_files(root, scan_roots, exclude_frags)
+def build_docs_incremental(
+    root: Path,
+    atlas_dir: Path,
+    generated: str,
+    scan_roots: list[Path],
+    exclude_frags: list[str],
+) -> tuple[list[dict[str, Any]], dict[str, Any], dict[str, Any]]:
+    prior_state = load_atlas_state(atlas_dir)
+    prior_documents = prior_state.get("documents", {})
+    scan_result = collect_indexable_markdown_files(root, scan_roots, exclude_frags)
+    current_files = scan_result["files"]
     current_rel_paths = {}
     for md_file in current_files:
@@ -710,12 +770,16 @@ def build_docs_incremental(
         "generated": generated,
         "documents": next_documents,
     }
-    build_stats = {
-        "discovered_count": len(current_rel_paths),
-        "reused_count": reused_count,
-        "reparsed_count": reparsed_count,
-        "removed_count": removed_count,
-    }
+    build_stats = {
+        "discovered_count": len(current_rel_paths),
+        "scanned_count": scan_result["scanned_count"],
+        "indexed_count": len(current_rel_paths),
+        "skipped_count": scan_result["skipped_count"],
+        "skipped_files": scan_result["skipped_files"],
+        "reused_count": reused_count,
+        "reparsed_count": reparsed_count,
+        "removed_count": removed_count,
+    }
     return docs, next_state, build_stats
@@ -814,11 +878,11 @@ def build_graph(docs: list[dict[str, Any]]) -> dict[str, Any]:
 # ---------------------------------------------------------------------------
 # Summary markdown
 # ---------------------------------------------------------------------------
-def build_summary(
+def build_summary(
     docs: list[dict[str, Any]],
     graph: dict[str, Any],
     generated: str,
-    stats: dict[str, int] | None,
+    stats: dict[str, Any] | None,
     root: Path,
     scan_roots: list[Path],
     exclude_frags: list[str],
@@ -848,16 +912,30 @@ def build_summary(
     for fam, cnt in sorted(family_counts.items(), key=lambda x: -x[1]):
         lines.append(f"| {fam} | {cnt} |")
-    if stats is not None:
-        lines += [
-            "",
-            "## Incremental Build",
-            "",
-            f"Discovered markdown docs: {stats['discovered_count']}",
-            f"Reused cached docs: {stats['reused_count']}",
-            f"Reparsed docs: {stats['reparsed_count']}",
-            f"Removed stale docs: {stats['removed_count']}",
-        ]
+    if stats is not None:
+        lines += [
+            "",
+            "## Incremental Build",
+            "",
+            f"Discovered markdown docs: {stats['discovered_count']}",
+            f"Scanned markdown candidates: {stats.get('scanned_count', stats['discovered_count'])}",
+            f"Indexed markdown docs: {stats.get('indexed_count', stats['discovered_count'])}",
+            f"Skipped markdown docs: {stats.get('skipped_count', 0)}",
+            f"Reused cached docs: {stats['reused_count']}",
+            f"Reparsed docs: {stats['reparsed_count']}",
+            f"Removed stale docs: {stats['removed_count']}",
+        ]
+        skipped_files = stats.get("skipped_files") or []
+        if skipped_files:
+            lines += [
+                "",
+                "## Skipped Markdown Files",
+                "",
+                "| Path | Reason |",
+                "|------|--------|",
+            ]
+            for skipped in skipped_files:
+                lines.append(f"| {skipped['path']} | {skipped['reason']} |")
     lines += [
         "",
@@ -959,12 +1037,19 @@ def build_atlas(
         scan_roots=roots,
         exclude_frags=frags,
     )
-    print(f"[atlas] Indexed {len(docs)} documents.")
-    if verbose:
-        print(
-            f"[atlas] Incremental build: reused {stats['reused_count']} cached, "
-            f"reparsed {stats['reparsed_count']}, removed {stats['removed_count']}."
-        )
+    print(f"[atlas] Indexed {len(docs)} documents.")
+    print(
+        f"[atlas] Scan coverage: scanned {stats.get('scanned_count', len(docs))}, "
+        f"indexed {stats.get('indexed_count', len(docs))}, "
+        f"skipped {stats.get('skipped_count', 0)}."
+    )
+    if verbose:
+        print(
+            f"[atlas] Incremental build: reused {stats['reused_count']} cached, "
+            f"reparsed {stats['reparsed_count']}, removed {stats['removed_count']}."
+        )
+        for skipped in stats.get("skipped_files", []):
+            print(f"[atlas] Skipped markdown: {skipped['path']} ({skipped['reason']})")
     print("[atlas] Building graph...")
     graph = build_graph(docs)

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "sdtk-wiki-kit",
-  "version": "0.1.0",
+  "version": "0.1.1",
   "description": "Project-local wiki and knowledge graph toolkit for SDTK workspaces.",
   "bin": {
     "sdtk-wiki": "bin/sdtk-wiki.js"

package/src/commands/help.js CHANGED Viewed

@@ -16,6 +16,7 @@ Usage:
   sdtk-wiki wiki discover --help
   sdtk-wiki wiki compile --help
   sdtk-wiki ask --help
+  sdtk-wiki search --help
   sdtk-wiki lint --help
 R1 command model:
@@ -27,8 +28,9 @@ R1 command model:
   wiki ingest          Register one local source in metadata-only raw/provenance state.
   wiki prune           Write a report-only dry-run stale managed-page review.
   wiki discover        Write a local-only discovery plan from WIKI gap evidence.
-  wiki compile         Write a compile dry-run preview from a local plan.
+  wiki compile         Preview or explicitly apply local personal-brain compile plans.
   ask                  Ask grounded questions over the built SDTK-WIKI graph.
+  search               Search generated personal-brain pages locally without premium Ask.
   lint                 Write a report-first, non-destructive wiki lint report.
 Workspace paths:
@@ -50,6 +52,10 @@ Premium Ask:
                        Requires .sdtk/wiki/graph plus local entitlement/runtime preconditions.
                        Query history, discover, compile, and cleanup automation are not enabled in R1.
+Local Search:
+  sdtk-wiki search     Deterministic, read-only local search over .sdtk/wiki/personal-brain.
+                       Does not require wiki.ask entitlement and does not perform LLM/RAG behavior.
 Maintenance:
   sdtk-wiki wiki prune --dry-run is report-only and writes under .sdtk/wiki/reports.
   It never deletes, archives, applies, or mutates .sdtk/atlas.`);
@@ -57,8 +63,9 @@ Maintenance:
   sdtk-wiki wiki discover --plan is plan-only and writes under .sdtk/wiki/reports.
   It never fetches web sources, ingests sources, compiles pages, applies edits, prunes, or mutates .sdtk/atlas.`);
   console.log(`
-  sdtk-wiki wiki compile --dry-run writes a compile dry-run preview under .sdtk/wiki/reports.
-  It never applies changes, rewrites pages, mutates raw/provenance files, or mutates .sdtk/atlas.`);
+  sdtk-wiki wiki compile --dry-run writes a markdown preview plus JSON sidecar under .sdtk/wiki/reports.
+  sdtk-wiki wiki compile --apply --yes consumes only the JSON sidecar and writes create-only personal-brain pages.
+  It never rewrites pages, mutates raw/provenance files, or mutates .sdtk/atlas.`);
   return 0;
 }

package/src/commands/lint.js CHANGED Viewed

@@ -13,13 +13,14 @@ function cmdLintHelp() {
   sdtk-wiki lint [--project-path <path>]
 Purpose:
-  Run report-first, non-destructive lint checks over canonical .sdtk/wiki content.
+  Run report-first, non-destructive lint checks over canonical .sdtk/wiki content and local source-quality evidence.
 Output:
   .sdtk/wiki/reports/lint-report-YYYY-MM-DD.md
 Behavior:
   Findings are written to the report and do not auto-modify wiki or source files.
+  Source-quality checks report mojibake-like text, missing source URLs, weak titles, duplicate repo/source candidates, low-confidence extraction, and raw/graph/provenance coverage mismatch.
   Completed lint runs exit 0 even when findings exist.
   Missing workspace or fatal report-write failures exit non-zero.

package/src/commands/search.js ADDED Viewed

@@ -0,0 +1,88 @@
+"use strict";
+const { parseFlags } = require("../lib/args");
+const { runWikiSearch } = require("../lib/wiki-search");
+const SEARCH_FLAG_DEFS = {
+  help: { type: "boolean", alias: "h" },
+  "project-path": { type: "string" },
+  json: { type: "boolean" },
+  limit: { type: "string" },
+};
+function parseSearchFlags(args) {
+  return parseFlags(args || [], SEARCH_FLAG_DEFS);
+}
+function printSearchHelp() {
+  console.log(`SDTK-WIKI Local Search
+Usage:
+  sdtk-wiki search --project-path <path> "multi-agent"
+  sdtk-wiki search --project-path <path> --json --limit 10 "Claude Code"
+Purpose:
+  Deterministically search generated personal-brain Markdown pages.
+Inputs:
+  .sdtk/wiki/personal-brain/**/*.md
+Behavior:
+  Read-only and non-premium.
+  No wiki.ask entitlement is required.
+  No LLM, RAG, web search, query history, compile/apply, prune, or project mutation is performed.`);
+  return 0;
+}
+function printHumanResult(result) {
+  const lines = [
+    `Query: ${result.query}`,
+    `Search mode: ${result.searchMode}`,
+    `Personal brain: ${result.personalBrainPath}`,
+    `Scanned files: ${result.scannedFiles}`,
+    `Matches: ${result.totalMatches}`,
+    "",
+  ];
+  if (result.matches.length === 0) {
+    lines.push("No local personal-brain matches found.");
+  } else {
+    result.matches.forEach((match, index) => {
+      lines.push(`${index + 1}. ${match.path}`);
+      lines.push(`   title: ${match.title}`);
+      lines.push(`   score: ${match.score}`);
+      lines.push(`   why: ${match.why}`);
+      lines.push(`   snippet: ${match.snippet}`);
+      lines.push("");
+    });
+  }
+  lines.push("No entitlement, LLM/RAG runtime, query history, or project mutation was used.");
+  console.log(lines.join("\n").trimEnd());
+}
+function cmdSearch(args) {
+  const { flags, positional } = parseSearchFlags(args || []);
+  if (flags.help) {
+    return printSearchHelp();
+  }
+  const query = positional.join(" ");
+  const result = runWikiSearch({
+    projectPath: flags["project-path"],
+    query,
+    limit: flags.limit,
+  });
+  if (flags.json) {
+    console.log(JSON.stringify(result, null, 2));
+  } else {
+    printHumanResult(result);
+  }
+  return 0;
+}
+module.exports = {
+  cmdSearch,
+  parseSearchFlags,
+  printSearchHelp,
+};