llmwikify 0.15.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (50) hide show
  1. llmwikify-0.15.0/CHANGELOG.md +259 -0
  2. llmwikify-0.15.0/LICENSE +21 -0
  3. llmwikify-0.15.0/MANIFEST.in +10 -0
  4. llmwikify-0.15.0/PKG-INFO +624 -0
  5. llmwikify-0.15.0/README.md +571 -0
  6. llmwikify-0.15.0/docs/CONFIGURATION_GUIDE.md +342 -0
  7. llmwikify-0.15.0/docs/CONFIG_GUIDE.md +319 -0
  8. llmwikify-0.15.0/docs/LLM_WIKI_PRINCIPLES.md +75 -0
  9. llmwikify-0.15.0/docs/MCP_SETUP.md +364 -0
  10. llmwikify-0.15.0/docs/REFERENCE_TRACKING_GUIDE.md +241 -0
  11. llmwikify-0.15.0/docs/SINK_DESIGN_DECISIONS.md +329 -0
  12. llmwikify-0.15.0/docs/plans/v0.15.0.md +161 -0
  13. llmwikify-0.15.0/pyproject.toml +137 -0
  14. llmwikify-0.15.0/setup.cfg +4 -0
  15. llmwikify-0.15.0/src/llmwikify/__init__.py +66 -0
  16. llmwikify-0.15.0/src/llmwikify/cli/__init__.py +5 -0
  17. llmwikify-0.15.0/src/llmwikify/cli/commands.py +658 -0
  18. llmwikify-0.15.0/src/llmwikify/config.py +186 -0
  19. llmwikify-0.15.0/src/llmwikify/core/__init__.py +6 -0
  20. llmwikify-0.15.0/src/llmwikify/core/index.py +309 -0
  21. llmwikify-0.15.0/src/llmwikify/core/wiki.py +2110 -0
  22. llmwikify-0.15.0/src/llmwikify/extractors/__init__.py +24 -0
  23. llmwikify-0.15.0/src/llmwikify/extractors/base.py +92 -0
  24. llmwikify-0.15.0/src/llmwikify/extractors/html.py +17 -0
  25. llmwikify-0.15.0/src/llmwikify/extractors/pdf.py +53 -0
  26. llmwikify-0.15.0/src/llmwikify/extractors/text.py +50 -0
  27. llmwikify-0.15.0/src/llmwikify/extractors/web.py +53 -0
  28. llmwikify-0.15.0/src/llmwikify/extractors/youtube.py +75 -0
  29. llmwikify-0.15.0/src/llmwikify/llm_client.py +142 -0
  30. llmwikify-0.15.0/src/llmwikify/mcp/__init__.py +5 -0
  31. llmwikify-0.15.0/src/llmwikify/mcp/server.py +260 -0
  32. llmwikify-0.15.0/src/llmwikify/py.typed +1 -0
  33. llmwikify-0.15.0/src/llmwikify/utils/__init__.py +5 -0
  34. llmwikify-0.15.0/src/llmwikify/utils/helpers.py +17 -0
  35. llmwikify-0.15.0/src/llmwikify.egg-info/PKG-INFO +624 -0
  36. llmwikify-0.15.0/src/llmwikify.egg-info/SOURCES.txt +48 -0
  37. llmwikify-0.15.0/src/llmwikify.egg-info/dependency_links.txt +1 -0
  38. llmwikify-0.15.0/src/llmwikify.egg-info/entry_points.txt +3 -0
  39. llmwikify-0.15.0/src/llmwikify.egg-info/requires.txt +29 -0
  40. llmwikify-0.15.0/src/llmwikify.egg-info/top_level.txt +1 -0
  41. llmwikify-0.15.0/tests/conftest.py +71 -0
  42. llmwikify-0.15.0/tests/test_cli.py +124 -0
  43. llmwikify-0.15.0/tests/test_extractors.py +149 -0
  44. llmwikify-0.15.0/tests/test_index.py +186 -0
  45. llmwikify-0.15.0/tests/test_llm_client.py +88 -0
  46. llmwikify-0.15.0/tests/test_query_flow.py +610 -0
  47. llmwikify-0.15.0/tests/test_recommend.py +111 -0
  48. llmwikify-0.15.0/tests/test_sink_flow.py +994 -0
  49. llmwikify-0.15.0/tests/test_v015_features.py +554 -0
  50. llmwikify-0.15.0/tests/test_wiki_core.py +637 -0
@@ -0,0 +1,259 @@
1
+ # Changelog
2
+
3
+ All notable changes to llmwikify will be documented in this file.
4
+
5
+ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
6
+ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
+
8
+ ## [Unreleased]
9
+
10
+ ### Planned
11
+ - Incremental index updates
12
+ - Web UI (optional)
13
+ - Graph visualization (graphviz/Mermaid)
14
+
15
+ ---
16
+
17
+ ## [0.15.0] - 2026-04-10
18
+
19
+ ### Added
20
+ - **Enhanced `ingest_source()` metadata** — returns rich file metadata for LLM context:
21
+ - `file_type` — detected from extension (markdown, pdf, text, html, etc.)
22
+ - `file_size` — byte size of the raw source file
23
+ - `word_count` — word count of extracted text
24
+ - `has_images` / `image_count` — detects markdown image references
25
+ - `text_extracted` — boolean flag
26
+ - `content_preview` — first 200 chars of extracted text
27
+ - **No summary returned** — respects "LLM reads source" principle
28
+ - **Clue-based lint detection** — three new observation types, LLM makes final judgment:
29
+ - `dated_claim` (critical, max 3): Pages referencing years ≥3 years older than latest raw source
30
+ - `topic_overlap` (informational, max 2): Query: pages with ≥85% keyword Jaccard overlap
31
+ - `missing_cross_ref` (informational, max 3): Concepts mentioned in 2+ pages without wikilink
32
+ - **`hints` structure in `lint()`** — two-tier classification:
33
+ - `hints.critical[]` — demands attention (max 3)
34
+ - `hints.informational[]` — optional context (max 5)
35
+ - Total max 8 hints per lint pass
36
+ - **`_detect_file_type()` helper** — static method for file extension detection
37
+
38
+ ### Changed
39
+ - `lint()` return structure now includes `hints: {critical: [...], informational: [...]}`
40
+ - All lint hints use observational language (non-directive, respects LLM autonomy)
41
+
42
+ ### Principle Coverage
43
+ - **Ingest: "LLM reads source, discusses key takeaways"** — ✅ Enhanced (metadata, not summary)
44
+ - **Lint: "stale claims superseded by newer sources"** — ✅ New (dated_claim, clue-based)
45
+ - **Lint: "missing cross-references"** — ✅ New (missing_cross_ref, clue-based)
46
+ - **Zero domain assumption** — ✅ All hints are observations, not judgments
47
+
48
+ ---
49
+
50
+ ## [0.14.0] - 2026-04-10
51
+
52
+ ### Added
53
+ - **`merge_or_replace` parameter** in `synthesize_query()` — replaces `update_existing` with three explicit strategies:
54
+ - `"sink"` (default) — append to sink buffer for later review
55
+ - `"merge"` — LLM reads old content, consolidates, replaces formal page
56
+ - `"replace"` — overwrite the formal page entirely
57
+ - **Sink suggestion generation** — each sink entry now includes actionable observations:
58
+ - **Content Gap detection**: compares new answer with formal page to find missing topics
59
+ - **Source quality analysis**: checks for missing citations, new sources, completeness
60
+ - **Query pattern analysis**: detects repeated questions, increasing complexity trends
61
+ - **Knowledge growth suggestions**: identifies new concepts, possible contradictions
62
+ - **Sink dedup detection**: flags entries with >70% text similarity to existing sink entries
63
+ - **Sink urgency tracking** in `sink_status()`: ok / attention (7d+) / aging (14d+) / stale (30d+)
64
+ - **`sink_warnings` in `lint()`**: reports stale/aging sinks that need attention
65
+ - **Observational hint format** in `synthesize_query()`: non-directive language respects LLM autonomy
66
+
67
+ ### Changed
68
+ - **BREAKING**: `update_existing` parameter removed from `synthesize_query()` and MCP tool
69
+ - **BREAKING**: `wiki_synthesize` MCP tool now uses `merge_or_replace` (string enum) instead of `update_existing` (boolean)
70
+ - `synthesize_query()` returns `status: "merged"` or `"replaced"` instead of `"updated"`
71
+ - Hint format changed from `suggestion` (directive) to `observation` + `options` (non-directive)
72
+
73
+ ---
74
+
75
+ ## [0.13.0] - 2026-04-10
76
+
77
+ ### Added
78
+ - **Query Sink feature** — Compound answers without creating duplicate pages
79
+ - When a similar query page exists, new answers append to `sink/` instead of creating timestamped copies
80
+ - Sink files: `sink/Query: Topic.sink.md` — one per formal query page
81
+ - Chronological entries with timestamp, query, answer, and sources
82
+ - Bidirectional linking: sink frontmatter → formal page, formal page frontmatter → sink
83
+ - **New methods on Wiki class**:
84
+ - `sink_status()` — overview of all sinks with entry counts
85
+ - `read_sink(page_name)` — read pending entries from a sink file
86
+ - `clear_sink(page_name)` — clear processed entries after merge
87
+ - `_append_to_sink()` — internal method to append entries
88
+ - `_get_sink_info_for_page()` — get sink status for a page
89
+ - `_find_or_create_sink_file()` — find or create sink file
90
+ - `_update_page_sink_meta()` — update formal page frontmatter with sink metadata
91
+ - **Enhanced `synthesize_query()`**: returns `status: "sunk"` when answer goes to sink
92
+ - **Enhanced `read_page()`**: returns `has_sink` and `sink_entries`, supports reading sink files via `sink/` prefix
93
+ - **Enhanced `search()`**: attaches `has_sink` and `sink_entries` to each result
94
+ - **Enhanced `lint()`**: includes `sink_status` in return value
95
+ - **Enhanced `_update_index_file()`**: shows pending sink entry count in index.md
96
+ - **New MCP tool**: `wiki_sink_status` — overview of all query sinks
97
+ - **Updated wiki.md template**: documents sink workflow and conventions
98
+ - **27 new tests** in `test_sink_flow.py` — comprehensive sink feature coverage
99
+
100
+ ---
101
+
102
+ ## [0.12.6] - 2026-04-10
103
+
104
+ ### Added
105
+ - **wiki_synthesize MCP tool** — Query knowledge compounding cycle
106
+ - `synthesize_query()` saves query answers as persistent wiki pages
107
+ - Auto-generates page names: `Query: {Topic}` with date suffix for duplicates
108
+ - Smart duplicate detection via keyword overlap (Jaccard similarity ≥ 0.3)
109
+ - `update_existing=True` revises existing page instead of creating new
110
+ - Auto-appends structured Sources section:
111
+ - Wiki pages as `[[wikilinks]]`
112
+ - Raw sources as `[Source: filename](raw/path)` markdown links
113
+ - Auto-logs to `log.md` with parseable format: `## [timestamp] query | ... → [[page]]`
114
+ - New page auto-indexed in FTS5 and `index.md`
115
+ - **27 new tests** in `test_query_flow.py` — comprehensive coverage of synthesize scenarios
116
+
117
+ ---
118
+
119
+ ## [0.12.5] - 2026-04-10
120
+
121
+ ### Added
122
+ - **Raw source collection**: All ingest sources unified into `raw/` directory
123
+ - URL/YouTube: extracted text saved to `raw/`
124
+ - Local files outside `raw/`: copied in (cross-platform safe via `read_bytes`/`write_bytes`)
125
+ - Local files already in `raw/`: no copy needed
126
+ - Returns `source_raw_path` and `hint` for LLM guidance
127
+ - **Source citation conventions** in generated `wiki.md`:
128
+ - Raw sources cited with standard markdown links: `[Source: Title](raw/filename)`
129
+ - Explicitly prohibits `[[raw/filename]]` wikilink syntax
130
+ - Two approaches: page-level `## Sources` section or inline citations
131
+ - **MCP config auto-read**: `MCPServer(wiki)` now reads from `wiki.config["mcp"]` when no explicit config passed
132
+
133
+ ### Changed
134
+ - `ingest_source()` now returns `source_raw_path` field alongside `source_name`
135
+ - `ingest_source()` instructions updated with citation guidance (step 6-7)
136
+ - Log entries for ingest now include raw path: `Source (url): Title → raw/slug.md`
137
+
138
+ ### Testing
139
+ - 4 new test cases: raw collection, skip, duplicate, instruction validation
140
+ - Total: 83 → 87 tests passing
141
+
142
+ ---
143
+
144
+ ## [0.12.4] - 2026-04-10
145
+
146
+ ### Added
147
+ - **wiki_read_schema MCP tool** — Read `wiki.md` (schema/conventions file)
148
+ - Returns content, file path, and hint to save copy before changes
149
+ - **wiki_update_schema MCP tool** — Update `wiki.md` with new conventions
150
+ - Validates format (warnings only, does not block)
151
+ - Returns suggestions for post-update actions
152
+ - **wiki.md reference in ingest** — `ingest_source()` instructions now direct LLM to `wiki.md` for conventions
153
+
154
+ ### Changed
155
+ - `ingest_source()` instructions updated: "See wiki.md for wiki conventions and workflows"
156
+
157
+ ---
158
+
159
+ ## [0.12.3] - 2026-04-10
160
+
161
+ ### Changed
162
+ - **Pure-data wiki_ingest**: `ingest_source()` returns extracted data for LLM processing, does NOT automatically create wiki pages
163
+ - **URL raw persistence**: URL/YouTube extracted text always saved to `raw/` for persistence
164
+ - **Optional LLM smart CLI**: `_llm_process_source()` and `execute_operations()` are optional, only used when LLM client configured
165
+ - **Unified error handling**: All operations return structured dicts with `error` key on failure
166
+
167
+ ---
168
+
169
+ ## [0.12.2] - 2026-04-10
170
+
171
+ ### Improved
172
+ - **ON CONFLICT for pages table**: `created_at` preserved on updates, only mutable fields changed
173
+ - **FTS5 snippet highlighting**: Search results include `**highlighted**` snippets via FTS5 snippet function
174
+ - **LIKE fallback**: FTS5 syntax errors gracefully fall back to `LIKE` search
175
+
176
+ ---
177
+
178
+ ## [0.12.1] - 2026-04-10
179
+
180
+ ### Changed
181
+ - **Optimized wiki_init**:
182
+ - Removed `agent` parameter (zero domain assumption)
183
+ - Added `overwrite` parameter for idempotent re-initialization
184
+ - Always skips `wiki.md` and config example if they exist
185
+ - Structured return with `created_files`, `skipped_files`, `existing_files`
186
+
187
+ ---
188
+
189
+ ## [0.12.0] - 2026-04-10
190
+
191
+ ### Added
192
+ - **Phase 1: Complete CLI commands** — 15 commands total
193
+ - `init`, `ingest`, `write_page`, `read_page`, `search`
194
+ - `lint`, `status`, `log`, `references`, `build-index`
195
+ - `export-index`, `batch`, `hint`, `recommend`, `serve`
196
+ - **Auto-index**: `write_page()` automatically updates `index.md`
197
+ - **wiki.md template**: Generated on `init()` with conventions and workflows
198
+ - **hint command**: Smart suggestions for wiki improvement
199
+ - **recommend command**: Missing pages and orphan detection
200
+
201
+ ---
202
+
203
+ ## [0.11.1] - 2026-04-10
204
+
205
+ ### Changed
206
+ - **Enforced zero domain assumption**: All exclusion patterns empty by default
207
+ - `default_exclude_patterns`: `[]` (was: dates, months, quarters)
208
+ - `exclude_frontmatter`: `[]` (was: `redirect_to`)
209
+ - `archive_directories`: `[]` (was: archive, logs, history)
210
+ - Users must explicitly configure exclusions in `.wiki-config.yaml`
211
+
212
+ ---
213
+
214
+ ## [0.11.0] - 2026-04-10
215
+
216
+ ### Changed
217
+ - **Modular package structure** — evolved from single-file `llmwikify.py`
218
+ - `core/wiki.py` — Wiki class (business logic)
219
+ - `core/index.py` — WikiIndex class (FTS5 + reference tracking)
220
+ - `extractors/` — Content extractors (text, pdf, web, youtube)
221
+ - `cli/commands.py` — CLI interface
222
+ - `mcp/server.py` — MCP server
223
+ - `utils/helpers.py` — Utility functions
224
+ - **Configuration system**: `config.py` with `load_config()`, `get_default_config()`
225
+ - **Public API stability maintained** across refactor
226
+
227
+ ### Added
228
+ - Optional dependencies for extractors (`pymupdf`, `trafilatura`, `youtube-transcript-api`)
229
+
230
+ ---
231
+
232
+ ## [0.10.0] - 2026-04-10
233
+
234
+ ### Changed
235
+ - Renamed core module from `wiki.py` to `llmwikify.py` for consistency
236
+ - Updated version numbering scheme from `v10.0.0` to `v0.10.0`
237
+
238
+ ### Fixed
239
+ - Module import paths in all test files
240
+ - Documentation references to core module
241
+
242
+ ---
243
+
244
+ ## [0.9.0] - 2026-04-09
245
+
246
+ ### Added
247
+ - SQLite FTS5 full-text search
248
+ - Bidirectional reference tracking
249
+ - MCP server support (8 tools)
250
+ - CLI with 15 commands
251
+ - Smart recommendations engine
252
+ - Configuration system (.wiki-config.yaml)
253
+ - Comprehensive test suite (48 tests)
254
+
255
+ ### Features
256
+ - Zero core dependencies (standard library only)
257
+ - Optional dependencies for extended functionality
258
+ - Performance optimized (1000x faster than naive implementation)
259
+ - Pure tool design (zero domain assumptions)
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Your Name
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,10 @@
1
+ include README.md
2
+ include LICENSE
3
+ include CHANGELOG.md
4
+ include pyproject.toml
5
+ recursive-include src/llmwikify *.py
6
+ recursive-include src/llmwikify py.typed
7
+ recursive-include docs *.md
8
+ recursive-include tests *.py
9
+ recursive-exclude tests __pycache__
10
+ recursive-exclude tests *.pyc