llmwikify 0.15.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- llmwikify-0.15.0/CHANGELOG.md +259 -0
- llmwikify-0.15.0/LICENSE +21 -0
- llmwikify-0.15.0/MANIFEST.in +10 -0
- llmwikify-0.15.0/PKG-INFO +624 -0
- llmwikify-0.15.0/README.md +571 -0
- llmwikify-0.15.0/docs/CONFIGURATION_GUIDE.md +342 -0
- llmwikify-0.15.0/docs/CONFIG_GUIDE.md +319 -0
- llmwikify-0.15.0/docs/LLM_WIKI_PRINCIPLES.md +75 -0
- llmwikify-0.15.0/docs/MCP_SETUP.md +364 -0
- llmwikify-0.15.0/docs/REFERENCE_TRACKING_GUIDE.md +241 -0
- llmwikify-0.15.0/docs/SINK_DESIGN_DECISIONS.md +329 -0
- llmwikify-0.15.0/docs/plans/v0.15.0.md +161 -0
- llmwikify-0.15.0/pyproject.toml +137 -0
- llmwikify-0.15.0/setup.cfg +4 -0
- llmwikify-0.15.0/src/llmwikify/__init__.py +66 -0
- llmwikify-0.15.0/src/llmwikify/cli/__init__.py +5 -0
- llmwikify-0.15.0/src/llmwikify/cli/commands.py +658 -0
- llmwikify-0.15.0/src/llmwikify/config.py +186 -0
- llmwikify-0.15.0/src/llmwikify/core/__init__.py +6 -0
- llmwikify-0.15.0/src/llmwikify/core/index.py +309 -0
- llmwikify-0.15.0/src/llmwikify/core/wiki.py +2110 -0
- llmwikify-0.15.0/src/llmwikify/extractors/__init__.py +24 -0
- llmwikify-0.15.0/src/llmwikify/extractors/base.py +92 -0
- llmwikify-0.15.0/src/llmwikify/extractors/html.py +17 -0
- llmwikify-0.15.0/src/llmwikify/extractors/pdf.py +53 -0
- llmwikify-0.15.0/src/llmwikify/extractors/text.py +50 -0
- llmwikify-0.15.0/src/llmwikify/extractors/web.py +53 -0
- llmwikify-0.15.0/src/llmwikify/extractors/youtube.py +75 -0
- llmwikify-0.15.0/src/llmwikify/llm_client.py +142 -0
- llmwikify-0.15.0/src/llmwikify/mcp/__init__.py +5 -0
- llmwikify-0.15.0/src/llmwikify/mcp/server.py +260 -0
- llmwikify-0.15.0/src/llmwikify/py.typed +1 -0
- llmwikify-0.15.0/src/llmwikify/utils/__init__.py +5 -0
- llmwikify-0.15.0/src/llmwikify/utils/helpers.py +17 -0
- llmwikify-0.15.0/src/llmwikify.egg-info/PKG-INFO +624 -0
- llmwikify-0.15.0/src/llmwikify.egg-info/SOURCES.txt +48 -0
- llmwikify-0.15.0/src/llmwikify.egg-info/dependency_links.txt +1 -0
- llmwikify-0.15.0/src/llmwikify.egg-info/entry_points.txt +3 -0
- llmwikify-0.15.0/src/llmwikify.egg-info/requires.txt +29 -0
- llmwikify-0.15.0/src/llmwikify.egg-info/top_level.txt +1 -0
- llmwikify-0.15.0/tests/conftest.py +71 -0
- llmwikify-0.15.0/tests/test_cli.py +124 -0
- llmwikify-0.15.0/tests/test_extractors.py +149 -0
- llmwikify-0.15.0/tests/test_index.py +186 -0
- llmwikify-0.15.0/tests/test_llm_client.py +88 -0
- llmwikify-0.15.0/tests/test_query_flow.py +610 -0
- llmwikify-0.15.0/tests/test_recommend.py +111 -0
- llmwikify-0.15.0/tests/test_sink_flow.py +994 -0
- llmwikify-0.15.0/tests/test_v015_features.py +554 -0
- llmwikify-0.15.0/tests/test_wiki_core.py +637 -0
|
@@ -0,0 +1,259 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to llmwikify will be documented in this file.
|
|
4
|
+
|
|
5
|
+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
|
6
|
+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
|
+
|
|
8
|
+
## [Unreleased]
|
|
9
|
+
|
|
10
|
+
### Planned
|
|
11
|
+
- Incremental index updates
|
|
12
|
+
- Web UI (optional)
|
|
13
|
+
- Graph visualization (graphviz/Mermaid)
|
|
14
|
+
|
|
15
|
+
---
|
|
16
|
+
|
|
17
|
+
## [0.15.0] - 2026-04-10
|
|
18
|
+
|
|
19
|
+
### Added
|
|
20
|
+
- **Enhanced `ingest_source()` metadata** — returns rich file metadata for LLM context:
|
|
21
|
+
- `file_type` — detected from extension (markdown, pdf, text, html, etc.)
|
|
22
|
+
- `file_size` — byte size of the raw source file
|
|
23
|
+
- `word_count` — word count of extracted text
|
|
24
|
+
- `has_images` / `image_count` — detects markdown image references
|
|
25
|
+
- `text_extracted` — boolean flag
|
|
26
|
+
- `content_preview` — first 200 chars of extracted text
|
|
27
|
+
- **No summary returned** — respects "LLM reads source" principle
|
|
28
|
+
- **Clue-based lint detection** — three new observation types, LLM makes final judgment:
|
|
29
|
+
- `dated_claim` (critical, max 3): Pages referencing years ≥3 years older than latest raw source
|
|
30
|
+
- `topic_overlap` (informational, max 2): Query: pages with ≥85% keyword Jaccard overlap
|
|
31
|
+
- `missing_cross_ref` (informational, max 3): Concepts mentioned in 2+ pages without wikilink
|
|
32
|
+
- **`hints` structure in `lint()`** — two-tier classification:
|
|
33
|
+
- `hints.critical[]` — demands attention (max 3)
|
|
34
|
+
- `hints.informational[]` — optional context (max 5)
|
|
35
|
+
- Total max 8 hints per lint pass
|
|
36
|
+
- **`_detect_file_type()` helper** — static method for file extension detection
|
|
37
|
+
|
|
38
|
+
### Changed
|
|
39
|
+
- `lint()` return structure now includes `hints: {critical: [...], informational: [...]}`
|
|
40
|
+
- All lint hints use observational language (non-directive, respects LLM autonomy)
|
|
41
|
+
|
|
42
|
+
### Principle Coverage
|
|
43
|
+
- **Ingest: "LLM reads source, discusses key takeaways"** — ✅ Enhanced (metadata, not summary)
|
|
44
|
+
- **Lint: "stale claims superseded by newer sources"** — ✅ New (dated_claim, clue-based)
|
|
45
|
+
- **Lint: "missing cross-references"** — ✅ New (missing_cross_ref, clue-based)
|
|
46
|
+
- **Zero domain assumption** — ✅ All hints are observations, not judgments
|
|
47
|
+
|
|
48
|
+
---
|
|
49
|
+
|
|
50
|
+
## [0.14.0] - 2026-04-10
|
|
51
|
+
|
|
52
|
+
### Added
|
|
53
|
+
- **`merge_or_replace` parameter** in `synthesize_query()` — replaces `update_existing` with three explicit strategies:
|
|
54
|
+
- `"sink"` (default) — append to sink buffer for later review
|
|
55
|
+
- `"merge"` — LLM reads old content, consolidates, replaces formal page
|
|
56
|
+
- `"replace"` — overwrite the formal page entirely
|
|
57
|
+
- **Sink suggestion generation** — each sink entry now includes actionable observations:
|
|
58
|
+
- **Content Gap detection**: compares new answer with formal page to find missing topics
|
|
59
|
+
- **Source quality analysis**: checks for missing citations, new sources, completeness
|
|
60
|
+
- **Query pattern analysis**: detects repeated questions, increasing complexity trends
|
|
61
|
+
- **Knowledge growth suggestions**: identifies new concepts, possible contradictions
|
|
62
|
+
- **Sink dedup detection**: flags entries with >70% text similarity to existing sink entries
|
|
63
|
+
- **Sink urgency tracking** in `sink_status()`: ok / attention (7d+) / aging (14d+) / stale (30d+)
|
|
64
|
+
- **`sink_warnings` in `lint()`**: reports stale/aging sinks that need attention
|
|
65
|
+
- **Observational hint format** in `synthesize_query()`: non-directive language respects LLM autonomy
|
|
66
|
+
|
|
67
|
+
### Changed
|
|
68
|
+
- **BREAKING**: `update_existing` parameter removed from `synthesize_query()` and MCP tool
|
|
69
|
+
- **BREAKING**: `wiki_synthesize` MCP tool now uses `merge_or_replace` (string enum) instead of `update_existing` (boolean)
|
|
70
|
+
- `synthesize_query()` returns `status: "merged"` or `"replaced"` instead of `"updated"`
|
|
71
|
+
- Hint format changed from `suggestion` (directive) to `observation` + `options` (non-directive)
|
|
72
|
+
|
|
73
|
+
---
|
|
74
|
+
|
|
75
|
+
## [0.13.0] - 2026-04-10
|
|
76
|
+
|
|
77
|
+
### Added
|
|
78
|
+
- **Query Sink feature** — Compound answers without creating duplicate pages
|
|
79
|
+
- When a similar query page exists, new answers append to `sink/` instead of creating timestamped copies
|
|
80
|
+
- Sink files: `sink/Query: Topic.sink.md` — one per formal query page
|
|
81
|
+
- Chronological entries with timestamp, query, answer, and sources
|
|
82
|
+
- Bidirectional linking: sink frontmatter → formal page, formal page frontmatter → sink
|
|
83
|
+
- **New methods on Wiki class**:
|
|
84
|
+
- `sink_status()` — overview of all sinks with entry counts
|
|
85
|
+
- `read_sink(page_name)` — read pending entries from a sink file
|
|
86
|
+
- `clear_sink(page_name)` — clear processed entries after merge
|
|
87
|
+
- `_append_to_sink()` — internal method to append entries
|
|
88
|
+
- `_get_sink_info_for_page()` — get sink status for a page
|
|
89
|
+
- `_find_or_create_sink_file()` — find or create sink file
|
|
90
|
+
- `_update_page_sink_meta()` — update formal page frontmatter with sink metadata
|
|
91
|
+
- **Enhanced `synthesize_query()`**: returns `status: "sunk"` when answer goes to sink
|
|
92
|
+
- **Enhanced `read_page()`**: returns `has_sink` and `sink_entries`, supports reading sink files via `sink/` prefix
|
|
93
|
+
- **Enhanced `search()`**: attaches `has_sink` and `sink_entries` to each result
|
|
94
|
+
- **Enhanced `lint()`**: includes `sink_status` in return value
|
|
95
|
+
- **Enhanced `_update_index_file()`**: shows pending sink entry count in index.md
|
|
96
|
+
- **New MCP tool**: `wiki_sink_status` — overview of all query sinks
|
|
97
|
+
- **Updated wiki.md template**: documents sink workflow and conventions
|
|
98
|
+
- **27 new tests** in `test_sink_flow.py` — comprehensive sink feature coverage
|
|
99
|
+
|
|
100
|
+
---
|
|
101
|
+
|
|
102
|
+
## [0.12.6] - 2026-04-10
|
|
103
|
+
|
|
104
|
+
### Added
|
|
105
|
+
- **wiki_synthesize MCP tool** — Query knowledge compounding cycle
|
|
106
|
+
- `synthesize_query()` saves query answers as persistent wiki pages
|
|
107
|
+
- Auto-generates page names: `Query: {Topic}` with date suffix for duplicates
|
|
108
|
+
- Smart duplicate detection via keyword overlap (Jaccard similarity ≥ 0.3)
|
|
109
|
+
- `update_existing=True` revises existing page instead of creating new
|
|
110
|
+
- Auto-appends structured Sources section:
|
|
111
|
+
- Wiki pages as `[[wikilinks]]`
|
|
112
|
+
- Raw sources as `[Source: filename](raw/path)` markdown links
|
|
113
|
+
- Auto-logs to `log.md` with parseable format: `## [timestamp] query | ... → [[page]]`
|
|
114
|
+
- New page auto-indexed in FTS5 and `index.md`
|
|
115
|
+
- **27 new tests** in `test_query_flow.py` — comprehensive coverage of synthesize scenarios
|
|
116
|
+
|
|
117
|
+
---
|
|
118
|
+
|
|
119
|
+
## [0.12.5] - 2026-04-10
|
|
120
|
+
|
|
121
|
+
### Added
|
|
122
|
+
- **Raw source collection**: All ingest sources unified into `raw/` directory
|
|
123
|
+
- URL/YouTube: extracted text saved to `raw/`
|
|
124
|
+
- Local files outside `raw/`: copied in (cross-platform safe via `read_bytes`/`write_bytes`)
|
|
125
|
+
- Local files already in `raw/`: no copy needed
|
|
126
|
+
- Returns `source_raw_path` and `hint` for LLM guidance
|
|
127
|
+
- **Source citation conventions** in generated `wiki.md`:
|
|
128
|
+
- Raw sources cited with standard markdown links: `[Source: Title](raw/filename)`
|
|
129
|
+
- Explicitly prohibits `[[raw/filename]]` wikilink syntax
|
|
130
|
+
- Two approaches: page-level `## Sources` section or inline citations
|
|
131
|
+
- **MCP config auto-read**: `MCPServer(wiki)` now reads from `wiki.config["mcp"]` when no explicit config passed
|
|
132
|
+
|
|
133
|
+
### Changed
|
|
134
|
+
- `ingest_source()` now returns `source_raw_path` field alongside `source_name`
|
|
135
|
+
- `ingest_source()` instructions updated with citation guidance (step 6-7)
|
|
136
|
+
- Log entries for ingest now include raw path: `Source (url): Title → raw/slug.md`
|
|
137
|
+
|
|
138
|
+
### Testing
|
|
139
|
+
- 4 new test cases: raw collection, skip, duplicate, instruction validation
|
|
140
|
+
- Total: 83 → 87 tests passing
|
|
141
|
+
|
|
142
|
+
---
|
|
143
|
+
|
|
144
|
+
## [0.12.4] - 2026-04-10
|
|
145
|
+
|
|
146
|
+
### Added
|
|
147
|
+
- **wiki_read_schema MCP tool** — Read `wiki.md` (schema/conventions file)
|
|
148
|
+
- Returns content, file path, and hint to save copy before changes
|
|
149
|
+
- **wiki_update_schema MCP tool** — Update `wiki.md` with new conventions
|
|
150
|
+
- Validates format (warnings only, does not block)
|
|
151
|
+
- Returns suggestions for post-update actions
|
|
152
|
+
- **wiki.md reference in ingest** — `ingest_source()` instructions now direct LLM to `wiki.md` for conventions
|
|
153
|
+
|
|
154
|
+
### Changed
|
|
155
|
+
- `ingest_source()` instructions updated: "See wiki.md for wiki conventions and workflows"
|
|
156
|
+
|
|
157
|
+
---
|
|
158
|
+
|
|
159
|
+
## [0.12.3] - 2026-04-10
|
|
160
|
+
|
|
161
|
+
### Changed
|
|
162
|
+
- **Pure-data wiki_ingest**: `ingest_source()` returns extracted data for LLM processing, does NOT automatically create wiki pages
|
|
163
|
+
- **URL raw persistence**: URL/YouTube extracted text always saved to `raw/` for persistence
|
|
164
|
+
- **Optional LLM smart CLI**: `_llm_process_source()` and `execute_operations()` are optional, only used when LLM client configured
|
|
165
|
+
- **Unified error handling**: All operations return structured dicts with `error` key on failure
|
|
166
|
+
|
|
167
|
+
---
|
|
168
|
+
|
|
169
|
+
## [0.12.2] - 2026-04-10
|
|
170
|
+
|
|
171
|
+
### Improved
|
|
172
|
+
- **ON CONFLICT for pages table**: `created_at` preserved on updates, only mutable fields changed
|
|
173
|
+
- **FTS5 snippet highlighting**: Search results include `**highlighted**` snippets via FTS5 snippet function
|
|
174
|
+
- **LIKE fallback**: FTS5 syntax errors gracefully fall back to `LIKE` search
|
|
175
|
+
|
|
176
|
+
---
|
|
177
|
+
|
|
178
|
+
## [0.12.1] - 2026-04-10
|
|
179
|
+
|
|
180
|
+
### Changed
|
|
181
|
+
- **Optimized wiki_init**:
|
|
182
|
+
- Removed `agent` parameter (zero domain assumption)
|
|
183
|
+
- Added `overwrite` parameter for idempotent re-initialization
|
|
184
|
+
- Always skips `wiki.md` and config example if they exist
|
|
185
|
+
- Structured return with `created_files`, `skipped_files`, `existing_files`
|
|
186
|
+
|
|
187
|
+
---
|
|
188
|
+
|
|
189
|
+
## [0.12.0] - 2026-04-10
|
|
190
|
+
|
|
191
|
+
### Added
|
|
192
|
+
- **Phase 1: Complete CLI commands** — 15 commands total
|
|
193
|
+
- `init`, `ingest`, `write_page`, `read_page`, `search`
|
|
194
|
+
- `lint`, `status`, `log`, `references`, `build-index`
|
|
195
|
+
- `export-index`, `batch`, `hint`, `recommend`, `serve`
|
|
196
|
+
- **Auto-index**: `write_page()` automatically updates `index.md`
|
|
197
|
+
- **wiki.md template**: Generated on `init()` with conventions and workflows
|
|
198
|
+
- **hint command**: Smart suggestions for wiki improvement
|
|
199
|
+
- **recommend command**: Missing pages and orphan detection
|
|
200
|
+
|
|
201
|
+
---
|
|
202
|
+
|
|
203
|
+
## [0.11.1] - 2026-04-10
|
|
204
|
+
|
|
205
|
+
### Changed
|
|
206
|
+
- **Enforced zero domain assumption**: All exclusion patterns empty by default
|
|
207
|
+
- `default_exclude_patterns`: `[]` (was: dates, months, quarters)
|
|
208
|
+
- `exclude_frontmatter`: `[]` (was: `redirect_to`)
|
|
209
|
+
- `archive_directories`: `[]` (was: archive, logs, history)
|
|
210
|
+
- Users must explicitly configure exclusions in `.wiki-config.yaml`
|
|
211
|
+
|
|
212
|
+
---
|
|
213
|
+
|
|
214
|
+
## [0.11.0] - 2026-04-10
|
|
215
|
+
|
|
216
|
+
### Changed
|
|
217
|
+
- **Modular package structure** — evolved from single-file `llmwikify.py`
|
|
218
|
+
- `core/wiki.py` — Wiki class (business logic)
|
|
219
|
+
- `core/index.py` — WikiIndex class (FTS5 + reference tracking)
|
|
220
|
+
- `extractors/` — Content extractors (text, pdf, web, youtube)
|
|
221
|
+
- `cli/commands.py` — CLI interface
|
|
222
|
+
- `mcp/server.py` — MCP server
|
|
223
|
+
- `utils/helpers.py` — Utility functions
|
|
224
|
+
- **Configuration system**: `config.py` with `load_config()`, `get_default_config()`
|
|
225
|
+
- **Public API stability maintained** across refactor
|
|
226
|
+
|
|
227
|
+
### Added
|
|
228
|
+
- Optional dependencies for extractors (`pymupdf`, `trafilatura`, `youtube-transcript-api`)
|
|
229
|
+
|
|
230
|
+
---
|
|
231
|
+
|
|
232
|
+
## [0.10.0] - 2026-04-10
|
|
233
|
+
|
|
234
|
+
### Changed
|
|
235
|
+
- Renamed core module from `wiki.py` to `llmwikify.py` for consistency
|
|
236
|
+
- Updated version numbering scheme from `v10.0.0` to `v0.10.0`
|
|
237
|
+
|
|
238
|
+
### Fixed
|
|
239
|
+
- Module import paths in all test files
|
|
240
|
+
- Documentation references to core module
|
|
241
|
+
|
|
242
|
+
---
|
|
243
|
+
|
|
244
|
+
## [0.9.0] - 2026-04-09
|
|
245
|
+
|
|
246
|
+
### Added
|
|
247
|
+
- SQLite FTS5 full-text search
|
|
248
|
+
- Bidirectional reference tracking
|
|
249
|
+
- MCP server support (8 tools)
|
|
250
|
+
- CLI with 15 commands
|
|
251
|
+
- Smart recommendations engine
|
|
252
|
+
- Configuration system (.wiki-config.yaml)
|
|
253
|
+
- Comprehensive test suite (48 tests)
|
|
254
|
+
|
|
255
|
+
### Features
|
|
256
|
+
- Zero core dependencies (standard library only)
|
|
257
|
+
- Optional dependencies for extended functionality
|
|
258
|
+
- Performance optimized (1000x faster than naive implementation)
|
|
259
|
+
- Pure tool design (zero domain assumptions)
|
llmwikify-0.15.0/LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
1
|
+
MIT License
|
|
2
|
+
|
|
3
|
+
Copyright (c) 2026 Your Name
|
|
4
|
+
|
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
|
7
|
+
in the Software without restriction, including without limitation the rights
|
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
|
10
|
+
furnished to do so, subject to the following conditions:
|
|
11
|
+
|
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
|
13
|
+
copies or substantial portions of the Software.
|
|
14
|
+
|
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
|
21
|
+
SOFTWARE.
|
|
@@ -0,0 +1,10 @@
|
|
|
1
|
+
include README.md
|
|
2
|
+
include LICENSE
|
|
3
|
+
include CHANGELOG.md
|
|
4
|
+
include pyproject.toml
|
|
5
|
+
recursive-include src/llmwikify *.py
|
|
6
|
+
recursive-include src/llmwikify py.typed
|
|
7
|
+
recursive-include docs *.md
|
|
8
|
+
recursive-include tests *.py
|
|
9
|
+
recursive-exclude tests __pycache__
|
|
10
|
+
recursive-exclude tests *.pyc
|