@creativeaitools/agent-wiki 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,2584 @@
1
+
2
+ # Agent Wiki v2 Specification
3
+
4
+ Version: 2.0.0
5
+ Last Updated: 2026-06-27
6
+
7
+ ---
8
+
9
+ ## 1. Purpose
10
+
11
+ This specification defines the **v2 format, rules, and runtime expectations** for an AI-agent-compatible wiki.
12
+
13
+ The goal of the system is to make the wiki useful for both:
14
+
15
+ - **humans**, who need readable pages, durable notes, summaries, and workflows
16
+ - **agents**, who need stable structure, normalized records, explicit claims, and machine-facing cache artifacts
17
+
18
+ This spec merges two requirements:
19
+
20
+ 1. a **knowledge ontology** that distinguishes entities, concepts, sources, claims, evidence, relationships, contradictions, and questions
21
+ 2. a **practical wiki architecture** that works as a markdown-first knowledge system with compile-time normalization in either standalone vault mode or embedded workspace mode
22
+
23
+ This document defines the v2 contract for:
24
+
25
+ - operating modes
26
+ - folder layout
27
+ - page types
28
+ - frontmatter fields
29
+ - structured claims and evidence
30
+ - relationship representation
31
+ - compile output files
32
+ - dashboard generation
33
+ - freshness and health rules
34
+ - minimum validation rules
35
+
36
+ ---
37
+
38
+ ## 2. Design Principles
39
+
40
+ The wiki must separate:
41
+
42
+ - **things** from **ideas**
43
+ - **claims** from **evidence**
44
+ - **sources** from **summaries**
45
+ - **facts** from **interpretations**
46
+ - **confidence** from **certainty theater**
47
+ - **human-edited content** from **compiled/generated artifacts**
48
+ - **page structure** from **compiled machine caches**
49
+
50
+ The wiki is intended to act as:
51
+
52
+ - a human-readable knowledge base
53
+ - a belief-tracking layer
54
+ - an agent-friendly context substrate
55
+ - a source-traceable research system
56
+ - a maintenance surface for contradictions, stale content, and open questions
57
+
58
+ ---
59
+
60
+ ## 3. Normative Language
61
+
62
+ The keywords **MUST**, **MUST NOT**, **SHOULD**, **SHOULD NOT**, and **MAY** in this document are to be interpreted as requirement levels.
63
+
64
+ - **MUST**: required for v2 compliance
65
+ - **SHOULD**: strongly recommended unless there is a justified reason not to
66
+ - **MAY**: optional
67
+
68
+ ---
69
+
70
+ ## 4. Scope of v2
71
+
72
+ v2 is intentionally constrained.
73
+
74
+ v2 includes:
75
+
76
+ - vault and workspace operating modes
77
+ - lifecycle CLI commands for initialization and health checks
78
+ - page typing
79
+ - structured claims
80
+ - structured evidence
81
+ - aliases
82
+ - relations
83
+ - generated reports
84
+ - machine-facing compile outputs
85
+
86
+ v2 does **not** require:
87
+
88
+ - a dedicated top-level timeline folder
89
+ - full contradiction pages as primary authoring surfaces
90
+ - a separate metrics/state object system
91
+ - automatic semantic deduplication
92
+ - ontology inference beyond explicit page metadata and claims
93
+
94
+ Those can be added in a later version.
95
+
96
+ ---
97
+
98
+ ## 5. Knowledge Model
99
+
100
+ The system recognizes the following knowledge object types.
101
+
102
+ ### 5.1 Primary object types
103
+
104
+ #### Entity
105
+ A durable thing in the world or system.
106
+
107
+ Examples:
108
+ - person
109
+ - organization
110
+ - project
111
+ - product
112
+ - system
113
+ - place
114
+ - event
115
+ - artifact
116
+ - document-as-thing
117
+
118
+ #### Concept
119
+ An abstract idea, reusable pattern, definition, method, workflow, runbook, checklist, or operational playbook.
120
+
121
+ Examples:
122
+ - principle
123
+ - method
124
+ - workflow pattern
125
+ - theory
126
+ - policy
127
+ - standard
128
+ - abstraction
129
+ - taxonomy definition
130
+ - runbook
131
+ - checklist
132
+ - workflow
133
+ - playbook
134
+
135
+ #### Source
136
+ An origin of information.
137
+
138
+ Examples:
139
+ - PDF
140
+ - webpage
141
+ - article
142
+ - transcript
143
+ - meeting notes
144
+ - email
145
+ - dataset
146
+ - screenshot
147
+ - raw imported file
148
+ - source bridge page
149
+
150
+ #### Claim
151
+ A statement that can be evaluated for support, confidence, freshness, and conflict.
152
+
153
+ #### Evidence
154
+ A bounded support, challenge, or context record attached to a claim.
155
+
156
+ #### Relationship
157
+ A typed connection between two objects.
158
+
159
+ #### Contradiction
160
+ A tracked conflict between claims, sources, definitions, dates, or interpretations.
161
+
162
+ #### Question
163
+ An unresolved uncertainty or research gap.
164
+
165
+
166
+ ### 5.2 Secondary object types
167
+
168
+ #### Synthesis
169
+ A maintained summary, overview, comparison, timeline, or analysis derived from other pages or sources.
170
+
171
+ #### Timeline Event
172
+ A dated event record represented inside an entity page, synthesis page, or compiled cache.
173
+
174
+ #### Alias
175
+ An alternate name for a page/object.
176
+
177
+ #### Metric / State
178
+ Optional quantitative or stateful information. Not a required first-class authored page type in v2.
179
+
180
+ ---
181
+
182
+ ## 6. Operating Modes and Wiki Layout
183
+
184
+ Agent Wiki supports two operating modes.
185
+
186
+ - **Vault mode**: the wiki root is the primary repository or folder. Source material enters through `_inbox/`, `import-link`, or direct source-page creation. Original inbox files are retained in `raw/` after promotion.
187
+ - **Workspace mode**: the wiki root is embedded inside a larger workspace, normally at `workspace/wiki`. Source candidates may live outside the wiki directory and are discovered by workspace scanning, while deliberate captures may still enter through the wiki's `_inbox/`. Original workspace files stay in place and are referenced by canonical source pages through workspace-relative `originPath` values.
188
+
189
+ A v2-compliant wiki MUST have a `wikiType` of `vault` or `workspace`. If `_system/config.json` is absent, tools SHOULD default to `vault` mode for backward compatibility.
190
+
191
+ ### 6.1 Vault Mode Layout
192
+
193
+ A vault-mode wiki SHOULD use the following top-level structure when initialized.
194
+
195
+ ```text
196
+ <wiki>/
197
+ AGENTS.md
198
+ WIKI.md
199
+ overview.md
200
+ index.md
201
+ INBOX.md
202
+
203
+ sources/
204
+ entities/
205
+ concepts/
206
+ claims/
207
+ syntheses/
208
+ questions/
209
+ reports/
210
+ skills/
211
+
212
+ _inbox/
213
+ raw/
214
+ _attachments/
215
+ _archive/
216
+
217
+ _system/
218
+ config.example.json
219
+ cache/
220
+ indexes/
221
+ logs/
222
+ ```
223
+
224
+ Fresh template repositories MAY omit empty runtime/content directories. Initialization tooling and workflows SHOULD create missing directories when they are needed.
225
+
226
+ ### 6.2 Workspace Mode Layout
227
+
228
+ A workspace-mode wiki is stored inside a larger workspace. The default wiki directory is `wiki/`.
229
+
230
+ ```text
231
+ <workspace>/
232
+ docs/
233
+ research/
234
+ decisions/
235
+ wiki/
236
+ AGENTS.md
237
+ WIKI.md
238
+ overview.md
239
+ index.md
240
+ INBOX.md
241
+
242
+ sources/
243
+ entities/
244
+ concepts/
245
+ claims/
246
+ syntheses/
247
+ questions/
248
+ reports/
249
+ skills/
250
+
251
+ _attachments/
252
+ _archive/
253
+
254
+ _system/
255
+ config.json
256
+ config.example.json
257
+ cache/
258
+ indexes/
259
+ logs/
260
+ state/
261
+ ```
262
+
263
+ Workspace mode MUST include `_inbox/`, `_inbox/trash/`, and `raw/` inside the wiki root for deliberate external captures and notes. Workspace discovery MUST still exclude the wiki directory itself, so these inbox/raw folders are not treated as workspace source candidates.
264
+
265
+ Workspace discovery state SHOULD live under `_system/state/` or another deterministic local runtime location inside the wiki root. It is local operational state, not canonical wiki knowledge.
266
+
267
+ ### 6.3 Required top-level files
268
+
269
+ #### `AGENTS.md`
270
+ MUST describe how agents are expected to behave in the wiki.
271
+
272
+ Typical contents:
273
+ - editing conventions
274
+ - generated artifact rules
275
+ - compile expectations
276
+ - page ownership expectations
277
+ - naming conventions
278
+ - what agents may or may not rewrite
279
+
280
+ #### `WIKI.md`
281
+ MUST describe the wiki schema and editorial rules in human-readable form.
282
+
283
+ Typical contents:
284
+ - folder meanings
285
+ - page types
286
+ - claim/evidence rules
287
+ - confidence meanings
288
+ - status vocabularies
289
+ - report meanings
290
+
291
+ #### `index.md`
292
+ SHOULD be the deterministic root-level page catalog.
293
+
294
+ The file SHOULD be regenerated as a whole by `agent-wiki index` from compiled page metadata. It is not a place for durable human-authored prose; use `README.md`, `WIKI.md`, `ONBOARD.md`, or other root documentation for that.
295
+
296
+ #### `overview.md`
297
+ SHOULD be the human-facing landing page for the wiki.
298
+
299
+ The file SHOULD provide a long-form narrative overview of the wiki, including a wiki summary and paragraph-form summaries for each active page type. It MAY be AI-authored or AI-maintained, but it is durable orientation prose and SHOULD NOT be regenerated automatically on every compile.
300
+
301
+ `overview.md` is not evidence, not a generated report, and not a replacement for compiled caches. Claims in `overview.md` SHOULD be treated as orientation unless they are represented in canonical pages, claims, evidence records, or source pages.
302
+
303
+ #### `INBOX.md`
304
+ SHOULD be a short navigation pointer to the durable inbox rules in `WIKI.md` and the operational `process-inbox` skill. It MUST NOT duplicate the full inbox workflow; `WIKI.md` owns lifecycle concepts and the skill owns exact commands.
305
+
306
+ ### 6.3.1 Optional top-level files
307
+
308
+ #### `log.md`
309
+ Operational log entries belong in `_system/logs/log.md`.
310
+
311
+ ### 6.4 Required directories
312
+
313
+ #### `sources/`
314
+ Stores canonical verbatim source pages.
315
+
316
+ Large sources MAY be represented by one parent source page and multiple source part pages under `sources/parts/`.
317
+
318
+ #### `entities/`
319
+ Stores durable thing pages.
320
+
321
+ #### `concepts/`
322
+ Stores concept pages, including workflow, runbook, checklist, and playbook concepts.
323
+
324
+ #### `claims/`
325
+ Stores standalone claim pages representing atomic propositions with dedicated evidence tracking.
326
+
327
+ #### `syntheses/`
328
+ Stores maintained rollups, analyses, comparisons, summaries, and timeline-style syntheses.
329
+
330
+ #### `questions/`
331
+ Stores open question pages.
332
+
333
+
334
+ #### `reports/`
335
+ Stores generated dashboard pages and maintenance views.
336
+
337
+ #### `skills/`
338
+ Stores agent skill definitions at the wiki root so the wiki follows common portable skill conventions. Skills are human-authored operational instructions and supporting files, not authored knowledge pages.
339
+
340
+ #### `_inbox/`
341
+ Stores raw files waiting to be promoted into canonical source pages. This folder exists in both vault and workspace mode. Files in `_inbox/` are not canonical source pages and MUST NOT be treated as evidence for claims.
342
+
343
+ #### `raw/`
344
+ Stores retained original raw files after inbox promotion. This folder exists in both vault and workspace mode. Files in `raw/` are not canonical source pages and MUST NOT be treated as evidence for claims.
345
+
346
+ #### `_attachments/`
347
+ Stores binary assets and attachments referenced by source pages or other pages (PDFs, images, raw files). Created on vault initialization; MAY be empty.
348
+
349
+ #### `_archive/`
350
+ Stores deprecated or no-longer-maintained pages that have been removed from active content folders. Created on vault initialization; MAY be empty.
351
+
352
+ #### `_system/`
353
+ Stores machine-generated runtime and compile artifacts.
354
+
355
+ Sub-directories:
356
+ - `cache/` — compiled artifact outputs (do not hand-edit)
357
+ - `indexes/` — generated index files (do not hand-edit)
358
+ - `logs/` — compile run logs (do not hand-edit)
359
+ - `state/` — local runtime state for deterministic workflows, including workspace source discovery (do not hand-edit)
360
+
361
+ Files:
362
+ - `config.example.json` — tracked example for optional local system configuration
363
+
364
+ The compile pipeline reads from the wiki and writes to `_system/cache/`, `_system/indexes/`, and `_system/logs/`. Utility commands exposed by `agent-wiki` MAY update deterministic generated catalog pages or scaffold new authored pages when explicitly invoked. The root `skills/` directory is not a compile output and is not scanned for page frontmatter.
365
+
366
+ `_system/config.json`, when present, is local operational configuration, not canonical vault knowledge. It SHOULD be ignored by version control and SHOULD NOT be committed to shared template repositories. `_system/config.example.json` SHOULD be tracked when the project wants to document the supported shape of local configuration.
367
+
368
+ Local config SHOULD NOT contain secrets, API keys, access tokens, private credentials, or machine-specific state that changes on every run. Detection results such as whether a converter is currently installed SHOULD be checked at runtime rather than stored as durable truth.
369
+
370
+ Each skill SHOULD live in its own sub-directory under `skills/`, containing at minimum an instruction file. Deterministic operations SHOULD be exposed through the TypeScript `agent-wiki` CLI instead of bundled Python-era script folders. Example layout:
371
+
372
+ ```text
373
+ skills/
374
+ compile-wiki/
375
+ SKILL.md
376
+ process-inbox/
377
+ SKILL.md
378
+ ```
379
+
380
+ ### 6.5 Local system configuration
381
+
382
+ `_system/config.json` MAY define local tool policy and command preferences used by deterministic scripts and skills. The file is optional. Tools SHOULD use conservative defaults when it is absent.
383
+
384
+ Shared repositories SHOULD track `_system/config.example.json` and ignore `_system/config.json`. Operators MAY create `_system/config.json` by copying the example or by approving an onboarding setup action. Agents MUST NOT write `_system/config.json` unless the operator explicitly approves the local choices to persist.
385
+
386
+ Recommended shape:
387
+
388
+ ```json
389
+ {
390
+ "schemaVersion": 1,
391
+ "wikiType": "vault",
392
+ "pythonCommand": null,
393
+ "knownVaults": {
394
+ "my-vault-name": "/absolute/path/to/vault"
395
+ },
396
+ "workspace": {
397
+ "root": null,
398
+ "wikiDir": "wiki",
399
+ "scan": {
400
+ "includeExtensions": [".md", ".markdown", ".txt", ".pdf", ".docx", ".csv", ".json", ".yaml", ".yml"],
401
+ "excludeDirs": [".git", ".hg", ".svn", ".obsidian", ".venv", "venv", "env", "__pycache__", "node_modules", "dist", "build", "_system", "reports"],
402
+ "excludeFileGlobs": ["*.lock", "package-lock.json", "pnpm-lock.yaml", "yarn.lock"]
403
+ }
404
+ },
405
+ "conversion": {
406
+ "enabled": true,
407
+ "defaultBackend": "auto",
408
+ "backendOrder": ["pymupdf4llm", "markitdown"],
409
+ "allowNetwork": false,
410
+ "allowOcr": false,
411
+ "allowLlm": false,
412
+ "allowTranscription": false,
413
+ "allowHostedDocumentIntelligence": false,
414
+ "backends": {
415
+ "pymupdf4llm": {
416
+ "enabled": true,
417
+ "command": null,
418
+ "formats": ["pdf"]
419
+ },
420
+ "markitdown": {
421
+ "enabled": true,
422
+ "command": "markitdown",
423
+ "formats": ["pdf", "docx", "pptx", "xlsx", "html", "csv", "json", "xml", "epub"]
424
+ },
425
+ "arxiv2md": {
426
+ "enabled": false,
427
+ "command": null,
428
+ "formats": ["pdf"]
429
+ },
430
+ "marker": {
431
+ "enabled": false,
432
+ "command": null,
433
+ "formats": ["pdf"]
434
+ }
435
+ }
436
+ }
437
+ }
438
+ ```
439
+
440
+ `wikiType` MUST be either `vault` or `workspace` when present. Missing `wikiType` SHOULD be interpreted as `vault`.
441
+
442
+ `workspace.root` MAY be `null` for vault mode. In workspace mode, it SHOULD be the absolute workspace root when known. `workspace.wikiDir` is the path from the workspace root to the wiki root, defaulting to `wiki`.
443
+
444
+ `workspace.scan` MAY define deterministic workspace discovery policy. Discovery policy is local operational policy, not canonical wiki knowledge. Workspace scanning MUST exclude the wiki directory itself and SHOULD exclude source-control, dependency, build, cache, and generated-output directories.
445
+
446
+ `knownVaults` is an optional object that maps Obsidian vault names (as registered in the Obsidian app) to their absolute paths on the local file system. When present, agents MAY use this map to resolve `obsidian://` cross-vault links to readable file paths. Keys SHOULD match the vault folder name exactly as Obsidian registers it. Values MUST be absolute paths. This field is local operator configuration and MUST NOT be committed to shared template repositories.
447
+
448
+ `pythonCommand` MAY be `null` to use the active environment, `python3`, `python`, or a project-local virtual environment path such as `.venv/bin/python`.
449
+
450
+ Each wiki root remains a single Agent Wiki. In vault mode, the repository or selected root is the wiki root. In workspace mode, the wiki root is the configured wiki directory inside a larger workspace. Skills, scripts, and config files MUST read and write wiki content relative to the selected wiki root unless a workspace-mode command explicitly reads source candidates from the workspace root.
451
+
452
+ The lifecycle CLI MAY track multiple local Agent Wiki roots through a machine-local registry outside any wiki root, conventionally `~/.config/agent-wiki/registry.json`. Registry entries MUST refer only to Agent Wiki roots created or migrated by the CLI. The registry is local operator state, not canonical wiki knowledge, and MUST NOT be stored inside a wiki. Operators MAY target a registered wiki with `agent-wiki --wiki NAME <command>`.
453
+
454
+ Obsidian is an optional editor for the wiki root. Opening the wiki root as an Obsidian vault MUST NOT change where skills or scripts read and write content.
455
+
456
+ Configuration SHOULD express operator policy and preferences, not transient detection state. For example, a backend can be enabled in config but still unavailable at runtime if the command or Python package is not installed. Tools SHOULD detect that condition during execution and report it clearly.
457
+
458
+ Operating system, platform, and shell detection SHOULD be reported by onboarding probes as runtime environment state and SHOULD NOT be persisted in `_system/config.json` unless a future operator policy field requires an explicit override.
459
+
460
+ If `.venv/` is used, it SHOULD be project-local and ignored by version control. Shared template repositories SHOULD NOT require it to exist.
461
+
462
+ `.gitignore` SHOULD include `_system/config.json` and `.venv/` so local setup choices and installed packages do not become shared vault content.
463
+
464
+ ### 6.6 Lifecycle CLI
465
+
466
+ The system SHOULD provide an agent-independent lifecycle CLI. The command name MAY be installed as `agent-wiki`, and the package SHOULD support Node/npm-based local development:
467
+
468
+ ```bash
469
+ npm install
470
+ npm run build
471
+ npm link
472
+ ```
473
+
474
+ The lifecycle CLI SHOULD support initializing a vault-mode wiki:
475
+
476
+ ```bash
477
+ agent-wiki init --type vault --root /path/to/wiki
478
+ agent-wiki registry add MyWiki --root /path/to/wiki --type vault
479
+ agent-wiki --wiki MyWiki onboard --check
480
+ ```
481
+
482
+ It SHOULD support initializing a workspace-mode wiki:
483
+
484
+ ```bash
485
+ agent-wiki init --type workspace --workspace-root /path/to/workspace --wiki-dir wiki
486
+ agent-wiki registry add MyProject --root /path/to/workspace/wiki --type workspace
487
+ agent-wiki --wiki MyProject onboard --check
488
+ ```
489
+
490
+ `init` SHOULD create the required content, generated, system, and inbox lifecycle directories for the selected mode. Both vault mode and workspace mode SHOULD create `_inbox/`, `_inbox/trash/`, and `raw/`.
491
+
492
+ By default, `init` SHOULD write `_system/config.json` with `schemaVersion`, `wikiType`, and workspace settings appropriate to the selected mode. It SHOULD preserve unrelated existing config fields when updating an existing local config. A `--no-config` flag MAY suppress this for advanced bare-skeleton setup or tests.
493
+
494
+ By default, `init` SHOULD copy missing bundled root documentation, root-level `skills/`, package metadata, and `_system/config.example.json` into the wiki. It MUST NOT overwrite existing files. A `--no-template` flag MAY suppress this for advanced bare-skeleton setup or tests.
495
+
496
+ The lifecycle CLI SHOULD provide a read-only health check:
497
+
498
+ ```bash
499
+ agent-wiki doctor --wiki-root /path/to/wiki --type vault
500
+ agent-wiki doctor --wiki-root /path/to/workspace/wiki --type workspace
501
+ agent-wiki --wiki MyWiki doctor
502
+ ```
503
+
504
+ `doctor` SHOULD verify mode-specific required folders, local config sanity, required template script/skill availability, and whether `wikiType` is valid. It MUST NOT create files, write config, install packages, run conversion, or mutate wiki content. It SHOULD return a non-zero exit code only for errors, not warnings or informational findings.
505
+
506
+ The lifecycle CLI SHOULD provide machine-local registry commands:
507
+
508
+ ```bash
509
+ agent-wiki registry add MyWiki --root /path/to/wiki --type vault
510
+ agent-wiki registry show MyWiki
511
+ agent-wiki registry remove MyWiki
512
+ agent-wiki list
513
+ agent-wiki check --all
514
+ agent-wiki check --all --full
515
+ ```
516
+
517
+ `agent-wiki list` SHOULD list registered wiki names, types, and paths. `agent-wiki check --all` SHOULD run a light read-only check across registered wikis using `doctor` and the deterministic onboarding summary. `agent-wiki check --all --full` MAY also run compile and index validation and therefore MAY write generated cache/index files.
518
+
519
+ The lifecycle CLI SHOULD provide scheduled-agent prompt generation for recurring skill-based maintenance:
520
+
521
+ ```bash
522
+ agent-wiki schedule prompt process-inbox
523
+ agent-wiki schedule prompt extract-primitives
524
+ agent-wiki schedule prompt update-overview
525
+ agent-wiki schedule prompt process-inbox MyWiki OtherWiki
526
+ agent-wiki schedule prompt update-overview --wiki MyWiki
527
+ ```
528
+
529
+ Schedule prompt commands MUST print prompts for an external scheduled-agent harness. They MUST NOT execute the skill workflow themselves. By default, schedule prompt commands SHOULD target all registered Agent Wiki roots. Operators MAY target one or more registered wikis by name. Generated prompts SHOULD instruct the scheduled agent to read each wiki's `AGENTS.md` and `WIKI.md`, follow the local skill instructions, log per-wiki failures, and continue to the next wiki.
530
+
531
+ ### 6.7 Workspace Discovery CLI
532
+
533
+ Workspace mode SHOULD provide deterministic discovery commands that operate from the workspace root while storing local state under the wiki root.
534
+
535
+ The CLI SHOULD support:
536
+
537
+ ```bash
538
+ agent-wiki workspace scan --workspace-root /path/to/workspace --wiki-dir wiki --json
539
+ agent-wiki workspace pending --workspace-root /path/to/workspace --wiki-dir wiki --json
540
+ agent-wiki workspace mark-sourced --workspace-root /path/to/workspace --path docs/example.md --source-id source.2026-06-27.document.example --source-path sources/2026-06-27-document-example.md
541
+ ```
542
+
543
+ Workspace scanning SHOULD identify candidate non-code files outside the wiki directory using deterministic include/exclude rules. It SHOULD report file path, modified time, size, extension, content hash, recommended source type, and any known source-page mapping.
544
+
545
+ Workspace discovery MUST NOT semantically read files, create source pages, modify workspace files, move files, delete files, or treat workspace files as canonical evidence. It only reports candidates and records local mapping state.
546
+
547
+ After an agent creates a canonical source page for a workspace file, `mark-sourced` MAY record the relationship between the workspace-relative source path and the wiki source page. That mapping is local operational state and SHOULD NOT replace source-page metadata.
548
+
549
+ ### 6.8 Onboarding probe
550
+
551
+ The system SHOULD provide a deterministic onboarding probe at `agent-wiki onboard`.
552
+
553
+ The probe SHOULD support:
554
+
555
+ ```bash
556
+ agent-wiki onboard --check
557
+ ```
558
+
559
+ It MAY also support a read-only human question helper:
560
+
561
+ ```bash
562
+ agent-wiki onboard --check --questions
563
+ ```
564
+
565
+ `agent-wiki onboard --check` SHOULD inspect local environment capabilities and print a structured report. The report SHOULD be suitable for both human review and agent-guided setup.
566
+
567
+ The probe SHOULD check:
568
+
569
+ - operating system and platform details needed for setup guidance
570
+ - available Python commands, including `python3`, `python`, and `.venv/bin/python`
571
+ - Python versions for available commands
572
+ - whether `.venv/` exists
573
+ - whether `_system/config.json` exists
574
+ - whether `_system/config.json` declares `wikiType`
575
+ - whether `.obsidian/` exists at the wiki root
576
+ - whether mode-specific required runtime/content folders exist
577
+ - whether `import-link` has a local config file and whether it appears configured
578
+ - available local conversion CLI commands, such as `markitdown`, `marker`, and `arxiv2md`
579
+ - importable Python conversion packages under each available Python command, such as `pymupdf4llm`, `markitdown`, and `marker`
580
+
581
+ The probe MUST NOT install packages, create virtual environments, write `_system/config.json`, create folders, modify skill config, or mutate vault content when run with `--check`.
582
+
583
+ Onboarding decisions SHOULD remain operator-driven. Agents MAY use the probe output to ask the operator a short series of setup questions, then run lifecycle commands or write local config only after the operator has approved those actions.
584
+
585
+ `agent-wiki onboard` MAY support an explicit mutating config writer:
586
+
587
+ ```bash
588
+ agent-wiki onboard --write-config --python-command python3 --conversion disabled
589
+ ```
590
+
591
+ `--write-config` MAY create or update local `_system/config.json`. It MUST NOT be implied by `--check` or `--questions`. Agents MUST run it only after the operator has approved the specific local choices to persist.
592
+
593
+ When `_system/config.example.json` exists, `--write-config` SHOULD start from the example shape and update only approved local policy fields. It SHOULD preserve unrelated existing config fields when updating an existing `_system/config.json`.
594
+
595
+ `--write-config` MUST write operator policy and command preferences only. It MUST NOT write transient detection state such as whether a package or command is currently installed. It MUST NOT create `.venv/`, install packages, create folders, modify `skills/import-link/config.json`, or run the compile pipeline.
596
+
597
+ The config writer SHOULD require explicit flags for choices that materially change behavior, including:
598
+
599
+ - `--python-command <command>` for the preferred Python command
600
+ - `--conversion disabled` to keep inbox conversion disabled
601
+ - `--conversion available-local` to enable conversion using only already installed local backends
602
+ - `--conversion custom` when explicit backend choices or policy flags are supplied
603
+
604
+ The supported wiki root for the probe is the current working directory. In workspace mode, callers SHOULD run the probe from the embedded wiki root, not from the workspace root.
605
+
606
+ Network, OCR, LLM, transcription, and hosted document-intelligence conversion behavior MUST remain disabled unless explicitly enabled by dedicated flags. The writer SHOULD report exactly which fields were written.
607
+
608
+ Onboarding questions SHOULD be compact multiple-choice prompts. The operator should be able to answer quickly with letter choices such as `1A 2B 3A`. Agents SHOULD avoid long open-ended setup questions unless a path, command, or credential must be supplied by the operator.
609
+
610
+ Each setup question SHOULD include:
611
+
612
+ - a short label
613
+ - two to four lettered choices
614
+ - a clear recommended choice when one exists
615
+ - a one-sentence consequence for each choice
616
+ - a short answer format, such as `Reply with: 1A 2B 3A 4C`
617
+
618
+ Question text SHOULD be friendly and operational. It SHOULD describe what the choice will do, not narrate internal implementation details.
619
+
620
+ Recommended setup questions include:
621
+
622
+ - which Python command to use
623
+ - whether to use or create a project-local `.venv/`
624
+ - whether inbox conversion should be enabled
625
+ - which conversion backend policy to use
626
+ - whether network, OCR, LLM, transcription, or hosted document-intelligence behavior is allowed
627
+ - whether mode-specific missing runtime folders should be created with `agent-wiki init`
628
+ - whether `_system/config.json` should be written
629
+
630
+ After core onboarding, agents SHOULD recommend optional Obsidian setup when the operator wants an Obsidian workflow. The recommendation SHOULD be concise and operational:
631
+
632
+ 1. Open Obsidian.
633
+ 2. Click the current vault name at the bottom of the file explorer pane, or use Obsidian's vault switcher if the control is not visible.
634
+ 3. Click "Manage vaults..."
635
+ 4. Click "Open folder as vault".
636
+ 5. Navigate to the wiki root.
637
+ 6. Click "Select Folder".
638
+
639
+ Opening the wiki root as an Obsidian vault may create local `.obsidian/` settings. `.obsidian/` is local application state and SHOULD be ignored by version control.
640
+
641
+ ### 6.9 Project development workflow
642
+
643
+ Changes to this project SHOULD move from contract to implementation in a consistent order.
644
+
645
+ When adding a feature or changing project behavior, the recommended workflow is:
646
+
647
+ 1. Update this specification.
648
+ 2. Update configuration files or configuration templates when the change affects operator policy, defaults, or local setup.
649
+ 3. Update deterministic scripts.
650
+ 4. Update skill instructions and skill-local support files.
651
+ 5. Update root-level Markdown documentation other than this specification.
652
+
653
+ Each step SHOULD be skipped when the change does not affect that surface. The specification SHOULD be reviewed first because it defines the contract that configuration, scripts, skills, and root-level documentation implement.
654
+
655
+ ### 6.10 Deterministic page scaffolding
656
+
657
+ The system SHOULD provide a deterministic page scaffolding utility at `agent-wiki create-page`.
658
+
659
+ The page scaffolder exists to reduce schema drift when agents create new authored knowledge pages. It is an operational helper, not an authorship engine. It MUST NOT decide what a page means, invent claims, write interpretations, choose evidence, or synthesize source material on its own. The caller remains responsible for supplying the title, page type, subtype where applicable, body prose, source references, claim references, and other semantic fields.
660
+
661
+ The scaffolder SHOULD support canonical page types:
662
+
663
+ - `source`
664
+ - `entity`
665
+ - `concept`
666
+ - `claim`
667
+ - `question`
668
+ - `synthesis`
669
+
670
+ The scaffolder MUST NOT create generated page types such as `index` or `report`.
671
+
672
+ For `source` pages, the scaffolder is only a deterministic source-page writer. It MUST support ordinary whole source pages, large-source parent manifest pages, and large-source part pages. It MUST NOT fetch links, capture web pages, convert binary files, move raw files, decide split points, perform OCR, call LLMs, or determine whether a large source should be partitioned. Source-oriented workflows such as `import-link` and `process-inbox` own acquisition, conversion, raw-file lifecycle, large-source partitioning decisions, source segment preparation, and provenance gathering. Those workflows MAY call the scaffolder to write validated canonical source pages once they have prepared the verbatim Markdown body and required source metadata.
673
+
674
+ The scaffolder SHOULD provide a command-line interface shaped like:
675
+
676
+ ```bash
677
+ agent-wiki create-page \
678
+ --type synthesis \
679
+ --subtype analysis \
680
+ --slug large-document-ingestion \
681
+ --title "Large Document Ingestion" \
682
+ --body-file /tmp/body.md
683
+ ```
684
+
685
+ The exact option set MAY evolve, but the script SHOULD support:
686
+
687
+ - `--type <pageType>` for the page type.
688
+ - `--subtype <subtype>` for the page-type-specific subtype when applicable, such as `sourceType`, `entityType`, `conceptType`, `claimType`, or `synthesisType`.
689
+ - `--slug <slug>` for the stable filename and ID suffix.
690
+ - `--title <title>` for the page title.
691
+ - `--body-file <path>` for substantive Markdown body prose.
692
+ - `--body <text>` for short body text when shell quoting is safe.
693
+ - repeated reference flags where useful, such as `--source-page <id>`, `--derived-claim <id>`, `--related-page <id>`, or `--tag <tag>`.
694
+ - source-specific flags where useful, such as `--source-url <url>`, `--origin-path <path>`, `--retrieved-at <date>`, `--source-role <whole|parent|part>`, `--source-part <id>`, `--parent-source-id <id>`, `--part-index <n>`, `--part-count <n>`, or `--locator <locator>`.
695
+ - `--dry-run` to print the resolved path and frontmatter without writing.
696
+ - `--no-log` so skills can create several pages and then write one batch log entry.
697
+
698
+ For each created page, the scaffolder MUST:
699
+
700
+ - resolve wiki paths relative to the wiki root
701
+ - select the correct folder for the requested `pageType`
702
+ - construct the stable `id` using this specification's naming rules
703
+ - create the required frontmatter for the page type using the current date for `createdAt` and `updatedAt`
704
+ - map `--subtype` to the correct page-type-specific field
705
+ - derive the filename from the stable ID using the filename rules in Section 8.2
706
+ - refuse to overwrite an existing file unless a future explicit update mode is specified
707
+ - check for duplicate IDs in existing wiki pages before writing
708
+ - require verbatim Markdown body content for `source` pages
709
+ - validate source role requirements: `whole` source pages stand alone; `parent` source pages carry ordered `sourceParts`; `part` source pages carry `parentSourceId`, `partIndex`, `partCount`, and a stable `locator`
710
+ - require substantive Markdown body prose for `entity`, `concept`, `claim`, `question`, and `synthesis` pages
711
+ - write valid Markdown with YAML frontmatter followed by the supplied body prose
712
+
713
+ The scaffolder SHOULD produce predictable, machine-readable console output for success and failure so skills can report results clearly. It SHOULD return a non-zero exit code when validation fails, when the target path already exists, or when the requested ID already exists elsewhere in the vault.
714
+
715
+ Skills that use the scaffolder SHOULD still write one operational log entry after the meaningful skill run or change batch through `agent-wiki log`. They SHOULD NOT rely on the scaffolder to log every individual page when multiple pages are created as part of one operation.
716
+
717
+ ---
718
+
719
+ ## 7. Folder Semantics
720
+
721
+ ### 7.1 `sources/`
722
+
723
+ `source` pages represent verbatim source material. They are created by the `import-link` and `process-inbox` skills.
724
+
725
+ A `source` page SHOULD include:
726
+ - verbatim content (text and images)
727
+ - source metadata
728
+ - attachments (images, pdfs, etc.)
729
+ - retrieval information
730
+
731
+ A page in `sources/` MUST have `pageType: source`.
732
+
733
+ #### 7.1.1 Large sources
734
+
735
+ Large sources SHOULD NOT be stored as one giant markdown body when doing so would make extraction, review, or evidence citation unreliable.
736
+
737
+ When captured or converted source text exceeds the large-source threshold, agents SHOULD create:
738
+
739
+ - one parent source page for the whole source
740
+ - multiple child source part pages for bounded verbatim text segments
741
+
742
+ The parent source page represents the document, transcript, webpage capture, or other source as a whole. It SHOULD include bibliographic metadata, retrieval metadata, attachment references, the retained raw file path when applicable, and a manifest of child source part paths. Its body SHOULD stay short and SHOULD NOT contain the full long-form source text.
743
+
744
+ Source part pages are canonical source pages scoped to a deterministic segment of the parent source. They SHOULD contain the verbatim extracted text for that segment, preserve available locators, and point back to the parent source.
745
+
746
+ Source part pages SHOULD live under:
747
+
748
+ ```text
749
+ sources/parts/
750
+ ```
751
+
752
+ Large-source partitioning SHOULD use deterministic split rules:
753
+
754
+ 1. Prefer semantic boundaries such as chapters, sections, headings, appendix boundaries, transcript topic blocks, or slide boundaries.
755
+ 2. Fall back to page ranges, timestamps, or other stable locators when semantic structure is unavailable.
756
+ 3. Keep each part near a target size of 8,000-15,000 words.
757
+ 4. Do not exceed 20,000 words in one part unless preserving an indivisible structure requires it.
758
+ 5. Merge very small adjacent sections when that preserves meaning and stays within the target size.
759
+ 6. Avoid splitting inside tables, code blocks, quoted blocks, or list structures when possible.
760
+
761
+ A source SHOULD be partitioned when converted text is larger than roughly 25,000 words or when an agent cannot reliably process the full source in one extraction pass. Tools MAY use token estimates instead of word counts, but the chosen threshold SHOULD be stable and documented.
762
+
763
+ Extraction workflows SHOULD process child source part pages, not the parent source body. Evidence SHOULD cite the most specific available source part and locator.
764
+
765
+ The parent source page SHOULD use `status: partitioned` while one or more child parts remain `status: unprocessed`. It SHOULD use `status: processed` only after all child parts have been processed or intentionally archived.
766
+
767
+ #### 7.1.2 Source conversion
768
+
769
+ Raw inbox files MAY be converted to Markdown before source page creation. Conversion is an intake step that produces the text used for canonical `source` pages or source part pages.
770
+
771
+ Plain text and Markdown inbox files SHOULD be treated as already prepared source body files. `process-inbox` SHOULD pass those files, or prepared source-part files derived from them, to `agent-wiki create-page` with `--body-file` rather than copying the body into `--body`. This preserves formatting, avoids shell quoting failures, and lets the scaffolder validate canonical `source` pages from file-backed body content.
772
+
773
+ In vault mode, original raw files SHOULD be retained in `raw/` after successful inbox promotion. Raw files in `_inbox/` or `raw/` are not canonical evidence. The converted source page or source part page is the canonical evidence surface.
774
+
775
+ In workspace mode, original workspace files MUST remain in place. A workspace source page SHOULD use `originPath` to point to the workspace-relative file path. Workspace files outside the wiki directory are discovery inputs until promoted into canonical source pages; they MUST NOT be treated as canonical evidence merely because discovery reported them.
776
+
777
+ Conversion tools SHOULD preserve document structure and stable locators when available, including headings, page ranges, slide numbers, table boundaries, timestamps, and section paths. When a source is partitioned, partition locators SHOULD use the most specific stable locator available from the conversion output.
778
+
779
+ Conversion behavior SHOULD be deterministic:
780
+
781
+ 1. Tools SHOULD use a stable backend order for automatic conversion.
782
+ 2. Tools MUST NOT install converters, model dependencies, or system packages during a skill run.
783
+ 3. Tools MUST NOT call network, cloud OCR, LLM, transcription, or hosted document-intelligence services unless the operator explicitly requests or configures that behavior.
784
+ 4. If no configured local conversion path exists, the source candidate MUST remain where it is and the failure reason SHOULD be reported to the operator. In vault mode that means the raw file remains in `_inbox/`; in workspace mode that means the workspace file remains untouched.
785
+ 5. If conversion succeeds but produces warnings or degraded output, those warnings SHOULD be recorded in source metadata.
786
+
787
+ Automatic conversion tools SHOULD read local policy from `_system/config.json` when that file exists. The config MAY define the preferred Python command, whether conversion is enabled, the automatic backend order, backend-specific command names, and whether network, OCR, LLM, transcription, or hosted document-intelligence behavior is allowed. Missing config SHOULD fall back to conservative local-only defaults.
788
+
789
+ When optional Python converter packages are installed, they SHOULD be installed in a project-local virtual environment such as `.venv/`. Agents MUST NOT create virtual environments or install packages unless the operator explicitly asks them to do so. `.venv/` is local environment state and MUST NOT be treated as vault content.
790
+
791
+ Common local converter backends MAY include:
792
+
793
+ - `pymupdf4llm` for fast local extraction from native-text PDFs
794
+ - `markitdown` for general document-to-Markdown conversion
795
+ - `arxiv2md` for arXiv or academic sources where a structured arXiv source can be identified
796
+ - `marker` for complex PDFs where higher-fidelity local extraction is needed
797
+
798
+ These backend names are examples, not required dependencies. The wiki schema MUST remain stable regardless of the converter used.
799
+
800
+ ### 7.2 `entities/`
801
+
802
+ An `entity` page represents a durable thing.
803
+
804
+ Typical entity kinds:
805
+ - person
806
+ - organization
807
+ - project
808
+ - product
809
+ - system
810
+ - place
811
+ - event
812
+ - artifact
813
+
814
+ A page in `entities/` MUST have `pageType: entity`.
815
+
816
+ ### 7.3 `concepts/`
817
+
818
+ A `concept` page represents a definition, method, abstraction, policy, standard, workflow, runbook, checklist, or operational playbook.
819
+
820
+ A page in `concepts/` MUST have `pageType: concept`.
821
+
822
+ ### 7.4 `syntheses/`
823
+
824
+ A `synthesis` page represents maintained cross-source interpretation or rollup.
825
+
826
+ Examples:
827
+ - overview
828
+ - analysis
829
+ - comparison
830
+ - brief
831
+ - timeline
832
+ - summary
833
+
834
+ A page in `syntheses/` MUST have `pageType: synthesis`.
835
+
836
+ Agents SHOULD create a synthesis page when the user asks for durable interpretation across multiple sources, claims, entities, concepts, or time periods. Synthesis pages are appropriate for briefs, comparisons, literature-style summaries, chronological narratives, decision memos, and maintained analyses that should remain available as authored knowledge.
837
+
838
+ Agents SHOULD NOT create a synthesis page for:
839
+ - a single atomic proposition that belongs in `claims/`
840
+ - a raw or verbatim captured item that belongs in `sources/`
841
+ - an unresolved unknown that belongs in `questions/`
842
+ - a deterministic maintenance output that belongs in `reports/`
843
+ - whole-wiki orientation that belongs in root `overview.md`
844
+
845
+ Synthesis pages are secondary authored interpretation. They MUST preserve uncertainty, identify their source basis, and avoid presenting unsupported conclusions as established fact.
846
+
847
+ ### 7.5 `questions/`
848
+
849
+ A `question` page represents an unresolved issue.
850
+
851
+ A page in `questions/` MUST have `pageType: question`.
852
+
853
+
854
+
855
+ ### 7.6 `claims/`
856
+
857
+ A `claim` page represents a standalone atomic proposition that tracks its own evidence independent of any one source.
858
+
859
+ A page in `claims/` MUST have `pageType: claim`.
860
+
861
+ ### 7.7 `reports/`
862
+
863
+ A `report` page is generated and SHOULD NOT be treated as an authoritative source of truth.
864
+
865
+ A page in `reports/` MUST have `pageType: report` if it includes frontmatter.
866
+
867
+ Reports are views over compiled or source page data.
868
+
869
+ ### 7.8 `index.md`
870
+
871
+ `index.md` is the deterministic root-level page catalog for the wiki.
872
+
873
+ It SHOULD have `pageType: index`. It MUST NOT be typed as `report`.
874
+
875
+ The `index` page type is reserved for wiki-level navigation and page discovery. There is typically only one `index` page per wiki root.
876
+
877
+ `index.md` SHOULD be regenerated as a whole by `agent-wiki index` from `_system/cache/pages.json`. The script MUST NOT independently define page truth; it only renders a deterministic catalog from compiled page metadata.
878
+
879
+ The script SHOULD support:
880
+
881
+ - `--write` to rewrite `index.md`
882
+ - `--check` to verify that `index.md` matches the deterministic rendered output
883
+
884
+ The generated page SHOULD include frontmatter and grouped page tables by `pageType`. It MAY include root documentation files as a separate documentation section when those files are intentionally outside the normal page catalog.
885
+
886
+ Because the whole file is deterministic, agents and humans SHOULD NOT place durable manual prose in `index.md`. Durable orientation content belongs in root documentation files such as `README.md`, `WIKI.md`, `ONBOARD.md`, and `AGENTS.md`.
887
+
888
+ ### 7.9 `overview.md`
889
+
890
+ `overview.md` is the root-level narrative landing page for the wiki.
891
+
892
+ It SHOULD have `pageType: overview`. It MUST NOT be typed as `report`, `index`, or `synthesis`.
893
+
894
+ The `overview` page type is reserved for wiki-level orientation. There is typically only one `overview` page per wiki root, and it SHOULD live at root `overview.md`.
895
+
896
+ The page SHOULD include:
897
+ - a human-facing summary of the wiki
898
+ - paragraph-form summaries of each active page type
899
+ - enough context for a new human reader to understand what is in the wiki and where to go next
900
+
901
+ `overview.md` MAY be written or refreshed by an agent, but it SHOULD be updated intentionally after meaningful content changes rather than regenerated as part of every compile run. It is durable orientation prose, not a deterministic artifact.
902
+
903
+ `overview.md` MUST NOT be treated as primary evidence for claims unless the relevant material has been promoted into canonical source, claim, evidence, or page metadata records.
904
+
905
+ ### 7.10 Authored knowledge page bodies
906
+
907
+ When an agent or human creates an `entity`, `concept`, `claim`, `question`, or `synthesis` page, the page MUST include a substantive Markdown body after the frontmatter.
908
+
909
+ The body SHOULD be detailed, human-facing prose that explains what the page represents, why it matters, and how the structured fields should be understood. It SHOULD NOT be a placeholder, a one-line restatement of the title, or only a machine-readable metadata dump.
910
+
911
+ For each page type, the body SHOULD cover the natural human context for that page:
912
+
913
+ - `entity` pages SHOULD describe the entity, its role in the vault, important identifiers or aliases, and known context or uncertainty.
914
+ - `concept` pages SHOULD explain the concept, its meaning, boundaries, related methods or examples, and any important distinctions.
915
+ - `claim` pages SHOULD restate the proposition in prose, summarize the evidence posture, and note important caveats or uncertainty.
916
+ - `question` pages SHOULD explain why the question exists, what is already known, what remains unresolved, and what would count as resolution.
917
+ - `synthesis` pages SHOULD provide maintained narrative interpretation, scope, source basis, and current conclusions or open tensions.
918
+
919
+ Agents MUST preserve existing human-authored body prose unless the operator explicitly asks for a rewrite.
920
+
921
+ ---
922
+
923
+ ## 8. Page Identity and Naming
924
+
925
+ Each page MUST have a stable `id`.
926
+
927
+ ### 8.1 Requirements
928
+ - `id` MUST be globally unique within the wiki root.
929
+ - *Note: Duplicate IDs will not self-repair. The compiler flags collisions in the console and logs the offending file paths in `_system/logs/`. In the compiled indexes, the last processed file with the duplicate ID will overwrite previous entries.*
930
+ - `id` SHOULD be stable over time
931
+ - `id` SHOULD NOT depend on the page filename alone
932
+ - `id` SHOULD use dotted lowercase namespace-style format
933
+ - *Exception for Source Pages:* Source pages use the format `source.<yyyy-mm-dd>.<sourceType>.<sourceSlug>` to balance semantic density with chronological sorting and collision prevention.
934
+ - *Exception for attachments:* Attachment IDs are generated using `agent-wiki uuid` and stored in the frontmatter of the source page as the value of the `attachments` field. This allows for easy reference to attachments from the source page and ensures that attachments are properly linked to their sources.
935
+ - *Exception for evidence blocks:* Evidence block IDs are generated using `agent-wiki uuid` and stored in the frontmatter of the source page as the value of the `evidence` field. This allows for easy reference to evidence blocks from the source page and ensures that evidence blocks are properly linked to their sources.
936
+
937
+ Examples:
938
+ - `entity.place.riverside-community-garden`
939
+ - `concept.policy.watershed-management`
940
+ - `source.2026-04-12.webpage.urban-tree-canopy`
941
+ - `synthesis.overview.coastal-resilience`
942
+ - `question.accessibility.evacuation-routing`
943
+
944
+ #### Rationale: Dotted Namespaces vs. UUIDs
945
+
946
+ While UUIDs guarantee mathematical uniqueness without central coordination, the dotted lowercase namespace format prioritizes **semantic density** and **agent ergonomics**:
947
+ - **Context at a Glance:** Humans and agents can immediately infer what an ID points to without needing to resolve the node.
948
+ - **Token Efficiency:** Descriptive IDs like `synthesis.overview.coastal-resilience` provide rich metadata at a low token cost.
949
+ - **Collision Prevention:** Scoping IDs by `<pageType>.<namespace>.<slug>` prevents common naming collisions in a flat namespace.
950
+
951
+ ### 8.2 Filenames
952
+ Filenames MAY change. IDs SHOULD remain stable. The id is used to generate filenames, dots are replaced with hyphens. filename format: `source-<yyyy-mm-dd>-<sourceType>-<sourceSlug>.md` or `<idWithHyphens>.md`.
953
+
954
+ ### 8.3 Canonical names
955
+ Entities and concepts SHOULD include `canonicalName`.
956
+
957
+ ### 8.4 Internal linking convention
958
+
959
+ Internal references in wiki-native pages and wiki-native documentation MUST use Obsidian-style wikilinks.
960
+
961
+ ```md
962
+ [[page-slug]]
963
+ [[page-slug|Display Text]]
964
+ [[page-slug#section-heading]]
965
+ ```
966
+
967
+ Standard markdown links (`[text](path)`) MUST NOT be used for internal vault-page references. They MAY be used for external URLs.
968
+
969
+ This convention applies to:
970
+ - page body content
971
+ - wiki-native root docs (`AGENTS.md`, `WIKI.md`, `INBOX.md`, `ONBOARD.md`, `CLAUDE.md`, etc.)
972
+ - the **navigation/display reference fields** in frontmatter: `sourcePages`, `derivedClaims`, `relatedPages`, `relatedClaims`, `extractedEntities`, `extractedConcepts`, `extractedClaims`, `extractedQuestions`, `originPath`
973
+
974
+ Public repository documentation MAY use standard markdown links for repository readability, especially `README.md` when it is intended to render cleanly on GitHub.
975
+
976
+ #### 8.4.1 Reference target vs. display text
977
+
978
+ The wikilink **target is the filename stem, not the page ID** (see §8.2: the id is hyphenated, and the `source.` prefix is dropped for source pages). The page ID is carried as the wikilink **alias** (display text) so the type-prefixed ID stays human-legible:
979
+
980
+ ```yaml
981
+ # id source.2026-04-12.webpage.tidal-flood-map lives in
982
+ # sources/2026-04-12-webpage-tidal-flood-map.md
983
+ sourcePages: ["[[2026-04-12-webpage-tidal-flood-map|source.2026-04-12.webpage.tidal-flood-map]]"]
984
+ derivedClaims: ["[[claim-descriptive-high-tide-risk|claim.descriptive.high-tide-risk]]"]
985
+ originPath: "[[raw/2026-04-12-report|raw/2026-04-12-report.md]]"
986
+ ```
987
+
988
+ Writing the dotted ID directly as the target (`[[source.2026-04-12.webpage.tidal-flood-map]]`) does **not** resolve in Obsidian, because no file is named that. `agent-wiki create-page` wraps supported fields automatically; `agent-wiki migrate-refs-to-links` converts existing pages.
989
+
990
+ #### 8.4.2 Raw-ID fields (MUST NOT be wikilinked)
991
+
992
+ The following frontmatter fields are resolved by **exact ID match** during compilation (`agent-wiki compile` builds an id→page map and looks these up). They MUST remain bare IDs — wrapping them in `[[ ]]` breaks resolution:
993
+
994
+ - `id`, `parentSourceId`, `subjectPageId`, `sourceIds`, `sourceParts`
995
+ - `evidence[].sourceId`, relation `sourceClaimIds`, timeline `sourceIds`
996
+
997
+ In short: **grounding/relationship lists are links; structural pointers used by the compiler are raw IDs.**
998
+
999
+ Skill instruction files SHOULD use explicit relative paths when directing agents to project files, schemas, scripts, or examples. Skills MAY mention wikilinks only when the desired output is wiki-native content that should contain wikilinks.
1000
+
1001
+ Rationale: wikilinks decouple vault references from file system paths, survive renames, and are resolved natively by Obsidian and compatible tooling. Public repository docs have a different audience and SHOULD remain readable in standard markdown renderers.
1002
+
1003
+ ### 8.5 Attachment IDs
1004
+
1005
+ Attachments (binary assets like images, PDFs, etc.) stored in `_attachments/` do not use frontmatter IDs. Instead, their **filename** acts as their unique identifier for internal linking (e.g., via Obsidian wikilinks).
1006
+
1007
+ To prevent silent overwrites in the flat `_attachments/` directory, attachment IDs MUST use the following pattern:
1008
+ `yyyy-mm-dd-<source-slug>-<UUID>-<index>.<ext>`
1009
+
1010
+ - `yyyy-mm-dd`: The date of capture.
1011
+ - `<source-slug>`: The same 4-word summary as the source file.
1012
+ - `<UUID>`: A unique identifier generated specifically for the attachment.
1013
+ - `<index>`: An incremental index (starting at 1) for sources containing multiple attachments.
1014
+
1015
+ ### 8.6 Cross-vault linking
1016
+
1017
+ Pages in one vault MAY reference pages in a separate Obsidian vault using an `obsidian://` URI link.
1018
+
1019
+ An `obsidian://` URI has the form:
1020
+
1021
+ ```
1022
+ obsidian://open?vault=<vault-name>&file=<url-encoded-file-path>
1023
+ ```
1024
+
1025
+ - `<vault-name>` is the name of the target vault as registered in Obsidian (the folder name Obsidian uses to identify the vault).
1026
+ - `<url-encoded-file-path>` is the path to the target file within that vault, URL-encoded (spaces become `%20`, slashes remain `/`).
1027
+
1028
+ To obtain the URI for a target page, open the target vault in Obsidian, right-click the file in the file explorer, and select **Copy Obsidian URL**.
1029
+
1030
+ In the linking page, write the cross-vault reference as a standard markdown link (NOT a wikilink, since wikilinks only resolve within the same vault):
1031
+
1032
+ ```md
1033
+ [Display text](obsidian://open?vault=my-other-vault&file=folder%2Fpage-slug)
1034
+ ```
1035
+
1036
+ Example:
1037
+
1038
+ ```md
1039
+ [Working with multiple vaults](obsidian://open?vault=o4e-06&file=00%20Obsidian%20for%20Everyone%20course%2F00-02%20First%20steps%20with%20Obsidian%2FWorking%20with%20multiple%20vaults)
1040
+ ```
1041
+
1042
+ Cross-vault links are Obsidian-local and will not resolve in GitHub, plain markdown renderers, or agent contexts. Pages that use cross-vault links SHOULD note this limitation in a comment or body prose so future readers and agents do not treat broken links as vault errors.
1043
+
1044
+ #### Agent resolution of `obsidian://` URIs
1045
+
1046
+ Agents MUST NOT attempt to launch or dispatch `obsidian://` URIs through the OS protocol handler.
1047
+
1048
+ An agent MAY resolve an `obsidian://` URI to a readable file path when `knownVaults` is present in `_system/config.json`. The resolution procedure is:
1049
+
1050
+ 1. Parse the URI query string and extract the `vault` and `file` parameters.
1051
+ 2. URL-decode the `file` parameter (replace `%20` with space, `%2F` with `/`, etc.) to obtain the relative file path within the target vault.
1052
+ 3. Append `.md` if the decoded path has no file extension.
1053
+ 4. Look up the `vault` value as a key in `knownVaults`. If the key is absent, stop and report that the vault is not configured locally.
1054
+ 5. Construct the full absolute file path: `<knownVaults[vault]>/<decoded-file-path>`.
1055
+ 6. Verify the file exists before reading. If it does not exist, report the missing path rather than silently failing.
1056
+
1057
+ When `knownVaults` is absent or the target vault is not listed, agents MUST treat the `obsidian://` URI as an opaque external reference and MUST NOT guess or scan for the target vault root.
1058
+
1059
+ ---
1060
+
1061
+ ## 9. Required Universal Frontmatter
1062
+
1063
+ Every authored page except purely generated disposable report pages SHOULD include frontmatter.
1064
+
1065
+ Minimum universal frontmatter:
1066
+
1067
+ ```yaml
1068
+ id: entity.place.riverside-community-garden
1069
+ pageType: entity
1070
+ title: Riverside Community Garden
1071
+ status: active
1072
+ createdAt: 2026-04-12
1073
+ updatedAt: 2026-04-12
1074
+ aliases: []
1075
+ tags: []
1076
+ ```
1077
+
1078
+ ### 9.1 Universal fields
1079
+
1080
+ #### `id`
1081
+ Type: string
1082
+ Required: yes
1083
+
1084
+ #### `pageType`
1085
+ Type: enum
1086
+ Required: yes
1087
+
1088
+ Allowed values:
1089
+ - `source`
1090
+ - `entity`
1091
+ - `concept`
1092
+ - `claim`
1093
+ - `synthesis`
1094
+ - `question`
1095
+ - `report`
1096
+ - `index`
1097
+ - `overview`
1098
+
1099
+ #### `title`
1100
+ Type: string
1101
+ Required: yes
1102
+
1103
+ #### `status`
1104
+ Type: string
1105
+ Required: yes
1106
+ Interpretation depends partly on page type.
1107
+
1108
+ #### `createdAt`
1109
+ Type: date (`YYYY-MM-DD`)
1110
+ Required: yes
1111
+
1112
+ #### `updatedAt`
1113
+ Type: date (`YYYY-MM-DD`)
1114
+ Required: yes
1115
+
1116
+ #### `aliases`
1117
+ Type: string[]
1118
+ Required: yes, but MAY be empty
1119
+
1120
+ #### `tags`
1121
+ Type: string[]
1122
+ Required: yes, but MAY be empty
1123
+
1124
+ ### 9.2 Recommended universal fields
1125
+
1126
+ ```yaml
1127
+ canonicalName: <Canonical Name>
1128
+ owner:
1129
+ summary:
1130
+ sourcePages: []
1131
+ relatedPages: []
1132
+ confidence:
1133
+ freshness:
1134
+ ```
1135
+
1136
+ These are optional in v2, but strongly recommended where applicable.
1137
+
1138
+ ---
1139
+
1140
+ ## 10. Page-Type Specific Frontmatter
1141
+
1142
+ This section defines the pure schema templates for each page type, followed by a concrete example.
1143
+
1144
+ ### 10.1 Source pages
1145
+
1146
+ **Schema:**
1147
+ ```yaml
1148
+ id: source.<yyyy-mm-dd>.<sourceType>.<sourceSlug>
1149
+ pageType: source
1150
+ title: <title>
1151
+ status: <status>
1152
+ sourceType: <sourceType>
1153
+ sourceRole: <sourceRole>
1154
+ parentSourceId: <sourceId>
1155
+ partIndex: <number>
1156
+ partCount: <number>
1157
+ locator: <locator>
1158
+ sourceParts: []
1159
+ originUrl: <url>
1160
+ originPath: <wikilink-to-local-raw-file>
1161
+ convertedAt: <yyyy-mm-dd>
1162
+ conversionTool: <tool>
1163
+ conversionToolVersion: <version>
1164
+ conversionBackend: <backend>
1165
+ conversionWarnings: []
1166
+ publishedAt: <yyyy-mm-dd>
1167
+ retrievedAt: <yyyy-mm-dd>
1168
+ updatedAt: <yyyy-mm-dd>
1169
+ createdAt: <yyyy-mm-dd>
1170
+ aliases: []
1171
+ tags: []
1172
+ attachments: []
1173
+ ```
1174
+
1175
+ **Example:**
1176
+ ```yaml
1177
+ id: source.2026-04-28.webpage.urban-tree-canopy
1178
+ pageType: source
1179
+ title: Urban Tree Canopy Assessment
1180
+ status: processed
1181
+ sourceType: webpage
1182
+ sourceRole: whole
1183
+ parentSourceId:
1184
+ partIndex:
1185
+ partCount:
1186
+ locator:
1187
+ sourceParts: []
1188
+ originUrl: https://example.com/reports/urban-tree-canopy
1189
+ originPath:
1190
+ convertedAt:
1191
+ conversionTool:
1192
+ conversionToolVersion:
1193
+ conversionBackend:
1194
+ conversionWarnings: []
1195
+ publishedAt: 2026-04-25
1196
+ retrievedAt: 2026-04-28
1197
+ updatedAt: 2026-04-28
1198
+ createdAt: 2026-04-28
1199
+ aliases: []
1200
+ tags: [urban-planning, tree-canopy]
1201
+ attachments: []
1202
+ ```
1203
+
1204
+ #### `status`
1205
+
1206
+ Allowed values:
1207
+ - `unprocessed`
1208
+ - `partitioned`
1209
+ - `processed`
1210
+ - `archived`
1211
+
1212
+ #### `sourceType`
1213
+
1214
+ Allowed values:
1215
+ - `webpage`
1216
+ - `article`
1217
+ - `document`
1218
+ - `pdf`
1219
+ - `transcript`
1220
+ - `email`
1221
+ - `meeting-notes`
1222
+ - `dataset`
1223
+ - `screenshot`
1224
+ - `bridge`
1225
+ - `import`
1226
+ - `other`
1227
+
1228
+ #### `sourceRole`
1229
+
1230
+ Allowed values:
1231
+ - `whole`
1232
+ - `parent`
1233
+ - `part`
1234
+
1235
+ Use `whole` for ordinary source pages that contain the complete captured source body in one page.
1236
+
1237
+ Use `parent` for the parent page of a large partitioned source. Parent source pages SHOULD include `sourceParts` and SHOULD NOT include the full long-form verbatim source body.
1238
+
1239
+ Use `part` for child source part pages. Part source pages SHOULD include `parentSourceId`, `partIndex`, `partCount`, and `locator` when available.
1240
+
1241
+ #### `sourceParts`
1242
+
1243
+ Ordered relative paths to child source part pages. This field SHOULD be present on parent source pages and empty or omitted on ordinary source pages and part pages.
1244
+
1245
+ #### `parentSourceId`
1246
+
1247
+ The source ID of the parent source page. This field SHOULD be present on part source pages and empty or omitted on ordinary source pages and parent source pages.
1248
+
1249
+ #### `partIndex`
1250
+
1251
+ One-based ordinal for a source part within its parent source. This field SHOULD be present on part source pages.
1252
+
1253
+ #### `partCount`
1254
+
1255
+ Total number of source parts for the parent source. This field SHOULD be present on part source pages and MAY be present on parent source pages.
1256
+
1257
+ #### `locator`
1258
+
1259
+ A stable locator for the part within the parent source, such as page range, heading path, timestamp range, slide range, or section range.
1260
+
1261
+ Source pages SHOULD include `originUrl` for externally retrieved material. Source pages promoted from local raw inbox files MAY use `originPath` instead. At least one of `originUrl` or `originPath` SHOULD be present.
1262
+
1263
+ #### Conversion provenance
1264
+
1265
+ Source pages SHOULD include conversion provenance when the canonical source body was produced by converting a raw file or external asset into Markdown.
1266
+
1267
+ Recommended fields:
1268
+
1269
+ - `convertedAt` - date the conversion was performed
1270
+ - `conversionTool` - converter or wrapper used
1271
+ - `conversionToolVersion` - converter version when available
1272
+ - `conversionBackend` - selected backend when the tool supports multiple backends
1273
+ - `conversionWarnings` - ordered list of warnings, quality notes, skipped content, or degraded extraction notices
1274
+
1275
+ For partitioned sources, conversion provenance SHOULD appear on the parent source page and MAY also appear on child source part pages when part-level conversion details differ. Child source parts SHOULD still include locators that let evidence point back to the relevant portion of the converted source.
1276
+
1277
+ Large source parent IDs SHOULD use the ordinary source ID format:
1278
+
1279
+ ```text
1280
+ source.<yyyy-mm-dd>.<sourceType>.<sourceSlug>
1281
+ ```
1282
+
1283
+ Large source part IDs SHOULD append a stable part suffix:
1284
+
1285
+ ```text
1286
+ source.<yyyy-mm-dd>.<sourceType>.<sourceSlug>.part<nnn>
1287
+ ```
1288
+
1289
+ Large source part filenames SHOULD preserve the same ordering:
1290
+
1291
+ ```text
1292
+ sources/parts/<yyyy-mm-dd>-<sourceType>-<sourceSlug>-part<nnn>.md
1293
+ ```
1294
+
1295
+ ### 10.2 Entity pages
1296
+
1297
+ **Schema:**
1298
+ ```yaml
1299
+ id: entity.<entityType>.<entitySlug>
1300
+ pageType: entity
1301
+ title: <title>
1302
+ entityType: <entityType>
1303
+ canonicalName: <canonicalName>
1304
+ status: active
1305
+ createdAt: <yyyy-mm-dd>
1306
+ updatedAt: <yyyy-mm-dd>
1307
+ aliases: []
1308
+ tags: []
1309
+ ```
1310
+
1311
+ **Example:**
1312
+ ```yaml
1313
+ id: entity.place.riverside-community-garden
1314
+ pageType: entity
1315
+ title: Riverside Community Garden
1316
+ entityType: place
1317
+ canonicalName: Riverside Community Garden
1318
+ status: active
1319
+ createdAt: 2026-04-12
1320
+ updatedAt: 2026-04-12
1321
+ aliases: [riverside-garden]
1322
+ tags: [urban-agriculture]
1323
+ ```
1324
+
1325
+ #### `entityType`
1326
+ Allowed values:
1327
+ - `person`
1328
+ - `organization`
1329
+ - `project`
1330
+ - `product`
1331
+ - `system`
1332
+ - `place`
1333
+ - `event`
1334
+ - `artifact`
1335
+ - `document`
1336
+ - `other`
1337
+
1338
+ ### 10.3 Concept pages
1339
+
1340
+ **Schema:**
1341
+ ```yaml
1342
+ id: concept.<conceptType>.<conceptSlug>
1343
+ pageType: concept
1344
+ title: <title>
1345
+ conceptType: <conceptType>
1346
+ status: active
1347
+ createdAt: <yyyy-mm-dd>
1348
+ updatedAt: <yyyy-mm-dd>
1349
+ aliases: []
1350
+ tags: []
1351
+ ```
1352
+
1353
+ **Example:**
1354
+ ```yaml
1355
+ id: concept.method.adaptive-reuse
1356
+ pageType: concept
1357
+ title: Adaptive Reuse
1358
+ conceptType: method
1359
+ status: active
1360
+ createdAt: 2026-04-12
1361
+ updatedAt: 2026-04-12
1362
+ aliases: [building-reuse]
1363
+ tags: [architecture]
1364
+ ```
1365
+
1366
+ #### `conceptType`
1367
+ Allowed values:
1368
+ - `definition`
1369
+ - `principle`
1370
+ - `framework`
1371
+ - `method`
1372
+ - `policy`
1373
+ - `standard`
1374
+ - `pattern`
1375
+ - `workflow`
1376
+ - `runbook`
1377
+ - `checklist`
1378
+ - `playbook`
1379
+ - `theory`
1380
+ - `taxonomy`
1381
+ - `other`
1382
+
1383
+ ### 10.4 Synthesis pages
1384
+
1385
+ **Schema:**
1386
+ ```yaml
1387
+ id: synthesis.<synthesisType>.<synthesisSlug>
1388
+ pageType: synthesis
1389
+ title: <title>
1390
+ synthesisType: <synthesisType>
1391
+ scope: <scope>
1392
+ status: active
1393
+ sourcePages: []
1394
+ derivedClaims: []
1395
+ createdAt: <yyyy-mm-dd>
1396
+ updatedAt: <yyyy-mm-dd>
1397
+ aliases: []
1398
+ tags: []
1399
+ ```
1400
+
1401
+ **Example:**
1402
+ ```yaml
1403
+ id: synthesis.overview.coastal-resilience
1404
+ pageType: synthesis
1405
+ title: Coastal Resilience Overview
1406
+ synthesisType: overview
1407
+ scope: coastal flood mitigation
1408
+ status: active
1409
+ sourcePages: ["[[2026-04-12-webpage-tidal-flood-map|source.2026-04-12.webpage.tidal-flood-map]]"]
1410
+ derivedClaims: ["[[claim-descriptive-high-tide-risk|claim.descriptive.high-tide-risk]]"]
1411
+ createdAt: 2026-04-12
1412
+ updatedAt: 2026-04-12
1413
+ aliases: []
1414
+ tags: [climate-resilience]
1415
+ ```
1416
+
1417
+ #### `synthesisType`
1418
+ Allowed values:
1419
+ - `summary`
1420
+ - `overview`
1421
+ - `analysis`
1422
+ - `timeline`
1423
+ - `brief`
1424
+ - `comparison`
1425
+
1426
+ ### 10.5 Synthesis workflow rules
1427
+
1428
+ Synthesis pages are durable authored knowledge, not deterministic reports. They combine judgment, source selection, prose, and uncertainty management. A synthesis page MAY cite source pages, derived claim pages, related entities, related concepts, questions, or prior syntheses, but it MUST remain clear about what is directly sourced and what is interpretive.
1429
+
1430
+ #### When to create a synthesis
1431
+
1432
+ Agents SHOULD create a synthesis page when at least one of the following is true:
1433
+ - the user asks to synthesize, compare, summarize, brief, analyze, or narrate across more than one source or page
1434
+ - several claims or sources need an integrated explanation that is more useful than a flat list
1435
+ - the vault needs a durable current-state brief for a topic, project, decision area, or research thread
1436
+ - a chronological account is needed and the chronology is more naturally maintained as a narrative than as isolated timeline records
1437
+ - contradictions, open questions, or competing interpretations need to be held together in one maintained reading
1438
+
1439
+ Agents SHOULD update an existing synthesis instead of creating a new one when the existing page has the same scope, audience, and synthesis type. Agents SHOULD create a new synthesis when the scope, time horizon, audience, or analytical question is materially different.
1440
+
1441
+ #### Expected body structure
1442
+
1443
+ The body of a synthesis page MUST be substantive Markdown prose. It SHOULD be written for a human reader and SHOULD contain enough context to stand alone without requiring the reader to inspect every referenced source first.
1444
+
1445
+ A synthesis body SHOULD normally include:
1446
+ - scope and purpose
1447
+ - source basis or coverage
1448
+ - main synthesis in paragraph form
1449
+ - important evidence, claims, or examples
1450
+ - uncertainty, limits, contradictions, or unresolved questions
1451
+ - current conclusion or next-step implication, when appropriate
1452
+
1453
+ Timeline syntheses SHOULD include a chronological narrative and MAY also include a structured `timeline:` field when individual events need deterministic extraction into `_system/cache/timeline-events.json`.
1454
+
1455
+ Comparison syntheses SHOULD make comparison dimensions explicit. Brief syntheses SHOULD prioritize concise conclusions and decision-relevant context. Analysis syntheses SHOULD explain reasoning and uncertainty instead of only listing findings.
1456
+
1457
+ #### Source and evidence grounding
1458
+
1459
+ Synthesis pages MUST list their source basis in `sourcePages` when source pages are used. If the synthesis relies on established claim pages, it SHOULD list them in `derivedClaims`.
1460
+
1461
+ Synthesis prose SHOULD cite the most specific canonical source page, source part, claim page, or question page needed to support the discussion. Large-document syntheses SHOULD cite source part pages rather than only parent source manifests when the relevant evidence came from a specific part.
1462
+
1463
+ Synthesis pages MUST NOT be used to launder unsupported assertions into accepted knowledge. If a synthesis introduces an atomic proposition that should be tracked independently, the agent SHOULD create or update a claim page and reference it from `derivedClaims`.
1464
+
1465
+ Evidence entries for claim pages SHOULD point back to canonical source pages whenever possible. They SHOULD NOT point only to a synthesis page unless the synthesis itself is the best available authored source for an interpretive claim about the wiki's analysis.
1466
+
1467
+ When the evidence base is incomplete, contested, or weak, the synthesis body MUST say so plainly. Agents MUST preserve minority views, contradictions, and caveats that matter to the synthesis question.
1468
+
1469
+ #### Maintenance rules
1470
+
1471
+ When a synthesis page is meaningfully changed, the agent MUST update `updatedAt`. If the source basis changes, the agent SHOULD update `sourcePages`, `derivedClaims`, and any relevant relationships at the same time.
1472
+
1473
+ Agents SHOULD maintain synthesis pages by revising the existing body prose in place, while preserving human-authored material unless the operator explicitly asks for a rewrite. If a prior conclusion becomes stale or contradicted, agents SHOULD revise the conclusion and record the reason in prose rather than silently removing the older context.
1474
+
1475
+ Agents SHOULD create question pages for unresolved issues discovered during synthesis when the question is important enough to track independently. Agents SHOULD create or update claim pages for important atomic assertions that need evidence tracking outside the synthesis body.
1476
+
1477
+ Synthesis pages SHOULD be refreshed intentionally after meaningful new sources or claims are added to their scope. They SHOULD NOT be regenerated automatically during every compile run.
1478
+
1479
+ #### Skill boundary
1480
+
1481
+ The deterministic page scaffolder MAY create the initial synthesis page file and required frontmatter, but it does not decide what to synthesize or write the synthesis body.
1482
+
1483
+ A dedicated synthesis skill SHOULD be added if agents are expected to frequently handle requests such as "synthesize these sources", "write a brief", "compare these documents", "summarize this research thread", or "make a timeline synthesis". Such a skill SHOULD own source and claim selection, synthesis type selection, body prose, uncertainty handling, updates to related claim/question records, and operational logging.
1484
+
1485
+ ### 10.6 Question pages
1486
+
1487
+ Questions are first-class authored pages in v2.
1488
+
1489
+ They represent known unknowns, unresolved research tasks, or ambiguity the system should not erase.
1490
+
1491
+ #### Question rules
1492
+
1493
+ - Questions MUST have stable IDs.
1494
+ - Questions MUST link related pages or claims.
1495
+ - Resolved questions MUST remain in the vault with updated status, not be deleted.
1496
+
1497
+ **Schema:**
1498
+ ```yaml
1499
+ id: question.<domain>.<questionSlug>
1500
+ pageType: question
1501
+ title: <title>
1502
+ priority: <priority>
1503
+ status: open
1504
+ relatedClaims: []
1505
+ relatedPages: []
1506
+ openedAt: <yyyy-mm-dd>
1507
+ createdAt: <yyyy-mm-dd>
1508
+ updatedAt: <yyyy-mm-dd>
1509
+ aliases: []
1510
+ tags: []
1511
+ ```
1512
+
1513
+ **Example:**
1514
+ ```yaml
1515
+ id: question.accessibility.evacuation-routing
1516
+ pageType: question
1517
+ title: Which evacuation routes are accessible during high-water events?
1518
+ priority: high
1519
+ status: open
1520
+ relatedClaims: []
1521
+ relatedPages: []
1522
+ openedAt: 2026-04-12
1523
+ createdAt: 2026-04-12
1524
+ updatedAt: 2026-04-12
1525
+ aliases: []
1526
+ tags: [emergency-planning]
1527
+ ```
1528
+
1529
+ #### `priority`
1530
+ Allowed values:
1531
+ - `low`
1532
+ - `medium`
1533
+ - `high`
1534
+ - `critical`
1535
+
1536
+ #### `status`
1537
+ Allowed values for question pages:
1538
+ - `open`
1539
+ - `researching`
1540
+ - `blocked`
1541
+ - `resolved`
1542
+ - `dropped`
1543
+
1544
+ ### 10.7 Claim pages
1545
+
1546
+ See also: Section 11. Structured Claims.
1547
+
1548
+ **Schema:**
1549
+ ```yaml
1550
+ id: claim.<claimType>.<claimSlug>
1551
+ pageType: claim
1552
+ title: <title>
1553
+ claimType: <claimType>
1554
+ status: <status>
1555
+ confidence: <float>
1556
+ text: <text>
1557
+ subjectPageId: <page-id>
1558
+ sourceIds: []
1559
+ evidence: []
1560
+ createdAt: <yyyy-mm-dd>
1561
+ updatedAt: <yyyy-mm-dd>
1562
+ aliases: []
1563
+ tags: []
1564
+ ```
1565
+
1566
+ **Example:**
1567
+ ```yaml
1568
+ id: claim.historical.library-reopened-2024
1569
+ pageType: claim
1570
+ title: Northside Library reopened in 2024
1571
+ claimType: historical
1572
+ status: supported
1573
+ confidence: 0.90
1574
+ text: Northside Library reopened to the public in 2024 after seismic upgrades were completed.
1575
+ subjectPageId: entity.place.northside-library
1576
+ sourceIds:
1577
+ - source.2026-04-12.library-renovation-notice
1578
+ evidence: []
1579
+ createdAt: 2026-04-12
1580
+ updatedAt: 2026-04-12
1581
+ aliases: []
1582
+ tags: []
1583
+ ```
1584
+
1585
+ ---
1586
+
1587
+ ## 11. Structured Claims
1588
+
1589
+ Claims are a primary **pagetype** in the system. They are authored as top-level, standalone files in the `claims/` directory.
1590
+
1591
+ For v2, Standalone Claim Pages are the normative shape. However, pages MAY also contain zero or more embedded claims in their frontmatter under the `claims:` key for convenience. Both formats are parsed identically by the compile pipeline.
1592
+
1593
+ ### 11.1 Claim shape
1594
+
1595
+ **Schema:**
1596
+ ```yaml
1597
+ claims:
1598
+ - id: claim.<claimType>.<claimSlug>
1599
+ text: <text>
1600
+ status: <status>
1601
+ confidence: <float>
1602
+ claimType: <claimType>
1603
+ relatedClaimIds: []
1604
+ evidence:
1605
+ - id: <evidenceId>
1606
+ sourceId: <sourceId>
1607
+ path: <sourcePath>
1608
+ lines: <lineRange>
1609
+ kind: <kind>
1610
+ relation: <relation>
1611
+ weight: <float>
1612
+ note: <note>
1613
+ excerpt: <text>
1614
+ retrievedAt: <yyyy-mm-dd>
1615
+ updatedAt: <yyyy-mm-dd>
1616
+ createdAt: <yyyy-mm-dd>
1617
+ updatedAt: <yyyy-mm-dd>
1618
+ validFrom: <yyyy-mm-dd>
1619
+ validTo: <yyyy-mm-dd>
1620
+ ```
1621
+
1622
+ **Example:**
1623
+ ```yaml
1624
+ claims:
1625
+ - id: claim.descriptive.school-energy-retrofit
1626
+ text: The Lincoln Middle School heat-pump retrofit reduced annual building energy use by 18 percent.
1627
+ status: supported
1628
+ confidence: 0.91
1629
+ claimType: descriptive
1630
+ relatedClaimIds: []
1631
+ evidence:
1632
+ - id: evidence.quote.supports.a1b2c3d4
1633
+ sourceId: source.2026-04-12.webpage.school-energy-audit
1634
+ path: sources/2026-04-12.webpage.school-energy-audit.md
1635
+ lines: 55-79
1636
+ kind: quote
1637
+ relation: supports
1638
+ weight: 0.86
1639
+ note: The audit compares normalized energy use before and after the retrofit.
1640
+ excerpt: "Weather-normalized annual energy consumption fell by 18 percent after commissioning."
1641
+ retrievedAt: 2026-04-12
1642
+ updatedAt: 2026-04-12
1643
+ createdAt: 2026-04-12
1644
+ updatedAt: 2026-04-12
1645
+ validFrom: 2026-04-12
1646
+ validTo:
1647
+ ```
1648
+
1649
+ ### 11.2 Required claim fields
1650
+
1651
+ #### `id`
1652
+ Type: string
1653
+ Required: yes
1654
+ Must be globally unique.
1655
+
1656
+ #### `text`
1657
+ Type: string
1658
+ Required: yes
1659
+
1660
+ #### `status`
1661
+ Type: enum
1662
+ Required: yes
1663
+
1664
+ Allowed values:
1665
+ - `supported`
1666
+ - `weakly_supported`
1667
+ - `inferred`
1668
+ - `unverified`
1669
+ - `contested`
1670
+ - `contradicted`
1671
+ - `deprecated`
1672
+
1673
+ #### `confidence`
1674
+ Type: number
1675
+ Required: yes
1676
+ Range: `0.0` to `1.0`
1677
+
1678
+ #### `claimType`
1679
+ Type: enum
1680
+ Required: yes
1681
+
1682
+ Allowed values:
1683
+ - `descriptive`
1684
+ - `historical`
1685
+ - `causal`
1686
+ - `interpretive`
1687
+ - `normative`
1688
+ - `forecast`
1689
+
1690
+ #### `evidence`
1691
+ Type: array
1692
+ Required: yes, but MAY be empty in draft states
1693
+
1694
+ #### `createdAt`
1695
+ Type: date
1696
+ Required: yes
1697
+
1698
+ #### `updatedAt`
1699
+ Type: date
1700
+ Required: yes
1701
+
1702
+ ### 11.3 Optional claim fields
1703
+
1704
+ - `relatedClaimIds: string[]`
1705
+ - `validFrom: date | null`
1706
+ - `validTo: date | null`
1707
+ - `tags: string[]`
1708
+ - `note: string`
1709
+
1710
+ ### 11.4 Claim rules
1711
+
1712
+ - Claim IDs MUST be stable.
1713
+ - Claim IDs MUST be unique across the vault.
1714
+ - Claims SHOULD be atomic and not overloaded.
1715
+ - A claim SHOULD express one proposition, not several glued together.
1716
+ - A claim MAY be attached to entity, concept, source, synthesis, or question pages when appropriate.
1717
+ - Pages SHOULD NOT hide all important assertions in prose if those assertions matter for machine use.
1718
+
1719
+ ---
1720
+
1721
+ ## 12. Evidence
1722
+
1723
+ Evidence entries attach provenance and support semantics to a claim.
1724
+
1725
+ ### 12.1 Evidence shape
1726
+
1727
+ **Schema:**
1728
+ ```yaml
1729
+ evidence:
1730
+ - id: evidence.<kind>.<relation>.<uuid>
1731
+ sourceId: <source-id>
1732
+ path: <source-path>
1733
+ lines: <line-range>
1734
+ kind: <kind>
1735
+ relation: <relation>
1736
+ weight: <float>
1737
+ note: <note>
1738
+ excerpt: <text>
1739
+ retrievedAt: <yyyy-mm-dd>
1740
+ updatedAt: <yyyy-mm-dd>
1741
+ ```
1742
+
1743
+ **Example:**
1744
+ ```yaml
1745
+ evidence:
1746
+ - id: evidence.quote.supports.a1b2c3d4
1747
+ sourceId: source.2026-04-28.article.urban-tree-canopy
1748
+ path: sources/2026-04-28.article.urban-tree-canopy.md
1749
+ lines: 10-18
1750
+ kind: quote
1751
+ relation: supports
1752
+ weight: 0.82
1753
+ note: Direct statement from the canopy assessment
1754
+ excerpt: "..."
1755
+ retrievedAt: 2026-04-12
1756
+ updatedAt: 2026-04-12
1757
+ ```
1758
+
1759
+ Nested block YAML is the canonical representation for evidence records. Obsidian's Properties UI MAY display nested evidence lists as JSON-like inline objects instead of editable nested fields. That display is cosmetic and MUST NOT be treated as a schema violation when the underlying Markdown frontmatter is valid YAML matching this shape.
1760
+
1761
+ ### 12.2 Required evidence fields
1762
+
1763
+ #### `id`
1764
+ Type: string
1765
+ Required: yes
1766
+
1767
+ #### `sourceId`
1768
+ Type: string
1769
+ Required: yes
1770
+ Must reference an existing source page ID when possible.
1771
+
1772
+ #### `path`
1773
+ Type: string
1774
+ Required: yes
1775
+ Path to the supporting page or source page.
1776
+
1777
+ #### `kind`
1778
+ Type: enum
1779
+ Required: yes
1780
+
1781
+ Allowed values:
1782
+ - `quote`
1783
+ - `summary`
1784
+ - `measurement`
1785
+ - `observation`
1786
+ - `screenshot`
1787
+ - `transcript`
1788
+ - `inference`
1789
+
1790
+ #### `relation`
1791
+ Type: enum
1792
+ Required: yes
1793
+
1794
+ Allowed values:
1795
+ - `supports`
1796
+ - `weakens`
1797
+ - `contradicts`
1798
+ - `context_only`
1799
+
1800
+ #### `weight`
1801
+ Type: number
1802
+ Required: yes
1803
+ Range: `0.0` to `1.0`
1804
+
1805
+ #### `updatedAt`
1806
+ Type: date
1807
+ Required: yes
1808
+
1809
+ ### 12.3 Optional evidence fields
1810
+
1811
+ - `lines: string`
1812
+ - `note: string`
1813
+ - `excerpt: string`
1814
+ - `retrievedAt: date`
1815
+ - `locatorText: string`
1816
+
1817
+ ### 12.4 Evidence rules
1818
+
1819
+ - Evidence MUST not imply stronger support than it actually provides.
1820
+ - `context_only` evidence MUST NOT be treated as direct support during compile scoring.
1821
+ - Evidence SHOULD point back to a source page, not only to a synthesis page, whenever possible.
1822
+ - Claims SHOULD have at least one evidence item to avoid appearing in evidence-gap reports.
1823
+ - Evidence entries MAY represent negative evidence using `weakens` or `contradicts`.
1824
+
1825
+ ---
1826
+
1827
+ ## 13. Relationships
1828
+
1829
+ Relationships are explicit machine-readable edges between objects.
1830
+
1831
+ Relationships MAY be authored in page frontmatter under `relations:`.
1832
+
1833
+ ### 13.1 Relationship shape
1834
+
1835
+ **Schema:**
1836
+ ```yaml
1837
+ relations:
1838
+ - subject: <subject-id>
1839
+ predicate: <predicate>
1840
+ object: <object-id>
1841
+ confidence: <float>
1842
+ sourceClaimIds: []
1843
+ ```
1844
+
1845
+ **Example:**
1846
+ ```yaml
1847
+ relations:
1848
+ - subject: entity.place.lincoln-middle-school
1849
+ predicate: uses
1850
+ object: entity.system.ground-source-heat-pump
1851
+ confidence: 0.88
1852
+ sourceClaimIds: ["claim.descriptive.school-energy-retrofit"]
1853
+ ```
1854
+
1855
+ ### 13.2 Required relationship fields
1856
+
1857
+ #### `subject`
1858
+ Type: string
1859
+ Required: yes
1860
+
1861
+ #### `predicate`
1862
+ Type: enum/string
1863
+ Required: yes
1864
+
1865
+ #### `object`
1866
+ Type: string
1867
+ Required: yes
1868
+
1869
+ #### `confidence`
1870
+ Type: number
1871
+ Required: yes
1872
+ Range: `0.0` to `1.0`
1873
+
1874
+ ### 13.3 Optional relationship fields
1875
+
1876
+ - `sourceClaimIds: string[]`
1877
+ - `note: string`
1878
+ - `updatedAt: date`
1879
+
1880
+ ### 13.4 Recommended predicates
1881
+
1882
+ v2 SHOULD use a controlled predicate set:
1883
+
1884
+ - `is_a`
1885
+ - `part_of`
1886
+ - `depends_on`
1887
+ - `uses`
1888
+ - `produces`
1889
+ - `founded_by`
1890
+ - `owned_by`
1891
+ - `located_in`
1892
+ - `related_to`
1893
+ - `supports`
1894
+ - `contradicts`
1895
+ - `mentions`
1896
+ - `applies_to`
1897
+ - `derived_from`
1898
+
1899
+ ### 13.5 Relationship rules
1900
+
1901
+ - Relationship IDs are optional in v2, but compiled output MAY assign normalized IDs.
1902
+ - Relationships SHOULD be grounded by source claims where possible.
1903
+ - Freeform predicates SHOULD be avoided in v2.
1904
+
1905
+ ---
1906
+
1907
+ ## 14. Contradictions
1908
+
1909
+ v2 tracks contradictions primarily through compiled outputs and reports.
1910
+
1911
+ Contradictions MAY also be represented in page content.
1912
+
1913
+ v2 does not require contradiction pages, but the compiler MUST be able to surface contradiction records.
1914
+
1915
+ ### 14.1 Compiled contradiction shape
1916
+
1917
+ **Schema:**
1918
+ ```yaml
1919
+ id: contradiction.<contradictionType>.<contradictionSlug>
1920
+ type: <type>
1921
+ status: <status>
1922
+ summary: <summary>
1923
+ claimIds: []
1924
+ sourceIds: []
1925
+ resolution: <resolution>
1926
+ updatedAt: <yyyy-mm-dd>
1927
+ ```
1928
+
1929
+ **Example:**
1930
+ ```yaml
1931
+ id: contradiction.interpretation-conflict.ferry-ridership
1932
+ type: interpretation_conflict
1933
+ status: open
1934
+ summary: Two claims disagree on whether weekend ferry ridership has recovered to pre-closure levels.
1935
+ claimIds:
1936
+ - claim.descriptive.ferry-ridership-recovered
1937
+ - claim.descriptive.ferry-ridership-still-depressed
1938
+ sourceIds:
1939
+ - source.2026-04-20.ferry-ridership-dashboard
1940
+ resolution:
1941
+ updatedAt: 2026-04-29
1942
+ ```
1943
+
1944
+ ### 14.2 Required fields
1945
+
1946
+ - `id`
1947
+ - `type`
1948
+ - `status`
1949
+ - `summary`
1950
+ - `claimIds`
1951
+ - `updatedAt`
1952
+
1953
+ ### 14.3 Allowed contradiction types
1954
+
1955
+ - `direct-conflict`
1956
+ - `date-conflict`
1957
+ - `scope-conflict`
1958
+ - `definition-conflict`
1959
+ - `interpretation-conflict`
1960
+
1961
+ ### 14.4 Allowed contradiction status values
1962
+
1963
+ - `open`
1964
+ - `under-review`
1965
+ - `resolved`
1966
+ - `dismissed`
1967
+
1968
+ ### 14.5 Detection strategy
1969
+
1970
+ **Explicit detection (compiled from flags):**
1971
+ - Claims with `status: contradicted`
1972
+ - Evidence entries with `relation: contradicts`
1973
+
1974
+ **Semantic conflict detection (cross-claim analysis):**
1975
+
1976
+ The compiler SHOULD also detect implicit conflicts by comparing claims that share the same `subjectPageId`.
1977
+
1978
+ - **Date conflict** (`type: date-conflict`): Two or more `claimType: historical` claims on the same subject that have different `date` field values and are both in an active (non-deprecated, non-contradicted) status.
1979
+ - **Scope conflict** (`type: scope-conflict`): Claims with `status: contested` coexisting with claims of `status: supported` or `weakly-supported` on the same subject, indicating active unresolved disagreement.
1980
+
1981
+ Semantic contradiction detection operates on structured fields only. It does not perform natural-language text comparison.
1982
+
1983
+ ---
1984
+
1985
+ ## 15. Timelines
1986
+
1987
+ Timelines represent dated events and temporal changes tied to pages in the wiki. They exist to support chronology, historical tracking, date-based retrieval, and temporal conflict detection.
1988
+
1989
+ Timeline data does not require a top-level `timelines/` folder in v2. It is represented through page-level `timeline:` records, synthesis pages with `synthesisType: timeline`, and compiled timeline cache output.
1990
+ ### 15.1 Structure
1991
+
1992
+ Timeline entries MUST be represented under a `timeline:` field.
1993
+
1994
+ **Schema:**
1995
+
1996
+ ```yaml
1997
+ timeline:
1998
+ - id: tl.<slug>.<index>
1999
+ date: <yyyy-mm-dd>
2000
+ endDate: <yyyy-mm-dd>
2001
+ text: <text>
2002
+ eventType: <eventType>
2003
+ status: <status>
2004
+ confidence: <float>
2005
+ relatedClaims: []
2006
+ sourceIds: []
2007
+ updatedAt: <yyyy-mm-dd>
2008
+ ```
2009
+
2010
+ **Example:**
2011
+
2012
+ ```yaml
2013
+ timeline:
2014
+ - id: tl.riverside-garden.001
2015
+ date: 2026-04-12
2016
+ endDate:
2017
+ text: Riverside Community Garden opened its spring seedling exchange.
2018
+ eventType: community-event
2019
+ status: supported
2020
+ confidence: 0.90
2021
+ relatedClaims:
2022
+ - "[[claim.historical.seedling-exchange-opened]]"
2023
+ sourceIds:
2024
+ - source.2026-04-12.webpage.garden-newsletter
2025
+ updatedAt: 2026-04-12
2026
+ ```
2027
+
2028
+ ### 15.2 Required and Optional Fields
2029
+
2030
+ **Required fields:**
2031
+ - `id`
2032
+ - `date`
2033
+ - `text`
2034
+
2035
+ **Optional fields:**
2036
+ - `endDate`
2037
+ - `eventType`
2038
+ - `status`
2039
+ - `confidence`
2040
+ - `relatedClaims`
2041
+ - `sourceIds`
2042
+ - `relatedPages`
2043
+ - `note`
2044
+ - `createdAt`
2045
+ - `updatedAt`
2046
+
2047
+ ### 15.3 Placement and Semantics
2048
+
2049
+ Timeline entries MAY appear on any authored page type when that page is the natural owner of the event, including entity, concept, source, synthesis, and question pages.
2050
+
2051
+ A timeline entry SHALL be authored on the page that most naturally owns the event. It SHOULD reference related claims and source IDs when the event matters for reasoning, retrieval, or contradiction analysis.
2052
+
2053
+ For a single-point event, use `date`. For a bounded range, use both `date` and `endDate`.
2054
+
2055
+ A synthesis page that acts as a dedicated chronology SHALL use:
2056
+
2057
+ ```yaml
2058
+ pageType: synthesis
2059
+ synthesisType: timeline
2060
+ ```
2061
+
2062
+ ### 15.4 Compile and Validation
2063
+
2064
+ The compile pipeline SHOULD extract timeline entries into:
2065
+
2066
+ ```text
2067
+ _system/cache/timeline-events.json
2068
+ ```
2069
+
2070
+ This cache is used for chronological queries, filtering, timeline reports, and temporal conflict detection.
2071
+
2072
+ A v2 validator SHOULD check:
2073
+ - every timeline entry has an `id`
2074
+ - every timeline entry has a valid `date`
2075
+ - every timeline entry has `text`
2076
+ - timeline IDs are unique
2077
+ - `endDate`, if present, is not earlier than `date`
2078
+ - referenced claim IDs and source IDs exist when possible
2079
+
2080
+ The compiler SHOULD flag timeline conflicts when multiple entries appear to describe the same event but disagree on date, range, or ordering.
2081
+
2082
+ ---
2083
+
2084
+ ## 16. Aliases
2085
+
2086
+ Entities and concepts SHOULD include aliases when relevant.
2087
+
2088
+ **Example:**
2089
+ ```yaml
2090
+ canonicalName: Riverside Community Garden
2091
+ aliases:
2092
+ - riverside-garden
2093
+ - river-garden
2094
+ ```
2095
+
2096
+ Alias support exists to improve:
2097
+ - search
2098
+ - deduplication
2099
+ - matching
2100
+ - claim linking
2101
+ - prompt grounding
2102
+
2103
+ ---
2104
+
2105
+ ## 17. Authoritative Sources of Truth
2106
+
2107
+ The system has multiple layers with different authorities.
2108
+
2109
+ ### 17.1 Authoritative layers
2110
+
2111
+ Primary truth sources:
2112
+ 1. page frontmatter
2113
+ 2. authored page content where structured references exist
2114
+ 3. compiled caches derived from the above
2115
+
2116
+ ### 17.2 Non-authoritative layers
2117
+
2118
+ These are views, not truth sources:
2119
+ - `reports/`
2120
+ - `_system/logs/log.md`
2121
+ - ad hoc dashboard summaries
2122
+ - search indexes
2123
+ - prompt supplements that do not round-trip back to pages
2124
+
2125
+ ### 17.3 Rule
2126
+ Compiled outputs SHALL reflect page truth.
2127
+ Reports SHALL reflect compiled or page truth.
2128
+ Reports SHALL NOT silently become the canonical data layer.
2129
+
2130
+ ### 17.4 Documentation layers
2131
+
2132
+ The project documentation has separate audiences. Agents SHOULD load the smallest authoritative document that can answer the current task.
2133
+
2134
+ Documentation layers:
2135
+
2136
+ - `AGENTS.md` — agent behavior contract for editing, compiling, linking, logging, and preserving human content.
2137
+ - `WIKI.md` — compact runtime schema, editorial guide, page type summary, ID formats, status enums, and examples for ordinary vault operations.
2138
+ - `INBOX.md` — short pointer to `WIKI.md` inbox rules and the `process-inbox` skill.
2139
+ - `ONBOARD.md` — first-run setup and local environment configuration workflow.
2140
+ - `AGENT-WIKI-SPEC-v2.md` — full project and development contract for maintainers, system changes, script behavior, validation rules, compatibility rules, and unresolved ambiguity.
2141
+
2142
+ Skills and ordinary wiki operations SHOULD prefer `WIKI.md` for schema, allowed enums, ID formats, and examples. They SHOULD consult `AGENT-WIKI-SPEC-v2.md` only when changing project behavior, updating scripts, updating skills, modifying configuration policy, resolving ambiguity, or when `WIKI.md` does not contain enough detail.
2143
+
2144
+ If `WIKI.md` conflicts with `AGENT-WIKI-SPEC-v2.md`, the full specification remains canonical until the conflict is resolved.
2145
+
2146
+ ---
2147
+
2148
+ ## 18. Compile Pipeline
2149
+
2150
+ The compile step reads the authored wiki and emits stable machine-facing artifacts.
2151
+
2152
+ ### 18.1 Compile goals
2153
+
2154
+ The compile pipeline exists so agents and runtime code do not need to scrape arbitrary markdown.
2155
+
2156
+ It MUST:
2157
+ - normalize page metadata
2158
+ - extract claims
2159
+ - extract evidence
2160
+ - extract relations
2161
+ - compute health signals
2162
+ - emit stable cache files
2163
+ - generate reports
2164
+
2165
+ ### 18.2 Minimum v2 compile outputs
2166
+
2167
+ The following files MUST be emitted under `_system/cache/`:
2168
+
2169
+ - `agent-digest.json`
2170
+ - `claims.jsonl`
2171
+ - `pages.json`
2172
+ - `relations.jsonl`
2173
+
2174
+ The following files SHOULD also be emitted:
2175
+
2176
+ - `contradictions.json`
2177
+ - `questions.json`
2178
+ - `timeline-events.json`
2179
+ - `source-index.json`
2180
+
2181
+ ### 18.3 Required cache files
2182
+
2183
+ #### `agent-digest.json`
2184
+ Purpose:
2185
+ - compact high-signal prompt supplement
2186
+ - runtime context pack
2187
+ - first-pass retrieval layer
2188
+
2189
+ This file SHOULD contain:
2190
+ - key page summaries
2191
+ - important claims
2192
+ - notable open questions
2193
+ - notable contradictions
2194
+ - high-priority entity/concept summaries
2195
+
2196
+ #### `claims.jsonl`
2197
+ Purpose:
2198
+ - claim-level retrieval
2199
+ - fast lookup by claim ID
2200
+ - status/confidence filtering
2201
+ - backlinks to owning pages
2202
+
2203
+ Each line SHOULD contain:
2204
+ - normalized claim record
2205
+ - owning page ID
2206
+ - page path
2207
+ - evidence summary
2208
+ - freshness info if available
2209
+
2210
+ #### `pages.json`
2211
+ Purpose:
2212
+ - normalized metadata index for all pages
2213
+
2214
+ Each page record SHOULD include:
2215
+ - `id`
2216
+ - `pageType`
2217
+ - `title`
2218
+ - `path`
2219
+ - `status`
2220
+ - `updatedAt`
2221
+ - `aliases`
2222
+ - `tags`
2223
+ - page-type-specific metadata
2224
+ - counts for claims/relations/questions if available
2225
+
2226
+ #### `relations.jsonl`
2227
+ Purpose:
2228
+ - graph edge retrieval
2229
+ - relationship traversal
2230
+ - cheap graph context generation
2231
+
2232
+ Each line SHOULD contain:
2233
+ - normalized subject
2234
+ - predicate
2235
+ - object
2236
+ - page source
2237
+ - source claim IDs if present
2238
+ - confidence
2239
+
2240
+ ### 18.4 Recommended cache files
2241
+
2242
+ #### `contradictions.json`
2243
+ Conflict registry.
2244
+
2245
+ #### `questions.json`
2246
+ Open question registry.
2247
+
2248
+ #### `timeline-events.json`
2249
+ Chronological event index.
2250
+
2251
+ #### `source-index.json`
2252
+ Source metadata registry.
2253
+
2254
+ The source index SHOULD preserve large-source structure when present. Source records SHOULD include `sourceRole`, `parentSourceId`, `sourceParts`, `partIndex`, `partCount`, and `locator` when those fields exist on source pages.
2255
+
2256
+ Tools SHOULD be able to answer:
2257
+
2258
+ - which source parts belong to a parent source
2259
+ - which parent source owns a source part
2260
+ - which source parts remain `status: unprocessed`
2261
+ - which locator should be used for evidence citation
2262
+
2263
+ ### 18.5 Agent digest limits
2264
+
2265
+ The `agent-digest.json` output truncates content to keep the file compact for use as a prompt supplement. Implementations SHOULD define these as named constants so they can be tuned as vault size grows.
2266
+
2267
+ | Constant | Default | Description |
2268
+ |---|---|---|
2269
+ | `MAX_DIGEST_KEY_PAGES` | `50` | Max entity/concept pages included |
2270
+ | `MAX_DIGEST_CLAIMS` | `30` | Max top supported claims included |
2271
+ | `MAX_DIGEST_QUESTIONS` | `20` | Max open question pages included |
2272
+ | `MAX_DIGEST_CONTRADICTIONS` | `10` | Max open contradictions included |
2273
+
2274
+ Implementations MUST NOT silently discard high-value pages due to truncation without surfacing the total counts in `vaultStats`. Operators SHOULD increase limits if `vaultStats` shows totals significantly exceeding the defaults.
2275
+
2276
+ ---
2277
+
2278
+ ## 19. Search and Indexes
2279
+
2280
+ The compiler MAY emit additional indexes under `_system/indexes/`.
2281
+
2282
+ Examples:
2283
+ - alias index
2284
+ - tag index
2285
+ - page type index
2286
+ - stale page index
2287
+ - path-to-id index
2288
+ - id-to-path index
2289
+
2290
+ These indexes are implementation details and not normative v2 authored data.
2291
+
2292
+ ---
2293
+
2294
+ ## 20. Reports
2295
+
2296
+ Reports are generated maintenance views.
2297
+
2298
+ ### 20.1 Required reports
2299
+
2300
+ When dashboard generation is enabled, the system SHOULD generate:
2301
+
2302
+ - `reports/open-questions.md`
2303
+ - `reports/contradictions.md`
2304
+ - `reports/low-confidence.md`
2305
+ - `reports/claim-health.md`
2306
+ - `reports/stale-pages.md`
2307
+
2308
+ ### 20.2 Recommended additional reports
2309
+
2310
+ - `reports/orphaned-claims.md`
2311
+ - `reports/evidence-gaps.md`
2312
+ - `reports/relationship-gaps.md`
2313
+ - `reports/timeline-conflicts.md`
2314
+
2315
+ #### 20.3 Report rules
2316
+
2317
+ - Reports SHOULD be fully regenerable.
2318
+ - Reports SHOULD NOT be treated as primary truth.
2319
+ - Compiler-generated reports SHOULD be treated as fully replaceable generated files.
2320
+ - Reports SHOULD identify the compile timestamp.
2321
+
2322
+ ---
2323
+
2324
+ ## 21. Logs
2325
+
2326
+ Logs capture operational history. They do not replace page frontmatter, structured claims, evidence, or compile caches.
2327
+
2328
+ ### 21.1 Log locations
2329
+
2330
+ - `_system/logs/log.md` is the canonical operational log for generated compile events and meaningful skill runs or change batches.
2331
+ - Files in `_system/logs/` SHOULD be written by tooling and MUST NOT be hand-edited.
2332
+
2333
+ ### 21.2 Log authority
2334
+
2335
+ Logs are non-authoritative operational records. Agents and tooling MUST NOT treat log entries as primary evidence for claims unless the relevant material is promoted into a canonical `source` page.
2336
+
2337
+ ### 21.3 `_system/logs/log.md` entries
2338
+
2339
+ Entries in `_system/logs/log.md` SHOULD be prepended so the most recent entry appears first. Entries SHOULD be concise. Each entry SHOULD include:
2340
+
2341
+ - date
2342
+ - actor or tool, when known
2343
+ - changed area
2344
+ - short reason or outcome
2345
+
2346
+ Skills SHOULD write one log entry after each meaningful skill run or change batch. They SHOULD NOT log every individual file write when those writes are part of one coherent operation.
2347
+
2348
+ Trivial report/cache regeneration does not need a log entry unless it records a meaningful vault change or operational incident.
2349
+
2350
+ ### 21.4 Log writer
2351
+
2352
+ Operational log entries SHOULD be written through `agent-wiki log`.
2353
+
2354
+ The log writer MUST support:
2355
+
2356
+ ```bash
2357
+ agent-wiki log --message "<message>"
2358
+ ```
2359
+
2360
+ The log writer MUST prepend the new entry to `_system/logs/log.md`.
2361
+
2362
+ ---
2363
+
2364
+ ## 22. Health Rules
2365
+
2366
+ The system SHOULD compute health signals at compile time.
2367
+
2368
+ ### 22.1 Low confidence
2369
+
2370
+ A claim SHOULD be considered low confidence when:
2371
+ - `confidence < 0.50`
2372
+ - or status is `weakly_supported`, `unverified`, or `contested`
2373
+
2374
+ Exact threshold MAY be configurable, but SHOULD be stable.
2375
+
2376
+ #### 22.2 Evidence gaps
2377
+
2378
+ A claim SHOULD appear in evidence-gap reporting when:
2379
+ - it has zero evidence entries
2380
+ - or only `context_only` evidence exists
2381
+
2382
+ ### 22.3 Staleness
2383
+
2384
+ A page or claim MAY be considered stale when:
2385
+ - `updatedAt` exceeds configured freshness expectations
2386
+ - or linked source retrieval dates are old
2387
+ - or evidence is old relative to domain expectations
2388
+
2389
+ v2 does not prescribe one universal stale threshold because domains vary.
2390
+
2391
+ #### 22.4 Contradictions
2392
+
2393
+ A contradiction SHOULD be surfaced when:
2394
+ - two claims with overlapping scope conflict materially
2395
+ - evidence relations include `contradicts`
2396
+ - a claim status is `contradicted`
2397
+ - multiple source-backed dates or definitions disagree
2398
+
2399
+ ---
2400
+
2401
+ ## 23. Freshness Model
2402
+
2403
+ Freshness SHOULD be tracked at multiple levels when possible.
2404
+
2405
+ ### 23.1 Recommended fields
2406
+ - page `updatedAt`
2407
+ - claim `updatedAt`
2408
+ - evidence `updatedAt`
2409
+ - source `publishedAt`
2410
+ - source `retrievedAt`
2411
+
2412
+ #### 23.2 Rule
2413
+ A recently edited page is not automatically a fresh page.
2414
+ Compile SHOULD distinguish between recent edits and recent underlying evidence.
2415
+
2416
+ ---
2417
+
2418
+ ## 24. Validation Rules
2419
+
2420
+ A v2 validator SHOULD check the following.
2421
+
2422
+ ### 24.1 Required validation
2423
+ - every page has a valid `pageType`
2424
+ - every page has a unique `id`
2425
+ - required frontmatter fields are present
2426
+ - claims have unique IDs
2427
+ - claims have required fields
2428
+ - confidence fields are numeric and in range
2429
+ - evidence entries have required fields
2430
+ - relation entries have required fields
2431
+ - question pages use allowed status enums
2432
+ - pages are stored in folders consistent with `pageType`
2433
+
2434
+ #### 24.2 Recommended validation
2435
+ - source IDs referenced by evidence exist
2436
+ - related page references exist
2437
+ - claim IDs referenced by relationships exist when provided
2438
+ - aliases do not duplicate canonical title unnecessarily
2439
+ - source part pages have valid `parentSourceId`, `partIndex`, `partCount`, and `locator` when `sourceRole: part`
2440
+ - parent source pages have ordered, existing `sourceParts` when `sourceRole: parent`
2441
+ - partitioned parent source pages do not contain the full long-form source body
2442
+ - source pages promoted from non-Markdown raw files include conversion provenance when available
2443
+ - conversion warnings are preserved when conversion output is degraded, incomplete, or produced with fallback behavior
2444
+ - large converted sources preserve stable locators on source part pages
2445
+
2446
+ ---
2447
+
2448
+ ## 25. Human Editing Expectations
2449
+
2450
+ Humans MAY:
2451
+ - add prose
2452
+ - add notes and commentary
2453
+ - create new pages
2454
+ - update frontmatter
2455
+ - add or revise claims manually
2456
+ - add questions
2457
+
2458
+ Humans SHOULD NOT:
2459
+ - directly hand-edit cache files
2460
+ - treat reports as canonical data
2461
+ - bypass IDs for important pages
2462
+ - mix unrelated claims into one compound claim
2463
+
2464
+ ---
2465
+
2466
+ ## 26. Agent Editing Expectations
2467
+
2468
+ Agents MUST:
2469
+ - preserve human-authored content unless explicitly directed otherwise
2470
+ - use stable IDs when generating claims or pages
2471
+ - update `updatedAt` when meaningfully changing structured content
2472
+ - avoid inventing unsupported certainty
2473
+ - update `_system/logs/log.md` through `agent-wiki log` after each meaningful skill run or change batch
2474
+
2475
+ Agents SHOULD:
2476
+ - create question pages for unresolved important unknowns
2477
+ - attach evidence to claims where possible
2478
+ - reuse canonical IDs instead of duplicating objects
2479
+
2480
+ Agents MUST NOT:
2481
+ - silently rewrite human commentary unless explicitly directed otherwise
2482
+ - delete unresolved uncertainty by omission
2483
+ - convert weak evidence into strong support semantics
2484
+ - treat reports as primary truth records
2485
+
2486
+ ---
2487
+
2488
+ ## 27. Example Minimal Entity Page
2489
+
2490
+ ```md
2491
+ ---
2492
+ id: entity.place.riverside-community-garden
2493
+ pageType: entity
2494
+ title: Riverside Community Garden
2495
+ entityType: place
2496
+ canonicalName: Riverside Community Garden
2497
+ status: active
2498
+ createdAt: 2026-04-12
2499
+ updatedAt: 2026-04-12
2500
+ aliases:
2501
+ - riverside-garden
2502
+ tags:
2503
+ - urban-agriculture
2504
+ - community
2505
+ claims:
2506
+ - id: claim.descriptive.garden-weekly-produce-donations
2507
+ text: Riverside Community Garden donates a portion of its weekly produce harvest to the neighborhood food pantry.
2508
+ status: supported
2509
+ confidence: 0.91
2510
+ claimType: descriptive
2511
+ relatedClaimIds: []
2512
+ evidence:
2513
+ - id: evidence.quote.supports.a1b2c3d4
2514
+ sourceId: source.2026-04-12.webpage.garden-newsletter
2515
+ path: sources/2026-04-12.webpage.garden-newsletter.md
2516
+ lines: 55-79
2517
+ kind: quote
2518
+ relation: supports
2519
+ weight: 0.86
2520
+ note: The newsletter describes the weekly donation arrangement.
2521
+ excerpt: "Each Friday harvest includes a pantry donation box."
2522
+ retrievedAt: 2026-04-12
2523
+ updatedAt: 2026-04-12
2524
+ createdAt: 2026-04-12
2525
+ updatedAt: 2026-04-12
2526
+ relations:
2527
+ - subject: entity.place.riverside-community-garden
2528
+ predicate: supports
2529
+ object: entity.organization.neighborhood-food-pantry
2530
+ confidence: 0.88
2531
+ sourceClaimIds:
2532
+ - "[[claim.descriptive.garden-weekly-produce-donations]]"
2533
+ ---
2534
+
2535
+ # Riverside Community Garden
2536
+
2537
+ Riverside Community Garden is a neighborhood garden that coordinates volunteer planting, harvest tracking, and weekly produce donations.
2538
+
2539
+ ```
2540
+
2541
+ ---
2542
+
2543
+ ## 28. Example Question Page
2544
+
2545
+ ```md
2546
+ ---
2547
+ id: question.maintenance.flood-sensor-calibration
2548
+ pageType: question
2549
+ title: Which flood sensors need calibration before storm season?
2550
+ priority: high
2551
+ status: open
2552
+ relatedClaims:
2553
+ - "[[claim.descriptive.sensor-readings-drifted]]"
2554
+ relatedPages:
2555
+ - "[[coastal-resilience-overview]]"
2556
+ openedAt: 2026-04-12
2557
+ createdAt: 2026-04-12
2558
+ updatedAt: 2026-04-12
2559
+ aliases: []
2560
+ tags:
2561
+ - flood-monitoring
2562
+ - open-question
2563
+ ---
2564
+
2565
+ # Which flood sensors need calibration before storm season?
2566
+
2567
+ ## Context
2568
+ This question exists because several river gauge readings drifted from manual spot checks during the spring inspection.
2569
+
2570
+ ## Current concern
2571
+ We need to identify which sensors require calibration before they are used for storm-season alerting.
2572
+ ```
2573
+
2574
+ ---
2575
+
2576
+ ## 29. Compatibility Notes
2577
+
2578
+ v2 implementations MAY add fields beyond this spec, provided they do not break:
2579
+
2580
+ - required fields
2581
+ - required enum values
2582
+ - compile output expectations
2583
+
2584
+ Unknown fields MUST be preserved by conforming tooling when possible.