logseq-matryca-parser 0.3.3__tar.gz → 1.1.1__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/.cursorignore +28 -3
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/.repomixignore +25 -2
- logseq_matryca_parser-1.1.1/CHANGELOG.md +40 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/CONTRIBUTING.md +14 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/PKG-INFO +49 -11
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/README.md +47 -9
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/docs/ARCHITECTURE.md +71 -5
- logseq_matryca_parser-1.1.1/docs/RELEASE_PROCESS.md +67 -0
- logseq_matryca_parser-1.1.1/docs/logseq_ast_primer.md +152 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/pyproject.toml +2 -2
- logseq_matryca_parser-1.1.1/repomix-output-parser.xml +15235 -0
- logseq_matryca_parser-1.1.1/scripts/extract_changelog.py +138 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/src/logseq_matryca_parser/__init__.py +26 -8
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/src/logseq_matryca_parser/agent_writer.py +6 -1
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/src/logseq_matryca_parser/graph.py +140 -16
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/src/logseq_matryca_parser/kinetic.py +6 -1
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/src/logseq_matryca_parser/logos_core.py +4 -5
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/src/logseq_matryca_parser/logos_parser.py +325 -90
- logseq_matryca_parser-1.1.1/src/logseq_matryca_parser/logseq_markdown.py +136 -0
- logseq_matryca_parser-1.1.1/src/logseq_matryca_parser/logseq_paths.py +107 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/tests/test_agent_writer.py +2 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/tests/test_graph.py +64 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/tests/test_logos_parser.py +256 -18
- logseq_matryca_parser-1.1.1/tests/test_logseq_markdown.py +203 -0
- logseq_matryca_parser-1.1.1/tests/test_logseq_paths.py +129 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/uv.lock +4 -4
- logseq_matryca_parser-0.3.3/.cursorrules +0 -5
- logseq_matryca_parser-0.3.3/docs/logseq_ast_primer.md +0 -82
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/.github/FUNDING.yml +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/.github/ISSUE_TEMPLATE/bug_report.yml +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/.github/ISSUE_TEMPLATE/feature_request.yml +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/.github/PULL_REQUEST_TEMPLATE.md +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/.github/dependabot.yml +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/.github/workflows/ci.yml +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/.github/workflows/pypi_publish.yml +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/.gitignore +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/.pre-commit-config.yaml +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/LICENSE +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/Makefile +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/NOTICE +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/ROADMAP_AGENT_NATIVE_XRAY.md +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/ROADMAP_HEADLESS_WRITER.md +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/ROADMAP_OBSIDIAN_ADAPTER.md +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/SECURITY.md +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/claude-skill-logseq-read/SKILL.md +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/claude-skill-logseq-read/scripts/parse_logseq.py +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/docs/design-docs/ARCHITECTURE_BLUEPRINT.md +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/docs/design-docs/CODE_SCAFFOLD.md +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/docs/design-docs/LOGSEQ_ASSET_RESOLUTION_SPEC.md +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/docs/design-docs/LOGSEQ_DATASCRIPT_MAPPING.md +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/docs/design-docs/LOGSEQ_TEMPORAL_ONTOLOGY.md +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/docs/design-docs/OFFICIAL_MLDOC_SPECS.md +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/docs/design-docs/REFERENCE_SPEC.md +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/docs/error_log.md +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/docs/roadmaps/ROADMAP_CLI_HYDRATION_AND_ENRICHMENT.md +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/docs/roadmaps/ROADMAP_CONTEXT_SYNTHESIS_AND_SCOPING.md +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/docs/roadmaps/ROADMAP_EMBED_EXPANSION_AND_FLUENT_QUERIES.md +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/docs/roadmaps/ROADMAP_GRAPH_RAG_SEMANTICS.md +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/docs/roadmaps/ROADMAP_INCREMENTAL_WATCHER.md +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/docs/roadmaps/ROADMAP_INLINE_SHIELD_AND_NAMESPACES.md +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/docs/roadmaps/ROADMAP_ROBUSTNESS_AND_SOFT_BREAKS.md +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/docs/roadmaps/ROADMAP_TOML_FIX_AND_PYPI_DISTRIBUTION.md +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/docs/roadmaps/ROADMAP_UUID_AND_GRAPH_SUPERPOWERS.md +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/examples/demo_logseq_journal.md +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/examples/run_demo.py +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/legacy/local_digestor.py +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/lib/bindings/utils.js +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/lib/tom-select/tom-select.complete.min.js +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/lib/tom-select/tom-select.css +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/lib/vis-9.1.2/vis-network.css +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/lib/vis-9.1.2/vis-network.min.js +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/src/logseq_matryca_parser/.gitignore +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/src/logseq_matryca_parser/NOTICE +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/src/logseq_matryca_parser/__main__.py +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/src/logseq_matryca_parser/agent_press.py +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/src/logseq_matryca_parser/exceptions.py +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/src/logseq_matryca_parser/forge.py +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/src/logseq_matryca_parser/lens.py +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/src/logseq_matryca_parser/pyproject.toml +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/src/logseq_matryca_parser/synapse.py +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/tests/test_agent_press.py +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/tests/test_forge.py +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/tests/test_kinetic.py +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/tests/test_lens.py +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/tests/test_package_version.py +0 -0
- {logseq_matryca_parser-0.3.3 → logseq_matryca_parser-1.1.1}/tests/test_synapse.py +0 -0
|
@@ -1,10 +1,35 @@
|
|
|
1
1
|
# .cursorignore
|
|
2
2
|
# (Nota: Cursor ignora già automaticamente tutto ciò che è nel .gitignore)
|
|
3
3
|
|
|
4
|
+
# =========================
|
|
5
|
+
# Python & Virtual Environments
|
|
6
|
+
# =========================
|
|
7
|
+
.venv/
|
|
8
|
+
venv/
|
|
9
|
+
env/
|
|
10
|
+
__pycache__/
|
|
11
|
+
*.pyc
|
|
12
|
+
|
|
13
|
+
# =========================
|
|
14
|
+
# Linter & Test Caches
|
|
15
|
+
# =========================
|
|
16
|
+
.ruff_cache/
|
|
17
|
+
.mypy_cache/
|
|
18
|
+
.pytest_cache/
|
|
19
|
+
.coverage
|
|
20
|
+
htmlcov/
|
|
21
|
+
|
|
22
|
+
# =========================
|
|
23
|
+
# Build Artifacts
|
|
24
|
+
# =========================
|
|
25
|
+
dist/
|
|
26
|
+
build/
|
|
27
|
+
*.egg-info/
|
|
28
|
+
|
|
4
29
|
# =========================
|
|
5
30
|
# Lockfiles (Letali per il Codebase Indexing)
|
|
6
31
|
# =========================
|
|
7
|
-
# I lockfile devono stare su Git, ma l'IA non deve MAI leggerli,
|
|
32
|
+
# I lockfile devono stare su Git, ma l'IA non deve MAI leggerli,
|
|
8
33
|
# sono solo un muro di versioni incomprensibili.
|
|
9
34
|
poetry.lock
|
|
10
35
|
uv.lock
|
|
@@ -33,7 +58,7 @@ tests/fixtures/*.md
|
|
|
33
58
|
# =========================
|
|
34
59
|
# Assets Vettoriali
|
|
35
60
|
# =========================
|
|
36
|
-
# Le immagini PNG/JPG Cursor le ignora da solo, ma gli SVG sono file di testo!
|
|
61
|
+
# Le immagini PNG/JPG Cursor le ignora da solo, ma gli SVG sono file di testo!
|
|
37
62
|
# Se l'IA legge un SVG, legge migliaia di coordinate matematiche inutili.
|
|
38
63
|
*.svg
|
|
39
64
|
|
|
@@ -42,4 +67,4 @@ tests/fixtures/*.md
|
|
|
42
67
|
# =========================
|
|
43
68
|
.vscode/
|
|
44
69
|
.idea/
|
|
45
|
-
.clinerules
|
|
70
|
+
.clinerules
|
|
@@ -1,5 +1,3 @@
|
|
|
1
|
-
# .repomixignore
|
|
2
|
-
|
|
3
1
|
# =========================
|
|
4
2
|
# Repomix Outputs & AI Generated
|
|
5
3
|
# =========================
|
|
@@ -50,6 +48,31 @@ __pycache__/
|
|
|
50
48
|
.ruff_cache/
|
|
51
49
|
.venv/
|
|
52
50
|
venv/
|
|
51
|
+
env/
|
|
52
|
+
|
|
53
|
+
# =========================
|
|
54
|
+
# Python Build & Distribution (NUOVO)
|
|
55
|
+
# =========================
|
|
56
|
+
build/
|
|
57
|
+
dist/
|
|
58
|
+
*.egg-info/
|
|
59
|
+
*.egg
|
|
60
|
+
|
|
61
|
+
# =========================
|
|
62
|
+
# Testing Coverage & Tox (NUOVO)
|
|
63
|
+
# =========================
|
|
64
|
+
.coverage
|
|
65
|
+
coverage.xml
|
|
66
|
+
htmlcov/
|
|
67
|
+
.tox/
|
|
68
|
+
|
|
69
|
+
# =========================
|
|
70
|
+
# IDE & Workflows (NUOVO)
|
|
71
|
+
# =========================
|
|
72
|
+
.vscode/
|
|
73
|
+
.idea/
|
|
74
|
+
.cursor/
|
|
75
|
+
.github/
|
|
53
76
|
|
|
54
77
|
# =========================
|
|
55
78
|
# Token Killers (Logs, DBs, Vector Graphics)
|
|
@@ -0,0 +1,40 @@
|
|
|
1
|
+
# Changelog
|
|
2
|
+
|
|
3
|
+
All notable changes to **logseq-matryca-parser** (The Logos Protocol) are documented in this file.
|
|
4
|
+
|
|
5
|
+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
|
|
6
|
+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
|
7
|
+
|
|
8
|
+
## [Unreleased]
|
|
9
|
+
|
|
10
|
+
## [1.1.1] - 2026-05-28
|
|
11
|
+
|
|
12
|
+
### Added
|
|
13
|
+
|
|
14
|
+
- **Graph page aliases** — `LogseqGraph.load_directory` honors `title::`, `alias::` / `aliases::` for `pages` lookup and backlinks; incremental reload re-applies enrichment after watcher edits.
|
|
15
|
+
- **LaTeX math shielding** — `_shield_inline_code` masks `$$...$$` and `$...$` spans so wikilinks/tags inside equations are not extracted.
|
|
16
|
+
- **Datalog query dead zones** — `#+BEGIN_QUERY` … `#+END_QUERY` blocks are ignored for entity extraction (parse-loop state plus shielding).
|
|
17
|
+
- **Numbered list blocks** — `logos_parser.py` recognizes ordered-list markers (`1. `, `12. `, etc.) as outliner bullets alongside `-` and `*`.
|
|
18
|
+
- **Markdown task checkboxes** — `[ ]`, `[-]`, and `[x]`/`[X]` on block text map to `TODO`, `DOING`, and `DONE` before Org-mode prefix fallback.
|
|
19
|
+
|
|
20
|
+
### Fixed
|
|
21
|
+
|
|
22
|
+
- **Logseq OG parity (parser)** — `{{embed [[Page]]}}` and similar macros expose nested wikilinks; Unicode tags and markdown boundaries (`**#tag**`, `==#tag==`); comma-separated `tags::` / `alias::` / `aliases::` inject implicit graph tokens; `~~~` fences share code-block immunity with ` ``` ` fences.
|
|
23
|
+
- **Property contiguity** — block `key:: value` lines apply only while contiguous below the bullet; after a soft-break, later `key::` lines stay in `content` / `clean_text` (Logseq-native behavior).
|
|
24
|
+
- **Property bullet lists** — empty `alias::` / `tags::` followed by indented `-` bullets serialize as `list[str]` without orphan AST children.
|
|
25
|
+
- **Case-insensitive property keys** — all property keys normalized to lowercase at parse time; `TITLE::` frontmatter overrides graph page titles like `title::`.
|
|
26
|
+
- **Extended task markers** — `DELEGATED`, `POSTPONED`, `IN-PROGRESS` (longest-prefix matching) alongside existing Org-mode statuses.
|
|
27
|
+
- **Aliased block references** — `[Visible](((uuid)))` clean text retains visible alias only (no surrounding `[` `]`).
|
|
28
|
+
|
|
29
|
+
## [1.0.0] - 2026-05-28
|
|
30
|
+
|
|
31
|
+
### Added
|
|
32
|
+
|
|
33
|
+
- **LOGOS engine** — deterministic Stack-Machine parser (`StackMachineParser`) producing strict `LogseqPage` / `LogseqNode` ASTs from Spatial Markdown.
|
|
34
|
+
- **SYNAPSE adapters** — LangChain and LlamaIndex exporters with parent-child lineage metadata.
|
|
35
|
+
- **FORGE exporters** — JSON, Markdown, Obsidian, and enriched chunk payloads.
|
|
36
|
+
- **LENS visualizer** — interactive topology HTML via NetworkX / PyVis.
|
|
37
|
+
- **KINETIC CLI** — `matryca-parse` Typer entry point for export, visualization, and agent read/write workflows.
|
|
38
|
+
- **Headless CRUD** — append-only agent writer and X-Ray press utilities for sovereign graph mutation.
|
|
39
|
+
- **Logseq-native serialization** — round-trip page and block property layout via `logseq_markdown.py`.
|
|
40
|
+
- **Graph query layer** — `LogseqGraph` with backlinks, effective property inheritance, and optional filesystem watcher.
|
|
@@ -8,6 +8,20 @@ To maintain the architectural integrity of the project, please follow the guidel
|
|
|
8
8
|
|
|
9
9
|
---
|
|
10
10
|
|
|
11
|
+
## 📚 Documentation
|
|
12
|
+
|
|
13
|
+
User-facing behavior is documented in:
|
|
14
|
+
|
|
15
|
+
- [`README.md`](README.md) — overview, quickstart, and feature matrix
|
|
16
|
+
- [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) — LOGOS, SYNAPSE, `LogseqGraph`, agents, and data flow
|
|
17
|
+
- [`docs/logseq_ast_primer.md`](docs/logseq_ast_primer.md) — Logseq Spatial Markdown domain rules
|
|
18
|
+
- [`CHANGELOG.md`](CHANGELOG.md) — shipped releases (current: **1.1.1**) and **Unreleased** changes (Keep a Changelog)
|
|
19
|
+
- [`docs/RELEASE_PROCESS.md`](docs/RELEASE_PROCESS.md) — version bump, tag, and PyPI publish checklist
|
|
20
|
+
|
|
21
|
+
When you add or change observable parser or graph behavior, update the relevant doc sections and add a bullet under **`## [Unreleased]`** in `CHANGELOG.md` (see [`.cursor/rules/05-auto-changelog.mdc`](.cursor/rules/05-auto-changelog.mdc)).
|
|
22
|
+
|
|
23
|
+
---
|
|
24
|
+
|
|
11
25
|
## 🏛️ Architectural Philosophy
|
|
12
26
|
|
|
13
27
|
Before writing any code, please understand our core principles:
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: logseq-matryca-parser
|
|
3
|
-
Version:
|
|
3
|
+
Version: 1.1.1
|
|
4
4
|
Summary: The Logos Protocol: Deterministic Logseq AST parsing for Matryca.ai.
|
|
5
5
|
Project-URL: Homepage, https://github.com/MarcoPorcellato/logseq-matryca-parser
|
|
6
6
|
Project-URL: Bug Tracker, https://github.com/MarcoPorcellato/logseq-matryca-parser/issues
|
|
@@ -10,7 +10,7 @@ License: Apache-2.0
|
|
|
10
10
|
License-File: LICENSE
|
|
11
11
|
License-File: NOTICE
|
|
12
12
|
Keywords: ai,ast,knowledge-graph,logseq,parser,rag
|
|
13
|
-
Classifier: Development Status ::
|
|
13
|
+
Classifier: Development Status :: 5 - Production/Stable
|
|
14
14
|
Classifier: Intended Audience :: Developers
|
|
15
15
|
Classifier: License :: OSI Approved :: Apache Software License
|
|
16
16
|
Classifier: Operating System :: OS Independent
|
|
@@ -44,10 +44,10 @@ Description-Content-Type: text/markdown
|
|
|
44
44
|
[](https://www.python.org/downloads/)
|
|
45
45
|
[](https://github.com/MarcoPorcellato/logseq-matryca-parser/blob/main/LICENSE)
|
|
46
46
|
[](https://github.com/MarcoPorcellato/logseq-matryca-parser#-quickstart)
|
|
47
|
-
[](#)
|
|
48
48
|

|
|
49
49
|
|
|
50
|
-
**
|
|
50
|
+
**v1.1.1** — Logseq OG parity release (see [CHANGELOG](CHANGELOG.md)) — **200+ tests**, full bidirectional Headless CRUD engine, native markdown serialization, and static typing; ready for production Enterprise integration.
|
|
51
51
|
|
|
52
52
|
> *Turning a forest of local plain-text files into a unified semantic powerhouse.*
|
|
53
53
|
|
|
@@ -57,7 +57,7 @@ Description-Content-Type: text/markdown
|
|
|
57
57
|
|
|
58
58
|
[👉 **TRY THE LIVE INTERACTIVE DEMO**](https://MarcoPorcellato.github.io/logseq-matryca-parser/)
|
|
59
59
|
|
|
60
|
-
[📘 **
|
|
60
|
+
[📘 **ARCHITECTURE**](docs/ARCHITECTURE.md) · [AST Primer](docs/logseq_ast_primer.md) · [Changelog](CHANGELOG.md) · [Release process](docs/RELEASE_PROCESS.md)
|
|
61
61
|
|
|
62
62
|
</div>
|
|
63
63
|
|
|
@@ -100,6 +100,7 @@ It acts as the strict **File System Driver** for your LLM OS. By using a determi
|
|
|
100
100
|
| **Block references `((uuid))`** | Treated as opaque text or dropped | **Resolved** against `LogseqGraph`; optional **embed expansion** and **Obsidian `[[Page#^anchor]]`** export |
|
|
101
101
|
| **Property inheritance** | Page-level frontmatter at best | **`get_effective_properties`**: page + ancestor outline keys merged top-down (Org-mode style), then exposed on enriched chunks |
|
|
102
102
|
| **Live sync** | Re-read whole tree or poll | **`LogseqGraph.start_watching()`** (optional `watchdog`): **per-file invalidation** — re-parse one page, purge stale UUIDs from registries, refresh backlinks |
|
|
103
|
+
| **Page aliases & titles** | Filename-only or manual link maps | **`title::`**, **`alias::`** / **`aliases::`** re-key `graph.pages` and wire **backlinks** for alias wikilinks |
|
|
103
104
|
|
|
104
105
|
---
|
|
105
106
|
|
|
@@ -138,7 +139,37 @@ Logseq Matryca Parser is a deterministic **Stack-Machine engine** that acts as t
|
|
|
138
139
|
|
|
139
140
|
---
|
|
140
141
|
|
|
141
|
-
## ⚡ Recent superpowers (
|
|
142
|
+
## ⚡ Recent superpowers (v1.1.1)
|
|
143
|
+
|
|
144
|
+
### Native parity (parser + graph)
|
|
145
|
+
|
|
146
|
+
| Area | Capability |
|
|
147
|
+
| :--- | :--- |
|
|
148
|
+
| **Graph index** | `title::` / `TITLE::` overrides the filename-derived page title; `alias::` / `aliases::` inject extra keys into `graph.pages` (comma-separated strings, bullet-list values, or Python lists). |
|
|
149
|
+
| **Backlinks** | `[[Dev]]` resolves against alias keys the same way as canonical titles (`get_backlinks("Dev")`). |
|
|
150
|
+
| **Incremental reload** | `invalidate_and_reload_page` re-applies title/alias enrichment after watcher edits. |
|
|
151
|
+
| **Parser shields** | LaTeX `$…$` / `$$…$$`, `#+BEGIN_QUERY` … `#+END_QUERY`, fenced code (` ``` ` and `~~~`), drawers, and `{{embed [[Page]]}}` macros do not emit false wikilinks/tags. |
|
|
152
|
+
| **Property contiguity** | `key:: value` lines apply only while contiguous under the bullet; after a soft-break, later property syntax stays in block text. |
|
|
153
|
+
| **Property bullet lists** | `alias::` / `tags::` with indented `-` children become `list[str]` properties — no spurious AST child nodes. |
|
|
154
|
+
| **Outliner bullets** | Ordered-list markers (`1. `, `12. `, …) are first-class bullets alongside `-` and `*`. |
|
|
155
|
+
| **Tasks** | GFM checkboxes (`[ ]`, `[-]`, `[x]`) plus Org-mode markers including `DELEGATED`, `POSTPONED`, `IN-PROGRESS`. |
|
|
156
|
+
| **Aliased block refs** | `[Label](((uuid)))` cleans to `Label` in `clean_text` for RAG-friendly prose. |
|
|
157
|
+
|
|
158
|
+
```python
|
|
159
|
+
from logseq_matryca_parser.graph import LogseqGraph
|
|
160
|
+
|
|
161
|
+
graph = LogseqGraph.load_directory("/path/to/logseq/graph")
|
|
162
|
+
|
|
163
|
+
# file_name.md with frontmatter: title:: Custom Title
|
|
164
|
+
page = graph.pages["Custom Title"]
|
|
165
|
+
|
|
166
|
+
# Development.md with alias:: Dev, Coding — wikilinks to aliases resolve
|
|
167
|
+
assert graph.pages["Dev"] is graph.pages["Development"]
|
|
168
|
+
linker = graph.pages["Linker"].root_nodes[0]
|
|
169
|
+
assert linker in graph.get_backlinks("Dev")
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
Deep dive: [Architecture §3.6 — LogseqGraph](docs/ARCHITECTURE.md#36-logseqgraph--namespace-scoping-o1-invalidation-live-watch) and [AST primer — page properties](docs/logseq_ast_primer.md#5-page-properties-title-aliases-and-graph-indexing).
|
|
142
173
|
|
|
143
174
|
### Obsidian-native export
|
|
144
175
|
Compile an entire Logseq graph into an **Obsidian vault layout**: YAML frontmatter from page properties, list body preserved, Logseq `((uuid))` links rewritten to **`[[Page#^anchor]]`**, and trailing **`^block-id`** on referenced blocks. Namespace titles become nested folders (e.g. `Projects/AI/Demo.md`).
|
|
@@ -179,7 +210,7 @@ matryca-parse agent-read /path/to/graph --query "quantum"
|
|
|
179
210
|
The agent reads cheap topology now; the registry resolves aliases back to sovereign UUIDs when you wire targeted writes.
|
|
180
211
|
|
|
181
212
|
### Headless Write Engine & AST Linter (Wave 12)
|
|
182
|
-
The parser is **no longer read-only**. Wave 12 adds a **headless Markdown splicer** ([`agent_writer.py`](src/logseq_matryca_parser/agent_writer.py)): `append_child_to_node` uses AST line numbers and indentation (`(indent_level + 1) × tab_size`) to insert a new bullet **atomically** into the sovereign `.md` file—via `tempfile` + `os.replace`—without Logseq’s fragile HTTP API. Pair **`agent-read`** with **`agent-write`**: X-Ray persists its alias map to **`.matryca_xray_state.json`** at the graph root so stateless CLI invocations can **read, then write** in sequence.
|
|
213
|
+
The parser is **no longer read-only**. Wave 12 adds a **headless Markdown splicer** ([`agent_writer.py`](src/logseq_matryca_parser/agent_writer.py)): `append_child_to_node` uses AST line numbers and indentation (`(indent_level + 1) × tab_size`) to insert a new bullet **atomically** into the sovereign `.md` file—via `tempfile` + `os.replace`—without Logseq’s fragile HTTP API. Beyond surgical node splicing, the engine now supports **full bidirectional page generation** via [`serialize_logseq_page`](src/logseq_matryca_parser/logseq_markdown.py) and [`write_logseq_page`](src/logseq_matryca_parser/logseq_markdown.py)—rebuilding entire Logseq-compliant `.md` pages from an in-memory AST. Pair **`agent-read`** with **`agent-write`**: X-Ray persists its alias map to **`.matryca_xray_state.json`** at the graph root so stateless CLI invocations can **read, then write** in sequence.
|
|
183
214
|
|
|
184
215
|
```bash
|
|
185
216
|
matryca-parse agent-read /path/to/graph --tag idea
|
|
@@ -194,13 +225,15 @@ For graph hygiene, **`LogseqGraph.get_broken_references()`** flags nodes whose `
|
|
|
194
225
|
|
|
195
226
|
| Feature | Description |
|
|
196
227
|
| :--- | :--- |
|
|
197
|
-
| **LOGOS Engine** | Deterministic AST parsing.
|
|
198
|
-
| **
|
|
228
|
+
| **LOGOS Engine** | Deterministic AST parsing. Property contiguity, bullet-list properties, lowercase keys, multiline blocks, extended task markers, GFM checkboxes, numbered bullets, and **shielded** code/math/query regions. |
|
|
229
|
+
| **LogseqGraph** | In-memory vault: `pages` index (with **title/alias enrichment**), backlinks, effective properties, namespace resolution, fluent `GraphQuery`, optional **watchdog** invalidation. |
|
|
230
|
+
| **Advanced Task Extraction** | Task **state** (TODO / DOING / DELEGATED / IN-PROGRESS / …), **priority** markers `[#A]`–`[#C]` promoted to `task_priority`, and **SCHEDULED** / **DEADLINE** Logseq timestamps normalized to **UTC Unix epoch seconds** on `scheduled_at` / `deadline_at` for temporal graph and retrieval pipelines. |
|
|
199
231
|
| **SYNAPSE Adapter** | Native exports for **LangChain** and **LlamaIndex** with automated lineage metadata; **context-enriched** chunks with breadcrumbs, embed expansion, and inherited properties. |
|
|
200
232
|
| **FORGE** | JSON, clean Markdown, and **Obsidian** vault serialization (`ObsidianForgeVisitor`, `ForgeExporter.to_obsidian_markdown`). |
|
|
201
233
|
| **LENS Visualizer** | 60FPS interactive graph rendering (10k+ nodes) with Glassmorphism HUD. |
|
|
202
234
|
| **Agent-Native Printing Press** | [`agent_press.py`](src/logseq_matryca_parser/agent_press.py): **`SessionAliasRegistry`** maps session aliases ↔ block UUIDs; **`to_xray_markdown`** emits token-minimal outline text for autonomous agents (`matryca-parse agent-read`). |
|
|
203
|
-
| **
|
|
235
|
+
| **Native Markdown Serialization** | [`logseq_markdown.py`](src/logseq_matryca_parser/logseq_markdown.py) + [`logseq_paths.py`](src/logseq_matryca_parser/logseq_paths.py): rebuild and write Logseq-compliant markdown pages from an AST—page properties as raw `key:: value` lines, block properties indented at **parent whitespace + exactly 2 spaces**, and namespace titles mapped via **`___`** pathing rules. |
|
|
236
|
+
| **Headless Write Engine** | [`agent_writer.py`](src/logseq_matryca_parser/agent_writer.py): **`append_child_to_node`** splices child bullets into on-disk Markdown from AST topology; **`serialize_logseq_page`** / **`write_logseq_page`** emit full pages; **`matryca-parse agent-write`** resolves aliases via **`.matryca_xray_state.json`**. |
|
|
204
237
|
| **AST Linters** | **`LogseqGraph.get_broken_references()`** returns originating nodes when `block_refs` target UUIDs absent from the global registry. |
|
|
205
238
|
| **Sovereign AI** | 100% Local. Zero telemetry. Private by design. |
|
|
206
239
|
|
|
@@ -247,12 +280,17 @@ matryca-parse export /path/to/logseq/graph output --format obsidian
|
|
|
247
280
|
|
|
248
281
|
### Python API
|
|
249
282
|
```python
|
|
283
|
+
from logseq_matryca_parser.graph import LogseqGraph
|
|
250
284
|
from logseq_matryca_parser.logos_parser import LogosParser
|
|
251
285
|
from logseq_matryca_parser.synapse import SynapseAdapter
|
|
252
286
|
|
|
253
|
-
# Parse to AST
|
|
287
|
+
# Parse a single page to AST
|
|
254
288
|
page = LogosParser().parse_page_file("page.md")
|
|
255
289
|
|
|
290
|
+
# Load the whole vault (pages, backlinks, node registry)
|
|
291
|
+
graph = LogseqGraph.load_directory("/path/to/logseq/graph")
|
|
292
|
+
effective = graph.get_effective_properties(graph.pages["My Page"].root_nodes[0].uuid)
|
|
293
|
+
|
|
256
294
|
# Export to LangChain with lineage metadata
|
|
257
295
|
docs = SynapseAdapter.to_langchain_documents(page.root_nodes, source_name=page.title)
|
|
258
296
|
```
|
|
@@ -8,10 +8,10 @@
|
|
|
8
8
|
[](https://www.python.org/downloads/)
|
|
9
9
|
[](https://github.com/MarcoPorcellato/logseq-matryca-parser/blob/main/LICENSE)
|
|
10
10
|
[](https://github.com/MarcoPorcellato/logseq-matryca-parser#-quickstart)
|
|
11
|
-
[](#)
|
|
12
12
|

|
|
13
13
|
|
|
14
|
-
**
|
|
14
|
+
**v1.1.1** — Logseq OG parity release (see [CHANGELOG](CHANGELOG.md)) — **200+ tests**, full bidirectional Headless CRUD engine, native markdown serialization, and static typing; ready for production Enterprise integration.
|
|
15
15
|
|
|
16
16
|
> *Turning a forest of local plain-text files into a unified semantic powerhouse.*
|
|
17
17
|
|
|
@@ -21,7 +21,7 @@
|
|
|
21
21
|
|
|
22
22
|
[👉 **TRY THE LIVE INTERACTIVE DEMO**](https://MarcoPorcellato.github.io/logseq-matryca-parser/)
|
|
23
23
|
|
|
24
|
-
[📘 **
|
|
24
|
+
[📘 **ARCHITECTURE**](docs/ARCHITECTURE.md) · [AST Primer](docs/logseq_ast_primer.md) · [Changelog](CHANGELOG.md) · [Release process](docs/RELEASE_PROCESS.md)
|
|
25
25
|
|
|
26
26
|
</div>
|
|
27
27
|
|
|
@@ -64,6 +64,7 @@ It acts as the strict **File System Driver** for your LLM OS. By using a determi
|
|
|
64
64
|
| **Block references `((uuid))`** | Treated as opaque text or dropped | **Resolved** against `LogseqGraph`; optional **embed expansion** and **Obsidian `[[Page#^anchor]]`** export |
|
|
65
65
|
| **Property inheritance** | Page-level frontmatter at best | **`get_effective_properties`**: page + ancestor outline keys merged top-down (Org-mode style), then exposed on enriched chunks |
|
|
66
66
|
| **Live sync** | Re-read whole tree or poll | **`LogseqGraph.start_watching()`** (optional `watchdog`): **per-file invalidation** — re-parse one page, purge stale UUIDs from registries, refresh backlinks |
|
|
67
|
+
| **Page aliases & titles** | Filename-only or manual link maps | **`title::`**, **`alias::`** / **`aliases::`** re-key `graph.pages` and wire **backlinks** for alias wikilinks |
|
|
67
68
|
|
|
68
69
|
---
|
|
69
70
|
|
|
@@ -102,7 +103,37 @@ Logseq Matryca Parser is a deterministic **Stack-Machine engine** that acts as t
|
|
|
102
103
|
|
|
103
104
|
---
|
|
104
105
|
|
|
105
|
-
## ⚡ Recent superpowers (
|
|
106
|
+
## ⚡ Recent superpowers (v1.1.1)
|
|
107
|
+
|
|
108
|
+
### Native parity (parser + graph)
|
|
109
|
+
|
|
110
|
+
| Area | Capability |
|
|
111
|
+
| :--- | :--- |
|
|
112
|
+
| **Graph index** | `title::` / `TITLE::` overrides the filename-derived page title; `alias::` / `aliases::` inject extra keys into `graph.pages` (comma-separated strings, bullet-list values, or Python lists). |
|
|
113
|
+
| **Backlinks** | `[[Dev]]` resolves against alias keys the same way as canonical titles (`get_backlinks("Dev")`). |
|
|
114
|
+
| **Incremental reload** | `invalidate_and_reload_page` re-applies title/alias enrichment after watcher edits. |
|
|
115
|
+
| **Parser shields** | LaTeX `$…$` / `$$…$$`, `#+BEGIN_QUERY` … `#+END_QUERY`, fenced code (` ``` ` and `~~~`), drawers, and `{{embed [[Page]]}}` macros do not emit false wikilinks/tags. |
|
|
116
|
+
| **Property contiguity** | `key:: value` lines apply only while contiguous under the bullet; after a soft-break, later property syntax stays in block text. |
|
|
117
|
+
| **Property bullet lists** | `alias::` / `tags::` with indented `-` children become `list[str]` properties — no spurious AST child nodes. |
|
|
118
|
+
| **Outliner bullets** | Ordered-list markers (`1. `, `12. `, …) are first-class bullets alongside `-` and `*`. |
|
|
119
|
+
| **Tasks** | GFM checkboxes (`[ ]`, `[-]`, `[x]`) plus Org-mode markers including `DELEGATED`, `POSTPONED`, `IN-PROGRESS`. |
|
|
120
|
+
| **Aliased block refs** | `[Label](((uuid)))` cleans to `Label` in `clean_text` for RAG-friendly prose. |
|
|
121
|
+
|
|
122
|
+
```python
|
|
123
|
+
from logseq_matryca_parser.graph import LogseqGraph
|
|
124
|
+
|
|
125
|
+
graph = LogseqGraph.load_directory("/path/to/logseq/graph")
|
|
126
|
+
|
|
127
|
+
# file_name.md with frontmatter: title:: Custom Title
|
|
128
|
+
page = graph.pages["Custom Title"]
|
|
129
|
+
|
|
130
|
+
# Development.md with alias:: Dev, Coding — wikilinks to aliases resolve
|
|
131
|
+
assert graph.pages["Dev"] is graph.pages["Development"]
|
|
132
|
+
linker = graph.pages["Linker"].root_nodes[0]
|
|
133
|
+
assert linker in graph.get_backlinks("Dev")
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
Deep dive: [Architecture §3.6 — LogseqGraph](docs/ARCHITECTURE.md#36-logseqgraph--namespace-scoping-o1-invalidation-live-watch) and [AST primer — page properties](docs/logseq_ast_primer.md#5-page-properties-title-aliases-and-graph-indexing).
|
|
106
137
|
|
|
107
138
|
### Obsidian-native export
|
|
108
139
|
Compile an entire Logseq graph into an **Obsidian vault layout**: YAML frontmatter from page properties, list body preserved, Logseq `((uuid))` links rewritten to **`[[Page#^anchor]]`**, and trailing **`^block-id`** on referenced blocks. Namespace titles become nested folders (e.g. `Projects/AI/Demo.md`).
|
|
@@ -143,7 +174,7 @@ matryca-parse agent-read /path/to/graph --query "quantum"
|
|
|
143
174
|
The agent reads cheap topology now; the registry resolves aliases back to sovereign UUIDs when you wire targeted writes.
|
|
144
175
|
|
|
145
176
|
### Headless Write Engine & AST Linter (Wave 12)
|
|
146
|
-
The parser is **no longer read-only**. Wave 12 adds a **headless Markdown splicer** ([`agent_writer.py`](src/logseq_matryca_parser/agent_writer.py)): `append_child_to_node` uses AST line numbers and indentation (`(indent_level + 1) × tab_size`) to insert a new bullet **atomically** into the sovereign `.md` file—via `tempfile` + `os.replace`—without Logseq’s fragile HTTP API. Pair **`agent-read`** with **`agent-write`**: X-Ray persists its alias map to **`.matryca_xray_state.json`** at the graph root so stateless CLI invocations can **read, then write** in sequence.
|
|
177
|
+
The parser is **no longer read-only**. Wave 12 adds a **headless Markdown splicer** ([`agent_writer.py`](src/logseq_matryca_parser/agent_writer.py)): `append_child_to_node` uses AST line numbers and indentation (`(indent_level + 1) × tab_size`) to insert a new bullet **atomically** into the sovereign `.md` file—via `tempfile` + `os.replace`—without Logseq’s fragile HTTP API. Beyond surgical node splicing, the engine now supports **full bidirectional page generation** via [`serialize_logseq_page`](src/logseq_matryca_parser/logseq_markdown.py) and [`write_logseq_page`](src/logseq_matryca_parser/logseq_markdown.py)—rebuilding entire Logseq-compliant `.md` pages from an in-memory AST. Pair **`agent-read`** with **`agent-write`**: X-Ray persists its alias map to **`.matryca_xray_state.json`** at the graph root so stateless CLI invocations can **read, then write** in sequence.
|
|
147
178
|
|
|
148
179
|
```bash
|
|
149
180
|
matryca-parse agent-read /path/to/graph --tag idea
|
|
@@ -158,13 +189,15 @@ For graph hygiene, **`LogseqGraph.get_broken_references()`** flags nodes whose `
|
|
|
158
189
|
|
|
159
190
|
| Feature | Description |
|
|
160
191
|
| :--- | :--- |
|
|
161
|
-
| **LOGOS Engine** | Deterministic AST parsing.
|
|
162
|
-
| **
|
|
192
|
+
| **LOGOS Engine** | Deterministic AST parsing. Property contiguity, bullet-list properties, lowercase keys, multiline blocks, extended task markers, GFM checkboxes, numbered bullets, and **shielded** code/math/query regions. |
|
|
193
|
+
| **LogseqGraph** | In-memory vault: `pages` index (with **title/alias enrichment**), backlinks, effective properties, namespace resolution, fluent `GraphQuery`, optional **watchdog** invalidation. |
|
|
194
|
+
| **Advanced Task Extraction** | Task **state** (TODO / DOING / DELEGATED / IN-PROGRESS / …), **priority** markers `[#A]`–`[#C]` promoted to `task_priority`, and **SCHEDULED** / **DEADLINE** Logseq timestamps normalized to **UTC Unix epoch seconds** on `scheduled_at` / `deadline_at` for temporal graph and retrieval pipelines. |
|
|
163
195
|
| **SYNAPSE Adapter** | Native exports for **LangChain** and **LlamaIndex** with automated lineage metadata; **context-enriched** chunks with breadcrumbs, embed expansion, and inherited properties. |
|
|
164
196
|
| **FORGE** | JSON, clean Markdown, and **Obsidian** vault serialization (`ObsidianForgeVisitor`, `ForgeExporter.to_obsidian_markdown`). |
|
|
165
197
|
| **LENS Visualizer** | 60FPS interactive graph rendering (10k+ nodes) with Glassmorphism HUD. |
|
|
166
198
|
| **Agent-Native Printing Press** | [`agent_press.py`](src/logseq_matryca_parser/agent_press.py): **`SessionAliasRegistry`** maps session aliases ↔ block UUIDs; **`to_xray_markdown`** emits token-minimal outline text for autonomous agents (`matryca-parse agent-read`). |
|
|
167
|
-
| **
|
|
199
|
+
| **Native Markdown Serialization** | [`logseq_markdown.py`](src/logseq_matryca_parser/logseq_markdown.py) + [`logseq_paths.py`](src/logseq_matryca_parser/logseq_paths.py): rebuild and write Logseq-compliant markdown pages from an AST—page properties as raw `key:: value` lines, block properties indented at **parent whitespace + exactly 2 spaces**, and namespace titles mapped via **`___`** pathing rules. |
|
|
200
|
+
| **Headless Write Engine** | [`agent_writer.py`](src/logseq_matryca_parser/agent_writer.py): **`append_child_to_node`** splices child bullets into on-disk Markdown from AST topology; **`serialize_logseq_page`** / **`write_logseq_page`** emit full pages; **`matryca-parse agent-write`** resolves aliases via **`.matryca_xray_state.json`**. |
|
|
168
201
|
| **AST Linters** | **`LogseqGraph.get_broken_references()`** returns originating nodes when `block_refs` target UUIDs absent from the global registry. |
|
|
169
202
|
| **Sovereign AI** | 100% Local. Zero telemetry. Private by design. |
|
|
170
203
|
|
|
@@ -211,12 +244,17 @@ matryca-parse export /path/to/logseq/graph output --format obsidian
|
|
|
211
244
|
|
|
212
245
|
### Python API
|
|
213
246
|
```python
|
|
247
|
+
from logseq_matryca_parser.graph import LogseqGraph
|
|
214
248
|
from logseq_matryca_parser.logos_parser import LogosParser
|
|
215
249
|
from logseq_matryca_parser.synapse import SynapseAdapter
|
|
216
250
|
|
|
217
|
-
# Parse to AST
|
|
251
|
+
# Parse a single page to AST
|
|
218
252
|
page = LogosParser().parse_page_file("page.md")
|
|
219
253
|
|
|
254
|
+
# Load the whole vault (pages, backlinks, node registry)
|
|
255
|
+
graph = LogseqGraph.load_directory("/path/to/logseq/graph")
|
|
256
|
+
effective = graph.get_effective_properties(graph.pages["My Page"].root_nodes[0].uuid)
|
|
257
|
+
|
|
220
258
|
# Export to LangChain with lineage metadata
|
|
221
259
|
docs = SynapseAdapter.to_langchain_documents(page.root_nodes, source_name=page.title)
|
|
222
260
|
```
|
|
@@ -187,9 +187,17 @@ Auxiliary **FORGE** serialization (JSON / flat Markdown / Obsidian) appears as a
|
|
|
187
187
|
- **Push** the freshly built `LogseqNode` onto the stack and register its UUID with `PageRegistry` for deterministic identity and future block-reference linkage.
|
|
188
188
|
This yields **finite-state, linear-time** traversal with explicit ascend/descend behavior — not regex-driven whole-document guessing.
|
|
189
189
|
|
|
190
|
-
- **Spatial indentation rules.** In Logseq, **indentation defines the AST**, not list decoration. Heading blocks and bullets both participate as first-class structural lines. Levels are **normalized post-pass** to tree depth (`_normalize_indent_levels`) so persisted `indent_level` reflects hierarchical depth independent of authoring quirks after stack repair.
|
|
190
|
+
- **Spatial indentation rules.** In Logseq, **indentation defines the AST**, not list decoration. Heading blocks and bullets both participate as first-class structural lines. The bullet detector accepts **unordered markers** (`-`, `*`) and **ordered-list markers** (`1. `, `12. `, …) via a shared `BULLET_PATTERN`, so numbered outlines participate in the stack machine like standard bullets. Levels are **normalized post-pass** to tree depth (`_normalize_indent_levels`) so persisted `indent_level` reflects hierarchical depth independent of authoring quirks after stack repair.
|
|
191
191
|
|
|
192
|
-
- **
|
|
192
|
+
- **GFM task checkboxes (before Org-mode tasks).** On the first line of a block, GitHub-flavored checkboxes are recognized and mapped to **`task_status`** before Org-mode prefix fallback: `[ ]` → **`TODO`**, `[-]` → **`DOING`**, `[x]` / `[X]` → **`DONE`**. The checkbox token is stripped from **`clean_text`** so embeddings stay prose-only.
|
|
193
|
+
|
|
194
|
+
- **Org-mode task prefixes (extended).** After checkbox handling, **`_extract_task_status`** matches longest-first Org prefixes (`TODO`, `DOING`, `DELEGATED`, `IN-PROGRESS`, …) at the start of the first line and promotes the remainder to **`clean_text`**.
|
|
195
|
+
|
|
196
|
+
- **Protected regions (entity extraction dead zones).** Wikilink, tag, and block-reference harvesters run on **`_shield_inline_code`**-masked text so literals inside **fenced code** (backtick and tilde fences), **inline code**, **LaTeX** (`$…$` and `$$…$$`), **`#+BEGIN_QUERY` … `#+END_QUERY`** blocks (parse-loop state plus shielding), and **Org drawers** do not produce false graph tokens. **`{{embed [[Page]]}}`** and similar macros are **not** fully opaque: nested wikilinks inside embed bodies are harvested for graph indexing.
|
|
197
|
+
|
|
198
|
+
- **Block properties & `id::`.** Subsequent lines matching `key:: value` attach to **`current_node`** only while **`properties_allowed`** remains true (contiguous property window immediately under the bullet). A **soft-break** continuation disables further property extraction; later `key::` lines merge into **`content`** as plain text. Keys are normalized with **`_normalize_property_key`** (lowercase) for Datomic parity. An empty value (`alias::` with no inline text) opens a **pending bullet-list** accumulator: indented `-` / `*` lines deeper than the property line become **`list[str]`** values without creating child **`LogseqNode`** entries. Page frontmatter uses the same key normalization (`TITLE::` ≡ `title::`). Parsed properties live in **`LogseqNode.properties`**. Native **`id::`** values are preserved in **`source_uuid`** (and in **`properties["id"]`** when applicable) so **`((uuid))`** references match Logseq; the parser’s stable **`uuid`** field remains the synthetic identity used for AST wiring and adapters.
|
|
199
|
+
|
|
200
|
+
- **Aliased block references in `clean_text`.** Markdown links of the form **`[Visible](((uuid)))`** are reduced to **`Visible`** in **`clean_text`** (brackets stripped) while UUIDs still populate **`block_refs`** for graph resolution.
|
|
193
201
|
|
|
194
202
|
#### Sovereign UUID architecture and zero-corruption guarantee
|
|
195
203
|
|
|
@@ -276,6 +284,24 @@ Both paths keep **existing topology intact** relative to their contract: append-
|
|
|
276
284
|
|
|
277
285
|
The **in-memory graph** ([`graph.py`](../src/logseq_matryca_parser/graph.py)) is the runtime **RAM image** of the sovereign vault: `pages: dict[str, LogseqPage]`, a private **`_node_registry`** keyed by synthetic block UUID, and a **`_backlink_registry`** mapping normalized link targets to source node UUIDs.
|
|
278
286
|
|
|
287
|
+
#### Page title overrides and alias indexing (`_enrich_pages_index`)
|
|
288
|
+
|
|
289
|
+
After every bulk or incremental parse, the graph applies a **post-parse enrichment pass** before backlink construction:
|
|
290
|
+
|
|
291
|
+
1. **Filename → canonical title.** Each markdown file is first keyed by **`derive_page_title_from_source_path`** (see §3.9).
|
|
292
|
+
2. **`title::` override.** If page frontmatter contains a non-empty string **`title`**, the frozen `LogseqPage` is updated via **`model_copy(update={"title": custom})`**, the old filename key is removed from **`pages`**, and the page is re-inserted under the custom title (collision with another file’s title is skipped with a debug log).
|
|
293
|
+
3. **Alias injection.** For each canonical dict entry where **`dict_key == page.title`**, values from **`alias::`** and **`aliases::`** are normalized (comma-separated strings or Python lists; `[[Page]]` / `#tag` adornments stripped using the same rules as [`logseq_markdown.py`](../src/logseq_matryca_parser/logseq_markdown.py)) and registered as **additional keys** pointing at the **same `LogseqPage` instance** — e.g. `pages["Dev"]` and `pages["Development"]` share identity.
|
|
294
|
+
4. **Backlinks.** **`_build_backlink_registry`** walks **unique pages** (`id(page)` deduplication) so alias keys do not double-count outgoing links. Incoming wikilinks such as **`[[Dev]]`** normalize to lowercase registry keys and resolve through **`get_backlinks("Dev")`** like any other page title.
|
|
295
|
+
|
|
296
|
+
**Incremental parity:** **`invalidate_and_reload_page`** drops **all** `pages` keys tied to the file’s `source_path` (not only the first alias hit), merges the freshly parsed page, re-runs **`_enrich_pages_index`**, then re-registers nodes and appends backlinks for the enriched instance. **`_page_title_for_source_path`** returns the canonical **`page.title`**, not an arbitrary alias key.
|
|
297
|
+
|
|
298
|
+
```python
|
|
299
|
+
graph = LogseqGraph.load_directory("/vault")
|
|
300
|
+
dev = graph.pages["Dev"] # alias key
|
|
301
|
+
assert dev is graph.pages["Development"]
|
|
302
|
+
assert linker in graph.get_backlinks("Dev")
|
|
303
|
+
```
|
|
304
|
+
|
|
279
305
|
#### Namespace shadowing (`resolve_relative_page_link`)
|
|
280
306
|
|
|
281
307
|
Relative page resolution follows **Logseq-style longest-prefix wins**: for a current page title split on **`/`** (namespace segments), the resolver tries candidates **`prefix + "/" + link_target`** for prefixes from **full namespace down to empty**, and returns the **first title that exists** in `pages`. Only if no contextual page exists does it fall back to a **global** title match. Thus a contextual page **`Progetti/AI/Sviluppo`** **shadows** a global **`Sviluppo`** when resolving from **`Progetti/AI/Matryca`** — matching the **nested-namespace shadowing** semantics described in the scoping roadmap.
|
|
@@ -287,9 +313,9 @@ Full-directory loads are expensive for always-on agents. **`invalidate_and_reloa
|
|
|
287
313
|
1. Ignore paths outside tracked **`pages/*.md`** and **`journals/*.md`**.
|
|
288
314
|
2. Re-parse the file with **`StackMachineParser.parse_page_file`**, producing a fresh `LogseqPage`.
|
|
289
315
|
3. If the path previously mapped to a page, collect **all synthetic UUIDs** from the old tree and call **`_purge_stale_page_uuids`**: remove each UUID from **`_node_registry`**, scrub those UUIDs from every **`_backlink_registry`** source list, and delete backlink keys that become empty.
|
|
290
|
-
4.
|
|
316
|
+
4. Remove every **`pages`** key whose value shares the file’s **`source_path`**, insert the freshly parsed page under its filename title, run **`_enrich_pages_index`** (title + aliases), then **`_register_page_nodes`** and **`_append_page_backlinks`** for the enriched page.
|
|
291
317
|
|
|
292
|
-
This keeps **global indexes consistent** without rebuilding the entire graph.
|
|
318
|
+
This keeps **global indexes consistent** without rebuilding the entire graph — including alias keys and custom titles declared in frontmatter.
|
|
293
319
|
|
|
294
320
|
#### Live filesystem watcher (`start_watching`)
|
|
295
321
|
|
|
@@ -358,6 +384,46 @@ Rich styling injects **ANSI escape sequences** that waste tokens and can cause m
|
|
|
358
384
|
|
|
359
385
|
This complements §3.4 **AGENT WRITER** (weekly append + headless splice) and §3.2 **SYNAPSE** (human/RAG chunking): one stack, multiple projections — **enriched chunks for vectors**, **X-Ray + alias state for agent context**, **append / splice for durable writes**.
|
|
360
386
|
|
|
387
|
+
### 3.8 Bidirectional I/O and Logseq Layouts
|
|
388
|
+
|
|
389
|
+
Wave 12 established **surgical writes** (single-line splices); v1.0 completes the loop with **full page round-tripping**. [`logseq_markdown.py`](../src/logseq_matryca_parser/logseq_markdown.py) is the native serializer that projects a parsed **`LogseqPage`** back onto sovereign Spatial Markdown — the inverse of LOGOS ingestion.
|
|
390
|
+
|
|
391
|
+
#### Page properties (file header)
|
|
392
|
+
|
|
393
|
+
Page-level metadata is emitted as **raw `key:: value` lines** at the top of the file — no YAML frontmatter wrapper. [`format_logseq_page_properties`](../src/logseq_matryca_parser/logseq_markdown.py) renders each entry on its own line (list-valued keys such as `tags::` are flattened to comma-separated tokens), followed by a **blank separator line** before the first outline bullet. This mirrors how Logseq stores page properties in vanilla `.md` exports and keeps Git diffs line-granular.
|
|
394
|
+
|
|
395
|
+
#### Block properties (strict indentation contract)
|
|
396
|
+
|
|
397
|
+
Block-scoped properties are serialized **immediately after the bullet text line**, never interleaved with child bullets. The indent rule is strict and deterministic:
|
|
398
|
+
|
|
399
|
+
```text
|
|
400
|
+
{parent_leading_whitespace} {key}:: {value}
|
|
401
|
+
```
|
|
402
|
+
|
|
403
|
+
That is, take the **exact leading whitespace** of the parent bullet line and append **exactly two additional spaces** (`_block_property_indent`). Continuation lines of multiline block bodies use the same `parent + 2` column. [`format_logseq_block_property_lines`](../src/logseq_matryca_parser/logseq_markdown.py) respects **`properties_order`** when present so round-trips preserve author ordering.
|
|
404
|
+
|
|
405
|
+
#### Full-page emission
|
|
406
|
+
|
|
407
|
+
[`serialize_logseq_page`](../src/logseq_matryca_parser/logseq_markdown.py) walks `page.root_nodes` depth-first, emitting `- {first_line}` bullets scaled by `indent_level × tab_size`, then property lines, then continuations, then children. [`write_logseq_page`](../src/logseq_matryca_parser/logseq_markdown.py) persists the result with UTF-8 encoding. Together with §3.4’s **`append_child_to_node`**, the stack now supports **point mutations** and **whole-page regeneration** from the same AST — bidirectional I/O without Logseq’s HTTP API.
|
|
408
|
+
|
|
409
|
+
### 3.9 Namespace & Path Translation
|
|
410
|
+
|
|
411
|
+
Semantic page titles and OS filesystem paths speak different dialects. [`logseq_paths.py`](../src/logseq_matryca_parser/logseq_paths.py) centralizes that translation so graph loaders, exporters, and the write engine agree on **where a page lives on disk**.
|
|
412
|
+
|
|
413
|
+
#### Title ↔ filename mapping
|
|
414
|
+
|
|
415
|
+
Logseq namespaces use **`/`** in titles (e.g. `Projects/AI`). On disk, each segment is flattened into a single filename stem with the **`___`** separator and percent-encoding for reserved characters — e.g. `Projects/AI` → `Projects___AI.md`. The inverse helpers **`filename_to_page_title`** and **`derive_page_title_from_source_path`** reconstruct semantic titles from `pages/` or `journals/` paths, including nested directory layouts when namespace segments are stored as folders.
|
|
416
|
+
|
|
417
|
+
| Direction | Function | Example |
|
|
418
|
+
| --------- | -------- | ------- |
|
|
419
|
+
| Title → stem | `page_title_to_filename` | `Projects/AI` → `Projects___AI` |
|
|
420
|
+
| Stem → title | `filename_to_page_title` | `Projects___AI` → `Projects/AI` |
|
|
421
|
+
| Title → relative path | `page_title_to_relative_path` | `pages/Projects___AI.md` |
|
|
422
|
+
|
|
423
|
+
#### Graph discovery filters
|
|
424
|
+
|
|
425
|
+
When scanning a vault root, **`is_excluded_graph_path`** drops noise directories — notably **`.recycle`**, **`.git`**, and the internal **`logseq`** config tree — so incremental watchers and bulk loaders never ingest backup blobs or VCS metadata as pages. This keeps **`LogseqGraph.load_directory`** and **`invalidate_and_reload_page`** focused on sovereign content under `pages/` and `journals/`.
|
|
426
|
+
|
|
361
427
|
---
|
|
362
428
|
|
|
363
429
|
## 4. Data Flow Sequence
|
|
@@ -425,4 +491,4 @@ Recursive and character-budget chunkers assume **approximately flat prose**. Log
|
|
|
425
491
|
|
|
426
492
|
---
|
|
427
493
|
|
|
428
|
-
*This document reflects the implementations in `src/logseq_matryca_parser/logos_parser.py`, `synapse.py`, `graph.py`, `forge.py`, `lens.py`, `logos_core.py`, `agent_writer.py`, and `
|
|
494
|
+
*This document reflects the implementations in `src/logseq_matryca_parser/logos_parser.py`, `synapse.py`, `graph.py`, `forge.py`, `lens.py`, `logos_core.py`, `agent_writer.py`, `agent_press.py`, `logseq_markdown.py`, and `logseq_paths.py`, and complements narrative primers such as [`logseq_ast_primer.md`](logseq_ast_primer.md).*
|
|
@@ -0,0 +1,67 @@
|
|
|
1
|
+
# Release process
|
|
2
|
+
|
|
3
|
+
**Logseq Matryca Parser** (The Logos Protocol · Marco Porcellato · [Matryca.ai](https://matryca.ai)) uses a **curated** [`CHANGELOG.md`](../CHANGELOG.md) (Keep a Changelog). PyPI publishing is triggered when you push a `v*` git tag.
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
## During development
|
|
8
|
+
|
|
9
|
+
Add user-facing bullets under **`## [Unreleased]`** (`Added` / `Changed` / `Fixed` / `Removed` / `Security`). One line per notable change. See [`.cursor/rules/05-auto-changelog.mdc`](../.cursor/rules/05-auto-changelog.mdc).
|
|
10
|
+
|
|
11
|
+
---
|
|
12
|
+
|
|
13
|
+
## Release day (local)
|
|
14
|
+
|
|
15
|
+
Replace `X.Y.Z` with the semver you are shipping (no `v` prefix in `pyproject.toml`; use `vX.Y.Z` for the git tag).
|
|
16
|
+
|
|
17
|
+
### 1. Prepare (Cursor or manual)
|
|
18
|
+
|
|
19
|
+
- [ ] Move everything from `[Unreleased]` to `## [X.Y.Z] - YYYY-MM-DD` in `CHANGELOG.md`
|
|
20
|
+
- [ ] Leave an empty `## [Unreleased]` section at the top
|
|
21
|
+
- [ ] Set `version = "X.Y.Z"` in `pyproject.toml`
|
|
22
|
+
- [ ] Run `make all` (ruff, mypy, pytest)
|
|
23
|
+
|
|
24
|
+
**Cursor shortcut:** ask the agent to *“prepare release vX.Y.Z”* (see [`.cursor/rules/04-release-preparation.mdc`](../.cursor/rules/04-release-preparation.mdc)).
|
|
25
|
+
|
|
26
|
+
### 2. Verify release notes (optional but recommended)
|
|
27
|
+
|
|
28
|
+
```bash
|
|
29
|
+
python scripts/extract_changelog.py vX.Y.Z | less
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
You should see exactly the section that will appear on GitHub if you attach release notes manually.
|
|
33
|
+
|
|
34
|
+
### 3. Commit, tag, push
|
|
35
|
+
|
|
36
|
+
```bash
|
|
37
|
+
git add CHANGELOG.md pyproject.toml
|
|
38
|
+
git commit -m "chore: release X.Y.Z"
|
|
39
|
+
git tag vX.Y.Z
|
|
40
|
+
git push origin main
|
|
41
|
+
git push origin vX.Y.Z
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
### 4. CI does the rest
|
|
45
|
+
|
|
46
|
+
On tag push, [`.github/workflows/pypi_publish.yml`](../.github/workflows/pypi_publish.yml):
|
|
47
|
+
|
|
48
|
+
1. Builds sdist and wheel with `python -m build`
|
|
49
|
+
2. Publishes to PyPI (trusted publishing)
|
|
50
|
+
|
|
51
|
+
---
|
|
52
|
+
|
|
53
|
+
## Troubleshooting
|
|
54
|
+
|
|
55
|
+
| Problem | Fix |
|
|
56
|
+
|---------|-----|
|
|
57
|
+
| PyPI version already exists | Bump patch version; never re-use a published version. |
|
|
58
|
+
| Notes look wrong | Re-run locally: `python scripts/extract_changelog.py vX.Y.Z` and compare to `CHANGELOG.md`. |
|
|
59
|
+
| CI fails on tests | Run `make all` locally before tagging. |
|
|
60
|
+
|
|
61
|
+
---
|
|
62
|
+
|
|
63
|
+
## Related
|
|
64
|
+
|
|
65
|
+
- [`CHANGELOG.md`](../CHANGELOG.md)
|
|
66
|
+
- [`CONTRIBUTING.md`](../CONTRIBUTING.md) — quality gates before tag
|
|
67
|
+
- [`scripts/extract_changelog.py`](../scripts/extract_changelog.py)
|