logseq-matryca-parser 1.1.1__tar.gz → 1.2.0__tar.gz
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- logseq_matryca_parser-1.2.0/.github/workflows/github_release.yml +53 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/CHANGELOG.md +27 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/CONTRIBUTING.md +1 -1
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/PKG-INFO +55 -28
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/README.md +54 -27
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/docs/ARCHITECTURE.md +45 -8
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/docs/RELEASE_PROCESS.md +21 -4
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/docs/design-docs/LOGSEQ_ASSET_RESOLUTION_SPEC.md +2 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/docs/logseq_ast_primer.md +74 -9
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/pyproject.toml +1 -1
- logseq_matryca_parser-1.2.0/scripts/debug_pre_release.py +170 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/src/logseq_matryca_parser/__init__.py +1 -1
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/src/logseq_matryca_parser/agent_writer.py +1 -1
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/src/logseq_matryca_parser/graph.py +58 -6
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/src/logseq_matryca_parser/logos_core.py +3 -1
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/src/logseq_matryca_parser/logos_parser.py +132 -21
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/src/logseq_matryca_parser/logseq_markdown.py +114 -9
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/src/logseq_matryca_parser/logseq_paths.py +2 -1
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/tests/test_graph.py +15 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/tests/test_logos_parser.py +221 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/tests/test_logseq_paths.py +12 -0
- logseq_matryca_parser-1.2.0/tests/test_pre_release_roundtrip.py +79 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/uv.lock +1 -1
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/.cursorignore +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/.github/FUNDING.yml +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/.github/ISSUE_TEMPLATE/bug_report.yml +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/.github/ISSUE_TEMPLATE/feature_request.yml +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/.github/PULL_REQUEST_TEMPLATE.md +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/.github/dependabot.yml +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/.github/workflows/ci.yml +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/.github/workflows/pypi_publish.yml +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/.gitignore +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/.pre-commit-config.yaml +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/.repomixignore +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/LICENSE +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/Makefile +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/NOTICE +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/ROADMAP_AGENT_NATIVE_XRAY.md +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/ROADMAP_HEADLESS_WRITER.md +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/ROADMAP_OBSIDIAN_ADAPTER.md +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/SECURITY.md +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/claude-skill-logseq-read/SKILL.md +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/claude-skill-logseq-read/scripts/parse_logseq.py +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/docs/design-docs/ARCHITECTURE_BLUEPRINT.md +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/docs/design-docs/CODE_SCAFFOLD.md +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/docs/design-docs/LOGSEQ_DATASCRIPT_MAPPING.md +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/docs/design-docs/LOGSEQ_TEMPORAL_ONTOLOGY.md +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/docs/design-docs/OFFICIAL_MLDOC_SPECS.md +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/docs/design-docs/REFERENCE_SPEC.md +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/docs/error_log.md +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/docs/roadmaps/ROADMAP_CLI_HYDRATION_AND_ENRICHMENT.md +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/docs/roadmaps/ROADMAP_CONTEXT_SYNTHESIS_AND_SCOPING.md +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/docs/roadmaps/ROADMAP_EMBED_EXPANSION_AND_FLUENT_QUERIES.md +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/docs/roadmaps/ROADMAP_GRAPH_RAG_SEMANTICS.md +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/docs/roadmaps/ROADMAP_INCREMENTAL_WATCHER.md +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/docs/roadmaps/ROADMAP_INLINE_SHIELD_AND_NAMESPACES.md +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/docs/roadmaps/ROADMAP_ROBUSTNESS_AND_SOFT_BREAKS.md +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/docs/roadmaps/ROADMAP_TOML_FIX_AND_PYPI_DISTRIBUTION.md +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/docs/roadmaps/ROADMAP_UUID_AND_GRAPH_SUPERPOWERS.md +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/examples/demo_logseq_journal.md +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/examples/run_demo.py +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/legacy/local_digestor.py +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/lib/bindings/utils.js +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/lib/tom-select/tom-select.complete.min.js +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/lib/tom-select/tom-select.css +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/lib/vis-9.1.2/vis-network.css +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/lib/vis-9.1.2/vis-network.min.js +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/repomix-output-parser.xml +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/scripts/extract_changelog.py +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/src/logseq_matryca_parser/.gitignore +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/src/logseq_matryca_parser/NOTICE +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/src/logseq_matryca_parser/__main__.py +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/src/logseq_matryca_parser/agent_press.py +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/src/logseq_matryca_parser/exceptions.py +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/src/logseq_matryca_parser/forge.py +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/src/logseq_matryca_parser/kinetic.py +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/src/logseq_matryca_parser/lens.py +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/src/logseq_matryca_parser/pyproject.toml +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/src/logseq_matryca_parser/synapse.py +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/tests/test_agent_press.py +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/tests/test_agent_writer.py +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/tests/test_forge.py +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/tests/test_kinetic.py +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/tests/test_lens.py +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/tests/test_logseq_markdown.py +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/tests/test_package_version.py +0 -0
- {logseq_matryca_parser-1.1.1 → logseq_matryca_parser-1.2.0}/tests/test_synapse.py +0 -0
|
@@ -0,0 +1,53 @@
|
|
|
1
|
+
name: GitHub Release
|
|
2
|
+
|
|
3
|
+
on:
|
|
4
|
+
push:
|
|
5
|
+
tags:
|
|
6
|
+
- "v*"
|
|
7
|
+
workflow_dispatch:
|
|
8
|
+
inputs:
|
|
9
|
+
tag:
|
|
10
|
+
description: "Existing tag to publish (e.g. v1.1.1)"
|
|
11
|
+
required: true
|
|
12
|
+
type: string
|
|
13
|
+
|
|
14
|
+
permissions:
|
|
15
|
+
contents: write
|
|
16
|
+
|
|
17
|
+
jobs:
|
|
18
|
+
release:
|
|
19
|
+
runs-on: ubuntu-latest
|
|
20
|
+
steps:
|
|
21
|
+
- name: Checkout
|
|
22
|
+
uses: actions/checkout@v6
|
|
23
|
+
with:
|
|
24
|
+
fetch-depth: 0
|
|
25
|
+
|
|
26
|
+
- name: Resolve tag name
|
|
27
|
+
id: tag
|
|
28
|
+
run: |
|
|
29
|
+
if [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
|
|
30
|
+
echo "name=${{ inputs.tag }}" >> "$GITHUB_OUTPUT"
|
|
31
|
+
else
|
|
32
|
+
echo "name=${GITHUB_REF#refs/tags/}" >> "$GITHUB_OUTPUT"
|
|
33
|
+
fi
|
|
34
|
+
|
|
35
|
+
- name: Build release notes from CHANGELOG
|
|
36
|
+
id: notes
|
|
37
|
+
run: |
|
|
38
|
+
BODY_FILE="$(mktemp)"
|
|
39
|
+
python scripts/extract_changelog.py "${{ steps.tag.outputs.name }}" > "$BODY_FILE"
|
|
40
|
+
{
|
|
41
|
+
echo "body<<CHANGELOG_EOF"
|
|
42
|
+
cat "$BODY_FILE"
|
|
43
|
+
echo "CHANGELOG_EOF"
|
|
44
|
+
} >> "$GITHUB_OUTPUT"
|
|
45
|
+
|
|
46
|
+
- name: Create or update GitHub Release
|
|
47
|
+
uses: softprops/action-gh-release@v2
|
|
48
|
+
with:
|
|
49
|
+
tag_name: ${{ steps.tag.outputs.name }}
|
|
50
|
+
name: ${{ steps.tag.outputs.name }}
|
|
51
|
+
body: ${{ steps.notes.outputs.body }}
|
|
52
|
+
generate_release_notes: false
|
|
53
|
+
make_latest: true
|
|
@@ -7,6 +7,33 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
|
|
|
7
7
|
|
|
8
8
|
## [Unreleased]
|
|
9
9
|
|
|
10
|
+
## [1.2.0] - 2026-05-29
|
|
11
|
+
|
|
12
|
+
### Added
|
|
13
|
+
|
|
14
|
+
- **Asset extraction** — `LogseqNode.assets` collects markdown images, `{{pdf}}` macros, and local `[label](path)` attachments; `resolve_asset_path` decodes percent-encoded paths (`%20`).
|
|
15
|
+
- **YAML frontmatter** — `---` blocks at file start populate `LogseqPage.properties` like native Logseq page properties.
|
|
16
|
+
- **`page-tags::`** — block and page properties named `page-tags` inject implicit graph tokens like `tags::`.
|
|
17
|
+
|
|
18
|
+
### Fixed
|
|
19
|
+
|
|
20
|
+
- **Round-trip serialization** — soft-break continuations no longer double-indent; list-shaped block properties (`tags::` + bullets) serialize as Logseq bullet lists instead of Python repr; `:LOGBOOK:` drawers and derived temporal fields (`scheduled::`, `repeater::`, …) are not emitted as bogus `key::` lines; YAML frontmatter pages round-trip with `---` fences and stable block UUIDs; `title` from YAML or `title::` frontmatter overrides the graph page title for deterministic IDs.
|
|
21
|
+
- **Property comma-split in wikilinks** — `tags::` / `alias::` comma separation ignores commas inside `[[...]]` tokens.
|
|
22
|
+
- **Properties after code fences** — `key::` lines immediately following a closing fence are parsed into block properties (Logseq contiguity exception).
|
|
23
|
+
- **Org warning periods** — `DEADLINE` / `SCHEDULED` payloads with `-3d`-style warning periods parse without datetime failures.
|
|
24
|
+
- **Quoted property values** — outer `"` / `'` are stripped from block property values in the AST.
|
|
25
|
+
- **Query macro shielding** — `{{query}}` / `{{advancedquery}}` inline macros do not emit false wikilink graph tokens (embed macros still do).
|
|
26
|
+
- **Case-insensitive page routing** — `LogseqGraph.get_page` and `resolve_relative_page_link` resolve titles via a lowercase index (Datomic / Logseq parity).
|
|
27
|
+
- **HTML comment shielding** — wikilinks and tags inside `<!-- ... -->` are masked before entity extraction (no ghost graph links).
|
|
28
|
+
- **Graph token parity** — list-shaped block properties (`tags::` with bullets) feed wikilinks/tags into the AST; page-level properties (YAML and `key::` frontmatter) merge into `page.refs`.
|
|
29
|
+
- **Temporal ranges and repeaters** — `SCHEDULED` / `DEADLINE` markers with `HH:MM - HH:MM` ranges parse using the start time; repeater tokens (`.+1w`, `++1d`) are stripped before datetime parsing.
|
|
30
|
+
- **Legacy namespace filenames** — `filename_to_page_title` decodes `___`, `%2F`, and Dendron-style `.` separators.
|
|
31
|
+
- **BOM-prefixed graph files** — `parse_page_file` reads with `utf-8-sig` so Windows-synced BOM bytes do not break the first bullet.
|
|
32
|
+
- **Markdown escape shielding** — `\#` and `\[\[` no longer yield tags or wikilinks in graph metadata.
|
|
33
|
+
- **Empty bullets** — bare `-` / `*` lines parse as empty blocks instead of failing `BULLET_PATTERN`.
|
|
34
|
+
- **Wikilink header anchors** — `[[Page#Section]]` resolves to the page name only for graph routing.
|
|
35
|
+
- **Hybrid alias links** — `[Alias]([[Page]])` is no longer treated as a file asset.
|
|
36
|
+
|
|
10
37
|
## [1.1.1] - 2026-05-28
|
|
11
38
|
|
|
12
39
|
### Added
|
|
@@ -15,7 +15,7 @@ User-facing behavior is documented in:
|
|
|
15
15
|
- [`README.md`](README.md) — overview, quickstart, and feature matrix
|
|
16
16
|
- [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md) — LOGOS, SYNAPSE, `LogseqGraph`, agents, and data flow
|
|
17
17
|
- [`docs/logseq_ast_primer.md`](docs/logseq_ast_primer.md) — Logseq Spatial Markdown domain rules
|
|
18
|
-
- [`CHANGELOG.md`](CHANGELOG.md) — shipped releases (current: **1.
|
|
18
|
+
- [`CHANGELOG.md`](CHANGELOG.md) — shipped releases (current: **1.2.0**) and **Unreleased** changes (Keep a Changelog)
|
|
19
19
|
- [`docs/RELEASE_PROCESS.md`](docs/RELEASE_PROCESS.md) — version bump, tag, and PyPI publish checklist
|
|
20
20
|
|
|
21
21
|
When you add or change observable parser or graph behavior, update the relevant doc sections and add a bullet under **`## [Unreleased]`** in `CHANGELOG.md` (see [`.cursor/rules/05-auto-changelog.mdc`](.cursor/rules/05-auto-changelog.mdc)).
|
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
Metadata-Version: 2.4
|
|
2
2
|
Name: logseq-matryca-parser
|
|
3
|
-
Version: 1.
|
|
3
|
+
Version: 1.2.0
|
|
4
4
|
Summary: The Logos Protocol: Deterministic Logseq AST parsing for Matryca.ai.
|
|
5
5
|
Project-URL: Homepage, https://github.com/MarcoPorcellato/logseq-matryca-parser
|
|
6
6
|
Project-URL: Bug Tracker, https://github.com/MarcoPorcellato/logseq-matryca-parser/issues
|
|
@@ -43,11 +43,12 @@ Description-Content-Type: text/markdown
|
|
|
43
43
|
[](https://github.com/MarcoPorcellato/logseq-matryca-parser/actions)
|
|
44
44
|
[](https://www.python.org/downloads/)
|
|
45
45
|
[](https://github.com/MarcoPorcellato/logseq-matryca-parser/blob/main/LICENSE)
|
|
46
|
-
[](https://pypi.org/project/logseq-matryca-parser/)
|
|
47
|
+
[](https://pypi.org/project/logseq-matryca-parser/)
|
|
47
48
|
[](#)
|
|
48
49
|

|
|
49
50
|
|
|
50
|
-
**v1.
|
|
51
|
+
**v1.2.0** — Graph parity, multimodal assets & format-preserving round-trip (see [CHANGELOG](CHANGELOG.md)) — **233 tests**, YAML frontmatter ingest/serialize, asset path resolution, case-insensitive page routing, and extended LOGOS shielding; ready for production Enterprise integration.
|
|
51
52
|
|
|
52
53
|
> *Turning a forest of local plain-text files into a unified semantic powerhouse.*
|
|
53
54
|
|
|
@@ -101,6 +102,8 @@ It acts as the strict **File System Driver** for your LLM OS. By using a determi
|
|
|
101
102
|
| **Property inheritance** | Page-level frontmatter at best | **`get_effective_properties`**: page + ancestor outline keys merged top-down (Org-mode style), then exposed on enriched chunks |
|
|
102
103
|
| **Live sync** | Re-read whole tree or poll | **`LogseqGraph.start_watching()`** (optional `watchdog`): **per-file invalidation** — re-parse one page, purge stale UUIDs from registries, refresh backlinks |
|
|
103
104
|
| **Page aliases & titles** | Filename-only or manual link maps | **`title::`**, **`alias::`** / **`aliases::`** re-key `graph.pages` and wire **backlinks** for alias wikilinks |
|
|
105
|
+
| **Case-insensitive pages** | Exact string match on filenames | **`get_page`** / **`resolve_relative_page_link`** use a lowercase index (Datomic / Logseq parity) |
|
|
106
|
+
| **Attachments & assets** | Opaque `` text in chunks | **`LogseqNode.assets`** + **`LogseqPage.resolve_asset_path`** for graph-root PDFs and images |
|
|
104
107
|
|
|
105
108
|
---
|
|
106
109
|
|
|
@@ -139,37 +142,57 @@ Logseq Matryca Parser is a deterministic **Stack-Machine engine** that acts as t
|
|
|
139
142
|
|
|
140
143
|
---
|
|
141
144
|
|
|
142
|
-
## ⚡ Recent superpowers (v1.
|
|
145
|
+
## ⚡ Recent superpowers (v1.2.0)
|
|
143
146
|
|
|
144
|
-
###
|
|
147
|
+
### Graph parity, assets, and parser hardening
|
|
145
148
|
|
|
146
149
|
| Area | Capability |
|
|
147
150
|
| :--- | :--- |
|
|
148
|
-
| **
|
|
149
|
-
| **
|
|
150
|
-
|
|
|
151
|
-
| **
|
|
152
|
-
| **
|
|
153
|
-
| **Property
|
|
154
|
-
|
|
155
|
-
|
|
156
|
-
|
|
151
|
+
| **Asset extraction** | `LogseqNode.assets` collects markdown images, `{{pdf}}` macros, and local `[label](path)` attachments; `LogseqPage.resolve_asset_path` maps to absolute paths (`%20` decode, graph-root relative). |
|
|
152
|
+
| **YAML frontmatter** | `---` blocks at file start populate `LogseqPage.properties` like native `key::` lines; **`title:`** in YAML sets `page.title` at parse; **`serialize_logseq_page`** preserves `---` fences on round-trip when the source file used YAML. |
|
|
153
|
+
| **`page-tags::`** | Block and page `page-tags::` inject implicit graph tokens like `tags::`; list-shaped values feed `refs`. |
|
|
154
|
+
| **Case-insensitive routing** | `LogseqGraph.get_page` and `resolve_relative_page_link` resolve titles via a lowercase index (Datomic parity). |
|
|
155
|
+
| **Extended shielding** | HTML comments, `{{query}}` / `{{advancedquery}}`, and escaped `\#` / `\[\[` do not emit false graph tokens (embed macros still harvest nested wikilinks). |
|
|
156
|
+
| **Property & temporal fixes** | Comma-split ignores commas inside `[[wikilinks]]`; properties after code fences; quoted value stripping; `SCHEDULED`/`DEADLINE` ranges, repeaters, and Org warning periods; legacy `___` / `%2F` / Dendron filenames; UTF-8 BOM via `utf-8-sig`. |
|
|
157
|
+
|
|
158
|
+
### Round-trip serialization (v1.2.0)
|
|
159
|
+
|
|
160
|
+
| Area | Capability |
|
|
161
|
+
| :--- | :--- |
|
|
162
|
+
| **Soft-break bodies** | Multiline block continuations serialize without double-indenting alignment spaces. |
|
|
163
|
+
| **List-shaped block props** | `tags::` / `page-tags::` with indented `-` bullets round-trip as Logseq lists (not Python repr). |
|
|
164
|
+
| **`:LOGBOOK:` drawers** | Org drawers re-emit as `:LOGBOOK:` / `:END:` blocks, not bogus `logbook::` property lines. |
|
|
165
|
+
| **Derived temporal keys** | Parsed `scheduled::`, `repeater::`, and related derived fields are omitted from serialized `key::` output. |
|
|
166
|
+
| **Stable block UUIDs** | Parse → `serialize_logseq_page` → parse preserves block `id::` / UUIDs on the same outline. |
|
|
157
167
|
|
|
158
168
|
```python
|
|
159
169
|
from logseq_matryca_parser.graph import LogseqGraph
|
|
170
|
+
from logseq_matryca_parser.logos_parser import LogosParser
|
|
160
171
|
|
|
161
172
|
graph = LogseqGraph.load_directory("/path/to/logseq/graph")
|
|
162
173
|
|
|
163
|
-
#
|
|
164
|
-
page = graph.pages["
|
|
174
|
+
# Case-insensitive page lookup
|
|
175
|
+
page = graph.get_page("my page") # same object as graph.pages["My Page"]
|
|
165
176
|
|
|
166
|
-
#
|
|
167
|
-
|
|
168
|
-
|
|
169
|
-
|
|
177
|
+
# Assets on a parsed block (Vision / document pipelines)
|
|
178
|
+
single = LogosParser().parse_page_file("pages/Notes.md")
|
|
179
|
+
block = single.root_nodes[0]
|
|
180
|
+
if block.assets:
|
|
181
|
+
abs_path = single.resolve_asset_path(block.assets[0])
|
|
170
182
|
```
|
|
171
183
|
|
|
172
|
-
Deep dive: [Architecture §3.6 — LogseqGraph](docs/ARCHITECTURE.md#36-logseqgraph--namespace-scoping-o1-invalidation-live-watch)
|
|
184
|
+
Deep dive: [Architecture §3.1 — LOGOS](docs/ARCHITECTURE.md#31-logos--deterministic-stack-machine-parsing) · [§3.6 — LogseqGraph](docs/ARCHITECTURE.md#36-logseqgraph--namespace-scoping-o1-invalidation-live-watch) · [AST primer](docs/logseq_ast_primer.md).
|
|
185
|
+
|
|
186
|
+
### Still included from v1.1.1
|
|
187
|
+
|
|
188
|
+
| Area | Capability |
|
|
189
|
+
| :--- | :--- |
|
|
190
|
+
| **Graph index** | `title::` / `TITLE::` overrides filename titles; `alias::` / `aliases::` inject extra `graph.pages` keys. |
|
|
191
|
+
| **Backlinks** | `[[Dev]]` resolves against alias keys (`get_backlinks("Dev")`). |
|
|
192
|
+
| **Incremental reload** | `invalidate_and_reload_page` re-applies title/alias enrichment after watcher edits. |
|
|
193
|
+
| **Parser shields** | LaTeX, `#+BEGIN_QUERY`, fenced code, drawers; `{{embed [[Page]]}}` harvests nested wikilinks. |
|
|
194
|
+
| **Property contiguity** | `key::` contiguous under bullets; soft-break closes the window (fence exception in v1.2.0). |
|
|
195
|
+
| **Tasks & bullets** | GFM checkboxes, extended Org markers, ordered-list bullets, aliased `((uuid))` clean text. |
|
|
173
196
|
|
|
174
197
|
### Obsidian-native export
|
|
175
198
|
Compile an entire Logseq graph into an **Obsidian vault layout**: YAML frontmatter from page properties, list body preserved, Logseq `((uuid))` links rewritten to **`[[Page#^anchor]]`**, and trailing **`^block-id`** on referenced blocks. Namespace titles become nested folders (e.g. `Projects/AI/Demo.md`).
|
|
@@ -225,14 +248,15 @@ For graph hygiene, **`LogseqGraph.get_broken_references()`** flags nodes whose `
|
|
|
225
248
|
|
|
226
249
|
| Feature | Description |
|
|
227
250
|
| :--- | :--- |
|
|
228
|
-
| **LOGOS Engine** | Deterministic AST parsing.
|
|
229
|
-
| **
|
|
251
|
+
| **LOGOS Engine** | Deterministic AST parsing. YAML + native frontmatter ingest, **format-preserving** `serialize_logseq_page` (YAML vs `key::` by source), list-shaped block property layout, **assets**, property contiguity (incl. post-fence), comma-safe wikilink splits, temporal ranges/repeaters, legacy filename decode, BOM-safe reads, and **shielded** code/math/query/HTML/escape regions. |
|
|
252
|
+
| **Multimodal assets** | **`LogseqNode.assets`** + **`LogseqPage.resolve_asset_path`** for PDFs and images relative to the graph root (Vision / document RAG). |
|
|
253
|
+
| **LogseqGraph** | In-memory vault: `pages` index (with **title/alias enrichment** and **case-insensitive lookup**), backlinks, effective properties, namespace resolution, fluent `GraphQuery`, optional **watchdog** invalidation. |
|
|
230
254
|
| **Advanced Task Extraction** | Task **state** (TODO / DOING / DELEGATED / IN-PROGRESS / …), **priority** markers `[#A]`–`[#C]` promoted to `task_priority`, and **SCHEDULED** / **DEADLINE** Logseq timestamps normalized to **UTC Unix epoch seconds** on `scheduled_at` / `deadline_at` for temporal graph and retrieval pipelines. |
|
|
231
255
|
| **SYNAPSE Adapter** | Native exports for **LangChain** and **LlamaIndex** with automated lineage metadata; **context-enriched** chunks with breadcrumbs, embed expansion, and inherited properties. |
|
|
232
256
|
| **FORGE** | JSON, clean Markdown, and **Obsidian** vault serialization (`ObsidianForgeVisitor`, `ForgeExporter.to_obsidian_markdown`). |
|
|
233
257
|
| **LENS Visualizer** | 60FPS interactive graph rendering (10k+ nodes) with Glassmorphism HUD. |
|
|
234
258
|
| **Agent-Native Printing Press** | [`agent_press.py`](src/logseq_matryca_parser/agent_press.py): **`SessionAliasRegistry`** maps session aliases ↔ block UUIDs; **`to_xray_markdown`** emits token-minimal outline text for autonomous agents (`matryca-parse agent-read`). |
|
|
235
|
-
| **Native Markdown Serialization** | [`logseq_markdown.py`](src/logseq_matryca_parser/logseq_markdown.py) + [`logseq_paths.py`](src/logseq_matryca_parser/logseq_paths.py): rebuild and write Logseq-compliant markdown
|
|
259
|
+
| **Native Markdown Serialization** | [`logseq_markdown.py`](src/logseq_matryca_parser/logseq_markdown.py) + [`logseq_paths.py`](src/logseq_matryca_parser/logseq_paths.py): rebuild and write Logseq-compliant markdown from an AST—page header preserves **YAML `---` or native `key::`** by source format, block properties at **parent whitespace + 2 spaces** (including bullet-list `tags::`), `:LOGBOOK:` drawers, and namespace titles via **`___`** pathing rules. |
|
|
236
260
|
| **Headless Write Engine** | [`agent_writer.py`](src/logseq_matryca_parser/agent_writer.py): **`append_child_to_node`** splices child bullets into on-disk Markdown from AST topology; **`serialize_logseq_page`** / **`write_logseq_page`** emit full pages; **`matryca-parse agent-write`** resolves aliases via **`.matryca_xray_state.json`**. |
|
|
237
261
|
| **AST Linters** | **`LogseqGraph.get_broken_references()`** returns originating nodes when `block_refs` target UUIDs absent from the global registry. |
|
|
238
262
|
| **Sovereign AI** | 100% Local. Zero telemetry. Private by design. |
|
|
@@ -259,8 +283,8 @@ Marker syntax (`[#A]`, `SCHEDULED: <...>`, `DEADLINE: <...>`) is stripped from `
|
|
|
259
283
|
## 🛠️ Quickstart
|
|
260
284
|
|
|
261
285
|
```bash
|
|
262
|
-
# Install from
|
|
263
|
-
pip install
|
|
286
|
+
# Install from PyPI (latest: v1.2.0)
|
|
287
|
+
pip install logseq-matryca-parser
|
|
264
288
|
|
|
265
289
|
# Optional: filesystem watcher for live incremental graph updates
|
|
266
290
|
pip install 'logseq-matryca-parser[watch]'
|
|
@@ -284,12 +308,15 @@ from logseq_matryca_parser.graph import LogseqGraph
|
|
|
284
308
|
from logseq_matryca_parser.logos_parser import LogosParser
|
|
285
309
|
from logseq_matryca_parser.synapse import SynapseAdapter
|
|
286
310
|
|
|
287
|
-
# Parse a single page to AST
|
|
311
|
+
# Parse a single page to AST (YAML or native frontmatter; utf-8-sig BOM-safe)
|
|
288
312
|
page = LogosParser().parse_page_file("page.md")
|
|
313
|
+
if page.root_nodes[0].assets:
|
|
314
|
+
absolute = page.resolve_asset_path(page.root_nodes[0].assets[0])
|
|
289
315
|
|
|
290
316
|
# Load the whole vault (pages, backlinks, node registry)
|
|
291
317
|
graph = LogseqGraph.load_directory("/path/to/logseq/graph")
|
|
292
|
-
|
|
318
|
+
page_obj = graph.get_page("My Page") # case-insensitive
|
|
319
|
+
effective = graph.get_effective_properties(page_obj.root_nodes[0].uuid)
|
|
293
320
|
|
|
294
321
|
# Export to LangChain with lineage metadata
|
|
295
322
|
docs = SynapseAdapter.to_langchain_documents(page.root_nodes, source_name=page.title)
|
|
@@ -7,11 +7,12 @@
|
|
|
7
7
|
[](https://github.com/MarcoPorcellato/logseq-matryca-parser/actions)
|
|
8
8
|
[](https://www.python.org/downloads/)
|
|
9
9
|
[](https://github.com/MarcoPorcellato/logseq-matryca-parser/blob/main/LICENSE)
|
|
10
|
-
[](https://pypi.org/project/logseq-matryca-parser/)
|
|
11
|
+
[](https://pypi.org/project/logseq-matryca-parser/)
|
|
11
12
|
[](#)
|
|
12
13
|

|
|
13
14
|
|
|
14
|
-
**v1.
|
|
15
|
+
**v1.2.0** — Graph parity, multimodal assets & format-preserving round-trip (see [CHANGELOG](CHANGELOG.md)) — **233 tests**, YAML frontmatter ingest/serialize, asset path resolution, case-insensitive page routing, and extended LOGOS shielding; ready for production Enterprise integration.
|
|
15
16
|
|
|
16
17
|
> *Turning a forest of local plain-text files into a unified semantic powerhouse.*
|
|
17
18
|
|
|
@@ -65,6 +66,8 @@ It acts as the strict **File System Driver** for your LLM OS. By using a determi
|
|
|
65
66
|
| **Property inheritance** | Page-level frontmatter at best | **`get_effective_properties`**: page + ancestor outline keys merged top-down (Org-mode style), then exposed on enriched chunks |
|
|
66
67
|
| **Live sync** | Re-read whole tree or poll | **`LogseqGraph.start_watching()`** (optional `watchdog`): **per-file invalidation** — re-parse one page, purge stale UUIDs from registries, refresh backlinks |
|
|
67
68
|
| **Page aliases & titles** | Filename-only or manual link maps | **`title::`**, **`alias::`** / **`aliases::`** re-key `graph.pages` and wire **backlinks** for alias wikilinks |
|
|
69
|
+
| **Case-insensitive pages** | Exact string match on filenames | **`get_page`** / **`resolve_relative_page_link`** use a lowercase index (Datomic / Logseq parity) |
|
|
70
|
+
| **Attachments & assets** | Opaque `` text in chunks | **`LogseqNode.assets`** + **`LogseqPage.resolve_asset_path`** for graph-root PDFs and images |
|
|
68
71
|
|
|
69
72
|
---
|
|
70
73
|
|
|
@@ -103,37 +106,57 @@ Logseq Matryca Parser is a deterministic **Stack-Machine engine** that acts as t
|
|
|
103
106
|
|
|
104
107
|
---
|
|
105
108
|
|
|
106
|
-
## ⚡ Recent superpowers (v1.
|
|
109
|
+
## ⚡ Recent superpowers (v1.2.0)
|
|
107
110
|
|
|
108
|
-
###
|
|
111
|
+
### Graph parity, assets, and parser hardening
|
|
109
112
|
|
|
110
113
|
| Area | Capability |
|
|
111
114
|
| :--- | :--- |
|
|
112
|
-
| **
|
|
113
|
-
| **
|
|
114
|
-
|
|
|
115
|
-
| **
|
|
116
|
-
| **
|
|
117
|
-
| **Property
|
|
118
|
-
|
|
119
|
-
|
|
120
|
-
|
|
115
|
+
| **Asset extraction** | `LogseqNode.assets` collects markdown images, `{{pdf}}` macros, and local `[label](path)` attachments; `LogseqPage.resolve_asset_path` maps to absolute paths (`%20` decode, graph-root relative). |
|
|
116
|
+
| **YAML frontmatter** | `---` blocks at file start populate `LogseqPage.properties` like native `key::` lines; **`title:`** in YAML sets `page.title` at parse; **`serialize_logseq_page`** preserves `---` fences on round-trip when the source file used YAML. |
|
|
117
|
+
| **`page-tags::`** | Block and page `page-tags::` inject implicit graph tokens like `tags::`; list-shaped values feed `refs`. |
|
|
118
|
+
| **Case-insensitive routing** | `LogseqGraph.get_page` and `resolve_relative_page_link` resolve titles via a lowercase index (Datomic parity). |
|
|
119
|
+
| **Extended shielding** | HTML comments, `{{query}}` / `{{advancedquery}}`, and escaped `\#` / `\[\[` do not emit false graph tokens (embed macros still harvest nested wikilinks). |
|
|
120
|
+
| **Property & temporal fixes** | Comma-split ignores commas inside `[[wikilinks]]`; properties after code fences; quoted value stripping; `SCHEDULED`/`DEADLINE` ranges, repeaters, and Org warning periods; legacy `___` / `%2F` / Dendron filenames; UTF-8 BOM via `utf-8-sig`. |
|
|
121
|
+
|
|
122
|
+
### Round-trip serialization (v1.2.0)
|
|
123
|
+
|
|
124
|
+
| Area | Capability |
|
|
125
|
+
| :--- | :--- |
|
|
126
|
+
| **Soft-break bodies** | Multiline block continuations serialize without double-indenting alignment spaces. |
|
|
127
|
+
| **List-shaped block props** | `tags::` / `page-tags::` with indented `-` bullets round-trip as Logseq lists (not Python repr). |
|
|
128
|
+
| **`:LOGBOOK:` drawers** | Org drawers re-emit as `:LOGBOOK:` / `:END:` blocks, not bogus `logbook::` property lines. |
|
|
129
|
+
| **Derived temporal keys** | Parsed `scheduled::`, `repeater::`, and related derived fields are omitted from serialized `key::` output. |
|
|
130
|
+
| **Stable block UUIDs** | Parse → `serialize_logseq_page` → parse preserves block `id::` / UUIDs on the same outline. |
|
|
121
131
|
|
|
122
132
|
```python
|
|
123
133
|
from logseq_matryca_parser.graph import LogseqGraph
|
|
134
|
+
from logseq_matryca_parser.logos_parser import LogosParser
|
|
124
135
|
|
|
125
136
|
graph = LogseqGraph.load_directory("/path/to/logseq/graph")
|
|
126
137
|
|
|
127
|
-
#
|
|
128
|
-
page = graph.pages["
|
|
138
|
+
# Case-insensitive page lookup
|
|
139
|
+
page = graph.get_page("my page") # same object as graph.pages["My Page"]
|
|
129
140
|
|
|
130
|
-
#
|
|
131
|
-
|
|
132
|
-
|
|
133
|
-
|
|
141
|
+
# Assets on a parsed block (Vision / document pipelines)
|
|
142
|
+
single = LogosParser().parse_page_file("pages/Notes.md")
|
|
143
|
+
block = single.root_nodes[0]
|
|
144
|
+
if block.assets:
|
|
145
|
+
abs_path = single.resolve_asset_path(block.assets[0])
|
|
134
146
|
```
|
|
135
147
|
|
|
136
|
-
Deep dive: [Architecture §3.6 — LogseqGraph](docs/ARCHITECTURE.md#36-logseqgraph--namespace-scoping-o1-invalidation-live-watch)
|
|
148
|
+
Deep dive: [Architecture §3.1 — LOGOS](docs/ARCHITECTURE.md#31-logos--deterministic-stack-machine-parsing) · [§3.6 — LogseqGraph](docs/ARCHITECTURE.md#36-logseqgraph--namespace-scoping-o1-invalidation-live-watch) · [AST primer](docs/logseq_ast_primer.md).
|
|
149
|
+
|
|
150
|
+
### Still included from v1.1.1
|
|
151
|
+
|
|
152
|
+
| Area | Capability |
|
|
153
|
+
| :--- | :--- |
|
|
154
|
+
| **Graph index** | `title::` / `TITLE::` overrides filename titles; `alias::` / `aliases::` inject extra `graph.pages` keys. |
|
|
155
|
+
| **Backlinks** | `[[Dev]]` resolves against alias keys (`get_backlinks("Dev")`). |
|
|
156
|
+
| **Incremental reload** | `invalidate_and_reload_page` re-applies title/alias enrichment after watcher edits. |
|
|
157
|
+
| **Parser shields** | LaTeX, `#+BEGIN_QUERY`, fenced code, drawers; `{{embed [[Page]]}}` harvests nested wikilinks. |
|
|
158
|
+
| **Property contiguity** | `key::` contiguous under bullets; soft-break closes the window (fence exception in v1.2.0). |
|
|
159
|
+
| **Tasks & bullets** | GFM checkboxes, extended Org markers, ordered-list bullets, aliased `((uuid))` clean text. |
|
|
137
160
|
|
|
138
161
|
### Obsidian-native export
|
|
139
162
|
Compile an entire Logseq graph into an **Obsidian vault layout**: YAML frontmatter from page properties, list body preserved, Logseq `((uuid))` links rewritten to **`[[Page#^anchor]]`**, and trailing **`^block-id`** on referenced blocks. Namespace titles become nested folders (e.g. `Projects/AI/Demo.md`).
|
|
@@ -189,14 +212,15 @@ For graph hygiene, **`LogseqGraph.get_broken_references()`** flags nodes whose `
|
|
|
189
212
|
|
|
190
213
|
| Feature | Description |
|
|
191
214
|
| :--- | :--- |
|
|
192
|
-
| **LOGOS Engine** | Deterministic AST parsing.
|
|
193
|
-
| **
|
|
215
|
+
| **LOGOS Engine** | Deterministic AST parsing. YAML + native frontmatter ingest, **format-preserving** `serialize_logseq_page` (YAML vs `key::` by source), list-shaped block property layout, **assets**, property contiguity (incl. post-fence), comma-safe wikilink splits, temporal ranges/repeaters, legacy filename decode, BOM-safe reads, and **shielded** code/math/query/HTML/escape regions. |
|
|
216
|
+
| **Multimodal assets** | **`LogseqNode.assets`** + **`LogseqPage.resolve_asset_path`** for PDFs and images relative to the graph root (Vision / document RAG). |
|
|
217
|
+
| **LogseqGraph** | In-memory vault: `pages` index (with **title/alias enrichment** and **case-insensitive lookup**), backlinks, effective properties, namespace resolution, fluent `GraphQuery`, optional **watchdog** invalidation. |
|
|
194
218
|
| **Advanced Task Extraction** | Task **state** (TODO / DOING / DELEGATED / IN-PROGRESS / …), **priority** markers `[#A]`–`[#C]` promoted to `task_priority`, and **SCHEDULED** / **DEADLINE** Logseq timestamps normalized to **UTC Unix epoch seconds** on `scheduled_at` / `deadline_at` for temporal graph and retrieval pipelines. |
|
|
195
219
|
| **SYNAPSE Adapter** | Native exports for **LangChain** and **LlamaIndex** with automated lineage metadata; **context-enriched** chunks with breadcrumbs, embed expansion, and inherited properties. |
|
|
196
220
|
| **FORGE** | JSON, clean Markdown, and **Obsidian** vault serialization (`ObsidianForgeVisitor`, `ForgeExporter.to_obsidian_markdown`). |
|
|
197
221
|
| **LENS Visualizer** | 60FPS interactive graph rendering (10k+ nodes) with Glassmorphism HUD. |
|
|
198
222
|
| **Agent-Native Printing Press** | [`agent_press.py`](src/logseq_matryca_parser/agent_press.py): **`SessionAliasRegistry`** maps session aliases ↔ block UUIDs; **`to_xray_markdown`** emits token-minimal outline text for autonomous agents (`matryca-parse agent-read`). |
|
|
199
|
-
| **Native Markdown Serialization** | [`logseq_markdown.py`](src/logseq_matryca_parser/logseq_markdown.py) + [`logseq_paths.py`](src/logseq_matryca_parser/logseq_paths.py): rebuild and write Logseq-compliant markdown
|
|
223
|
+
| **Native Markdown Serialization** | [`logseq_markdown.py`](src/logseq_matryca_parser/logseq_markdown.py) + [`logseq_paths.py`](src/logseq_matryca_parser/logseq_paths.py): rebuild and write Logseq-compliant markdown from an AST—page header preserves **YAML `---` or native `key::`** by source format, block properties at **parent whitespace + 2 spaces** (including bullet-list `tags::`), `:LOGBOOK:` drawers, and namespace titles via **`___`** pathing rules. |
|
|
200
224
|
| **Headless Write Engine** | [`agent_writer.py`](src/logseq_matryca_parser/agent_writer.py): **`append_child_to_node`** splices child bullets into on-disk Markdown from AST topology; **`serialize_logseq_page`** / **`write_logseq_page`** emit full pages; **`matryca-parse agent-write`** resolves aliases via **`.matryca_xray_state.json`**. |
|
|
201
225
|
| **AST Linters** | **`LogseqGraph.get_broken_references()`** returns originating nodes when `block_refs` target UUIDs absent from the global registry. |
|
|
202
226
|
| **Sovereign AI** | 100% Local. Zero telemetry. Private by design. |
|
|
@@ -223,8 +247,8 @@ Marker syntax (`[#A]`, `SCHEDULED: <...>`, `DEADLINE: <...>`) is stripped from `
|
|
|
223
247
|
## 🛠️ Quickstart
|
|
224
248
|
|
|
225
249
|
```bash
|
|
226
|
-
# Install from
|
|
227
|
-
pip install
|
|
250
|
+
# Install from PyPI (latest: v1.2.0)
|
|
251
|
+
pip install logseq-matryca-parser
|
|
228
252
|
|
|
229
253
|
# Optional: filesystem watcher for live incremental graph updates
|
|
230
254
|
pip install 'logseq-matryca-parser[watch]'
|
|
@@ -248,12 +272,15 @@ from logseq_matryca_parser.graph import LogseqGraph
|
|
|
248
272
|
from logseq_matryca_parser.logos_parser import LogosParser
|
|
249
273
|
from logseq_matryca_parser.synapse import SynapseAdapter
|
|
250
274
|
|
|
251
|
-
# Parse a single page to AST
|
|
275
|
+
# Parse a single page to AST (YAML or native frontmatter; utf-8-sig BOM-safe)
|
|
252
276
|
page = LogosParser().parse_page_file("page.md")
|
|
277
|
+
if page.root_nodes[0].assets:
|
|
278
|
+
absolute = page.resolve_asset_path(page.root_nodes[0].assets[0])
|
|
253
279
|
|
|
254
280
|
# Load the whole vault (pages, backlinks, node registry)
|
|
255
281
|
graph = LogseqGraph.load_directory("/path/to/logseq/graph")
|
|
256
|
-
|
|
282
|
+
page_obj = graph.get_page("My Page") # case-insensitive
|
|
283
|
+
effective = graph.get_effective_properties(page_obj.root_nodes[0].uuid)
|
|
257
284
|
|
|
258
285
|
# Export to LangChain with lineage metadata
|
|
259
286
|
docs = SynapseAdapter.to_langchain_documents(page.root_nodes, source_name=page.title)
|
|
@@ -193,12 +193,26 @@ Auxiliary **FORGE** serialization (JSON / flat Markdown / Obsidian) appears as a
|
|
|
193
193
|
|
|
194
194
|
- **Org-mode task prefixes (extended).** After checkbox handling, **`_extract_task_status`** matches longest-first Org prefixes (`TODO`, `DOING`, `DELEGATED`, `IN-PROGRESS`, …) at the start of the first line and promotes the remainder to **`clean_text`**.
|
|
195
195
|
|
|
196
|
-
- **Protected regions (entity extraction dead zones).** Wikilink, tag, and block-reference harvesters run on **`_shield_inline_code`**-masked text so literals inside **fenced code** (backtick and tilde fences), **inline code**, **LaTeX** (`$…$` and `$$…$$`), **`#+BEGIN_QUERY` … `#+END_QUERY`** blocks (parse-loop state plus shielding), and **Org drawers** do not produce false graph tokens. **`{{embed [[Page]]}}`** and similar macros are **not** fully opaque: nested wikilinks inside embed bodies are harvested for graph indexing.
|
|
196
|
+
- **Protected regions (entity extraction dead zones).** Wikilink, tag, and block-reference harvesters run on **`_shield_inline_code`**-masked text so literals inside **fenced code** (backtick and tilde fences), **inline code**, **LaTeX** (`$…$` and `$$…$$`), **`#+BEGIN_QUERY` … `#+END_QUERY`** blocks (parse-loop state plus shielding), **HTML comments** (`<!-- … -->`), **escaped Markdown** (`\#`, `\[\[`), and **Org drawers** do not produce false graph tokens. **`{{query}}`** and **`{{advancedquery}}`** inline macros are fully shielded (no nested wikilink harvest). **`{{embed [[Page]]}}`** and similar embed macros are **not** fully opaque: nested wikilinks inside embed bodies are still harvested for graph indexing.
|
|
197
197
|
|
|
198
|
-
- **Block properties & `id::`.** Subsequent lines matching `key:: value` attach to **`current_node`** only while **`properties_allowed`** remains true (contiguous property window immediately under the bullet). A **soft-break** continuation disables further property extraction; later `key::` lines merge into **`content`** as plain text. Keys are normalized with **`_normalize_property_key`** (lowercase) for Datomic parity. An empty value (`alias::` with no inline text) opens a **pending bullet-list** accumulator: indented `-` / `*` lines deeper than the property line become **`list[str]`** values without creating child **`LogseqNode`** entries. Page frontmatter uses the same key normalization (`TITLE::` ≡ `title::`). Parsed properties live in **`LogseqNode.properties`**. Native **`id::`** values are preserved in **`source_uuid`** (and in **`properties["id"]`** when applicable) so **`((uuid))`** references match Logseq; the parser’s stable **`uuid`** field remains the synthetic identity used for AST wiring and adapters.
|
|
198
|
+
- **Block properties & `id::`.** Subsequent lines matching `key:: value` attach to **`current_node`** only while **`properties_allowed`** remains true (contiguous property window immediately under the bullet). A **soft-break** continuation disables further property extraction; later `key::` lines merge into **`content`** as plain text — **except** the Logseq contiguity exception: a `key::` line immediately after a closing code fence is still parsed as block metadata when contiguous under the bullet. Outer **`"`** / **`'`** on property values are stripped in the AST. Keys are normalized with **`_normalize_property_key`** (lowercase) for Datomic parity. Comma-separated **`tags::`**, **`alias::`**, **`aliases::`**, and **`page-tags::`** values split on commas but **ignore commas inside `[[wikilink]]` tokens**. An empty value (`alias::` with no inline text) opens a **pending bullet-list** accumulator: indented `-` / `*` lines deeper than the property line become **`list[str]`** values without creating child **`LogseqNode`** entries; list-shaped values also feed implicit graph tokens on **`LogseqNode.refs`**. Page frontmatter uses the same key normalization (`TITLE::` ≡ `title::`). Parsed properties live in **`LogseqNode.properties`**. Native **`id::`** values are preserved in **`source_uuid`** (and in **`properties["id"]`** when applicable) so **`((uuid))`** references match Logseq; the parser’s stable **`uuid`** field remains the synthetic identity used for AST wiring and adapters.
|
|
199
|
+
|
|
200
|
+
- **Structural edge cases.** Bare `-` or `*` lines (no text after the marker) parse as **empty blocks** instead of failing the bullet detector. **`[[Page#Section]]`** wikilinks contribute **`Page`** only (header anchor stripped) for graph routing. **`[Alias]([[Page]])`** hybrid alias links are wikilinks, not file **assets**.
|
|
199
201
|
|
|
200
202
|
- **Aliased block references in `clean_text`.** Markdown links of the form **`[Visible](((uuid)))`** are reduced to **`Visible`** in **`clean_text`** (brackets stripped) while UUIDs still populate **`block_refs`** for graph resolution.
|
|
201
203
|
|
|
204
|
+
#### Asset extraction and path resolution
|
|
205
|
+
|
|
206
|
+
During node build, **`_extract_assets`** scans block **`content`** for multimodal references:
|
|
207
|
+
|
|
208
|
+
| Source | Example | `LogseqNode.assets` entry |
|
|
209
|
+
| ------ | ------- | --------------------------- |
|
|
210
|
+
| Markdown image | `` | `../assets/scan.png` |
|
|
211
|
+
| PDF macro | `{{pdf mydoc.pdf}}` | `mydoc.pdf` |
|
|
212
|
+
| Local attachment link | `[spec](../assets/specs.pdf)` | `../assets/specs.pdf` (not hybrid `[[wikilink]]` links) |
|
|
213
|
+
|
|
214
|
+
**`LogseqPage.resolve_asset_path(asset_link)`** ([`logos_core.py`](../src/logseq_matryca_parser/logos_core.py)) decodes percent-encoded paths (`%20`), normalizes `../assets/` and `assets/` against the inferred **graph root**, and falls back to **`graph_root/assets/<filename>`** when needed — the contract Vision and document-ingestion pipelines use to map AST tokens to absolute filesystem paths.
|
|
215
|
+
|
|
202
216
|
#### Sovereign UUID architecture and zero-corruption guarantee
|
|
203
217
|
|
|
204
218
|
One of the most critical aspects of parsing a Logseq graph for AI is maintaining the integrity of block references (`((uuid))`) without causing infinite loops or polluting a vector database with duplicates.
|
|
@@ -213,7 +227,7 @@ The Logos protocol uses a **dual-track identity system** so vanilla Logseq compa
|
|
|
213
227
|
|
|
214
228
|
- **Task priority (`PRIORITY_PATTERN`).** Priority tags match `\[#([A-Z])\]` (Logseq’s A/B/C style). On the **first line** of a block, a match sets **`LogseqNode.task_priority`** to the captured letter and **`PRIORITY_PATTERN.sub("", …)`** removes the marker from **`clean_text`** so priority is a **typed attribute**, not redundant surface noise in retrieval text.
|
|
215
229
|
|
|
216
|
-
- **Temporal markers (`TIME_PATTERN`) → epoch fields.** Lines matching `\b(SCHEDULED|DEADLINE):\s*(<[^>]+>)` are parsed by **`_extract_time_properties`**: the `<…>` payload is interpreted through **`_parse_logseq_datetime`** (multiple Logseq date formats), then normalized to **UTC Unix epoch seconds** and assigned to **`LogseqNode.scheduled_at`** and **`LogseqNode.deadline_at`** respectively. Auxiliary keys (`scheduled_iso`, `deadline_journal_day`, repeaters, etc.) may still land in **`properties`** for human/debug parity, but the **integer epoch pair on the node** is the stable contract for **temporal graph edges**, range filters, and **GraphRAG** planners—without forcing downstream graph databases to re-scan Markdown.
|
|
230
|
+
- **Temporal markers (`TIME_PATTERN`) → epoch fields.** Lines matching `\b(SCHEDULED|DEADLINE):\s*(<[^>]+>)` are parsed by **`_extract_time_properties`**: the `<…>` payload is interpreted through **`_parse_logseq_datetime`** (multiple Logseq date formats, **time ranges** `HH:MM - HH:MM` using the start time, **repeater** tokens such as `.+1w` / `++1d` stripped before parsing, and **Org warning periods** like `-3d` handled without datetime failures), then normalized to **UTC Unix epoch seconds** and assigned to **`LogseqNode.scheduled_at`** and **`LogseqNode.deadline_at`** respectively. Auxiliary keys (`scheduled_iso`, `deadline_journal_day`, repeaters, etc.) may still land in **`properties`** for human/debug parity, but the **integer epoch pair on the node** is the stable contract for **temporal graph edges**, range filters, and **GraphRAG** planners—without forcing downstream graph databases to re-scan Markdown.
|
|
217
231
|
|
|
218
232
|
#### Node anatomy — raw Markdown to temporal `LogseqNode`
|
|
219
233
|
|
|
@@ -293,6 +307,14 @@ After every bulk or incremental parse, the graph applies a **post-parse enrichme
|
|
|
293
307
|
3. **Alias injection.** For each canonical dict entry where **`dict_key == page.title`**, values from **`alias::`** and **`aliases::`** are normalized (comma-separated strings or Python lists; `[[Page]]` / `#tag` adornments stripped using the same rules as [`logseq_markdown.py`](../src/logseq_matryca_parser/logseq_markdown.py)) and registered as **additional keys** pointing at the **same `LogseqPage` instance** — e.g. `pages["Dev"]` and `pages["Development"]` share identity.
|
|
294
308
|
4. **Backlinks.** **`_build_backlink_registry`** walks **unique pages** (`id(page)` deduplication) so alias keys do not double-count outgoing links. Incoming wikilinks such as **`[[Dev]]`** normalize to lowercase registry keys and resolve through **`get_backlinks("Dev")`** like any other page title.
|
|
295
309
|
|
|
310
|
+
#### Case-insensitive page routing
|
|
311
|
+
|
|
312
|
+
**`LogseqGraph`** maintains **`_lower_title_map`**: canonical display titles keyed by lowercase form. **`get_page(title)`** returns a direct **`pages[title]`** hit when present, otherwise resolves via the lowercase index (Datomic / Logseq parity). **`resolve_relative_page_link`** uses the same lookup when testing namespace candidates. **`get_backlinks`** normalizes page-title targets case-insensitively.
|
|
313
|
+
|
|
314
|
+
#### Implicit graph tokens (`refs`)
|
|
315
|
+
|
|
316
|
+
**`LogseqPage.refs`** merges wikilinks and tags harvested from page-level properties (native `key::` frontmatter and **YAML `---` blocks** at parse time). **`page-tags::`** is treated like **`tags::`** for implicit injection. Block-level list-shaped **`tags::`** / **`page-tags::`** values also contribute tokens to **`LogseqNode.refs`** so list bullets and comma-separated strings stay aligned with Logseq’s graph indexing semantics.
|
|
317
|
+
|
|
296
318
|
**Incremental parity:** **`invalidate_and_reload_page`** drops **all** `pages` keys tied to the file’s `source_path` (not only the first alias hit), merges the freshly parsed page, re-runs **`_enrich_pages_index`**, then re-registers nodes and appends backlinks for the enriched instance. **`_page_title_for_source_path`** returns the canonical **`page.title`**, not an arbitrary alias key.
|
|
297
319
|
|
|
298
320
|
```python
|
|
@@ -388,23 +410,37 @@ This complements §3.4 **AGENT WRITER** (weekly append + headless splice) and §
|
|
|
388
410
|
|
|
389
411
|
Wave 12 established **surgical writes** (single-line splices); v1.0 completes the loop with **full page round-tripping**. [`logseq_markdown.py`](../src/logseq_matryca_parser/logseq_markdown.py) is the native serializer that projects a parsed **`LogseqPage`** back onto sovereign Spatial Markdown — the inverse of LOGOS ingestion.
|
|
390
412
|
|
|
413
|
+
#### Ingestion vs native serialization
|
|
414
|
+
|
|
415
|
+
**Read path:** **`StackMachineParser.parse_page_file`** reads files with **`encoding="utf-8-sig"`** so a UTF-8 BOM from Windows sync tools does not break the first bullet. Page metadata at file start may be either:
|
|
416
|
+
|
|
417
|
+
1. **Native Logseq frontmatter** — raw `key:: value` lines (no leading `- `), blank line, then bullets.
|
|
418
|
+
2. **YAML frontmatter** — `---` delimited block with `key: value` lines mapped into **`LogseqPage.properties`** (same lowercase key normalization).
|
|
419
|
+
|
|
420
|
+
**Write path:** [`serialize_logseq_page`](../src/logseq_matryca_parser/logseq_markdown.py) is **format-preserving** for page headers. If `page.raw_content` began with `---`, page properties re-emit as a YAML fence via [`_format_yaml_frontmatter`](../src/logseq_matryca_parser/logseq_markdown.py); otherwise [`format_logseq_page_properties`](../src/logseq_matryca_parser/logseq_markdown.py) writes native **`key:: value`** lines. At parse time, a non-empty **`title`** property (YAML or `title::`) sets **`LogseqPage.title`**; [`LogseqGraph`](../src/logseq_matryca_parser/graph.py) re-applies **`title::`** overrides when loading a vault directory.
|
|
421
|
+
|
|
422
|
+
**Obsidian FORGE** (§3.5) may emit YAML frontmatter on export even for pages that were natively `key::` on disk — that projection is separate from sovereign round-trip writes.
|
|
423
|
+
|
|
391
424
|
#### Page properties (file header)
|
|
392
425
|
|
|
393
|
-
|
|
426
|
+
- **Native:** [`format_logseq_page_properties`](../src/logseq_matryca_parser/logseq_markdown.py) renders `key:: value` lines (list-valued keys such as `tags::` flatten to comma-separated tokens), then a blank line before the first bullet.
|
|
427
|
+
- **YAML:** [`_format_yaml_frontmatter`](../src/logseq_matryca_parser/logseq_markdown.py) renders `key: value` lines inside `---` fences when the ingested file used YAML.
|
|
394
428
|
|
|
395
429
|
#### Block properties (strict indentation contract)
|
|
396
430
|
|
|
397
|
-
Block-scoped properties are serialized **immediately after the bullet text line
|
|
431
|
+
Block-scoped properties are serialized **immediately after the bullet text line** (and after `:LOGBOOK:` drawers when present), never interleaved with child bullets. The indent rule is strict and deterministic:
|
|
398
432
|
|
|
399
433
|
```text
|
|
400
434
|
{parent_leading_whitespace} {key}:: {value}
|
|
401
435
|
```
|
|
402
436
|
|
|
403
|
-
That is, take the **exact leading whitespace** of the parent bullet line and append **exactly two additional spaces** (`_block_property_indent`). Continuation lines of multiline block bodies use the same `parent + 2` column
|
|
437
|
+
That is, take the **exact leading whitespace** of the parent bullet line and append **exactly two additional spaces** (`_block_property_indent`). Continuation lines of multiline block bodies use the same `parent + 2` column; [`_serialize_logseq_node_lines`](../src/logseq_matryca_parser/logseq_markdown.py) strips redundant alignment prefix on soft-break lines so continuations do not double-indent on round-trip.
|
|
438
|
+
|
|
439
|
+
[`format_logseq_block_property_lines`](../src/logseq_matryca_parser/logseq_markdown.py) respects **`properties_order`** when present. **List-shaped** keys (`tags`, `alias`, `aliases`, `page-tags`) with list values emit `key::` plus indented `-` item lines via [`_format_block_property_list_lines`](../src/logseq_matryca_parser/logseq_markdown.py). **`:LOGBOOK:`** metadata re-emits as Org drawers via [`_format_logbook_drawer_lines`](../src/logseq_matryca_parser/logseq_markdown.py), not as `logbook::` property lines. Keys in **`_DERIVED_BLOCK_PROPERTY_KEYS`** (`scheduled`, `repeater`, `logbook`, etc.) are parsed into AST fields or drawers and are **not** written back as bogus `key::` lines.
|
|
404
440
|
|
|
405
441
|
#### Full-page emission
|
|
406
442
|
|
|
407
|
-
[`serialize_logseq_page`](../src/logseq_matryca_parser/logseq_markdown.py) walks `page.root_nodes` depth-first, emitting `- {first_line}` bullets scaled by `indent_level × tab_size`, then
|
|
443
|
+
[`serialize_logseq_page`](../src/logseq_matryca_parser/logseq_markdown.py) walks `page.root_nodes` depth-first, emitting `- {first_line}` bullets scaled by `indent_level × tab_size`, then continuations, then drawers/properties, then children. [`write_logseq_page`](../src/logseq_matryca_parser/logseq_markdown.py) persists the result with UTF-8 encoding. Together with §3.4’s **`append_child_to_node`**, the stack now supports **point mutations** and **whole-page regeneration** from the same AST — bidirectional I/O without Logseq’s HTTP API.
|
|
408
444
|
|
|
409
445
|
### 3.9 Namespace & Path Translation
|
|
410
446
|
|
|
@@ -412,12 +448,13 @@ Semantic page titles and OS filesystem paths speak different dialects. [`logseq_
|
|
|
412
448
|
|
|
413
449
|
#### Title ↔ filename mapping
|
|
414
450
|
|
|
415
|
-
Logseq namespaces use **`/`** in titles (e.g. `Projects/AI`). On disk, each segment is flattened into a single filename stem with the **`___`** separator and percent-encoding for reserved characters — e.g. `Projects/AI` → `Projects___AI.md`. The inverse helpers **`filename_to_page_title`** and **`derive_page_title_from_source_path`** reconstruct semantic titles from `pages/` or `journals/` paths, including nested directory layouts when namespace segments are stored as folders.
|
|
451
|
+
Logseq namespaces use **`/`** in titles (e.g. `Projects/AI`). On disk, each segment is flattened into a single filename stem with the **`___`** separator (modern Logseq) and percent-encoding for reserved characters — e.g. `Projects/AI` → `Projects___AI.md`. The inverse helpers **`filename_to_page_title`** and **`derive_page_title_from_source_path`** reconstruct semantic titles from `pages/` or `journals/` paths, including nested directory layouts when namespace segments are stored as folders. **`filename_to_page_title`** also decodes **legacy** encodings: URL-encoded **`%2F`** namespace separators and Dendron-style **`.` → `/`** segment splits when reconstructing titles from flat stems.
|
|
416
452
|
|
|
417
453
|
| Direction | Function | Example |
|
|
418
454
|
| --------- | -------- | ------- |
|
|
419
455
|
| Title → stem | `page_title_to_filename` | `Projects/AI` → `Projects___AI` |
|
|
420
456
|
| Stem → title | `filename_to_page_title` | `Projects___AI` → `Projects/AI` |
|
|
457
|
+
| Legacy stem | `filename_to_page_title` | `Work%2FTasks` → `Work/Tasks`; Dendron `a.b.c` → `a/b/c` |
|
|
421
458
|
| Title → relative path | `page_title_to_relative_path` | `pages/Projects___AI.md` |
|
|
422
459
|
|
|
423
460
|
#### Graph discovery filters
|
|
@@ -1,6 +1,11 @@
|
|
|
1
1
|
# Release process
|
|
2
2
|
|
|
3
|
-
**Logseq Matryca Parser** (The Logos Protocol · Marco Porcellato · [Matryca.ai](https://matryca.ai)) uses a **curated** [`CHANGELOG.md`](../CHANGELOG.md) (Keep a Changelog).
|
|
3
|
+
**Logseq Matryca Parser** (The Logos Protocol · Marco Porcellato · [Matryca.ai](https://matryca.ai)) uses a **curated** [`CHANGELOG.md`](../CHANGELOG.md) (Keep a Changelog). Pushing a `v*` git tag triggers **two** workflows:
|
|
4
|
+
|
|
5
|
+
| Workflow | Result |
|
|
6
|
+
|----------|--------|
|
|
7
|
+
| [`.github/workflows/pypi_publish.yml`](../.github/workflows/pypi_publish.yml) | Builds and publishes the package to **PyPI** (OIDC). |
|
|
8
|
+
| [`.github/workflows/github_release.yml`](../.github/workflows/github_release.yml) | Creates a **GitHub Release** with notes extracted from `CHANGELOG.md`. |
|
|
4
9
|
|
|
5
10
|
---
|
|
6
11
|
|
|
@@ -43,10 +48,21 @@ git push origin vX.Y.Z
|
|
|
43
48
|
|
|
44
49
|
### 4. CI does the rest
|
|
45
50
|
|
|
46
|
-
On tag push
|
|
51
|
+
On tag push:
|
|
52
|
+
|
|
53
|
+
1. **PyPI** — builds sdist/wheel and publishes (trusted publishing).
|
|
54
|
+
2. **GitHub Release** — publishes release notes from `scripts/extract_changelog.py`.
|
|
55
|
+
|
|
56
|
+
Verify both under **Actions** on GitHub (`Publish to PyPI` and `GitHub Release`).
|
|
57
|
+
|
|
58
|
+
#### Retroactive release (tag already pushed)
|
|
59
|
+
|
|
60
|
+
If the tag exists but no GitHub Release was created (e.g. before `github_release.yml` existed):
|
|
61
|
+
|
|
62
|
+
1. Open **Actions → GitHub Release → Run workflow**
|
|
63
|
+
2. Enter the tag (e.g. `v1.1.1`) and run.
|
|
47
64
|
|
|
48
|
-
|
|
49
|
-
2. Publishes to PyPI (trusted publishing)
|
|
65
|
+
PyPI cannot be re-published for the same version; use a patch bump if the wheel upload failed.
|
|
50
66
|
|
|
51
67
|
---
|
|
52
68
|
|
|
@@ -54,6 +70,7 @@ On tag push, [`.github/workflows/pypi_publish.yml`](../.github/workflows/pypi_pu
|
|
|
54
70
|
|
|
55
71
|
| Problem | Fix |
|
|
56
72
|
|---------|-----|
|
|
73
|
+
| Tag on GitHub but no **Release** page | Run **GitHub Release** workflow manually (`workflow_dispatch`) with that tag. |
|
|
57
74
|
| PyPI version already exists | Bump patch version; never re-use a published version. |
|
|
58
75
|
| Notes look wrong | Re-run locally: `python scripts/extract_changelog.py vX.Y.Z` and compare to `CHANGELOG.md`. |
|
|
59
76
|
| CI fails on tests | Run `make all` locally before tagging. |
|
|
@@ -1,4 +1,6 @@
|
|
|
1
1
|
> **Note for Contributors:** This document is part of the original *Document-Driven Development* phase. It was generated as a structural blueprint during the initial scaffolding of the Logos Protocol. It is preserved here for historical context and to explain the initial design intent. For the current, active architecture, please refer to the main `ARCHITECTURE.md` at the root of the `/docs` folder.
|
|
2
|
+
>
|
|
3
|
+
> **Implementation (v1.2.0+):** Runtime asset harvesting and resolution live in `LogseqNode.assets`, `LogseqPage.resolve_asset_path`, and `_extract_assets` in `logos_parser.py` — see [Architecture §3.1 — Asset extraction](../ARCHITECTURE.md#asset-extraction-and-path-resolution).
|
|
2
4
|
|
|
3
5
|
# **LOGSEQ\_ASSET\_RESOLUTION\_SPEC.md**
|
|
4
6
|
|