quackspace 0.1.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 Marco Cassar
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
@@ -0,0 +1,188 @@
1
+ Metadata-Version: 2.4
2
+ Name: quackspace
3
+ Version: 0.1.0
4
+ Summary: A DuckDB-backed knowledge layer over your local work that helps LLMs navigate everything
5
+ Keywords: llm,mcp,rag,knowledge-base,search,duckdb,obsidian,markdown,notes,retrieval,agent
6
+ Author: Marco Cassar
7
+ Author-email: Marco Cassar <marcocassar@mdcusa.net>
8
+ License-Expression: MIT
9
+ License-File: LICENSE
10
+ Classifier: Development Status :: 3 - Alpha
11
+ Classifier: Intended Audience :: Developers
12
+ Classifier: Programming Language :: Python :: 3
13
+ Classifier: Programming Language :: Python :: 3.12
14
+ Classifier: Programming Language :: Python :: 3.13
15
+ Classifier: Operating System :: POSIX
16
+ Classifier: Operating System :: MacOS
17
+ Classifier: Topic :: Software Development :: Libraries
18
+ Classifier: Topic :: Text Processing :: Indexing
19
+ Classifier: Environment :: Console
20
+ Requires-Dist: duckdb>=1.5.3
21
+ Requires-Dist: mcp[cli]>=1.27.2
22
+ Requires-Dist: python-frontmatter>=1.3.0
23
+ Requires-Dist: pyyaml>=6.0.3
24
+ Requires-Python: >=3.12
25
+ Project-URL: Homepage, https://github.com/Ocramaru/quackspace
26
+ Project-URL: Repository, https://github.com/Ocramaru/quackspace
27
+ Project-URL: Issues, https://github.com/Ocramaru/quackspace/issues
28
+ Description-Content-Type: text/markdown
29
+
30
+ # quack
31
+
32
+ A DuckDB-backed knowledge layer over your local work — any files: notes, docs,
33
+ code, configs, assets — that helps LLMs navigate everything. It generates a
34
+ cheap, precise meta layer from your files; you author metadata in one editable
35
+ place (`.index.yaml`), and everything else is derived. Plays well with Obsidian
36
+ but does not require it.
37
+
38
+ PyPI package: `quackspace` · command: `quack`
39
+
40
+ ## Install
41
+
42
+ ```bash
43
+ curl -fsSL https://raw.githubusercontent.com/Ocramaru/quackspace/main/install.sh | bash
44
+ ```
45
+
46
+ This installs `uv` (if needed) and the `quack` CLI globally. Then create a space
47
+ anywhere:
48
+
49
+ ```bash
50
+ quack init my-workspace # make & scaffold the folder (or `quack init` to use the current one)
51
+ cd my-workspace
52
+ quack mcp install # connect an LLM (Claude Code, Kiro, …) over MCP
53
+ ```
54
+
55
+ Prefer Python packaging? `uv tool install quackspace` (or `pipx install
56
+ quackspace`) does the same as the one-liner's second step.
57
+
58
+ ### Releasing
59
+
60
+ Releases publish to PyPI via GitHub Actions + Trusted Publishing (no stored
61
+ token). One-time: add a PyPI *pending publisher* (project `quackspace`, owner
62
+ `Ocramaru`, repo `quackspace`, workflow `publish.yml`, environment `pypi`) and a
63
+ GitHub environment named `pypi`. Then bump `version` in `pyproject.toml`, tag,
64
+ and publish a GitHub Release — the workflow builds and uploads. Manual fallback:
65
+ `uv build && uv publish --token <pypi-token>`.
66
+
67
+ ## Architecture
68
+
69
+ ```
70
+ <your-root>/ ← any directory; quack finds it by the .quack/ marker
71
+ ├── .quack/ ← the toolkit (and the root marker)
72
+ │ ├── GUIDE.md hand-written: how an AI should search the tree
73
+ │ ├── map.yaml GENERATED: folder tree + folder descriptions
74
+ │ ├── quack.duckdb GENERATED: catalog (files, tags, links, FTS) — the queryable store
75
+ │ ├── diagram.md GENERATED: whole-graph Mermaid link diagram
76
+ │ ├── config.yaml your AI assistant choice
77
+ │ └── src/quack/ the CLI + library
78
+ ├── .quackignore optional: extra ignore patterns
79
+ ├── QUACK.md visible navigation anchor for LLMs
80
+ ├── src/ docs/ notes/ … ANY files: code, configs, markdown, assets
81
+ │ ├── .index.yaml EDITABLE: per-file description + tags (links derived)
82
+ │ └── _diagrams.md GENERATED: this folder's Mermaid link graph
83
+ └── …
84
+ ```
85
+
86
+ **One rule:** the only thing you edit is each folder's `.index.yaml`
87
+ (description + tags per file; Markdown may also use frontmatter). `quack reindex`
88
+ MERGES it — preserving your text — and regenerates every map, catalog, and
89
+ diagram from the files + `[[wikilinks]]`, so the navigation layer can never
90
+ drift. The root can be named anything; quack locates it by walking up for
91
+ `.quack/` (like git finds `.git`).
92
+
93
+ ## The catalog (DuckDB)
94
+
95
+ `quack reindex` builds `.quack/quack.duckdb`, a single embedded catalog of all
96
+ metadata: `files` (name, rel, folder, ext, description, tags_csv, n_links,
97
+ n_inbound, is_orphan, is_binary, file_modified, described_at, stale, body),
98
+ `tags(name, tag)`, and `links(src, dst, dst_exists)`, plus a BM25 full-text
99
+ index over name/description/body. (`stale` is true when a file changed after its
100
+ description was written — see `quack generate --stale`.)
101
+
102
+ It is the queryable store and **the graph lives here too**: the `links` table
103
+ is the edge list, and multi-hop traversal is a recursive CTE behind
104
+ `quack search` and `quack graph`. There is no separate graph.json: a query
105
+ pulls only the relevant slice into context instead of loading the whole graph,
106
+ which matters as the tree grows. The catalog is derived, never authoritative;
107
+ delete it and `quack reindex` rebuilds it from the files.
108
+
109
+ ```bash
110
+ quack sql "SELECT folder, count(*) FROM files GROUP BY folder"
111
+ quack sql "SELECT rel FROM files WHERE stale" # descriptions to refresh
112
+ quack sql "SELECT src, dst FROM links WHERE NOT dst_exists" # broken links
113
+ quack search "regex" --fts # BM25 ranking
114
+ quack graph path file-a file-b # shortest link path
115
+ ```
116
+
117
+ ## The `quack` command
118
+
119
+ Once installed (see [Install](#install)), `quack` is on your PATH and finds the
120
+ root from any directory inside it by walking up for `.quack/` — so commands work
121
+ no matter where you invoke them. (Developing on a checkout instead? `uv run
122
+ quack` inside `.quack/` is the equivalent.)
123
+
124
+ ```bash
125
+ quack init [dir] # create & scaffold a new space (dir, or the current folder)
126
+ quack reindex # regenerate everything (indexes, map, catalog, diagrams)
127
+ quack reindex --no-diagrams
128
+ quack diagram # diagrams only
129
+ quack doctor # check files + MCP registration
130
+ quack doctor --files # only files; --mcp only MCP; --strict to fail on issues
131
+ quack new "Title" -f projects -d "one-line description" -t tag,tag # new markdown note
132
+ quack describe PATH -d "…" -t tag,tag # record a description + tags for any file
133
+ quack setup # choose the AI assistant (arrow-key menu)
134
+ quack generate # AI: write description + tags for files missing one
135
+ quack generate --stale # also refresh descriptions whose file changed since
136
+ quack search "terms" # auto-hybrid: structural + FTS + semantic + graph
137
+ quack search "terms" --fts # force DuckDB BM25 full-text ranking
138
+ quack search "terms" --semantic # force vss semantic ranking
139
+ quack embed # build semantic embeddings (DuckDB vss)
140
+ quack graph path|central|clusters # graph queries
141
+ quack sql "SELECT ..." # query the catalog directly
142
+ quack mcp install # register the MCP server with clients
143
+ quack where # show root / toolkit / command paths
144
+ ```
145
+
146
+ Root resolution: `--root` > walk up for `.quack/` > `$QUACK_ROOT` >
147
+ `$OBSIDIAN_VAULT` > the package location.
148
+
149
+ ## LLM access (MCP)
150
+
151
+ `quack mcp install` writes a project-root `.mcp.json` (the auto-discover
152
+ convention Claude Code and others pick up) and offers to register with installed
153
+ client CLIs (kiro-cli, claude). The server exposes typed tools, `map`, `search`,
154
+ `get_file`, `sql`, `graph_path`, `central`, `clusters` (read), plus `describe`
155
+ and `reindex` (write), each returning `root` so the LLM can join `root` +
156
+ relative path. `QUACK.md` at the root is a visible anchor telling any LLM how to
157
+ navigate even without MCP.
158
+
159
+ **Seeding quack on a repo an agent already knows.** Point the MCP server at a
160
+ codebase and ask the assistant to annotate it: for each relevant file it calls
161
+ `describe(path, description, tags)` (writing into `.index.yaml` — the file
162
+ itself is untouched), then `reindex()` once. No per-file model shell-out; the
163
+ agent records what it already understands, and the catalog becomes searchable.
164
+
165
+ ## AI is optional
166
+
167
+ The assistant is used for one thing: writing short descriptions + tags for your
168
+ files (`quack generate`) so the search index is rich. quack works fully without
169
+ it — you just author descriptions yourself by editing each folder's
170
+ `.index.yaml`.
171
+
172
+ - `quack setup` shows an arrow-key menu of assistants (kiro-cli, claude, a
173
+ custom command, or "use without AI"), probes which are installed, and writes
174
+ the choice to `.quack/config.yaml`. `quack init` is an alias that runs it.
175
+ - `quack generate` fills in missing descriptions using that command. If none is
176
+ set up, it explains what the AI is for and offers to run setup.
177
+ - Set `ai.skip: true` in `config.yaml` to use quack without AI permanently;
178
+ `generate` then stops offering to set one up.
179
+ - Swap assistants anytime by editing `ai.command` (use `{prompt}` for the
180
+ prompt, or omit it to pipe on stdin) or re-running `quack setup`.
181
+
182
+ ## Keeping it in sync
183
+
184
+ Run `quack reindex` after structural changes. To automate, wire it to one of:
185
+ - Obsidian **Shell Commands** plugin (run on save),
186
+ - a git **pre-commit** hook (`quack doctor --strict --files && quack reindex`),
187
+ - a file-watcher,
188
+ - `quack kiro install` (writes a Kiro reindex-on-save hook).
@@ -0,0 +1,159 @@
1
+ # quack
2
+
3
+ A DuckDB-backed knowledge layer over your local work — any files: notes, docs,
4
+ code, configs, assets — that helps LLMs navigate everything. It generates a
5
+ cheap, precise meta layer from your files; you author metadata in one editable
6
+ place (`.index.yaml`), and everything else is derived. Plays well with Obsidian
7
+ but does not require it.
8
+
9
+ PyPI package: `quackspace` · command: `quack`
10
+
11
+ ## Install
12
+
13
+ ```bash
14
+ curl -fsSL https://raw.githubusercontent.com/Ocramaru/quackspace/main/install.sh | bash
15
+ ```
16
+
17
+ This installs `uv` (if needed) and the `quack` CLI globally. Then create a space
18
+ anywhere:
19
+
20
+ ```bash
21
+ quack init my-workspace # make & scaffold the folder (or `quack init` to use the current one)
22
+ cd my-workspace
23
+ quack mcp install # connect an LLM (Claude Code, Kiro, …) over MCP
24
+ ```
25
+
26
+ Prefer Python packaging? `uv tool install quackspace` (or `pipx install
27
+ quackspace`) does the same as the one-liner's second step.
28
+
29
+ ### Releasing
30
+
31
+ Releases publish to PyPI via GitHub Actions + Trusted Publishing (no stored
32
+ token). One-time: add a PyPI *pending publisher* (project `quackspace`, owner
33
+ `Ocramaru`, repo `quackspace`, workflow `publish.yml`, environment `pypi`) and a
34
+ GitHub environment named `pypi`. Then bump `version` in `pyproject.toml`, tag,
35
+ and publish a GitHub Release — the workflow builds and uploads. Manual fallback:
36
+ `uv build && uv publish --token <pypi-token>`.
37
+
38
+ ## Architecture
39
+
40
+ ```
41
+ <your-root>/ ← any directory; quack finds it by the .quack/ marker
42
+ ├── .quack/ ← the toolkit (and the root marker)
43
+ │ ├── GUIDE.md hand-written: how an AI should search the tree
44
+ │ ├── map.yaml GENERATED: folder tree + folder descriptions
45
+ │ ├── quack.duckdb GENERATED: catalog (files, tags, links, FTS) — the queryable store
46
+ │ ├── diagram.md GENERATED: whole-graph Mermaid link diagram
47
+ │ ├── config.yaml your AI assistant choice
48
+ │ └── src/quack/ the CLI + library
49
+ ├── .quackignore optional: extra ignore patterns
50
+ ├── QUACK.md visible navigation anchor for LLMs
51
+ ├── src/ docs/ notes/ … ANY files: code, configs, markdown, assets
52
+ │ ├── .index.yaml EDITABLE: per-file description + tags (links derived)
53
+ │ └── _diagrams.md GENERATED: this folder's Mermaid link graph
54
+ └── …
55
+ ```
56
+
57
+ **One rule:** the only thing you edit is each folder's `.index.yaml`
58
+ (description + tags per file; Markdown may also use frontmatter). `quack reindex`
59
+ MERGES it — preserving your text — and regenerates every map, catalog, and
60
+ diagram from the files + `[[wikilinks]]`, so the navigation layer can never
61
+ drift. The root can be named anything; quack locates it by walking up for
62
+ `.quack/` (like git finds `.git`).
63
+
64
+ ## The catalog (DuckDB)
65
+
66
+ `quack reindex` builds `.quack/quack.duckdb`, a single embedded catalog of all
67
+ metadata: `files` (name, rel, folder, ext, description, tags_csv, n_links,
68
+ n_inbound, is_orphan, is_binary, file_modified, described_at, stale, body),
69
+ `tags(name, tag)`, and `links(src, dst, dst_exists)`, plus a BM25 full-text
70
+ index over name/description/body. (`stale` is true when a file changed after its
71
+ description was written — see `quack generate --stale`.)
72
+
73
+ It is the queryable store and **the graph lives here too**: the `links` table
74
+ is the edge list, and multi-hop traversal is a recursive CTE behind
75
+ `quack search` and `quack graph`. There is no separate graph.json: a query
76
+ pulls only the relevant slice into context instead of loading the whole graph,
77
+ which matters as the tree grows. The catalog is derived, never authoritative;
78
+ delete it and `quack reindex` rebuilds it from the files.
79
+
80
+ ```bash
81
+ quack sql "SELECT folder, count(*) FROM files GROUP BY folder"
82
+ quack sql "SELECT rel FROM files WHERE stale" # descriptions to refresh
83
+ quack sql "SELECT src, dst FROM links WHERE NOT dst_exists" # broken links
84
+ quack search "regex" --fts # BM25 ranking
85
+ quack graph path file-a file-b # shortest link path
86
+ ```
87
+
88
+ ## The `quack` command
89
+
90
+ Once installed (see [Install](#install)), `quack` is on your PATH and finds the
91
+ root from any directory inside it by walking up for `.quack/` — so commands work
92
+ no matter where you invoke them. (Developing on a checkout instead? `uv run
93
+ quack` inside `.quack/` is the equivalent.)
94
+
95
+ ```bash
96
+ quack init [dir] # create & scaffold a new space (dir, or the current folder)
97
+ quack reindex # regenerate everything (indexes, map, catalog, diagrams)
98
+ quack reindex --no-diagrams
99
+ quack diagram # diagrams only
100
+ quack doctor # check files + MCP registration
101
+ quack doctor --files # only files; --mcp only MCP; --strict to fail on issues
102
+ quack new "Title" -f projects -d "one-line description" -t tag,tag # new markdown note
103
+ quack describe PATH -d "…" -t tag,tag # record a description + tags for any file
104
+ quack setup # choose the AI assistant (arrow-key menu)
105
+ quack generate # AI: write description + tags for files missing one
106
+ quack generate --stale # also refresh descriptions whose file changed since
107
+ quack search "terms" # auto-hybrid: structural + FTS + semantic + graph
108
+ quack search "terms" --fts # force DuckDB BM25 full-text ranking
109
+ quack search "terms" --semantic # force vss semantic ranking
110
+ quack embed # build semantic embeddings (DuckDB vss)
111
+ quack graph path|central|clusters # graph queries
112
+ quack sql "SELECT ..." # query the catalog directly
113
+ quack mcp install # register the MCP server with clients
114
+ quack where # show root / toolkit / command paths
115
+ ```
116
+
117
+ Root resolution: `--root` > walk up for `.quack/` > `$QUACK_ROOT` >
118
+ `$OBSIDIAN_VAULT` > the package location.
119
+
120
+ ## LLM access (MCP)
121
+
122
+ `quack mcp install` writes a project-root `.mcp.json` (the auto-discover
123
+ convention Claude Code and others pick up) and offers to register with installed
124
+ client CLIs (kiro-cli, claude). The server exposes typed tools, `map`, `search`,
125
+ `get_file`, `sql`, `graph_path`, `central`, `clusters` (read), plus `describe`
126
+ and `reindex` (write), each returning `root` so the LLM can join `root` +
127
+ relative path. `QUACK.md` at the root is a visible anchor telling any LLM how to
128
+ navigate even without MCP.
129
+
130
+ **Seeding quack on a repo an agent already knows.** Point the MCP server at a
131
+ codebase and ask the assistant to annotate it: for each relevant file it calls
132
+ `describe(path, description, tags)` (writing into `.index.yaml` — the file
133
+ itself is untouched), then `reindex()` once. No per-file model shell-out; the
134
+ agent records what it already understands, and the catalog becomes searchable.
135
+
136
+ ## AI is optional
137
+
138
+ The assistant is used for one thing: writing short descriptions + tags for your
139
+ files (`quack generate`) so the search index is rich. quack works fully without
140
+ it — you just author descriptions yourself by editing each folder's
141
+ `.index.yaml`.
142
+
143
+ - `quack setup` shows an arrow-key menu of assistants (kiro-cli, claude, a
144
+ custom command, or "use without AI"), probes which are installed, and writes
145
+ the choice to `.quack/config.yaml`. `quack init` is an alias that runs it.
146
+ - `quack generate` fills in missing descriptions using that command. If none is
147
+ set up, it explains what the AI is for and offers to run setup.
148
+ - Set `ai.skip: true` in `config.yaml` to use quack without AI permanently;
149
+ `generate` then stops offering to set one up.
150
+ - Swap assistants anytime by editing `ai.command` (use `{prompt}` for the
151
+ prompt, or omit it to pipe on stdin) or re-running `quack setup`.
152
+
153
+ ## Keeping it in sync
154
+
155
+ Run `quack reindex` after structural changes. To automate, wire it to one of:
156
+ - Obsidian **Shell Commands** plugin (run on save),
157
+ - a git **pre-commit** hook (`quack doctor --strict --files && quack reindex`),
158
+ - a file-watcher,
159
+ - `quack kiro install` (writes a Kiro reindex-on-save hook).
@@ -0,0 +1,49 @@
1
+ [project]
2
+ name = "quackspace"
3
+ version = "0.1.0"
4
+ description = "A DuckDB-backed knowledge layer over your local work that helps LLMs navigate everything"
5
+ readme = "README.md"
6
+ authors = [
7
+ { name = "Marco Cassar", email = "marcocassar@mdcusa.net" }
8
+ ]
9
+ license = "MIT"
10
+ license-files = ["LICENSE"]
11
+ keywords = [
12
+ "llm", "mcp", "rag", "knowledge-base", "search", "duckdb",
13
+ "obsidian", "markdown", "notes", "retrieval", "agent",
14
+ ]
15
+ classifiers = [
16
+ "Development Status :: 3 - Alpha",
17
+ "Intended Audience :: Developers",
18
+ "Programming Language :: Python :: 3",
19
+ "Programming Language :: Python :: 3.12",
20
+ "Programming Language :: Python :: 3.13",
21
+ "Operating System :: POSIX",
22
+ "Operating System :: MacOS",
23
+ "Topic :: Software Development :: Libraries",
24
+ "Topic :: Text Processing :: Indexing",
25
+ "Environment :: Console",
26
+ ]
27
+ requires-python = ">=3.12"
28
+ dependencies = [
29
+ "duckdb>=1.5.3",
30
+ "mcp[cli]>=1.27.2",
31
+ "python-frontmatter>=1.3.0",
32
+ "pyyaml>=6.0.3",
33
+ ]
34
+
35
+ [project.urls]
36
+ Homepage = "https://github.com/Ocramaru/quackspace"
37
+ Repository = "https://github.com/Ocramaru/quackspace"
38
+ Issues = "https://github.com/Ocramaru/quackspace/issues"
39
+
40
+ [project.scripts]
41
+ quack = "quack.cli:main"
42
+ quack-mcp = "quack.mcp_server:main"
43
+
44
+ [build-system]
45
+ requires = ["uv_build>=0.11.15,<0.12.0"]
46
+ build-backend = "uv_build"
47
+
48
+ [tool.uv.build-backend]
49
+ module-name = "quack"
@@ -0,0 +1,3 @@
1
+ """quack, a knowledge layer over your local work that helps LLMs navigate it."""
2
+
3
+ __version__ = "0.1.0"
@@ -0,0 +1,217 @@
1
+ """The meta collection: one DuckDB catalog of all file metadata.
2
+
3
+ `quack reindex` rebuilds `.quack/quack.duckdb` from the files (+ the editable
4
+ .index.yaml store). It is a derived artifact, never the source of truth, so it
5
+ can be deleted and regenerated at any time. DuckDB is embedded (no server) and
6
+ gives real SQL plus BM25 full-text search over everything, the fast metadata
7
+ search `ls` can't do.
8
+
9
+ Schema:
10
+ files(name, rel, folder, ext, title, description, tags_csv, n_links,
11
+ n_inbound, is_orphan, is_binary, file_modified, described_at, stale,
12
+ body)
13
+ tags(name, tag) -- one row per (file, tag)
14
+ links(src, dst, dst_exists) -- one row per wikilink edge
15
+ A DuckDB FTS index is built over files(name, description, body) for `match_bm25`.
16
+ `stale` is true when the file changed after its description was written.
17
+ """
18
+
19
+ from __future__ import annotations
20
+
21
+ from collections import defaultdict
22
+ from pathlib import Path
23
+
24
+ import duckdb
25
+
26
+ from .core import Space
27
+
28
+ DB_NAME = "quack.duckdb"
29
+
30
+
31
+ def db_path(space: Space) -> Path:
32
+ return space.root / ".quack" / DB_NAME
33
+
34
+
35
+ def build(space: Space) -> dict:
36
+ """Rebuild the catalog from scratch over the loaded space. Returns a
37
+ summary. The space already carries effective metadata (authored .index.yaml
38
+ overlaid on each file)."""
39
+ path = db_path(space)
40
+ if path.exists():
41
+ path.unlink() # rebuild clean; the files + .index.yaml are the truth
42
+
43
+ names = set(space.by_name)
44
+ inbound: dict[str, int] = defaultdict(int)
45
+ for e in space.entries:
46
+ for target in e.links:
47
+ if target in names:
48
+ inbound[target] += 1
49
+
50
+ con = duckdb.connect(str(path))
51
+ try:
52
+ _create_schema(con)
53
+ for e in space.entries:
54
+ con.execute(
55
+ "INSERT INTO files VALUES "
56
+ "(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)",
57
+ [
58
+ e.name,
59
+ e.rel,
60
+ e.folder,
61
+ e.ext,
62
+ e.name,
63
+ e.description,
64
+ ",".join(e.tags),
65
+ len(e.links),
66
+ inbound.get(e.name, 0),
67
+ inbound.get(e.name, 0) == 0 and len(e.links) == 0,
68
+ e.is_binary,
69
+ e.modified,
70
+ e.described_at,
71
+ e.stale,
72
+ e.body,
73
+ ],
74
+ )
75
+ for tag in e.tags:
76
+ con.execute("INSERT INTO tags VALUES (?, ?)", [e.name, tag])
77
+ for dst in e.links:
78
+ con.execute(
79
+ "INSERT INTO links VALUES (?, ?, ?)",
80
+ [e.name, dst, dst in names],
81
+ )
82
+ _build_fts(con)
83
+ n_files = con.execute("SELECT count(*) FROM files").fetchone()[0]
84
+ n_tags = con.execute("SELECT count(*) FROM tags").fetchone()[0]
85
+ n_links = con.execute("SELECT count(*) FROM links").fetchone()[0]
86
+ finally:
87
+ con.close()
88
+
89
+ return {"db": str(path), "files": n_files, "tags": n_tags, "links": n_links}
90
+
91
+
92
+ def _create_schema(con: duckdb.DuckDBPyConnection) -> None:
93
+ con.execute(
94
+ """
95
+ CREATE TABLE files (
96
+ name VARCHAR,
97
+ rel VARCHAR,
98
+ folder VARCHAR,
99
+ ext VARCHAR,
100
+ title VARCHAR,
101
+ description VARCHAR,
102
+ tags_csv VARCHAR,
103
+ n_links INTEGER,
104
+ n_inbound INTEGER,
105
+ is_orphan BOOLEAN,
106
+ is_binary BOOLEAN,
107
+ file_modified VARCHAR,
108
+ described_at VARCHAR,
109
+ stale BOOLEAN,
110
+ body VARCHAR
111
+ );
112
+ CREATE TABLE tags (name VARCHAR, tag VARCHAR);
113
+ CREATE TABLE links (src VARCHAR, dst VARCHAR, dst_exists BOOLEAN);
114
+ """
115
+ )
116
+
117
+
118
+ def _build_fts(con: duckdb.DuckDBPyConnection) -> None:
119
+ """Create the BM25 full-text index over the searchable note fields."""
120
+ con.execute("INSTALL fts; LOAD fts;")
121
+ con.execute(
122
+ "PRAGMA create_fts_index('files', 'name', 'name', 'description', 'body', "
123
+ "overwrite=1);"
124
+ )
125
+
126
+
127
+ def connect(explicit_root: str | None = None) -> duckdb.DuckDBPyConnection:
128
+ """Open the catalog read-only for querying. Caller closes it."""
129
+ space = Space.load(explicit_root)
130
+ path = db_path(space)
131
+ if not path.exists():
132
+ raise RuntimeError(
133
+ f"No catalog at {path}. Run `quack reindex` to build it."
134
+ )
135
+ return duckdb.connect(str(path), read_only=True)
136
+
137
+
138
+ def query(sql: str, explicit_root: str | None = None) -> tuple[list[str], list[tuple]]:
139
+ """Run a SQL query against the catalog. Returns (column_names, rows)."""
140
+ con = connect(explicit_root)
141
+ try:
142
+ cur = con.execute(sql)
143
+ cols = [d[0] for d in cur.description] if cur.description else []
144
+ return cols, cur.fetchall()
145
+ finally:
146
+ con.close()
147
+
148
+
149
+ def neighbours(
150
+ names: list[str], explicit_root: str | None = None, hops: int = 1
151
+ ) -> list[tuple[str, str, int, str]]:
152
+ """Graph traversal in SQL: notes within `hops` of any seed name, in either
153
+ link direction. Returns [(name, rel, distance, via_seed), ...], excluding
154
+ the seeds, where via_seed is one seed that reaches the note at min distance.
155
+
156
+ Uses a recursive CTE so only the relevant subgraph is materialized, the
157
+ whole point of keeping the graph in DuckDB instead of a flat file.
158
+ """
159
+ if not names:
160
+ return []
161
+ con = connect(explicit_root)
162
+ try:
163
+ placeholders = ",".join("?" for _ in names)
164
+ rows = con.execute(
165
+ f"""
166
+ WITH RECURSIVE
167
+ -- undirected edge view over existing notes only
168
+ edge(a, b) AS (
169
+ SELECT src, dst FROM links WHERE dst_exists
170
+ UNION ALL
171
+ SELECT dst, src FROM links WHERE dst_exists
172
+ ),
173
+ walk(name, dist, seed) AS (
174
+ SELECT name, 0, name FROM files WHERE name IN ({placeholders})
175
+ UNION
176
+ SELECT e.b, w.dist + 1, w.seed
177
+ FROM walk w JOIN edge e ON e.a = w.name
178
+ WHERE w.dist < ?
179
+ ),
180
+ ranked AS (
181
+ SELECT w.name, n.rel, w.dist, w.seed,
182
+ row_number() OVER (PARTITION BY w.name ORDER BY w.dist) AS rn
183
+ FROM walk w JOIN files n ON n.name = w.name
184
+ WHERE w.dist > 0
185
+ AND w.name NOT IN ({placeholders}) -- a seed is not its own neighbour
186
+ )
187
+ SELECT name, rel, dist, seed FROM ranked WHERE rn = 1
188
+ ORDER BY dist, name
189
+ """,
190
+ [*names, hops, *names],
191
+ ).fetchall()
192
+ return rows
193
+ finally:
194
+ con.close()
195
+
196
+
197
+ def fts_search(
198
+ terms: str, explicit_root: str | None = None, limit: int = 10
199
+ ) -> list[tuple[str, str, float]]:
200
+ """BM25 full-text search. Returns [(rel, description, score), ...]."""
201
+ con = connect(explicit_root)
202
+ try:
203
+ rows = con.execute(
204
+ """
205
+ SELECT rel, description, score FROM (
206
+ SELECT rel, description,
207
+ fts_main_files.match_bm25(name, ?) AS score
208
+ FROM files
209
+ ) WHERE score IS NOT NULL
210
+ ORDER BY score DESC
211
+ LIMIT ?
212
+ """,
213
+ [terms, limit],
214
+ ).fetchall()
215
+ return rows
216
+ finally:
217
+ con.close()