@astrofoundry/grimoire 1.2.4 → 1.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +9 -122
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -1,148 +1,35 @@
|
|
|
1
1
|
# grimoire
|
|
2
2
|
|
|
3
|
-
Documentation
|
|
3
|
+
Documentation search powered by vector embeddings.
|
|
4
4
|
|
|
5
|
-
##
|
|
5
|
+
## Install
|
|
6
6
|
|
|
7
7
|
```bash
|
|
8
8
|
npm install -g @astrofoundry/grimoire
|
|
9
9
|
grimoire init
|
|
10
|
-
# Enter API URL and API key (provided by admin)
|
|
11
|
-
grimoire search "how to query firestore"
|
|
12
10
|
```
|
|
13
11
|
|
|
14
|
-
##
|
|
12
|
+
## Usage
|
|
15
13
|
|
|
16
14
|
```bash
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
### Firebase / GCP
|
|
23
|
-
|
|
24
|
-
```bash
|
|
25
|
-
# Authenticate for Firestore access (grimoire-docs project)
|
|
26
|
-
gcloud auth application-default login --project=grimoire-docs
|
|
27
|
-
```
|
|
28
|
-
|
|
29
|
-
### Vector indexes (one-time, before first search)
|
|
30
|
-
|
|
31
|
-
```bash
|
|
32
|
-
gcloud firestore indexes composite create \
|
|
33
|
-
--collection-group=grimoire_chunks \
|
|
34
|
-
--query-scope=COLLECTION \
|
|
35
|
-
--field-config='field-path=embedding,vector-config={"dimension":"768","flat":{}}' \
|
|
36
|
-
--database="(default)" \
|
|
37
|
-
--project=grimoire-docs
|
|
38
|
-
|
|
39
|
-
gcloud firestore indexes composite create \
|
|
40
|
-
--collection-group=grimoire_chunks \
|
|
41
|
-
--query-scope=COLLECTION \
|
|
42
|
-
--field-config='field-path=source,order=ASCENDING' \
|
|
43
|
-
--field-config='field-path=embedding,vector-config={"dimension":"768","flat":{}}' \
|
|
44
|
-
--database="(default)" \
|
|
45
|
-
--project=grimoire-docs
|
|
15
|
+
grimoire search "<query>"
|
|
16
|
+
grimoire search "<query>" --source <name>
|
|
17
|
+
grimoire list
|
|
18
|
+
grimoire stats
|
|
46
19
|
```
|
|
47
20
|
|
|
48
|
-
##
|
|
21
|
+
## Admin
|
|
49
22
|
|
|
50
23
|
```bash
|
|
51
|
-
|
|
52
|
-
grimoire add <name> --url <start_url>
|
|
53
|
-
|
|
54
|
-
# Refresh a source (scrape → convert → chunk → embed → store)
|
|
24
|
+
grimoire add <name> --url <url>
|
|
55
25
|
grimoire refresh <source>
|
|
56
|
-
|
|
57
|
-
# Full refresh (purge all data, re-scrape everything)
|
|
58
26
|
grimoire refresh <source> --full
|
|
59
|
-
|
|
60
|
-
# Re-run from cached HTML (skip scraping)
|
|
61
27
|
grimoire refresh <source> --from-raw
|
|
62
|
-
|
|
63
|
-
# Re-store from cached embeddings (skip scraping + embedding)
|
|
64
28
|
grimoire refresh <source> --from-store
|
|
65
|
-
|
|
66
|
-
# Override concurrency (default: 10)
|
|
67
29
|
grimoire refresh <source> --concurrency 20
|
|
68
|
-
|
|
69
|
-
# Refresh all sources
|
|
70
30
|
grimoire refresh --all
|
|
71
|
-
|
|
72
|
-
# Search across all sources
|
|
73
|
-
grimoire search "<query>"
|
|
74
|
-
|
|
75
|
-
# Search within a specific source
|
|
76
|
-
grimoire search "<query>" --source <name>
|
|
77
|
-
|
|
78
|
-
# List all configured sources
|
|
79
|
-
grimoire list
|
|
80
|
-
|
|
81
|
-
# Show statistics
|
|
82
|
-
grimoire stats
|
|
83
|
-
|
|
84
|
-
# Export source as JSON
|
|
85
31
|
grimoire export <source>
|
|
86
|
-
|
|
87
|
-
# API key management (admin only)
|
|
88
32
|
grimoire apikey create <name>
|
|
89
33
|
grimoire apikey list
|
|
90
34
|
grimoire apikey revoke <name>
|
|
91
35
|
```
|
|
92
|
-
|
|
93
|
-
## Configuration
|
|
94
|
-
|
|
95
|
-
Sources are defined in `config/sources.yaml`. Each source needs site-specific cleanup config.
|
|
96
|
-
|
|
97
|
-
```yaml
|
|
98
|
-
sources:
|
|
99
|
-
my-source:
|
|
100
|
-
name: My Docs # Display name
|
|
101
|
-
start_url: https://example.com/docs
|
|
102
|
-
nav_selector: nav # CSS selector for navigation element
|
|
103
|
-
content_selector: article # CSS selector for main content
|
|
104
|
-
include_patterns: # URL patterns to include
|
|
105
|
-
- /docs
|
|
106
|
-
exclude_patterns: # URL patterns to exclude (optional)
|
|
107
|
-
- /docs/legacy
|
|
108
|
-
remove_selectors: # CSS selectors to strip from content (site-specific)
|
|
109
|
-
- footer
|
|
110
|
-
- nav
|
|
111
|
-
- .sidebar
|
|
112
|
-
remove_text_patterns: # Regex patterns to strip from markdown (site-specific)
|
|
113
|
-
- "^Cookie notice.*$"
|
|
114
|
-
concurrency: 10 # Parallel browser tabs (default: 10)
|
|
115
|
-
rate_limit_ms: 1000 # Delay between requests (optional)
|
|
116
|
-
```
|
|
117
|
-
|
|
118
|
-
The converter only strips `style`, `script`, `noscript`, `iframe`, `svg` by default. All other cleanup (nav, footer, banners, site-specific UI elements) must be configured per source via `remove_selectors` and `remove_text_patterns`.
|
|
119
|
-
|
|
120
|
-
See `config/sources.yaml` for the Firebase Firestore example with full cleanup config.
|
|
121
|
-
|
|
122
|
-
## Environment Variables
|
|
123
|
-
|
|
124
|
-
Set in `.env` at project root (auto-loaded by CLI):
|
|
125
|
-
|
|
126
|
-
```bash
|
|
127
|
-
GOOGLE_CLOUD_PROJECT=grimoire-docs # Firebase/GCP project ID
|
|
128
|
-
GEMINI_API_KEY=... # Google Gemini API key
|
|
129
|
-
RERANKER_URL=... # llama-cpp reranker endpoint
|
|
130
|
-
```
|
|
131
|
-
|
|
132
|
-
## Releasing
|
|
133
|
-
|
|
134
|
-
```bash
|
|
135
|
-
pnpm release:patch # bump, commit, tag, push → GH Actions deploys functions + publishes npm
|
|
136
|
-
pnpm release:minor
|
|
137
|
-
pnpm release:major
|
|
138
|
-
```
|
|
139
|
-
|
|
140
|
-
## Development
|
|
141
|
-
|
|
142
|
-
```bash
|
|
143
|
-
pnpm test # Run tests
|
|
144
|
-
pnpm lint # ESLint
|
|
145
|
-
pnpm check # Typecheck + lint + test
|
|
146
|
-
pnpm build # Compile TypeScript
|
|
147
|
-
pnpm build:watch # Watch mode
|
|
148
|
-
```
|