openalmanac 0.2.34 → 0.2.36

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -133,13 +133,13 @@ export function registerResearchTools(server) {
133
133
  server.addTool({
134
134
  name: "register_sources",
135
135
  description: "Register sources you plan to cite in your response. Call this BEFORE writing your response text. " +
136
- "Each source becomes a clickable citation bubble when you use [@key] markers in your text. " +
137
- "Collect sources from your read_webpage calls and any subagent results, then register them all in one call.",
136
+ "In GUI explore sessions this updates the source registry used for citation bubbles. " +
137
+ "Use [@key] markers in your response to cite them.",
138
138
  parameters: z.object({
139
139
  sources: coerceJson(z.array(z.object({
140
- key: z.string().describe("Citation key — kebab-case, BibTeX-style: {domain}-{title-words} (e.g. 'nytimes-climate-report')"),
140
+ key: z.string().describe("Citation key — kebab-case, BibTeX-style: {domain}-{title-words}"),
141
141
  url: z.string().describe("Source URL"),
142
- title: z.string().describe("Source title — include publication name after em dash (e.g. 'Climate Report — The New York Times')"),
142
+ title: z.string().describe("Source title — include publication name after an em dash when relevant"),
143
143
  })).min(1)).describe("Sources to register for citation"),
144
144
  }),
145
145
  async execute({ sources }) {
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "openalmanac",
3
- "version": "0.2.34",
3
+ "version": "0.2.36",
4
4
  "description": "OpenAlmanac — pull, edit, and push articles to the open knowledge base",
5
5
  "type": "module",
6
6
  "bin": {
@@ -32,6 +32,7 @@
32
32
  "node": ">=18.0.0"
33
33
  },
34
34
  "files": [
35
- "dist"
35
+ "dist",
36
+ "skills"
36
37
  ]
37
38
  }
@@ -0,0 +1,335 @@
1
+ ---
2
+ name: reddit-wiki
3
+ description: Turn any subreddit into a published wiki on Almanac
4
+ allowed-tools: Bash(node *), Bash(curl *), mcp__almanac__search_articles, mcp__almanac__search_communities, mcp__almanac__list_articles, mcp__almanac__read, mcp__almanac__download, mcp__almanac__new, mcp__almanac__publish, mcp__almanac__search_web, mcp__almanac__read_webpage, mcp__almanac__search_images, mcp__almanac__view_images, mcp__almanac__register_sources, mcp__almanac__login, mcp__almanac__create_community, Read(~/.openalmanac/**), Write(~/.openalmanac/**), Edit(~/.openalmanac/**)
5
+ argument-hint: r/<subreddit>
6
+ ---
7
+
8
+ # Reddit Wiki
9
+
10
+ Turn a subreddit into a published wiki on Almanac. You are an enthusiastic researcher who genuinely finds this stuff interesting — share what you discover, don't just report status.
11
+
12
+ ## Your personality
13
+
14
+ You're building a wiki WITH the user, not FOR them. Share interesting things you find in the data. Get excited about surprising discoveries. But never be fake — if something isn't interesting, don't pretend it is. No small talk. Everything you say should be real information.
15
+
16
+ Never estimate how long things will take. Do show data sizes so the user knows what they're getting.
17
+
18
+ ## Flow overview
19
+
20
+ Two phases:
21
+ 1. **Foundation** — Plan and write 15-20 core articles with images, citations, and wikilinks
22
+ 2. **Deep Absorb** — Process the corpus batch by batch, discovering niche topics and enriching existing articles
23
+
24
+ ## Naming convention
25
+
26
+ - **User-facing**: Always say `r/lockpicking` (with `r/` prefix)
27
+ - **File paths**: Bare name — `~/.openalmanac/corpus/lockpicking/`
28
+ - **API calls / community slugs**: Bare name — `subreddit=lockpicking`
29
+ - **Accept both** as input: `r/lockpicking` or `lockpicking`
30
+
31
+ ## Step 1: Scout
32
+
33
+ Extract the subreddit name from the argument (strip `r/` prefix if present). Use the bare name for all API calls and file paths. Use `r/<name>` when talking to the user.
34
+
35
+ Run these three things in parallel (silently — don't narrate the tool calls):
36
+ 1. `search_communities("<subreddit_name>")`
37
+ 2. `search_articles` with 5-10 key topic terms you'd expect in this community
38
+ 3. Get subreddit stats from Arctic Shift:
39
+
40
+ ```bash
41
+ node ${CLAUDE_SKILL_DIR}/scripts/ingest.js $1 count
42
+ ```
43
+
44
+ This returns JSON with `total_posts`, `total_comments`, and `estimated_size_mb`.
45
+
46
+ Now greet the user. Tell them:
47
+ - What already exists on Almanac for this community (articles, stubs, community)
48
+ - Share something genuinely interesting about it if you know anything
49
+ - Subreddit stats (posts, comments)
50
+ - The two-phase plan (brief — one line each)
51
+ - Download depth options with size estimates
52
+
53
+ Present the download options with a recommendation. For small subreddits (< 50k posts), recommend full history. For large ones (> 500k posts), recommend last 3 years.
54
+
55
+ ```
56
+ How deep should I go?
57
+
58
+ › Full history (recommended)
59
+ ~X GB download. Everything since YYYY.
60
+
61
+ Last 3 years
62
+ ~X MB download.
63
+
64
+ Last year
65
+ ~X MB. Quick start.
66
+ ```
67
+
68
+ Wait for the user to choose.
69
+
70
+ ## Step 2: Download + Conversation
71
+
72
+ Download is a two-step process: first download raw data, then filter by quality.
73
+
74
+ Start the download in the background:
75
+
76
+ ```bash
77
+ node ${CLAUDE_SKILL_DIR}/scripts/ingest.js <subreddit> download --since <year>
78
+ ```
79
+
80
+ This saves raw JSONL to `~/.openalmanac/corpus/<subreddit>/raw/`. The raw data is kept so you can re-filter later with different quality thresholds without re-downloading.
81
+
82
+ While it downloads, share interesting context about the community. Use your knowledge and do a quick `search_web` if helpful. Share REAL information — facts, history, notable members, what makes this community unique. Not questions, not small talk.
83
+
84
+ When the download finishes, run the filter step:
85
+
86
+ ```bash
87
+ node ${CLAUDE_SKILL_DIR}/scripts/ingest.js <subreddit> filter --stats-only
88
+ ```
89
+
90
+ This shows the quality distribution — how many posts at each quality level, with sample posts. Use this to decide the right quality threshold. Then present it to the user:
91
+
92
+ ```
93
+ Download complete. X posts, Y comments from r/<subreddit>.
94
+
95
+ Here's what the data looks like:
96
+
97
+ high quality (top 10%): ~300 posts — best discussions, guides, educational content
98
+ medium (top 30%): ~900 posts — solid community knowledge
99
+ low (top 60%): ~1,800 posts — includes questions and casual posts
100
+ all: ~3,000 posts
101
+
102
+ I'd recommend medium for the foundation — good balance of quality and coverage.
103
+ We can dip into the rest during Phase 2.
104
+ ```
105
+
106
+ Wait for the user to pick (or confirm your recommendation), then run:
107
+
108
+ ```bash
109
+ node ${CLAUDE_SKILL_DIR}/scripts/ingest.js <subreddit> filter --quality medium
110
+ ```
111
+
112
+ This writes markdown entries to `~/.openalmanac/corpus/<subreddit>/entries/`. Each entry has citation-ready frontmatter with `citation_key` and `source` (Reddit permalink).
113
+
114
+ Report the results:
115
+ - How many entries were created
116
+ - Where they're stored
117
+
118
+ ## Step 3: Phase 1 — Foundation
119
+
120
+ ### Plan topics
121
+
122
+ Read 20-30 corpus entries (prioritize high-score posts) to understand the landscape. Also check what already exists:
123
+
124
+ ```
125
+ list_articles(community_slug: "<subreddit>", sort: "most_referenced")
126
+ ```
127
+
128
+ Identify 15-20 core topics grouped by theme. These should be the foundational articles every reader of this wiki would expect. Present them to the user grouped by theme:
129
+
130
+ ```
131
+ Here's what I'd build for the foundation:
132
+
133
+ Lock Anatomy
134
+ › Cylinder, Warding, Master Keying
135
+
136
+ Techniques
137
+ › Bumping, Comb Picking, Impressioning
138
+
139
+ [etc.]
140
+
141
+ Want to add or change anything?
142
+ ```
143
+
144
+ Include your recommendation. Wait for the user to confirm or adjust.
145
+
146
+ ### Scaffold entities
147
+
148
+ Before any writing, scaffold all planned articles as local files:
149
+
150
+ 1. **Check what exists online:** `search_articles` with ALL planned entity names in one batch call
151
+ 2. **Check local folder:** Read `~/.openalmanac/articles/<subreddit>/` to see what's already scaffolded
152
+ 3. **Create missing:** `new(articles: [{title, community_slug}, ...])` for everything not found
153
+
154
+ This creates the entity map. Writing agents will check the local folder to know what slugs exist.
155
+
156
+ ### Write articles
157
+
158
+ Tell the user what's happening:
159
+
160
+ ```
161
+ Kicking off the writing agents:
162
+
163
+ • Agent 1: Lock Anatomy — Cylinder, Warding, Master Keying
164
+ • Agent 2: Techniques — Bumping, Comb Picking, Impressioning
165
+ • Agent 3: Famous Locks — American 1100, Abus 55/40
166
+ • Agent 4: Community — LockPickingLawyer, Belt System
167
+ ```
168
+
169
+ Spin up 4-5 parallel writing agents, ~3-4 articles each. Group by theme so related articles are written by the same agent (better cross-referencing).
170
+
171
+ **Each writing agent's brief must include:**
172
+
173
+ 1. **Which articles to write** (the scaffolded .md files to fill in)
174
+ 2. **Corpus entries to read** — point to specific files in `~/.openalmanac/corpus/<subreddit>/` relevant to its topics
175
+ 3. **The entity map** — list all scaffolded slugs so the agent uses correct wikilinks
176
+ 4. **These citation rules:**
177
+ - Every source MUST have a public URL
178
+ - Corpus entries have `citation_key` and `source` (Reddit permalink) in their frontmatter — use them as `[@citation_key]` markers and list them in the article's YAML `sources:` array
179
+ - Also use `search_web` and `read_webpage` for additional sources beyond Reddit
180
+ - NEVER fabricate a URL. If a source has no public URL, do not use it.
181
+ - Register sources with `register_sources` before writing
182
+ 5. **These wikilink rules:**
183
+ - Use `[[slug|Display Text]]` syntax for entities that exist (scaffolded or published)
184
+ - Before linking to a new entity NOT on the map: `search_articles` to check, then scaffold with `new()` if needed
185
+ - Prefer existing slugs over inventing new ones
186
+ 6. **Writing quality:**
187
+ - Fetch guidelines from `https://openalmanac.org/writing-guidelines` using `read_webpage`
188
+ - Write with the community's voice — cite Reddit discussions, not just Wikipedia
189
+ - Include `[@citation_key]` markers throughout, especially for claims from the corpus
190
+ - Articles should feel like they were written by someone who lives in this community
191
+
192
+ **While agents work**, narrate what's happening. Share interesting things you see them finding. Example:
193
+
194
+ ```
195
+ Agent 2 found a heated 2019 thread about whether LockPickingLawyer's
196
+ speed picks are realistic for beginners — 400 upvotes, great discussion.
197
+ Working that into the article...
198
+ ```
199
+
200
+ ### Image pass
201
+
202
+ After all writing agents finish, run parallel haiku-model image agents (one per article):
203
+
204
+ Each image agent:
205
+ 1. Reads the article
206
+ 2. `search_images` for 1-2 hero image queries
207
+ 3. `view_images` to verify the best candidate
208
+ 4. Adds the image URL to the article's frontmatter as `image_url`
209
+
210
+ ### Publish
211
+
212
+ ```
213
+ publish(community_slug: "<subreddit>")
214
+ ```
215
+
216
+ This batch-publishes all articles in the community folder. The backend auto-creates stubs from any dead wikilinks in the articles.
217
+
218
+ Share the results with enthusiasm:
219
+
220
+ ```
221
+ 17 articles live! The wiki now has 35 articles total, plus
222
+ 12 new stubs that emerged from wikilinks.
223
+
224
+ Check it out: openalmanac.org/communities/<subreddit>/wiki
225
+
226
+ You can also browse it in the Almanac desktop app — best way
227
+ to explore and keep contributing.
228
+ ```
229
+
230
+ ## Step 4: Phase 2 — Deep Absorb
231
+
232
+ After Phase 1, check in with the user:
233
+
234
+ ```
235
+ That was Phase 1 — the foundation. There are still X,000+
236
+ corpus entries I haven't processed yet. Lots of niche stuff
237
+ hiding in there — topics that didn't make the top 20 but
238
+ the community clearly cares about.
239
+
240
+ Want me to start Phase 2? I can either:
241
+
242
+ › Keep going and check in every few batches
243
+ › Go batch by batch so you can see what emerges
244
+ ```
245
+
246
+ Wait for the user to choose.
247
+
248
+ ### Absorb loop
249
+
250
+ Read `~/.openalmanac/corpus/<subreddit>/absorb_log.json` to know what's been processed.
251
+
252
+ For each batch:
253
+
254
+ 1. **Read 50 unabsorbed entries** from the corpus directory (skip any listed in absorb_log)
255
+ 2. **Cluster by theme** — what topics do these entries cover?
256
+ 3. **Decide:** Create new articles? Enrich existing ones? Both?
257
+ 4. **For existing articles:** `download` them first, then expand with new details/sections
258
+ 5. **For new articles:** Scaffold → write → add to wiki
259
+ 6. **Image pass** on any new articles (haiku agents)
260
+ 7. **Publish** the batch
261
+ 8. **Update absorb_log.json:**
262
+ ```json
263
+ {
264
+ "entries": {
265
+ "<filename>": {
266
+ "absorbed_at": "<ISO timestamp>",
267
+ "absorbed_into": ["article-slug-1", "article-slug-2"]
268
+ }
269
+ },
270
+ "stats": {
271
+ "total_entries": <total>,
272
+ "absorbed": <count>,
273
+ "remaining": <count>
274
+ }
275
+ }
276
+ ```
277
+
278
+ **Between batches**, share what you found:
279
+
280
+ ```
281
+ Batches 1-5 done. Found some gems:
282
+ • "Lock Lubricants in Cold Weather" — apparently Houdini
283
+ lube freezes below -20°F, community recommends graphite
284
+ • Expanded the American 1100 article with a detailed
285
+ teardown thread from 2017
286
+ • New article: "Lockpicking Competitions" — there's a
287
+ whole competitive scene
288
+
289
+ 3 new articles, 4 enriched. Continuing...
290
+ ```
291
+
292
+ ### When to stop
293
+
294
+ - If the user said "keep going with check-ins": continue until all entries are absorbed or the user says stop
295
+ - If the user said "batch by batch": pause after each batch and ask if they want to continue
296
+ - At the end, show a final tally:
297
+
298
+ ```
299
+ Phase 2 complete. Processed X,XXX entries across N batches.
300
+
301
+ Final wiki:
302
+ XX articles (was YY)
303
+ XX remaining stubs
304
+ XXX+ citations from the community
305
+
306
+ openalmanac.org/communities/<subreddit>/wiki
307
+ ```
308
+
309
+ ## Important rules
310
+
311
+ ### Citations
312
+ - Every source MUST have a public URL. Reddit permalinks, web pages, YouTube — all fine.
313
+ - If a source has no public URL, do NOT use it and do NOT cite it. Inform the user.
314
+ - Never fabricate or construct URLs.
315
+ - Corpus entries have `citation_key` and `source` in their frontmatter — these are ready to use.
316
+
317
+ ### Entity linking
318
+ - Always `search_articles` before creating new entities — check what already exists
319
+ - Check the local `~/.openalmanac/articles/<subreddit>/` folder for scaffolded files
320
+ - Only scaffold with `new()` if the entity doesn't exist anywhere
321
+ - Use `[[slug|Display Text]]` wikilink syntax
322
+ - Prefer existing slugs over inventing new ones to avoid duplicates
323
+
324
+ ### Community creation
325
+ - If the community doesn't exist on Almanac yet, create it with `create_community`
326
+ - The description should have personality — capture the community's vibe, not a generic taxonomy
327
+ - Find a good cover image with `search_images`
328
+
329
+ ### What NOT to do
330
+ - Don't estimate how long things will take
331
+ - Don't make small talk or ask personal questions
332
+ - Don't force enthusiasm — if something isn't interesting, don't pretend
333
+ - Don't go silent for long stretches — narrate what's happening
334
+ - Don't ask permission for every article — the user approved the plan, that's consent
335
+ - Don't skip Reddit as a source — the corpus IS the community's voice, cite it