@forwardimpact/basecamp 0.3.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,335 @@
1
+ ---
2
+ name: process-hyprnote
3
+ description: Process Hyprnote meeting sessions (memos, summaries, transcripts) into the knowledge graph. Extracts people, organizations, projects, and topics from AI-generated meeting summaries and user notes, creating or updating Obsidian-compatible notes in knowledge/. Use when the user asks to process meeting notes or after Hyprnote sessions.
4
+ ---
5
+
6
+ # Process Hyprnote
7
+
8
+ Process meeting sessions from Hyprnote (a local AI meeting notes app) and
9
+ extract structured knowledge into `knowledge/`. Hyprnote records meetings,
10
+ transcribes them, and generates AI summaries. This skill reads that output and
11
+ feeds it into the knowledge graph — the same way `extract-entities` processes
12
+ emails and calendar events.
13
+
14
+ ## Trigger
15
+
16
+ Run this skill:
17
+
18
+ - When the user asks to process meeting notes or Hyprnote sessions
19
+ - After new meetings have been recorded in Hyprnote
20
+ - When the user asks to update the knowledge base from recent meetings
21
+
22
+ ## Prerequisites
23
+
24
+ - Hyprnote installed with session data at
25
+ `~/Library/Application Support/hyprnote/sessions/`
26
+ - User identity configured in `USER.md`
27
+
28
+ ## Inputs
29
+
30
+ - `~/Library/Application Support/hyprnote/sessions/{uuid}/` — session
31
+ directories, each containing:
32
+ - `_meta.json` — session metadata (title, created_at, participants)
33
+ - `_memo.md` — user's markdown notes (YAML frontmatter + body)
34
+ - `_summary.md` — AI-generated meeting summary (YAML frontmatter + body),
35
+ optional
36
+ - `transcript.json` — word-level transcript with speaker channels, optional
37
+ - `~/.cache/fit/basecamp/state/graph_processed` — tracks processed files (TSV)
38
+ - `USER.md` — user identity for self-exclusion
39
+
40
+ ## Outputs
41
+
42
+ - `knowledge/People/*.md` — person notes (new or updated)
43
+ - `knowledge/Organizations/*.md` — organization notes (new or updated)
44
+ - `knowledge/Projects/*.md` — project notes (new or updated)
45
+ - `knowledge/Topics/*.md` — topic notes (new or updated)
46
+ - `~/.cache/fit/basecamp/state/graph_processed` — updated with processed session
47
+ files
48
+
49
+ ---
50
+
51
+ ## Before Starting
52
+
53
+ 1. Read `USER.md` to get the user's name, email, and domain.
54
+ 2. List all session directories:
55
+
56
+ ```bash
57
+ ls "$HOME/Library/Application Support/hyprnote/sessions/"
58
+ ```
59
+
60
+ 3. For each session, check if it needs processing by looking up its key files in
61
+ the graph state:
62
+
63
+ ```bash
64
+ grep -F "{file_path}" ~/.cache/fit/basecamp/state/graph_processed
65
+ ```
66
+
67
+ A session needs processing if:
68
+
69
+ - Its `_memo.md` path is **not** in `graph_processed`, OR
70
+ - Its `_memo.md` hash has changed (compute SHA-256 and compare), OR
71
+ - Its `_summary.md` exists and is not in `graph_processed` or has changed
72
+
73
+ **Process all unprocessed sessions in one run** (typically few sessions).
74
+
75
+ ## Step 0: Build Knowledge Index
76
+
77
+ Scan existing notes to avoid duplicates and resolve entities:
78
+
79
+ ```bash
80
+ ls knowledge/People/ knowledge/Organizations/ knowledge/Projects/ knowledge/Topics/ 2>/dev/null
81
+ ```
82
+
83
+ For each existing note, read the header fields to build a mental index of known
84
+ entities (same approach as `extract-entities` Step 0).
85
+
86
+ ## Step 1: Read Session Data
87
+
88
+ For each unprocessed session, read files in this order:
89
+
90
+ ### 1a. Read `_meta.json`
91
+
92
+ ```json
93
+ {
94
+ "created_at": "2026-02-16T13:01:59.187Z",
95
+ "id": "7888363f-4cc6-4987-8470-92f386e5bdfc",
96
+ "participants": [],
97
+ "title": "Director-Level Hiring Pipeline",
98
+ "user_id": "00000000-0000-0000-0000-000000000000"
99
+ }
100
+ ```
101
+
102
+ Extract: **session date** (from `created_at`), **title**, **participants** (may
103
+ be empty — Hyprnote doesn't always populate this).
104
+
105
+ ### 1b. Read `_memo.md`
106
+
107
+ YAML frontmatter (id, session_id) followed by the user's markdown notes.
108
+ Example:
109
+
110
+ ```markdown
111
+ ---
112
+ id: 213e0f78-a66a-468d-b8e5-bc3fbbe04bf4
113
+ session_id: 213e0f78-a66a-468d-b8e5-bc3fbbe04bf4
114
+ ---
115
+
116
+ Chat with Sarah about the product roadmap.
117
+ ```
118
+
119
+ The memo contains the user's own notes — names, observations, action items. This
120
+ is high-signal content; every name or observation here is intentional.
121
+
122
+ ### 1c. Read `_summary.md` (if exists)
123
+
124
+ YAML frontmatter (id, position, session_id, title) followed by an AI-generated
125
+ meeting summary. This is the richest source — structured bullet points covering
126
+ topics discussed, decisions made, action items, and key details.
127
+
128
+ ```markdown
129
+ ---
130
+ id: 152d9bc9-0cdc-4fb2-9916-cb7670f3a6df
131
+ position: 1
132
+ session_id: 213e0f78-a66a-468d-b8e5-bc3fbbe04bf4
133
+ title: Summary
134
+ ---
135
+
136
+ # Product Roadmap Review
137
+
138
+ - Both speakers reviewed the Q2 roadmap priorities...
139
+ ```
140
+
141
+ ### 1d. Read `transcript.json` (if exists, for disambiguation only)
142
+
143
+ The transcript is word-level with speaker channels:
144
+
145
+ ```json
146
+ {
147
+ "transcripts": [{
148
+ "words": [
149
+ {"channel": 0, "text": "Hello", "start_ms": 0, "end_ms": 500},
150
+ {"channel": 1, "text": "Hi", "start_ms": 600, "end_ms": 900}
151
+ ]
152
+ }]
153
+ }
154
+ ```
155
+
156
+ **Do NOT process the full transcript for entity extraction.** It's too noisy.
157
+ Only consult the transcript when you need to:
158
+
159
+ - Disambiguate a name mentioned in the summary or memo
160
+ - Confirm who said what (channel 0 = user, channel 1 = other speaker)
161
+ - Find context around a specific topic or decision
162
+
163
+ ### 1e. Filter: Skip Empty Sessions
164
+
165
+ Skip sessions where:
166
+
167
+ - `_memo.md` body is empty or only contains ` ` / whitespace
168
+ - No `_summary.md` exists
169
+ - Title is empty or generic ("Hello", "Welcome to Hyprnote", "Test")
170
+
171
+ If a session has **either** a substantive memo **or** a `_summary.md`, process
172
+ it.
173
+
174
+ ## Step 2: Classify Source
175
+
176
+ Hyprnote sessions are **meetings**. They follow meeting rules from
177
+ `extract-entities`:
178
+
179
+ - **CAN create** new People, Organization, Project, and Topic notes
180
+ - **CAN update** existing notes
181
+ - **CAN detect** state changes
182
+
183
+ Apply the same "Would I prep for this person?" test from `extract-entities` Step
184
+ 5 when deciding whether to create a note for someone mentioned.
185
+
186
+ ## Step 3: Extract Entities
187
+
188
+ Combine content from `_memo.md` and `_summary.md` (prefer summary when both
189
+ exist, as it's more detailed). Extract:
190
+
191
+ ### People
192
+
193
+ Look for names in:
194
+
195
+ - Memo text ("chat with Sarah Chen", "interview with David Kim")
196
+ - Summary bullet points ("the user will serve as the senior engineer", "Alex
197
+ from the platform team")
198
+ - Participant list in `_meta.json`
199
+
200
+ For each person:
201
+
202
+ - Resolve against knowledge index (Step 0)
203
+ - Extract role, organization, relationship to user
204
+ - Note what was discussed with/about them
205
+
206
+ ### Organizations
207
+
208
+ - Explicit mentions ("Acme Corp", "TechCo", "Global Services")
209
+ - Inferred from people's roles or context
210
+
211
+ ### Projects
212
+
213
+ - Explicit project names ("Customer Portal", "Q2 Migration")
214
+ - Described initiatives ("the hiring pipeline", "the product launch")
215
+
216
+ ### Topics
217
+
218
+ - Recurring themes ("AI coding agents", "interview process", "architecture
219
+ decisions")
220
+ - Only create Topic notes for subjects that span multiple meetings or are
221
+ strategically important
222
+
223
+ ### Self-exclusion
224
+
225
+ Never create or update notes for the user who matches name, email, or @domain
226
+ from `USER.md`.
227
+
228
+ **Exception for interview sessions:** If the session is clearly a job interview
229
+ (title or memo indicates "interview with {Name}"), the interviewee is a
230
+ **candidate** — create or update their note in `knowledge/Candidates/` instead
231
+ of `knowledge/People/`, following the candidate note template from
232
+ `track-candidates`.
233
+
234
+ ## Step 4: Extract Content
235
+
236
+ For each entity that has or will have a note, extract from the session:
237
+
238
+ ### Decisions
239
+
240
+ Signals in summaries: "decided", "agreed", "plan to", "established", "will serve
241
+ as"
242
+
243
+ ### Commitments / Action Items
244
+
245
+ Signals: "will share", "plans to", "needs to", "to be created", "will upload"
246
+
247
+ Extract: Owner, action, deadline (if mentioned), status (open).
248
+
249
+ ### Key Facts
250
+
251
+ - Specific numbers (headcount, budget, timeline)
252
+ - Preferences ("non-traditional backgrounds", "fusion of skills")
253
+ - Process details (interview stages, evaluation criteria)
254
+ - Strategic context (market trends, competitive landscape)
255
+
256
+ ### Activity Summary
257
+
258
+ One line per session for each entity:
259
+
260
+ ```markdown
261
+ - **2026-02-14** (meeting): Discussed hiring pipeline. 11 internal candidates,
262
+ plan to shortlist to 6-7. [[People/Sarah Chen]] managing the team.
263
+ ```
264
+
265
+ ### Interview Notes (for Candidates)
266
+
267
+ If the session is an interview, extract:
268
+
269
+ - Impressions and observations from the memo
270
+ - Technical assessment notes
271
+ - Strengths and concerns
272
+ - Any interview scoring or decisions
273
+
274
+ Add these to the candidate's `## Notes` section.
275
+
276
+ ## Step 5: Write Updates
277
+
278
+ Follow the same patterns as `extract-entities` Steps 7-10:
279
+
280
+ ### For NEW entities
281
+
282
+ Create notes using templates from
283
+ `.claude/skills/extract-entities/references/TEMPLATES.md`.
284
+
285
+ For **candidates** (interview sessions), use the candidate template from
286
+ `track-candidates` instead.
287
+
288
+ ### For EXISTING entities
289
+
290
+ Apply targeted edits:
291
+
292
+ - Add new activity entry at the TOP of the Activity section
293
+ - Update Last seen / Last activity date
294
+ - Add new key facts (skip duplicates)
295
+ - Update open items (mark completed, add new)
296
+ - Apply state changes
297
+
298
+ **Use precise edits — don't rewrite the entire file.**
299
+
300
+ ### Bidirectional links
301
+
302
+ Verify links go both ways (same rules as `extract-entities` Step 10). Always use
303
+ absolute paths: `[[People/Name]]`, `[[Organizations/Name]]`,
304
+ `[[Projects/Name]]`.
305
+
306
+ ## Step 6: Update Graph State
307
+
308
+ After processing each session, mark its files as processed:
309
+
310
+ ```bash
311
+ python3 .claude/skills/extract-entities/scripts/state.py update \
312
+ "$HOME/Library/Application Support/hyprnote/sessions/{uuid}/_memo.md"
313
+
314
+ # Also mark _summary.md if it exists
315
+ python3 .claude/skills/extract-entities/scripts/state.py update \
316
+ "$HOME/Library/Application Support/hyprnote/sessions/{uuid}/_summary.md"
317
+ ```
318
+
319
+ This prevents reprocessing unless the files change.
320
+
321
+ ## Quality Checklist
322
+
323
+ Before completing, verify:
324
+
325
+ - [ ] Skipped empty/test/onboarding sessions
326
+ - [ ] Read both `_memo.md` and `_summary.md` for each processed session
327
+ - [ ] Applied "Would I prep?" test to each person
328
+ - [ ] Excluded self and @user.domain from entity extraction
329
+ - [ ] Interview sessions created/updated Candidate notes (not People notes)
330
+ - [ ] Used absolute paths `[[Folder/Name]]` in ALL links
331
+ - [ ] Summaries describe relationship, not communication method
332
+ - [ ] Key facts are substantive (no filler)
333
+ - [ ] Open items are commitments (no meta-tasks)
334
+ - [ ] Bidirectional links are consistent
335
+ - [ ] Graph state updated for all processed session files
@@ -40,12 +40,15 @@ their calendar.
40
40
  Run the sync as a single Python script. This avoids N+1 sqlite3 invocations (one
41
41
  per event for attendees) and handles all data transformation in one pass:
42
42
 
43
- python3 scripts/sync.py
43
+ python3 scripts/sync.py [--days N]
44
+
45
+ - `--days N` — how many days back to sync (default: 30)
44
46
 
45
47
  The script:
46
48
 
47
49
  1. Finds the Calendar database (Sonoma+ path first, then fallback)
48
- 2. Queries all events in a 14-day past/future window with a single SQL query
50
+ 2. Queries all events in a sliding window (`--days` past / 14 days future) with
51
+ a single SQL query
49
52
  3. Batch-fetches all attendees for those events in one query
50
53
  4. Writes one JSON file per event to `~/.cache/fit/basecamp/apple_calendar/`
51
54
  5. Cleans up JSON files for events now outside the window
@@ -95,7 +98,7 @@ Each `{event_id}.json` file:
95
98
  ## Constraints
96
99
 
97
100
  - Open database read-only (`-readonly`)
98
- - This sync is stateless — always queries the current 14-day window
101
+ - This sync is stateless — always queries the current sliding window
99
102
  - All-day events may have null end times — use start date as end date
100
103
  - All-day events have timezone `_float` — omit timezone from output
101
104
  - Output format matches Google Calendar event format for downstream consistency
@@ -1,14 +1,18 @@
1
1
  #!/usr/bin/env python3
2
2
  """Sync Apple Calendar events to ~/.cache/fit/basecamp/apple_calendar/ as JSON.
3
3
 
4
- Queries the macOS Calendar SQLite database for events in a 14-day sliding
5
- window (past and future) and writes one JSON file per event.
4
+ Queries the macOS Calendar SQLite database for events in a sliding window
5
+ (past and future) and writes one JSON file per event.
6
6
 
7
- Usage: python3 scripts/sync.py
7
+ Usage: python3 scripts/sync.py [--days N]
8
+
9
+ Options:
10
+ --days N How many days back to sync (default: 30)
8
11
 
9
12
  Requires: macOS with Calendar app configured and Full Disk Access granted.
10
13
  """
11
14
 
15
+ import argparse
12
16
  import json
13
17
  import os
14
18
  import subprocess
@@ -82,11 +86,16 @@ def coredata_to_iso(ts, tz_name=None):
82
86
 
83
87
 
84
88
  def main():
89
+ parser = argparse.ArgumentParser(description="Sync Apple Calendar events.")
90
+ parser.add_argument("--days", type=int, default=30,
91
+ help="How many days back to sync (default: 30)")
92
+ args = parser.parse_args()
93
+
85
94
  db = find_db()
86
95
  os.makedirs(OUTDIR, exist_ok=True)
87
96
 
88
97
  now = datetime.now(timezone.utc)
89
- start = now - timedelta(days=14)
98
+ start = now - timedelta(days=args.days)
90
99
  end = now + timedelta(days=14)
91
100
  START_TS = (start - EPOCH).total_seconds()
92
101
  END_TS = (end - EPOCH).total_seconds()
@@ -32,6 +32,8 @@ their email.
32
32
 
33
33
  - `~/.cache/fit/basecamp/apple_mail/{thread_id}.md` — one markdown file per
34
34
  email thread
35
+ - `~/.cache/fit/basecamp/apple_mail/attachments/{thread_id}/` — copied
36
+ attachment files for each thread (PDFs, images, documents, etc.)
35
37
  - `~/.cache/fit/basecamp/state/apple_mail_last_sync` — updated with new sync
36
38
  timestamp
37
39
 
@@ -42,16 +44,19 @@ their email.
42
44
  Run the sync as a single Python script. This avoids N+1 shell invocations and
43
45
  handles all data transformation in one pass:
44
46
 
45
- python3 scripts/sync.py
47
+ python3 scripts/sync.py [--days N]
48
+
49
+ - `--days N` — how many days back to look on first sync (default: 30)
46
50
 
47
51
  The script:
48
52
 
49
53
  1. Finds the Mail database (`~/Library/Mail/V*/MailData/Envelope Index`)
50
- 2. Loads last sync timestamp (or defaults to 30 days ago for first sync)
54
+ 2. Loads last sync timestamp (or defaults to `--days` days ago for first sync)
51
55
  3. Discovers the thread grouping column (`conversation_id` or `thread_id`)
52
56
  4. Finds threads with new messages since last sync (up to 500)
53
- 5. For each thread: fetches messages, batch-fetches recipients, parses `.emlx`
54
- files for full email bodies (falling back to database summaries)
57
+ 5. For each thread: fetches messages, batch-fetches recipients and attachment
58
+ metadata, parses `.emlx` files for full email bodies (falling back to
59
+ database summaries), copies attachment files to the output directory
55
60
  6. Writes one markdown file per thread to `~/.cache/fit/basecamp/apple_mail/`
56
61
  7. Updates sync state timestamp
57
62
  8. Reports summary (threads processed, files written)
@@ -95,6 +100,10 @@ Each `{thread_id}.md` file:
95
100
  **Cc:** ...
96
101
 
97
102
  {next_body}
103
+
104
+ **Attachments:**
105
+ - [report.pdf](attachments/{thread_id}/report.pdf)
106
+ - image001.png *(not available)*
98
107
  ```
99
108
 
100
109
  Rules:
@@ -120,7 +129,10 @@ Rules:
120
129
  - `.emlx` parse error → fall back to database summary field
121
130
  - HTML-only email → strip tags and use as plain text body (handled by
122
131
  parse-emlx.py)
123
- - `find` timeout → skip that message's body, use summary
132
+ - `find` timeout → skip that message's body, use summary; attachment index empty
133
+ - Attachment file not found on disk → listed as `*(not available)*` in markdown
134
+ - Attachment copy fails (permissions, disk full) → listed as `*(not available)*`
135
+ - Filename collision across messages → prefixed with `{message_id}_`
124
136
  - Always update sync state, even on partial success
125
137
 
126
138
  ## Constraints
@@ -30,11 +30,11 @@ apply Core Data conversion.
30
30
 
31
31
  ## addresses (sender and recipient addresses)
32
32
 
33
- | Column | Type | Notes |
34
- | --------- | ------- | ------------------------------------- |
35
- | `ROWID` | INTEGER | Primary key |
36
- | `address` | TEXT | Email address |
37
- | `comment` | TEXT | Display name (e.g., `"Olsson, Dick"`) |
33
+ | Column | Type | Notes |
34
+ | --------- | ------- | ------------------------------------ |
35
+ | `ROWID` | INTEGER | Primary key |
36
+ | `address` | TEXT | Email address |
37
+ | `comment` | TEXT | Display name (e.g., `"Chen, Sarah"`) |
38
38
 
39
39
  **IMPORTANT:** The display name is in `comment`, not a `name` or `display_name`
40
40
  column.
@@ -86,3 +86,30 @@ Use case-insensitive `LIKE` patterns to match both:
86
86
  - `%/Inbox%` (catches IMAP `/INBOX` and EWS `/Inbox`)
87
87
  - `%/INBOX%` (explicit uppercase match)
88
88
  - `%/Sent%` (catches `Sent Messages`, `Sent Items`, `Sent%20Items`)
89
+
90
+ ## attachments (email attachments)
91
+
92
+ | Column | Type | Notes |
93
+ | --------------- | ------- | --------------------------------------- |
94
+ | `ROWID` | INTEGER | Primary key |
95
+ | `message` | INTEGER | FK → messages.ROWID (ON DELETE CASCADE) |
96
+ | `attachment_id` | TEXT | Used as subdirectory name on disk |
97
+ | `name` | TEXT | Original filename (e.g., `report.pdf`) |
98
+
99
+ **Constraints:** `UNIQUE(message, attachment_id)` — each attachment within a
100
+ message has a unique identifier.
101
+
102
+ **IMPORTANT:** Column is `message` (not `message_id`), matching the convention
103
+ used by the `recipients` table.
104
+
105
+ ### Filesystem mapping
106
+
107
+ Attachment files on disk follow this path structure:
108
+
109
+ ```
110
+ ~/Library/Mail/V10/.../Attachments/{message_ROWID}/{attachment_id}/{filename}
111
+ ```
112
+
113
+ - `{message_ROWID}` — the `messages.ROWID` value (same as `attachments.message`)
114
+ - `{attachment_id}` — the `attachments.attachment_id` value
115
+ - `{filename}` — the actual file on disk (may differ from `attachments.name`)