@xiaotianxt/skills 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (120) hide show
  1. package/EXCLUDED.md +42 -0
  2. package/LICENSE +21 -0
  3. package/README.md +165 -0
  4. package/SECURITY.md +23 -0
  5. package/SOURCES.md +45 -0
  6. package/bin/skills.mjs +241 -0
  7. package/package.json +38 -0
  8. package/skills/1password/SKILL.md +94 -0
  9. package/skills/1password/agents/openai.yaml +4 -0
  10. package/skills/1password/references/item-management.md +80 -0
  11. package/skills/1password/references/op-cli.md +107 -0
  12. package/skills/apple-calendar-event/SKILL.md +81 -0
  13. package/skills/apple-calendar-event/agents/openai.yaml +4 -0
  14. package/skills/apple-calendar-event/scripts/calendar_audit.py +201 -0
  15. package/skills/apple-calendar-event/scripts/calendar_event.py +164 -0
  16. package/skills/bro-browser/SKILL.md +118 -0
  17. package/skills/bro-browser/agents/openai.yaml +4 -0
  18. package/skills/bro-browser/references/tool-map.md +102 -0
  19. package/skills/bro-browser/references/workflows.md +146 -0
  20. package/skills/bro-browser/scripts/bro-call.mjs +189 -0
  21. package/skills/calendar/SKILL.md +182 -0
  22. package/skills/calendar/agents/openai.yaml +4 -0
  23. package/skills/calendar/references/operations.md +255 -0
  24. package/skills/calendar/scripts/calendar_list_review.py +157 -0
  25. package/skills/calendar/scripts/event_dedupe_preview.py +155 -0
  26. package/skills/canvas/SKILL.md +70 -0
  27. package/skills/canvas/agents/openai.yaml +4 -0
  28. package/skills/canvas/references/canvas-api.md +76 -0
  29. package/skills/course-exam-review-planner/SKILL.md +127 -0
  30. package/skills/cx/SKILL.md +25 -0
  31. package/skills/gh-fix-ci/LICENSE.txt +201 -0
  32. package/skills/gh-fix-ci/SKILL.md +81 -0
  33. package/skills/gh-fix-ci/agents/openai.yaml +6 -0
  34. package/skills/gh-fix-ci/assets/github-small.svg +3 -0
  35. package/skills/gh-fix-ci/assets/github.png +0 -0
  36. package/skills/gh-fix-ci/scripts/inspect_pr_checks.py +509 -0
  37. package/skills/gh-review-workflow/SKILL.md +61 -0
  38. package/skills/gh-review-workflow/agents/openai.yaml +4 -0
  39. package/skills/gh-review-workflow/references/workflow.md +48 -0
  40. package/skills/gh-review-workflow/scripts/fetch_review_state.py +222 -0
  41. package/skills/gh-review-workflow/scripts/resolve_review_threads.py +83 -0
  42. package/skills/github/SKILL.md +74 -0
  43. package/skills/github/agents/openai.yaml +6 -0
  44. package/skills/github/assets/github-small.svg +3 -0
  45. package/skills/github/assets/github.png +0 -0
  46. package/skills/gws-calendar/SKILL.md +126 -0
  47. package/skills/gws-calendar-agenda/SKILL.md +52 -0
  48. package/skills/gws-calendar-insert/SKILL.md +66 -0
  49. package/skills/gws-docs/SKILL.md +48 -0
  50. package/skills/gws-docs-write/SKILL.md +49 -0
  51. package/skills/gws-drive/SKILL.md +137 -0
  52. package/skills/gws-drive-upload/SKILL.md +52 -0
  53. package/skills/gws-gmail/SKILL.md +62 -0
  54. package/skills/gws-gmail-forward/SKILL.md +55 -0
  55. package/skills/gws-gmail-reply/SKILL.md +58 -0
  56. package/skills/gws-gmail-reply-all/SKILL.md +62 -0
  57. package/skills/gws-gmail-send/SKILL.md +57 -0
  58. package/skills/gws-gmail-triage/SKILL.md +50 -0
  59. package/skills/gws-gmail-watch/SKILL.md +58 -0
  60. package/skills/gws-shared/SKILL.md +27 -0
  61. package/skills/helium-browser-mcp/SKILL.md +137 -0
  62. package/skills/helium-browser-mcp/agents/openai.yaml +4 -0
  63. package/skills/helium-browser-mcp/scripts/obmcp.mjs +92 -0
  64. package/skills/helium-browser-mcp/scripts/openbrowsermcp-stdio-proxy.mjs +170 -0
  65. package/skills/learn/SKILL.md +122 -0
  66. package/skills/learn/agents/openai.yaml +7 -0
  67. package/skills/learn/assets/AGENTS.template.md +33 -0
  68. package/skills/learn/assets/errorlog.template.typ +61 -0
  69. package/skills/learn/assets/reading-sequence.template.md +23 -0
  70. package/skills/learn/assets/source-index.template.md +17 -0
  71. package/skills/learn/assets/tasklog.template.typ +57 -0
  72. package/skills/learn/assets/workbook.template.typ +60 -0
  73. package/skills/learn/references/learning-science.md +103 -0
  74. package/skills/learn/scripts/init_learning_workspace.py +70 -0
  75. package/skills/macos-messages/SKILL.md +258 -0
  76. package/skills/memory/SKILL.md +33 -0
  77. package/skills/memory/codex.md +186 -0
  78. package/skills/memory/opencode.md +164 -0
  79. package/skills/mimestreamctl/SKILL.md +170 -0
  80. package/skills/mimestreamctl/agents/openai.yaml +4 -0
  81. package/skills/mimestreamctl/scripts/mimestreamctl +33 -0
  82. package/skills/mon/SKILL.md +51 -0
  83. package/skills/mon/scripts/mon_spend_review.py +458 -0
  84. package/skills/ocr/SKILL.md +136 -0
  85. package/skills/ocr/agents/openai.yaml +4 -0
  86. package/skills/ocr/references/local-ocr-best-practices.md +297 -0
  87. package/skills/ocr/references/mineru-api.md +159 -0
  88. package/skills/ocr/scripts/ocr-router +22 -0
  89. package/skills/ocr/scripts/ocr_router.py +741 -0
  90. package/skills/panopto-mp4-bulk-download/SKILL.md +57 -0
  91. package/skills/panopto-mp4-bulk-download/agents/openai.yaml +4 -0
  92. package/skills/panopto-mp4-bulk-download/references/url-patterns.md +26 -0
  93. package/skills/panopto-mp4-bulk-download/scripts/panopto_bulk_mp4.sh +213 -0
  94. package/skills/rust-systems-style/SKILL.md +109 -0
  95. package/skills/rust-systems-style/agents/openai.yaml +4 -0
  96. package/skills/rust-systems-style/references/rust-review-checklist.md +77 -0
  97. package/skills/rust-systems-style/references/style-sources.md +68 -0
  98. package/skills/ship-ai-native-cli/SKILL.md +76 -0
  99. package/skills/ship-ai-native-cli/agents/openai.yaml +4 -0
  100. package/skills/ship-ai-native-cli/references/case-notes.md +83 -0
  101. package/skills/ship-ai-native-cli/references/product-method.md +82 -0
  102. package/skills/ship-ai-native-cli/references/release-checklist.md +147 -0
  103. package/skills/ship-ai-native-cli/references/rust-cli-shape.md +111 -0
  104. package/skills/telegram-mtproto-session/SKILL.md +125 -0
  105. package/skills/telegram-mtproto-session/agents/openai.yaml +4 -0
  106. package/skills/telegram-mtproto-session/scripts/telegram_session.py +687 -0
  107. package/skills/tg/SKILL.md +173 -0
  108. package/skills/things3-manager/SKILL.md +116 -0
  109. package/skills/things3-manager/scripts/things +42 -0
  110. package/skills/things3-manager/scripts/things_cli.py +514 -0
  111. package/skills/web-artifacts-builder/LICENSE.txt +202 -0
  112. package/skills/web-artifacts-builder/SKILL.md +74 -0
  113. package/skills/web-artifacts-builder/scripts/bundle-artifact.sh +54 -0
  114. package/skills/web-artifacts-builder/scripts/init-artifact.sh +379 -0
  115. package/skills/web-artifacts-builder/scripts/shadcn-components.tar.gz +0 -0
  116. package/skills/yeet/LICENSE.txt +201 -0
  117. package/skills/yeet/SKILL.md +71 -0
  118. package/skills/yeet/agents/openai.yaml +6 -0
  119. package/skills/yeet/assets/yeet-small.svg +3 -0
  120. package/skills/yeet/assets/yeet.png +0 -0
@@ -0,0 +1,458 @@
1
+ #!/usr/bin/env python3
2
+ """Review Monarch transactions for shared-expense reimbursements.
3
+
4
+ This script is intentionally local-only: it reads JSON produced by `mon
5
+ transactions --json` and emits a compact spending review. It never reads tokens
6
+ or calls Monarch directly.
7
+ """
8
+
9
+ from __future__ import annotations
10
+
11
+ import argparse
12
+ import datetime as dt
13
+ import json
14
+ import re
15
+ import sys
16
+ from dataclasses import dataclass, field
17
+ from pathlib import Path
18
+ from typing import Any
19
+
20
+
21
+ SPEND_CATEGORIES = {
22
+ "Restaurants & Bars",
23
+ "Groceries",
24
+ "Entertainment & Recreation",
25
+ "Travel & Vacation",
26
+ "Shopping",
27
+ "Coffee Shops",
28
+ "Public Transit",
29
+ "Internet & Cable",
30
+ "Insurance",
31
+ "Office Supplies & Expenses",
32
+ "Miscellaneous",
33
+ "Cash & ATM",
34
+ }
35
+
36
+ NON_CONSUMPTION_OUTFLOW = {
37
+ "Transfer",
38
+ "Credit Card Payment",
39
+ "Rent",
40
+ }
41
+
42
+ NON_REIMBURSEMENT_INFLOW = {
43
+ "Paychecks",
44
+ "Interest",
45
+ "Credit Card Payment",
46
+ }
47
+
48
+ DEFAULT_OWN_NAMES = {
49
+ "yupei tian",
50
+ }
51
+
52
+ INFRA_NAMES = {
53
+ "carnegie mellon",
54
+ "chase",
55
+ "discover",
56
+ }
57
+
58
+
59
+ @dataclass
60
+ class Tx:
61
+ id: str
62
+ date: dt.date
63
+ amount: float
64
+ category: str
65
+ merchant: str
66
+ raw_name: str
67
+ account: str
68
+ pending: bool
69
+ hidden: bool
70
+
71
+ @classmethod
72
+ def from_monarch(cls, value: dict[str, Any]) -> "Tx":
73
+ category = (value.get("category") or {}).get("name") or ""
74
+ merchant = (value.get("merchant") or {}).get("name") or ""
75
+ raw_name = value.get("plaidName") or ""
76
+ account = (value.get("account") or {}).get("displayName") or ""
77
+ return cls(
78
+ id=str(value.get("id") or ""),
79
+ date=dt.date.fromisoformat(value["date"]),
80
+ amount=float(value["amount"]),
81
+ category=category,
82
+ merchant=merchant,
83
+ raw_name=raw_name,
84
+ account=account,
85
+ pending=bool(value.get("pending")),
86
+ hidden=bool(value.get("hideFromReports")),
87
+ )
88
+
89
+ @property
90
+ def name(self) -> str:
91
+ return self.merchant or self.raw_name
92
+
93
+ @property
94
+ def abs_amount(self) -> float:
95
+ return abs(self.amount)
96
+
97
+
98
+ @dataclass
99
+ class Event:
100
+ date: dt.date
101
+ expenses: list[Tx]
102
+ reimbursements: list[Tx] = field(default_factory=list)
103
+
104
+ @property
105
+ def gross(self) -> float:
106
+ return sum(tx.abs_amount for tx in self.expenses)
107
+
108
+ @property
109
+ def reimbursement_total(self) -> float:
110
+ return sum(tx.amount for tx in self.reimbursements)
111
+
112
+ @property
113
+ def net(self) -> float:
114
+ return self.gross - self.reimbursement_total
115
+
116
+ @property
117
+ def confidence(self) -> str:
118
+ if not self.reimbursements:
119
+ return "none"
120
+ ratio = self.reimbursement_total / self.gross if self.gross else 0
121
+ if 0.55 <= ratio <= 1.05:
122
+ return "high"
123
+ if 0.25 <= ratio < 0.55 or 1.05 < ratio <= 1.25:
124
+ return "medium"
125
+ return "low"
126
+
127
+
128
+ def money(value: float) -> str:
129
+ return f"${value:,.2f}"
130
+
131
+
132
+ def load_transactions(path: Path) -> list[Tx]:
133
+ data = json.loads(path.read_text())
134
+ values = data.get("allTransactions", {}).get("results", [])
135
+ txs = [Tx.from_monarch(value) for value in values]
136
+ seen: set[str] = set()
137
+ unique: list[Tx] = []
138
+ for tx in txs:
139
+ key = tx.id or f"{tx.date}:{tx.amount}:{tx.name}:{tx.account}"
140
+ if key in seen:
141
+ continue
142
+ seen.add(key)
143
+ unique.append(tx)
144
+ return sorted(unique, key=lambda tx: (tx.date, tx.amount, tx.name))
145
+
146
+
147
+ def compact_name(tx: Tx) -> str:
148
+ text = tx.raw_name or tx.merchant
149
+ match = re.search(r"zelle payment from\s+(.+?)(?:\s+[A-Z0-9]{8,}|\s+\d{8,}|$)", text, re.I)
150
+ if match:
151
+ return f"Zelle from {match.group(1).strip()}"
152
+ match = re.search(r"payment from\s+(.+)", text, re.I)
153
+ if match:
154
+ return f"Payment from {match.group(1).strip()}"
155
+ return tx.name
156
+
157
+
158
+ def text_has_any(text: str, needles: set[str]) -> bool:
159
+ low = text.lower()
160
+ return any(needle in low for needle in needles)
161
+
162
+
163
+ def is_visible_settled(tx: Tx, include_pending: bool) -> bool:
164
+ if tx.hidden:
165
+ return False
166
+ if tx.pending and not include_pending:
167
+ return False
168
+ return True
169
+
170
+
171
+ def is_consumption_outflow(tx: Tx, include_pending: bool) -> bool:
172
+ if not is_visible_settled(tx, include_pending) or tx.amount >= 0:
173
+ return False
174
+ if tx.category in NON_CONSUMPTION_OUTFLOW:
175
+ return False
176
+ return tx.category in SPEND_CATEGORIES or bool(tx.category)
177
+
178
+
179
+ def is_reimbursement_candidate(tx: Tx, include_pending: bool, own_names: set[str]) -> bool:
180
+ if not is_visible_settled(tx, include_pending) or tx.amount <= 0:
181
+ return False
182
+ if tx.category in NON_REIMBURSEMENT_INFLOW:
183
+ return False
184
+ full_text = f"{tx.merchant} {tx.raw_name}"
185
+ if text_has_any(full_text, own_names | INFRA_NAMES):
186
+ return False
187
+ if tx.category in {"Transfer", "Other Income", "Business Income", "Shopping"}:
188
+ return True
189
+ low = full_text.lower()
190
+ return "zelle payment from" in low or "payment from" in low
191
+
192
+
193
+ def is_merchant_credit(tx: Tx, include_pending: bool, own_names: set[str]) -> bool:
194
+ if not is_visible_settled(tx, include_pending) or tx.amount <= 0:
195
+ return False
196
+ if is_reimbursement_candidate(tx, include_pending, own_names):
197
+ return False
198
+ if tx.category in NON_REIMBURSEMENT_INFLOW:
199
+ return False
200
+ return tx.category in SPEND_CATEGORIES
201
+
202
+
203
+ def build_events(txs: list[Tx], min_anchor: float, include_pending: bool) -> list[Event]:
204
+ by_day: dict[dt.date, list[Tx]] = {}
205
+ for tx in txs:
206
+ if is_consumption_outflow(tx, include_pending):
207
+ by_day.setdefault(tx.date, []).append(tx)
208
+
209
+ events: list[Event] = []
210
+ for day, day_txs in sorted(by_day.items()):
211
+ anchors = [
212
+ tx
213
+ for tx in day_txs
214
+ if tx.category in {"Restaurants & Bars", "Groceries", "Entertainment & Recreation", "Travel & Vacation", "Shopping"}
215
+ and tx.abs_amount >= min_anchor
216
+ ]
217
+ if not anchors:
218
+ continue
219
+ events.append(Event(date=day, expenses=sorted(anchors, key=lambda tx: tx.amount)))
220
+ return events
221
+
222
+
223
+ def assign_reimbursements(events: list[Event], reimbursements: list[Tx], window_days: int) -> None:
224
+ for reimb in sorted(reimbursements, key=lambda tx: (tx.date, tx.amount)):
225
+ candidates = [
226
+ event
227
+ for event in events
228
+ if dt.timedelta(days=0) <= reimb.date - event.date <= dt.timedelta(days=window_days)
229
+ ]
230
+ if not candidates:
231
+ continue
232
+ candidates.sort(
233
+ key=lambda event: (
234
+ (reimb.date - event.date).days,
235
+ abs(event.net - reimb.amount),
236
+ -event.gross,
237
+ )
238
+ )
239
+ chosen = candidates[0]
240
+ if chosen.reimbursement_total + reimb.amount <= chosen.gross * 1.25:
241
+ chosen.reimbursements.append(reimb)
242
+
243
+
244
+ def summarize(txs: list[Tx], events: list[Event], reimbursements: list[Tx], merchant_credits: list[Tx], include_pending: bool) -> dict[str, Any]:
245
+ consumption = [tx for tx in txs if is_consumption_outflow(tx, include_pending)]
246
+ raw_spend = sum(tx.abs_amount for tx in consumption)
247
+ assigned_reimbursements = {tx.id for event in events for tx in event.reimbursements}
248
+ assigned_total = sum(tx.amount for tx in reimbursements if tx.id in assigned_reimbursements)
249
+ merchant_credit_total = sum(tx.amount for tx in merchant_credits)
250
+ adjusted = raw_spend - assigned_total
251
+ cash_adjusted = adjusted - merchant_credit_total
252
+
253
+ by_category: dict[str, float] = {}
254
+ for tx in consumption:
255
+ by_category[tx.category] = by_category.get(tx.category, 0.0) + tx.abs_amount
256
+
257
+ reclassification_rows: list[dict[str, Any]] = []
258
+ category_offsets: dict[str, float] = {}
259
+ for event in events:
260
+ for reimb in event.reimbursements:
261
+ for category, amount in allocate_reimbursement_to_categories(event, reimb.amount):
262
+ category_offsets[category] = category_offsets.get(category, 0.0) + amount
263
+ reclassification_rows.append(
264
+ {
265
+ "date": reimb.date.isoformat(),
266
+ "source": compact_name(reimb),
267
+ "originalCategory": reimb.category,
268
+ "assignedEventDate": event.date.isoformat(),
269
+ "assignedCategory": category,
270
+ "signedAmount": round(-amount, 2),
271
+ "eventGross": round(event.gross, 2),
272
+ "eventMerchants": [tx.name for tx in event.expenses],
273
+ }
274
+ )
275
+
276
+ category_after = {
277
+ category: amount - category_offsets.get(category, 0.0)
278
+ for category, amount in by_category.items()
279
+ }
280
+
281
+ unresolved_large = [
282
+ tx
283
+ for tx in consumption
284
+ if tx.abs_amount >= 80 and all(tx not in event.expenses for event in events if event.reimbursements)
285
+ ]
286
+
287
+ return {
288
+ "transactionCount": len(txs),
289
+ "rawConsumptionSpend": round(raw_spend, 2),
290
+ "assignedReimbursements": round(assigned_total, 2),
291
+ "merchantCreditTotal": round(merchant_credit_total, 2),
292
+ "adjustedConsumptionSpend": round(adjusted, 2),
293
+ "cashImpactAfterCredits": round(cash_adjusted, 2),
294
+ "categorySpend": [
295
+ {"category": category, "amount": round(amount, 2)}
296
+ for category, amount in sorted(by_category.items(), key=lambda item: -item[1])
297
+ ],
298
+ "categorySpendAfterReimbursements": [
299
+ {
300
+ "category": category,
301
+ "raw": round(by_category[category], 2),
302
+ "aaOffset": round(-category_offsets.get(category, 0.0), 2),
303
+ "adjusted": round(amount, 2),
304
+ }
305
+ for category, amount in sorted(category_after.items(), key=lambda item: -item[1])
306
+ ],
307
+ "reclassificationLedger": reclassification_rows,
308
+ "events": [
309
+ {
310
+ "date": event.date.isoformat(),
311
+ "gross": round(event.gross, 2),
312
+ "reimbursements": round(event.reimbursement_total, 2),
313
+ "net": round(event.net, 2),
314
+ "confidence": event.confidence,
315
+ "expenses": [
316
+ {
317
+ "amount": round(tx.abs_amount, 2),
318
+ "category": tx.category,
319
+ "merchant": tx.name,
320
+ }
321
+ for tx in event.expenses
322
+ ],
323
+ "matchedInflows": [
324
+ {
325
+ "date": tx.date.isoformat(),
326
+ "amount": round(tx.amount, 2),
327
+ "category": tx.category,
328
+ "source": compact_name(tx),
329
+ }
330
+ for tx in event.reimbursements
331
+ ],
332
+ }
333
+ for event in events
334
+ if event.reimbursements
335
+ ],
336
+ "unresolvedLargeOutflows": [
337
+ {
338
+ "date": tx.date.isoformat(),
339
+ "amount": round(tx.abs_amount, 2),
340
+ "category": tx.category,
341
+ "merchant": tx.name,
342
+ }
343
+ for tx in unresolved_large
344
+ ],
345
+ "merchantCredits": [
346
+ {
347
+ "date": tx.date.isoformat(),
348
+ "amount": round(tx.amount, 2),
349
+ "category": tx.category,
350
+ "merchant": tx.name,
351
+ "pending": tx.pending,
352
+ }
353
+ for tx in merchant_credits
354
+ ],
355
+ }
356
+
357
+
358
+ def allocate_reimbursement_to_categories(event: Event, amount: float) -> list[tuple[str, float]]:
359
+ by_category: dict[str, float] = {}
360
+ for tx in event.expenses:
361
+ by_category[tx.category] = by_category.get(tx.category, 0.0) + tx.abs_amount
362
+
363
+ if len(by_category) == 1:
364
+ category = next(iter(by_category))
365
+ return [(category, round(amount, 2))]
366
+
367
+ allocations: list[tuple[str, float]] = []
368
+ remaining = round(amount, 2)
369
+ categories = sorted(by_category.items(), key=lambda item: -item[1])
370
+ for index, (category, gross) in enumerate(categories):
371
+ if index == len(categories) - 1:
372
+ allocated = remaining
373
+ else:
374
+ allocated = round(amount * gross / event.gross, 2)
375
+ remaining = round(remaining - allocated, 2)
376
+ allocations.append((category, allocated))
377
+ return allocations
378
+
379
+
380
+ def render_markdown(summary: dict[str, Any]) -> str:
381
+ lines = [
382
+ "# Monarch Spend Review",
383
+ "",
384
+ f"- Transactions: {summary['transactionCount']}",
385
+ f"- Raw consumption spend: {money(summary['rawConsumptionSpend'])}",
386
+ f"- Matched reimbursements / AA: {money(summary['assignedReimbursements'])}",
387
+ f"- Adjusted consumption spend: {money(summary['adjustedConsumptionSpend'])}",
388
+ f"- Merchant credits / refunds, listed separately: {money(summary['merchantCreditTotal'])}",
389
+ f"- Cash impact after credits: {money(summary['cashImpactAfterCredits'])}",
390
+ "",
391
+ "## Matched Shared-Spend Events",
392
+ ]
393
+ for event in summary["events"]:
394
+ lines.append(
395
+ f"- {event['date']}: gross {money(event['gross'])}, reimbursed {money(event['reimbursements'])}, net {money(event['net'])}, confidence {event['confidence']}"
396
+ )
397
+ for tx in event["expenses"]:
398
+ lines.append(f" - expense: {money(tx['amount'])} {tx['category']} at {tx['merchant']}")
399
+ for tx in event["matchedInflows"]:
400
+ lines.append(f" - inflow: {money(tx['amount'])} on {tx['date']} from {tx['source']} ({tx['category']})")
401
+
402
+ lines.extend(["", "## Category Spend Before Reimbursements"])
403
+ for row in summary["categorySpend"]:
404
+ lines.append(f"- {row['category']}: {money(row['amount'])}")
405
+
406
+ lines.extend(["", "## Category Spend After AA Reclassification"])
407
+ for row in summary["categorySpendAfterReimbursements"]:
408
+ lines.append(
409
+ f"- {row['category']}: raw {money(row['raw'])}, AA offset {money(row['aaOffset'])}, adjusted {money(row['adjusted'])}"
410
+ )
411
+
412
+ lines.extend(["", "## Reclassification Ledger"])
413
+ for row in summary["reclassificationLedger"]:
414
+ merchants = ", ".join(row["eventMerchants"])
415
+ lines.append(
416
+ f"- {row['date']}: {money(row['signedAmount'])} from {row['source']} -> {row['assignedCategory']} for {row['assignedEventDate']} event ({merchants})"
417
+ )
418
+
419
+ lines.extend(["", "## Unresolved Large Outflows"])
420
+ for tx in summary["unresolvedLargeOutflows"]:
421
+ lines.append(f"- {tx['date']}: {money(tx['amount'])} {tx['category']} at {tx['merchant']}")
422
+
423
+ if summary["merchantCredits"]:
424
+ lines.extend(["", "## Merchant Credits / Refunds"])
425
+ for tx in summary["merchantCredits"]:
426
+ pending = " pending" if tx["pending"] else ""
427
+ lines.append(f"- {tx['date']}: {money(tx['amount'])} {tx['category']} from {tx['merchant']}{pending}")
428
+
429
+ return "\n".join(lines) + "\n"
430
+
431
+
432
+ def main() -> int:
433
+ parser = argparse.ArgumentParser(description="Review Monarch transaction JSON for shared-spend reimbursements.")
434
+ parser.add_argument("--input", required=True, type=Path, help="JSON file from `mon transactions --json`.")
435
+ parser.add_argument("--format", choices=["json", "markdown"], default="markdown")
436
+ parser.add_argument("--min-anchor", type=float, default=45.0, help="Minimum outflow amount to treat as a shared-spend anchor.")
437
+ parser.add_argument("--window-days", type=int, default=3, help="Days after an expense to match incoming reimbursements.")
438
+ parser.add_argument("--include-pending", action="store_true", help="Include pending transactions.")
439
+ parser.add_argument("--own-name", action="append", default=[], help="Name fragment to treat as own/internal transfer. Repeatable.")
440
+ args = parser.parse_args()
441
+
442
+ own_names = DEFAULT_OWN_NAMES | {value.lower() for value in args.own_name}
443
+ txs = load_transactions(args.input)
444
+ reimbursements = [tx for tx in txs if is_reimbursement_candidate(tx, args.include_pending, own_names)]
445
+ merchant_credits = [tx for tx in txs if is_merchant_credit(tx, args.include_pending, own_names)]
446
+ events = build_events(txs, args.min_anchor, args.include_pending)
447
+ assign_reimbursements(events, reimbursements, args.window_days)
448
+ summary = summarize(txs, events, reimbursements, merchant_credits, args.include_pending)
449
+
450
+ if args.format == "json":
451
+ print(json.dumps(summary, indent=2, sort_keys=True))
452
+ else:
453
+ sys.stdout.write(render_markdown(summary))
454
+ return 0
455
+
456
+
457
+ if __name__ == "__main__":
458
+ raise SystemExit(main())
@@ -0,0 +1,136 @@
1
+ ---
2
+ name: ocr
3
+ description: Extract text, Markdown, tables, formulas, and structured content from PDFs, scanned documents, screenshots, and images using the best available local or cloud OCR route. Use when Codex needs OCR, PDF text-layer extraction, MinerU local or official API parsing, VLM document parsing, table/formula extraction, scanned PDF handling, or when deciding whether a PDF should be read directly, OCRed, parsed by MinerU, or uploaded to a cloud API.
4
+ ---
5
+
6
+ # OCR
7
+
8
+ Use this skill to choose and run the right document extraction path instead of defaulting to OCR for every PDF.
9
+
10
+ ## Quick Start
11
+
12
+ Use the router first for PDFs and files where the best path is unclear:
13
+
14
+ ```bash
15
+ ocr-doc /path/to/file.pdf --profile-only
16
+ ```
17
+
18
+ The installed wrapper is:
19
+
20
+ ```bash
21
+ /Users/yupeit/bin/ocr-doc
22
+ ```
23
+
24
+ It runs:
25
+
26
+ ```bash
27
+ /Users/yupeit/dev/skills/skills/ocr/scripts/ocr-router
28
+ ```
29
+
30
+ The wrapper uses the skill-local virtualenv at `.venv/` when present. If the venv is missing, recreate it:
31
+
32
+ ```bash
33
+ python3 -m venv /Users/yupeit/dev/skills/skills/ocr/.venv
34
+ /Users/yupeit/dev/skills/skills/ocr/.venv/bin/python -m pip install requests pymupdf pyyaml
35
+ ```
36
+
37
+ ## Decision Tree
38
+
39
+ 1. For screenshots or single images, use the offline Apple Vision CLI:
40
+
41
+ ```bash
42
+ ocr /path/to/image.png
43
+ ocr capture
44
+ ocr fullscreen
45
+ ```
46
+
47
+ 2. For a PDF with a strong text layer and no need for formulas/tables/layout JSON, use native text extraction:
48
+
49
+ ```bash
50
+ ocr-doc file.pdf --engine native-text
51
+ ```
52
+
53
+ Use `--show-profile` when you want the profile printed and extraction to continue. Use `--profile-only` when you only want the recommendation.
54
+
55
+ 3. For PDFs where tables, formulas, layout, page markers, or image assets matter, prefer MinerU:
56
+
57
+ ```bash
58
+ ocr-doc file.pdf --engine mineru-local --require-structure --need-formulas --need-tables
59
+ ```
60
+
61
+ 4. For non-confidential documents where local MinerU is too slow or unavailable, use the official MinerU API only after explicit upload permission:
62
+
63
+ ```bash
64
+ ocr-doc file.pdf --engine mineru-api --allow-cloud --model-version vlm
65
+ ```
66
+
67
+ 5. For small non-confidential documents needing a quick agent-friendly Markdown result, use MinerU Agent API:
68
+
69
+ ```bash
70
+ ocr-doc file.pdf --engine mineru-agent --allow-cloud
71
+ ```
72
+
73
+ 6. For images/PDFs where semantic visual understanding is more important than deterministic layout, use Gemini VLM:
74
+
75
+ ```bash
76
+ ocr-doc file.pdf --engine gemini-vlm --allow-cloud
77
+ ```
78
+
79
+ ## Cloud Safety
80
+
81
+ Never upload confidential, private, school-restricted, client, credential-bearing, or unknown-sensitivity documents to cloud OCR.
82
+
83
+ The router refuses cloud upload unless either:
84
+
85
+ - The user interactively types `UPLOAD`.
86
+ - The caller passes `--allow-cloud`, which must only be used after the user explicitly allows cloud upload for that document.
87
+
88
+ Use `--no-cloud` when confidentiality is unknown:
89
+
90
+ ```bash
91
+ ocr-doc file.pdf --no-cloud
92
+ ```
93
+
94
+ MinerU official API credentials are read from `MINERU_API_TOKEN` / `MINERU_TOKEN`, then Keychain service `codex.mineru`, account `credential`. Never print the token.
95
+
96
+ ## MinerU Local Lessons
97
+
98
+ For long technical books with a real PDF text layer, do not run full OCR blindly. Use MinerU `pipeline + txt` as the base when formulas/tables matter:
99
+
100
+ ```bash
101
+ uvx 'mineru[all]' -p file.pdf -o out -b pipeline -m txt -l en -f true -t true --image-analysis false
102
+ ```
103
+
104
+ For full textbooks or long technical PDFs, do not use a single `ocr-doc --engine mineru-local` whole-book run as the default. Use chunked local MinerU scripts or an equivalent chunked workflow:
105
+
106
+ ```bash
107
+ python3 /Users/yupeit/dev/learn/quant/scripts/run_mineru_chunks.py \
108
+ --pdf file.pdf \
109
+ --output-dir out_chunks \
110
+ --page-count PAGE_COUNT \
111
+ --chunk-size 64 \
112
+ --backend pipeline \
113
+ --method txt \
114
+ --lang en \
115
+ --formula \
116
+ --table \
117
+ --no-image-analysis \
118
+ --timeout-seconds 86400
119
+ ```
120
+
121
+ Then merge `*_content_list.json` chunks with the existing merge script. This is aligned with the prior John Hull textbook workflow: `pipeline + txt` for the whole book, then `vlm-auto-engine` only on selected formula-heavy pages for overlay.
122
+
123
+ For cloud MinerU on a non-confidential born-digital textbook, override the generic cloud defaults: start with `--model-version pipeline --no-is-ocr --enable-formula --enable-table`, then compare `vlm` on selected difficult pages. Do not default a full textbook to `vlm + OCR` without a cost/quality reason.
124
+
125
+ For large documents, chunk the run and keep logs; local MinerU may wait for the final result before writing the user-facing output. A previous 880-page technical book worked best with 64-page chunks and long timeouts.
126
+
127
+ For map-like or diagram-heavy pages, MinerU may output only an image reference. If visible labels are the goal, compare native text-layer extraction and Apple Vision OCR.
128
+
129
+ Apple Vision can fail inside a restricted Codex sandbox with a Foundation/Vision error. In that case rerun through the approved local entrypoint `/Users/yupeit/bin/ocr-doc` or `/Users/yupeit/bin/ocr` outside the sandbox.
130
+
131
+ Read [references/local-ocr-best-practices.md](references/local-ocr-best-practices.md) before doing long or quality-sensitive extraction.
132
+
133
+ ## References
134
+
135
+ - Read [references/mineru-api.md](references/mineru-api.md) before changing MinerU official/agent API calls.
136
+ - Read [references/local-ocr-best-practices.md](references/local-ocr-best-practices.md) for local tool choices, PDF shape heuristics, and previous MinerU findings.
@@ -0,0 +1,4 @@
1
+ interface:
2
+ display_name: "OCR"
3
+ short_description: "Local and MinerU document OCR routing"
4
+ default_prompt: "Use $ocr to extract text or Markdown from this PDF using the best local or MinerU workflow."