lifeos 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.en.md +202 -0
- package/README.md +202 -0
- package/assets/lifeos-rules.en.md +162 -0
- package/assets/lifeos-rules.zh.md +162 -0
- package/assets/lifeos.yaml +56 -0
- package/assets/prompts/AI_LLMResearch_Prompt.en.md +120 -0
- package/assets/prompts/AI_LLMResearch_Prompt.zh.md +120 -0
- package/assets/prompts/Art_ChinesePainting_Prompt.en.md +147 -0
- package/assets/prompts/Art_ChinesePainting_Prompt.zh.md +148 -0
- package/assets/prompts/Cryptography_Prompt.en.md +148 -0
- package/assets/prompts/Cryptography_Prompt.zh.md +147 -0
- package/assets/prompts/History_ChineseCulture_Prompt.en.md +98 -0
- package/assets/prompts/History_ChineseCulture_Prompt.zh.md +98 -0
- package/assets/prompts/Math_HigherMathematics_Prompt.en.md +117 -0
- package/assets/prompts/Math_HigherMathematics_Prompt.zh.md +116 -0
- package/assets/schema/Frontmatter_Schema.md +139 -0
- package/assets/skills/_shared/completion-report.en.md +30 -0
- package/assets/skills/_shared/completion-report.zh.md +30 -0
- package/assets/skills/_shared/dual-agent-orchestrator.en.md +40 -0
- package/assets/skills/_shared/dual-agent-orchestrator.zh.md +40 -0
- package/assets/skills/_shared/learning-lifecycle.en.md +53 -0
- package/assets/skills/_shared/learning-lifecycle.zh.md +53 -0
- package/assets/skills/_shared/lifecycle.en.md +84 -0
- package/assets/skills/_shared/lifecycle.zh.md +84 -0
- package/assets/skills/_shared/memory-protocol.en.md +36 -0
- package/assets/skills/_shared/memory-protocol.zh.md +36 -0
- package/assets/skills/_shared/template-loading.en.md +26 -0
- package/assets/skills/_shared/template-loading.zh.md +26 -0
- package/assets/skills/archive/SKILL.en.md +300 -0
- package/assets/skills/archive/SKILL.zh.md +300 -0
- package/assets/skills/ask/SKILL.en.md +164 -0
- package/assets/skills/ask/SKILL.zh.md +164 -0
- package/assets/skills/brainstorm/SKILL.en.md +242 -0
- package/assets/skills/brainstorm/SKILL.zh.md +242 -0
- package/assets/skills/brainstorm/references/action-options.en.md +65 -0
- package/assets/skills/brainstorm/references/action-options.zh.md +65 -0
- package/assets/skills/knowledge/SKILL.en.md +202 -0
- package/assets/skills/knowledge/SKILL.zh.md +202 -0
- package/assets/skills/project/SKILL.en.md +133 -0
- package/assets/skills/project/SKILL.zh.md +133 -0
- package/assets/skills/project/references/execution-agent-prompt.en.md +148 -0
- package/assets/skills/project/references/execution-agent-prompt.zh.md +148 -0
- package/assets/skills/project/references/planning-agent-prompt.en.md +162 -0
- package/assets/skills/project/references/planning-agent-prompt.zh.md +162 -0
- package/assets/skills/read-pdf/SKILL.en.md +199 -0
- package/assets/skills/read-pdf/SKILL.zh.md +199 -0
- package/assets/skills/read-pdf/scripts/read_pdf.py +354 -0
- package/assets/skills/research/SKILL.en.md +107 -0
- package/assets/skills/research/SKILL.zh.md +107 -0
- package/assets/skills/research/references/execution-agent-prompt.en.md +166 -0
- package/assets/skills/research/references/execution-agent-prompt.zh.md +166 -0
- package/assets/skills/research/references/planning-agent-prompt.en.md +129 -0
- package/assets/skills/research/references/planning-agent-prompt.zh.md +129 -0
- package/assets/skills/revise/SKILL.en.md +258 -0
- package/assets/skills/revise/SKILL.zh.md +258 -0
- package/assets/skills/revise/references/grading-protocol.en.md +99 -0
- package/assets/skills/revise/references/grading-protocol.zh.md +99 -0
- package/assets/skills/today/SKILL.en.md +211 -0
- package/assets/skills/today/SKILL.zh.md +211 -0
- package/assets/templates/en/Daily_Template.md +25 -0
- package/assets/templates/en/Draft_Template.md +29 -0
- package/assets/templates/en/Knowledge_Template.md +86 -0
- package/assets/templates/en/Project_Template.md +110 -0
- package/assets/templates/en/Research_Template.md +46 -0
- package/assets/templates/en/Retrospective_Template.md +89 -0
- package/assets/templates/en/Revise_Template.md +24 -0
- package/assets/templates/en/Wiki_Template.md +35 -0
- package/assets/templates/zh/Daily_Template.md +26 -0
- package/assets/templates/zh/Draft_Template.md +29 -0
- package/assets/templates/zh/Knowledge_Template.md +86 -0
- package/assets/templates/zh/Project_Template.md +110 -0
- package/assets/templates/zh/Research_Template.md +46 -0
- package/assets/templates/zh/Retrospective_Template.md +89 -0
- package/assets/templates/zh/Revise_Template.md +24 -0
- package/assets/templates/zh/Wiki_Template.md +35 -0
- package/bin/lifeos.js +24 -0
- package/dist/active-docs/citations.d.ts +20 -0
- package/dist/active-docs/citations.js +74 -0
- package/dist/active-docs/citations.js.map +1 -0
- package/dist/active-docs/derived-memory.d.ts +57 -0
- package/dist/active-docs/derived-memory.js +161 -0
- package/dist/active-docs/derived-memory.js.map +1 -0
- package/dist/active-docs/index.d.ts +51 -0
- package/dist/active-docs/index.js +165 -0
- package/dist/active-docs/index.js.map +1 -0
- package/dist/active-docs/long-term-profile.d.ts +24 -0
- package/dist/active-docs/long-term-profile.js +75 -0
- package/dist/active-docs/long-term-profile.js.map +1 -0
- package/dist/active-docs/taskboard.d.ts +12 -0
- package/dist/active-docs/taskboard.js +146 -0
- package/dist/active-docs/taskboard.js.map +1 -0
- package/dist/active-docs/userprofile.d.ts +12 -0
- package/dist/active-docs/userprofile.js +169 -0
- package/dist/active-docs/userprofile.js.map +1 -0
- package/dist/cli/commands/doctor.d.ts +9 -0
- package/dist/cli/commands/doctor.js +125 -0
- package/dist/cli/commands/doctor.js.map +1 -0
- package/dist/cli/commands/init.d.ts +1 -0
- package/dist/cli/commands/init.js +129 -0
- package/dist/cli/commands/init.js.map +1 -0
- package/dist/cli/commands/rename.d.ts +7 -0
- package/dist/cli/commands/rename.js +188 -0
- package/dist/cli/commands/rename.js.map +1 -0
- package/dist/cli/commands/upgrade.d.ts +6 -0
- package/dist/cli/commands/upgrade.js +66 -0
- package/dist/cli/commands/upgrade.js.map +1 -0
- package/dist/cli/index.d.ts +1 -0
- package/dist/cli/index.js +52 -0
- package/dist/cli/index.js.map +1 -0
- package/dist/cli/utils/assets.d.ts +3 -0
- package/dist/cli/utils/assets.js +20 -0
- package/dist/cli/utils/assets.js.map +1 -0
- package/dist/cli/utils/install-assets.d.ts +39 -0
- package/dist/cli/utils/install-assets.js +141 -0
- package/dist/cli/utils/install-assets.js.map +1 -0
- package/dist/cli/utils/lang.d.ts +1 -0
- package/dist/cli/utils/lang.js +32 -0
- package/dist/cli/utils/lang.js.map +1 -0
- package/dist/cli/utils/managed-assets.d.ts +9 -0
- package/dist/cli/utils/managed-assets.js +20 -0
- package/dist/cli/utils/managed-assets.js.map +1 -0
- package/dist/cli/utils/mcp-register.d.ts +2 -0
- package/dist/cli/utils/mcp-register.js +132 -0
- package/dist/cli/utils/mcp-register.js.map +1 -0
- package/dist/cli/utils/sync-vault.d.ts +14 -0
- package/dist/cli/utils/sync-vault.js +132 -0
- package/dist/cli/utils/sync-vault.js.map +1 -0
- package/dist/cli/utils/ui.d.ts +14 -0
- package/dist/cli/utils/ui.js +78 -0
- package/dist/cli/utils/ui.js.map +1 -0
- package/dist/cli/utils/version.d.ts +1 -0
- package/dist/cli/utils/version.js +4 -0
- package/dist/cli/utils/version.js.map +1 -0
- package/dist/config.d.ts +127 -0
- package/dist/config.js +356 -0
- package/dist/config.js.map +1 -0
- package/dist/core.d.ts +106 -0
- package/dist/core.js +286 -0
- package/dist/core.js.map +1 -0
- package/dist/db/consolidation.d.ts +14 -0
- package/dist/db/consolidation.js +28 -0
- package/dist/db/consolidation.js.map +1 -0
- package/dist/db/index.d.ts +22 -0
- package/dist/db/index.js +39 -0
- package/dist/db/index.js.map +1 -0
- package/dist/db/schema.d.ts +7 -0
- package/dist/db/schema.js +175 -0
- package/dist/db/schema.js.map +1 -0
- package/dist/index.d.ts +3 -0
- package/dist/index.js +5 -0
- package/dist/index.js.map +1 -0
- package/dist/server.d.ts +6 -0
- package/dist/server.js +303 -0
- package/dist/server.js.map +1 -0
- package/dist/services/capture.d.ts +101 -0
- package/dist/services/capture.js +297 -0
- package/dist/services/capture.js.map +1 -0
- package/dist/services/enhance.d.ts +51 -0
- package/dist/services/enhance.js +184 -0
- package/dist/services/enhance.js.map +1 -0
- package/dist/services/layer0.d.ts +24 -0
- package/dist/services/layer0.js +90 -0
- package/dist/services/layer0.js.map +1 -0
- package/dist/services/maintenance.d.ts +27 -0
- package/dist/services/maintenance.js +73 -0
- package/dist/services/maintenance.js.map +1 -0
- package/dist/services/retrieval.d.ts +120 -0
- package/dist/services/retrieval.js +571 -0
- package/dist/services/retrieval.js.map +1 -0
- package/dist/services/startup.d.ts +28 -0
- package/dist/services/startup.js +112 -0
- package/dist/services/startup.js.map +1 -0
- package/dist/skill-context/ask-global.d.ts +8 -0
- package/dist/skill-context/ask-global.js +21 -0
- package/dist/skill-context/ask-global.js.map +1 -0
- package/dist/skill-context/base.d.ts +48 -0
- package/dist/skill-context/base.js +5 -0
- package/dist/skill-context/base.js.map +1 -0
- package/dist/skill-context/daily-global.d.ts +8 -0
- package/dist/skill-context/daily-global.js +25 -0
- package/dist/skill-context/daily-global.js.map +1 -0
- package/dist/skill-context/index.d.ts +32 -0
- package/dist/skill-context/index.js +171 -0
- package/dist/skill-context/index.js.map +1 -0
- package/dist/skill-context/knowledge-strict.d.ts +8 -0
- package/dist/skill-context/knowledge-strict.js +26 -0
- package/dist/skill-context/knowledge-strict.js.map +1 -0
- package/dist/skill-context/review-strict.d.ts +8 -0
- package/dist/skill-context/review-strict.js +26 -0
- package/dist/skill-context/review-strict.js.map +1 -0
- package/dist/skill-context/revise-strict.d.ts +8 -0
- package/dist/skill-context/revise-strict.js +26 -0
- package/dist/skill-context/revise-strict.js.map +1 -0
- package/dist/skill-context/seed-profiles.d.ts +21 -0
- package/dist/skill-context/seed-profiles.js +80 -0
- package/dist/skill-context/seed-profiles.js.map +1 -0
- package/dist/types.d.ts +165 -0
- package/dist/types.js +76 -0
- package/dist/types.js.map +1 -0
- package/dist/utils/context-policy.d.ts +57 -0
- package/dist/utils/context-policy.js +333 -0
- package/dist/utils/context-policy.js.map +1 -0
- package/dist/utils/scan-state.d.ts +41 -0
- package/dist/utils/scan-state.js +79 -0
- package/dist/utils/scan-state.js.map +1 -0
- package/dist/utils/segmenter.d.ts +19 -0
- package/dist/utils/segmenter.js +75 -0
- package/dist/utils/segmenter.js.map +1 -0
- package/dist/utils/shared.d.ts +103 -0
- package/dist/utils/shared.js +313 -0
- package/dist/utils/shared.js.map +1 -0
- package/dist/utils/vault-indexer.d.ts +53 -0
- package/dist/utils/vault-indexer.js +256 -0
- package/dist/utils/vault-indexer.js.map +1 -0
- package/package.json +59 -0
|
@@ -0,0 +1,354 @@
|
|
|
1
|
+
#!/usr/bin/env python3
|
|
2
|
+
"""读取 PDF 指定页码或章节,并输出结构化 JSON 中间结果。"""
|
|
3
|
+
|
|
4
|
+
from __future__ import annotations
|
|
5
|
+
|
|
6
|
+
import argparse
|
|
7
|
+
import json
|
|
8
|
+
import re
|
|
9
|
+
import sys
|
|
10
|
+
import tempfile
|
|
11
|
+
import unicodedata
|
|
12
|
+
from dataclasses import dataclass
|
|
13
|
+
from datetime import datetime
|
|
14
|
+
from pathlib import Path
|
|
15
|
+
from typing import Any, Dict, List, Optional, Sequence, Tuple
|
|
16
|
+
|
|
17
|
+
import fitz
|
|
18
|
+
|
|
19
|
+
|
|
20
|
+
MAX_DEFAULT_PAGES = 50
|
|
21
|
+
|
|
22
|
+
|
|
23
|
+
@dataclass
|
|
24
|
+
class ChapterMatch:
|
|
25
|
+
level: int
|
|
26
|
+
title: str
|
|
27
|
+
start_page: int
|
|
28
|
+
end_page: int
|
|
29
|
+
|
|
30
|
+
|
|
31
|
+
class ReadPdfError(Exception):
|
|
32
|
+
"""用于向 CLI 返回可读错误信息。"""
|
|
33
|
+
|
|
34
|
+
|
|
35
|
+
def parse_args() -> argparse.Namespace:
|
|
36
|
+
parser = argparse.ArgumentParser(
|
|
37
|
+
description="按页码范围或章节名提取 PDF 内容,并输出 JSON 文件。"
|
|
38
|
+
)
|
|
39
|
+
parser.add_argument("pdf_path", help="PDF 路径,支持 Vault 相对路径或绝对路径")
|
|
40
|
+
parser.add_argument(
|
|
41
|
+
"target",
|
|
42
|
+
nargs="?",
|
|
43
|
+
help="页码范围、单页、逗号列表,或章节名,例如 245-260 / 245 / 245,247-249 / 第3章",
|
|
44
|
+
)
|
|
45
|
+
parser.add_argument(
|
|
46
|
+
"--output",
|
|
47
|
+
help="输出 JSON 路径;默认写入 /tmp/read-pdf-时间戳.json",
|
|
48
|
+
)
|
|
49
|
+
parser.add_argument(
|
|
50
|
+
"--images-dir",
|
|
51
|
+
help="页面 PNG 输出目录;默认写入临时目录",
|
|
52
|
+
)
|
|
53
|
+
parser.add_argument(
|
|
54
|
+
"--dpi",
|
|
55
|
+
type=int,
|
|
56
|
+
default=300,
|
|
57
|
+
help="页面渲染 DPI,默认 300",
|
|
58
|
+
)
|
|
59
|
+
parser.add_argument(
|
|
60
|
+
"--skip-render",
|
|
61
|
+
action="store_true",
|
|
62
|
+
help="只提取文字,不渲染页面 PNG",
|
|
63
|
+
)
|
|
64
|
+
parser.add_argument(
|
|
65
|
+
"--list-toc",
|
|
66
|
+
action="store_true",
|
|
67
|
+
help="列出 PDF TOC 并退出",
|
|
68
|
+
)
|
|
69
|
+
parser.add_argument(
|
|
70
|
+
"--max-pages",
|
|
71
|
+
type=int,
|
|
72
|
+
default=MAX_DEFAULT_PAGES,
|
|
73
|
+
help=f"单次允许处理的最大页数,默认 {MAX_DEFAULT_PAGES}",
|
|
74
|
+
)
|
|
75
|
+
parser.add_argument(
|
|
76
|
+
"--force-large-range",
|
|
77
|
+
action="store_true",
|
|
78
|
+
help="允许处理超过 --max-pages 的范围",
|
|
79
|
+
)
|
|
80
|
+
return parser.parse_args()
|
|
81
|
+
|
|
82
|
+
|
|
83
|
+
def normalize_text(value: str) -> str:
|
|
84
|
+
normalized = unicodedata.normalize("NFKC", value).strip().lower()
|
|
85
|
+
normalized = re.sub(r"\s+", "", normalized)
|
|
86
|
+
return normalized
|
|
87
|
+
|
|
88
|
+
|
|
89
|
+
def resolve_pdf_path(raw_path: str, cwd: Path) -> Path:
|
|
90
|
+
candidate = Path(raw_path)
|
|
91
|
+
if not candidate.is_absolute():
|
|
92
|
+
candidate = (cwd / candidate).resolve()
|
|
93
|
+
if not candidate.exists():
|
|
94
|
+
raise ReadPdfError(f"找不到 PDF 文件:{raw_path}")
|
|
95
|
+
if candidate.suffix.lower() != ".pdf":
|
|
96
|
+
raise ReadPdfError(f"目标文件不是 PDF:{candidate}")
|
|
97
|
+
return candidate
|
|
98
|
+
|
|
99
|
+
|
|
100
|
+
def get_toc_entries(doc: fitz.Document) -> List[Tuple[int, str, int]]:
|
|
101
|
+
toc = doc.get_toc(simple=True)
|
|
102
|
+
return [(int(level), str(title), int(page)) for level, title, page in toc]
|
|
103
|
+
|
|
104
|
+
|
|
105
|
+
def dump_toc(doc: fitz.Document) -> None:
|
|
106
|
+
toc = [
|
|
107
|
+
{"level": level, "title": title, "page": page}
|
|
108
|
+
for level, title, page in get_toc_entries(doc)
|
|
109
|
+
]
|
|
110
|
+
print(json.dumps(toc, ensure_ascii=False, indent=2))
|
|
111
|
+
|
|
112
|
+
|
|
113
|
+
def parse_page_token(token: str, page_count: int) -> List[int]:
|
|
114
|
+
token = token.strip()
|
|
115
|
+
if not token:
|
|
116
|
+
return []
|
|
117
|
+
if "-" in token:
|
|
118
|
+
start_str, end_str = token.split("-", 1)
|
|
119
|
+
if not start_str.isdigit() or not end_str.isdigit():
|
|
120
|
+
raise ReadPdfError(f"非法页码范围:{token}")
|
|
121
|
+
start = int(start_str)
|
|
122
|
+
end = int(end_str)
|
|
123
|
+
if start > end:
|
|
124
|
+
raise ReadPdfError(f"页码范围起点大于终点:{token}")
|
|
125
|
+
return validate_pages(list(range(start, end + 1)), page_count)
|
|
126
|
+
if not token.isdigit():
|
|
127
|
+
raise ReadPdfError(f"非法页码:{token}")
|
|
128
|
+
return validate_pages([int(token)], page_count)
|
|
129
|
+
|
|
130
|
+
|
|
131
|
+
def validate_pages(pages: Sequence[int], page_count: int) -> List[int]:
|
|
132
|
+
invalid_pages = [page for page in pages if page < 1 or page > page_count]
|
|
133
|
+
if invalid_pages:
|
|
134
|
+
raise ReadPdfError(
|
|
135
|
+
f"页码超出范围:{invalid_pages}。PDF 总页数为 {page_count}。"
|
|
136
|
+
)
|
|
137
|
+
return list(pages)
|
|
138
|
+
|
|
139
|
+
|
|
140
|
+
def parse_page_spec(spec: str, page_count: int) -> Optional[List[int]]:
|
|
141
|
+
compact = spec.replace(" ", "")
|
|
142
|
+
if not compact or not re.fullmatch(r"[\d,\-]+", compact):
|
|
143
|
+
return None
|
|
144
|
+
pages: List[int] = []
|
|
145
|
+
for token in compact.split(","):
|
|
146
|
+
pages.extend(parse_page_token(token, page_count))
|
|
147
|
+
return sorted(set(pages))
|
|
148
|
+
|
|
149
|
+
|
|
150
|
+
def resolve_chapter(doc: fitz.Document, query: str) -> ChapterMatch:
|
|
151
|
+
toc_entries = get_toc_entries(doc)
|
|
152
|
+
if not toc_entries:
|
|
153
|
+
raise ReadPdfError("PDF 没有目录信息,无法按章节匹配。可改用页码范围。")
|
|
154
|
+
|
|
155
|
+
normalized_query = normalize_text(query)
|
|
156
|
+
exact_matches: List[Tuple[int, str, int, int]] = []
|
|
157
|
+
fuzzy_matches: List[Tuple[int, str, int, int]] = []
|
|
158
|
+
|
|
159
|
+
for index, (level, title, start_page) in enumerate(toc_entries):
|
|
160
|
+
normalized_title = normalize_text(title)
|
|
161
|
+
if not normalized_title:
|
|
162
|
+
continue
|
|
163
|
+
end_page = doc.page_count
|
|
164
|
+
for next_level, _next_title, next_page in toc_entries[index + 1 :]:
|
|
165
|
+
if next_level <= level:
|
|
166
|
+
end_page = next_page - 1
|
|
167
|
+
break
|
|
168
|
+
entry = (level, title, start_page, end_page)
|
|
169
|
+
if normalized_title == normalized_query:
|
|
170
|
+
exact_matches.append(entry)
|
|
171
|
+
elif normalized_query in normalized_title or normalized_title in normalized_query:
|
|
172
|
+
fuzzy_matches.append(entry)
|
|
173
|
+
|
|
174
|
+
matches = exact_matches or fuzzy_matches
|
|
175
|
+
if not matches:
|
|
176
|
+
preview = [
|
|
177
|
+
{"level": level, "title": title, "page": page}
|
|
178
|
+
for level, title, page in toc_entries[:20]
|
|
179
|
+
]
|
|
180
|
+
raise ReadPdfError(
|
|
181
|
+
"未找到匹配章节。你可以先用 --list-toc 查看目录,或参考这些条目:\n"
|
|
182
|
+
+ json.dumps(preview, ensure_ascii=False, indent=2)
|
|
183
|
+
)
|
|
184
|
+
if len(matches) > 1:
|
|
185
|
+
candidates = [
|
|
186
|
+
{"level": level, "title": title, "start_page": start_page, "end_page": end_page}
|
|
187
|
+
for level, title, start_page, end_page in matches[:10]
|
|
188
|
+
]
|
|
189
|
+
raise ReadPdfError(
|
|
190
|
+
"匹配到多个章节,请改用更精确的章节名:\n"
|
|
191
|
+
+ json.dumps(candidates, ensure_ascii=False, indent=2)
|
|
192
|
+
)
|
|
193
|
+
|
|
194
|
+
level, title, start_page, end_page = matches[0]
|
|
195
|
+
return ChapterMatch(level=level, title=title, start_page=start_page, end_page=end_page)
|
|
196
|
+
|
|
197
|
+
|
|
198
|
+
def render_pages(
|
|
199
|
+
doc: fitz.Document,
|
|
200
|
+
pages: Sequence[int],
|
|
201
|
+
dpi: int,
|
|
202
|
+
images_dir: Optional[Path],
|
|
203
|
+
) -> Tuple[Path, List[Dict[str, Any]]]:
|
|
204
|
+
target_dir = images_dir
|
|
205
|
+
if target_dir is None:
|
|
206
|
+
target_dir = Path(tempfile.mkdtemp(prefix="read-pdf-pages-"))
|
|
207
|
+
else:
|
|
208
|
+
target_dir.mkdir(parents=True, exist_ok=True)
|
|
209
|
+
|
|
210
|
+
images: List[Dict[str, Any]] = []
|
|
211
|
+
for page_number in pages:
|
|
212
|
+
page = doc[page_number - 1]
|
|
213
|
+
pix = page.get_pixmap(dpi=dpi)
|
|
214
|
+
image_path = target_dir / f"page_{page_number}.png"
|
|
215
|
+
pix.save(str(image_path))
|
|
216
|
+
images.append({"page": page_number, "path": str(image_path)})
|
|
217
|
+
return target_dir, images
|
|
218
|
+
|
|
219
|
+
|
|
220
|
+
def extract_text(doc: fitz.Document, pages: Sequence[int]) -> Tuple[Dict[str, str], List[int]]:
|
|
221
|
+
full_text: Dict[str, str] = {}
|
|
222
|
+
missing_text_pages: List[int] = []
|
|
223
|
+
for page_number in pages:
|
|
224
|
+
text = doc[page_number - 1].get_text("text")
|
|
225
|
+
full_text[str(page_number)] = text
|
|
226
|
+
if not text.strip():
|
|
227
|
+
missing_text_pages.append(page_number)
|
|
228
|
+
return full_text, missing_text_pages
|
|
229
|
+
|
|
230
|
+
|
|
231
|
+
def build_output_path(raw_output: Optional[str]) -> Path:
|
|
232
|
+
if raw_output:
|
|
233
|
+
output_path = Path(raw_output)
|
|
234
|
+
if not output_path.is_absolute():
|
|
235
|
+
output_path = (Path.cwd() / output_path).resolve()
|
|
236
|
+
else:
|
|
237
|
+
timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
|
|
238
|
+
output_path = Path(f"/tmp/read-pdf-{timestamp}.json")
|
|
239
|
+
output_path.parent.mkdir(parents=True, exist_ok=True)
|
|
240
|
+
return output_path
|
|
241
|
+
|
|
242
|
+
|
|
243
|
+
def build_result(
|
|
244
|
+
pdf_input: str,
|
|
245
|
+
resolved_pdf_path: Path,
|
|
246
|
+
pages: Sequence[int],
|
|
247
|
+
full_text: Dict[str, str],
|
|
248
|
+
images: Sequence[Dict[str, Any]],
|
|
249
|
+
missing_text_pages: Sequence[int],
|
|
250
|
+
target: str,
|
|
251
|
+
doc: fitz.Document,
|
|
252
|
+
chapter_match: Optional[ChapterMatch],
|
|
253
|
+
) -> Dict[str, Any]:
|
|
254
|
+
result: Dict[str, Any] = {
|
|
255
|
+
"source": pdf_input,
|
|
256
|
+
"resolved_path": str(resolved_pdf_path),
|
|
257
|
+
"target": target,
|
|
258
|
+
"page_count": doc.page_count,
|
|
259
|
+
"pages": list(pages),
|
|
260
|
+
"full_text": full_text,
|
|
261
|
+
"images": list(images),
|
|
262
|
+
"charts": [],
|
|
263
|
+
"formulas": [],
|
|
264
|
+
"tables": [],
|
|
265
|
+
"text_layer_missing_pages": list(missing_text_pages),
|
|
266
|
+
"generated_at": datetime.now().isoformat(timespec="seconds"),
|
|
267
|
+
}
|
|
268
|
+
if chapter_match:
|
|
269
|
+
result["mode"] = "chapter"
|
|
270
|
+
result["chapter"] = {
|
|
271
|
+
"level": chapter_match.level,
|
|
272
|
+
"title": chapter_match.title,
|
|
273
|
+
"start_page": chapter_match.start_page,
|
|
274
|
+
"end_page": chapter_match.end_page,
|
|
275
|
+
}
|
|
276
|
+
else:
|
|
277
|
+
result["mode"] = "pages"
|
|
278
|
+
return result
|
|
279
|
+
|
|
280
|
+
|
|
281
|
+
def ensure_page_limit(pages: Sequence[int], max_pages: int, force_large_range: bool) -> None:
|
|
282
|
+
if len(pages) <= max_pages or force_large_range:
|
|
283
|
+
return
|
|
284
|
+
raise ReadPdfError(
|
|
285
|
+
f"本次命中 {len(pages)} 页,超过限制 {max_pages} 页。"
|
|
286
|
+
"建议拆分批次,或显式传入 --force-large-range。"
|
|
287
|
+
)
|
|
288
|
+
|
|
289
|
+
|
|
290
|
+
def main() -> int:
|
|
291
|
+
args = parse_args()
|
|
292
|
+
if not args.target and not args.list_toc:
|
|
293
|
+
print("错误:缺少 target。请提供页码范围、单页、逗号列表,或章节名。", file=sys.stderr)
|
|
294
|
+
return 2
|
|
295
|
+
|
|
296
|
+
try:
|
|
297
|
+
resolved_pdf_path = resolve_pdf_path(args.pdf_path, Path.cwd())
|
|
298
|
+
with fitz.open(str(resolved_pdf_path)) as doc:
|
|
299
|
+
if args.list_toc:
|
|
300
|
+
dump_toc(doc)
|
|
301
|
+
return 0
|
|
302
|
+
|
|
303
|
+
page_spec = parse_page_spec(args.target, doc.page_count)
|
|
304
|
+
chapter_match: Optional[ChapterMatch] = None
|
|
305
|
+
if page_spec is None:
|
|
306
|
+
chapter_match = resolve_chapter(doc, args.target)
|
|
307
|
+
pages = list(range(chapter_match.start_page, chapter_match.end_page + 1))
|
|
308
|
+
else:
|
|
309
|
+
pages = page_spec
|
|
310
|
+
|
|
311
|
+
ensure_page_limit(pages, args.max_pages, args.force_large_range)
|
|
312
|
+
full_text, missing_text_pages = extract_text(doc, pages)
|
|
313
|
+
|
|
314
|
+
images: List[Dict[str, Any]] = []
|
|
315
|
+
if not args.skip_render:
|
|
316
|
+
images_dir = Path(args.images_dir).resolve() if args.images_dir else None
|
|
317
|
+
_, images = render_pages(doc, pages, args.dpi, images_dir)
|
|
318
|
+
|
|
319
|
+
result = build_result(
|
|
320
|
+
pdf_input=args.pdf_path,
|
|
321
|
+
resolved_pdf_path=resolved_pdf_path,
|
|
322
|
+
pages=pages,
|
|
323
|
+
full_text=full_text,
|
|
324
|
+
images=images,
|
|
325
|
+
missing_text_pages=missing_text_pages,
|
|
326
|
+
target=args.target,
|
|
327
|
+
doc=doc,
|
|
328
|
+
chapter_match=chapter_match,
|
|
329
|
+
)
|
|
330
|
+
|
|
331
|
+
output_path = build_output_path(args.output)
|
|
332
|
+
output_path.write_text(
|
|
333
|
+
json.dumps(result, ensure_ascii=False, indent=2),
|
|
334
|
+
encoding="utf-8",
|
|
335
|
+
)
|
|
336
|
+
|
|
337
|
+
print(f"已输出 JSON:{output_path}")
|
|
338
|
+
print(
|
|
339
|
+
"摘要:"
|
|
340
|
+
f"共处理 {len(result['pages'])} 页,"
|
|
341
|
+
f"渲染 {len(result['images'])} 张图片,"
|
|
342
|
+
f"缺少文字层页数 {len(result['text_layer_missing_pages'])}。"
|
|
343
|
+
)
|
|
344
|
+
return 0
|
|
345
|
+
except ReadPdfError as exc:
|
|
346
|
+
print(f"错误:{exc}", file=sys.stderr)
|
|
347
|
+
return 2
|
|
348
|
+
except Exception as exc: # pragma: no cover
|
|
349
|
+
print(f"未预期错误:{exc}", file=sys.stderr)
|
|
350
|
+
return 1
|
|
351
|
+
|
|
352
|
+
|
|
353
|
+
if __name__ == "__main__":
|
|
354
|
+
raise SystemExit(main())
|
|
@@ -0,0 +1,107 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: research
|
|
3
|
+
description: "Conduct deep research on a specified topic or draft, producing structured research reports to {research directory}/. Uses dual-Agent workflow: the Planning Agent scans local drafts, matches expert personas, and creates a `type: plan, status: active` research plan; the Execution Agent combines local drafts with WebSearch external sources to write the report and updates the plan to `status: done`. Supports topic mode (direct topic) and file mode (draft as anchor). Use this skill when the user wants to deeply understand a topic, needs systematic research, wants to expand a draft into a full report, or says '/research'."
|
|
4
|
+
version: 1.0.0
|
|
5
|
+
dependencies:
|
|
6
|
+
templates: []
|
|
7
|
+
prompts:
|
|
8
|
+
- path: "{system directory}/Prompts/"
|
|
9
|
+
scan: true
|
|
10
|
+
when: "Planning Agent matches expert persona by domain"
|
|
11
|
+
schemas:
|
|
12
|
+
- path: "{system directory}/{schema subdirectory}/Frontmatter_Schema.md"
|
|
13
|
+
agents:
|
|
14
|
+
- path: references/planning-agent-prompt.md
|
|
15
|
+
role: planning
|
|
16
|
+
- path: references/execution-agent-prompt.md
|
|
17
|
+
role: execution
|
|
18
|
+
---
|
|
19
|
+
> [!config]
|
|
20
|
+
> Path references in this skill use logical names (e.g., `{research directory}`).
|
|
21
|
+
> The Orchestrator resolves actual paths from `lifeos.yaml` and injects them into the context.
|
|
22
|
+
> Path mappings:
|
|
23
|
+
> - `{drafts directory}` → directories.drafts
|
|
24
|
+
> - `{diary directory}` → directories.diary
|
|
25
|
+
> - `{research directory}` → directories.research
|
|
26
|
+
> - `{plans directory}` → directories.plans
|
|
27
|
+
> - `{system directory}` → directories.system
|
|
28
|
+
> - `{templates subdirectory}` → subdirectories.system.templates
|
|
29
|
+
> - `{schema subdirectory}` → subdirectories.system.schema
|
|
30
|
+
> - `{archived plans subdirectory}` → subdirectories.system.archive.plans
|
|
31
|
+
|
|
32
|
+
You are LifeOS's deep research orchestrator, responsible for coordinating the Planning Agent and Execution Agent to complete systematic research. You ensure research has a clear scope, appropriate expert persona, fully leverages local drafts as first-hand sources, and combines external search to produce high-quality reports.
|
|
33
|
+
|
|
34
|
+
# Phase 0: Memory Pre-check (Required)
|
|
35
|
+
|
|
36
|
+
Follow `_shared/dual-agent-orchestrator.en.md` Phase 0, with entity type `filters.type = "research"`.
|
|
37
|
+
|
|
38
|
+
# Workflow Overview
|
|
39
|
+
|
|
40
|
+
| Phase | Actor | Responsibility |
|
|
41
|
+
| ------- | ------------------ | -------------------------------------------------------- |
|
|
42
|
+
| Phase 1 | Planning Agent | Scan local drafts, formulate research strategy, generate plan file |
|
|
43
|
+
| Phase 2 | Orchestrator (you) | Ask user clarification questions, wait for confirmation |
|
|
44
|
+
| Phase 3 | Execution Agent | Execute research per the plan, write report, and update the plan to `status: done` |
|
|
45
|
+
|
|
46
|
+
# Your Responsibilities as Orchestrator
|
|
47
|
+
|
|
48
|
+
Follow the standard orchestration flow in `_shared/dual-agent-orchestrator.en.md`. The following are additional responsibilities specific to the research skill:
|
|
49
|
+
|
|
50
|
+
- During Phase 2 (user review), you directly ask the user clarification questions in the conversation, write answers into the plan file, then prompt the user to review and confirm
|
|
51
|
+
|
|
52
|
+
# Input Context
|
|
53
|
+
|
|
54
|
+
| Trigger mode | Example | Description |
|
|
55
|
+
| ------------ | ------------------------------------------ | -------------------------------------------- |
|
|
56
|
+
| Topic mode | `/research React Server Components` | Topic-centric research, drafts as local supplement |
|
|
57
|
+
| File mode | `/research {drafts directory}/AI_Agent_Thoughts.md` | Specified draft as core anchor, expanding outward |
|
|
58
|
+
|
|
59
|
+
# Phase 1: Launch Planning Agent
|
|
60
|
+
|
|
61
|
+
Follow `_shared/dual-agent-orchestrator.en.md` Phase 1. Replace the placeholder `[user's input]` with the user's actual input.
|
|
62
|
+
|
|
63
|
+
After the Planning Agent returns, **directly** ask the user in the conversation:
|
|
64
|
+
|
|
65
|
+
```
|
|
66
|
+
I've created a research plan for "[Topic]" at: `[plan file path]`
|
|
67
|
+
|
|
68
|
+
Please answer the following questions, and I'll write them into the plan before starting execution:
|
|
69
|
+
|
|
70
|
+
1. What is your current familiarity with this topic? (Beginner / Intermediate / Advanced)
|
|
71
|
+
2. Do you prefer theoretical understanding or example-driven practice?
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
After receiving answers:
|
|
75
|
+
|
|
76
|
+
1. Write the answers into the "Clarification Question Answers" section of the plan file
|
|
77
|
+
2. If the Domain in the plan is TBD, additionally ask about the domain
|
|
78
|
+
3. Prompt the user to review the plan, wait for confirmation
|
|
79
|
+
|
|
80
|
+
# Phase 2: Launch Execution Agent (After User Confirmation)
|
|
81
|
+
|
|
82
|
+
Follow `_shared/dual-agent-orchestrator.en.md` Phase 3.
|
|
83
|
+
|
|
84
|
+
# Edge Cases
|
|
85
|
+
|
|
86
|
+
| Situation | Handling |
|
|
87
|
+
| ----------------------- | ----------------------------------------------------------- |
|
|
88
|
+
| Topic too broad | Planning Agent splits into subtopics and marks priority |
|
|
89
|
+
| Existing related research | Update the existing report, do not create a duplicate file |
|
|
90
|
+
| Specified draft doesn't exist | Prompt user to confirm path, or switch to TOPIC MODE |
|
|
91
|
+
| No related drafts | Proceed normally; "Core Insights from Drafts" section notes "No local drafts" |
|
|
92
|
+
| WebSearch returns nothing | Rely on local drafts, note limitations in the report |
|
|
93
|
+
| WebFetch fails | Mark in "References" as "(link inaccessible, for reference only)" |
|
|
94
|
+
|
|
95
|
+
# Follow-up Handling
|
|
96
|
+
|
|
97
|
+
When the user requests additions/modifications: edit the existing research report file directly, do not create duplicate files.
|
|
98
|
+
|
|
99
|
+
After execution, the plan file remains in `{plans directory}/` with status `done`, waiting for `/archive` to move it into `{archived plans subdirectory}`.
|
|
100
|
+
|
|
101
|
+
# Memory System Integration
|
|
102
|
+
|
|
103
|
+
> Common protocols (file change notification, skill completion, session wrap-up) are documented in `_shared/memory-protocol.md`. Only skill-specific queries and behaviors are listed below.
|
|
104
|
+
|
|
105
|
+
### Pre-query
|
|
106
|
+
|
|
107
|
+
See Phase 0 for query code.
|
|
@@ -0,0 +1,107 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: research
|
|
3
|
+
description: 对指定主题或草稿进行深度研究,产出结构化研究报告到 {研究目录}/。使用双 Agent 工作流:规划 Agent 扫描本地草稿、匹配专家人格、生成 `type: plan, status: active` 的研究计划;执行 Agent 结合本地草稿与 WebSearch 外部资料撰写报告,并将计划更新为 `status: done`。支持主题模式(直接给主题)和文件模式(以草稿为锚点展开)。当用户想深入了解某个主题、需要系统性调研、想把草稿扩展为完整报告、或说"/research"时使用此技能。
|
|
4
|
+
version: 1.0.0
|
|
5
|
+
dependencies:
|
|
6
|
+
templates: []
|
|
7
|
+
prompts:
|
|
8
|
+
- path: "{系统目录}/提示词/"
|
|
9
|
+
scan: true
|
|
10
|
+
when: "Planning Agent 按 domain 匹配专家人格"
|
|
11
|
+
schemas:
|
|
12
|
+
- path: "{系统目录}/{规范子目录}/Frontmatter_Schema.md"
|
|
13
|
+
agents:
|
|
14
|
+
- path: references/planning-agent-prompt.md
|
|
15
|
+
role: planning
|
|
16
|
+
- path: references/execution-agent-prompt.md
|
|
17
|
+
role: execution
|
|
18
|
+
---
|
|
19
|
+
> [!config]
|
|
20
|
+
> 本技能中的路径引用使用逻辑名(如 `{研究目录}`)。
|
|
21
|
+
> Orchestrator 从 `lifeos.yaml` 解析实际路径后注入上下文。
|
|
22
|
+
> 路径映射:
|
|
23
|
+
> - `{草稿目录}` → directories.drafts
|
|
24
|
+
> - `{日记目录}` → directories.diary
|
|
25
|
+
> - `{研究目录}` → directories.research
|
|
26
|
+
> - `{计划目录}` → directories.plans
|
|
27
|
+
> - `{系统目录}` → directories.system
|
|
28
|
+
> - `{模板子目录}` → subdirectories.system.templates
|
|
29
|
+
> - `{规范子目录}` → subdirectories.system.schema
|
|
30
|
+
> - `{归档计划子目录}` → subdirectories.system.archive.plans
|
|
31
|
+
|
|
32
|
+
你是 LifeOS 的深度研究编排者,负责协调规划 Agent 和执行 Agent 完成系统性研究。你确保研究有明确的范围、合适的专家人格、充分利用本地草稿作为第一手资料,并结合外部搜索产出高质量报告。
|
|
33
|
+
|
|
34
|
+
# 阶段0:记忆前置检查(必须)
|
|
35
|
+
|
|
36
|
+
按 `_shared/dual-agent-orchestrator.zh.md` 阶段0 执行,实体类型 `filters.type = “research”`。
|
|
37
|
+
|
|
38
|
+
# 工作流概述
|
|
39
|
+
|
|
40
|
+
| 阶段 | 执行者 | 职责 |
|
|
41
|
+
| ------- | ------------------ | ---------------------------------------- |
|
|
42
|
+
| Phase 1 | Planning Agent | 扫描本地草稿、制定研究策略、生成计划文件 |
|
|
43
|
+
| Phase 2 | Orchestrator(你) | 向用户提出澄清问题、等待确认 |
|
|
44
|
+
| Phase 3 | Execution Agent | 按计划执行研究、撰写报告,并将计划更新为 `status: done` |
|
|
45
|
+
|
|
46
|
+
# 你作为 Orchestrator 的职责
|
|
47
|
+
|
|
48
|
+
按 `_shared/dual-agent-orchestrator.zh.md` 的标准编排流程执行,以下为研究技能的额外职责:
|
|
49
|
+
|
|
50
|
+
- 阶段2(用户审核)中,你在对话中直接向用户提出澄清问题,收到回答后写入计划文件,再提示用户审核确认
|
|
51
|
+
|
|
52
|
+
# 输入上下文
|
|
53
|
+
|
|
54
|
+
| 触发方式 | 示例 | 说明 |
|
|
55
|
+
| -------- | ------------------------------------ | -------------------------------- |
|
|
56
|
+
| 主题模式 | `/research React Server Components` | 以主题为核心展开,草稿为本地补充 |
|
|
57
|
+
| 文件模式 | `/research {草稿目录}/AI_Agent_思考.md` | 以指定草稿为核心锚点,向外延伸 |
|
|
58
|
+
|
|
59
|
+
# 阶段1:启动 Planning Agent
|
|
60
|
+
|
|
61
|
+
按 `_shared/dual-agent-orchestrator.zh.md` 阶段1 执行。占位符 `[user's input]` 替换为用户实际输入。
|
|
62
|
+
|
|
63
|
+
Planning Agent 返回后,在**对话中直接**向用户提问:
|
|
64
|
+
|
|
65
|
+
```
|
|
66
|
+
我已为「[主题]」制定了研究计划,路径:`[plan file path]`
|
|
67
|
+
|
|
68
|
+
请回答以下问题,我将写入计划后开始执行:
|
|
69
|
+
|
|
70
|
+
1. 你目前对该主题的了解程度?(初级 / 中级 / 高级)
|
|
71
|
+
2. 你更偏向理论理解,还是示例驱动的实践?
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
收到回答后:
|
|
75
|
+
|
|
76
|
+
1. 将答案写入计划文件的「澄清问题回答」区块
|
|
77
|
+
2. 若计划中 Domain 为 TBD,额外追问领域
|
|
78
|
+
3. 提示用户审核计划,等待确认
|
|
79
|
+
|
|
80
|
+
# 阶段2:启动 Execution Agent(用户确认后)
|
|
81
|
+
|
|
82
|
+
按 `_shared/dual-agent-orchestrator.zh.md` 阶段3 执行。
|
|
83
|
+
|
|
84
|
+
# 边界情况
|
|
85
|
+
|
|
86
|
+
| 情况 | 处理 |
|
|
87
|
+
| ---------------- | ---------------------------------------------------- |
|
|
88
|
+
| Topic 过宽 | Planning Agent 拆为子主题并标注优先级 |
|
|
89
|
+
| 已有相关研究 | 更新现有报告,不新建重复文件 |
|
|
90
|
+
| 指定草稿不存在 | 提示用户确认路径,或改为 TOPIC MODE |
|
|
91
|
+
| 无相关草稿 | 正常执行,「来自草稿的核心洞察」区块注明"无本地草稿" |
|
|
92
|
+
| WebSearch 无结果 | 依赖本地草稿,报告中注明局限性 |
|
|
93
|
+
| WebFetch 失败 | 在「参考资源」标注"(链接无法访问,仅供参考)" |
|
|
94
|
+
|
|
95
|
+
# 后续处理
|
|
96
|
+
|
|
97
|
+
用户要求补充/修改时:直接修改现有研究报告文件,不创建重复文件。
|
|
98
|
+
|
|
99
|
+
计划文件在执行完成后保留于 `{计划目录}/` 且状态为 `done`,等待 `/archive` 统一归档至 `{归档计划子目录}`。
|
|
100
|
+
|
|
101
|
+
# 记忆系统集成
|
|
102
|
+
|
|
103
|
+
> 通用协议(文件变更通知、技能完成、会话收尾)见 `_shared/memory-protocol.md`。以下仅列出本技能特有的查询和行为。
|
|
104
|
+
|
|
105
|
+
### 前置查询
|
|
106
|
+
|
|
107
|
+
见阶段 0 中的查询代码。
|