npm - create-ccc-tutor - Versions diffs - 0.1.0 → 0.2.0 - Mend

create-ccc-tutor 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

package/package.json +1 -1
package/template/.claude/commands/exam.md +13 -0
package/template/.claude/commands/slide.md +24 -5
package/template/.claude-plugin/plugin.json +13 -26
package/template/.codex/skills/exam/SKILL.md +13 -0
package/template/.codex/skills/slide/SKILL.md +27 -3
package/template/.harness/scripts/pdf-rag.sh +40 -0
package/template/.harness/scripts/pdf_rag.py +485 -0
package/template/.harness/scripts/requirements-pdf.txt +6 -0
package/template/.harness/scripts/tests/test_pdf_rag.py +228 -0
package/template/.harness/state/install.json +1 -1
package/template/constitution.md +1 -1
package/template/course/README.md +1 -1
package/template/docs/features/pdf-vision-implementation.md +109 -0
package/template/docs/features/pdf-vision.md +226 -0
package/template/docs/features/slide-query-implementation.md +2 -2
package/template/docs/features/slide-query.md +2 -0
package/template/gitignore +4 -0

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "create-ccc-tutor",
-  "version": "0.1.0",
+  "version": "0.2.0",
   "description": "一行命令装好的多科目 AI 复习助手（基于 CCC-MAGI）：slide 查课件、exam 解题目，严格依据课件、标出处、不编造。",
   "bin": {
     "create-ccc-tutor": "bin/cli.js"

package/template/.claude/commands/exam.md CHANGED Viewed

@@ -42,6 +42,19 @@ $ARGUMENTS
 - 定位不到（题号超出、描述对不上）→ 告诉用户在该文件里没找到对应题目，并简要列出文件里实际有哪些题，请用户确认。
 - **先把题目原文复述出来**给用户核对，再开始解答（确保找对了题）。
+# 第二步半：用检索引擎找解题依据（课件）
+解题依据从 slide 课件找时，用 `pdf-vision` 引擎检索定位，不要逐个翻 PDF：
+```bash
+.harness/scripts/pdf-rag.sh query --subject <科目> -q "<这道题考的知识点/方法>" -k 5
+```
+- 对命中页用 Read 读（Claude 原生看图）；命中页 `visual_flag=true` 或题目涉及图/表时务必看图。出处用 `source_file_exact` + 页码。
+- `miss=true` → 该步课件没有依据，按解题铁律 2 标注「课件没有直接依据」。
+- `mode=keyword` / 引擎失败 → 告知「语义检索未启用」或回退直接 Read；**看图能力不可用而必须看图的步骤**，明说不可答、不要纯文字冒充。
+- 看图取值：图上标注数字=精确；按刻度=估算（标「非精确」）；看不清=只给相对大小；图未读出=标注并退回文字；文图冲突=两列、各标出处、不替课件裁决。
 # 解题铁律（不可违反）
 1. **先依据课件解。** 解题方法、公式、判定标准优先引用同科目 slide 课件，能引的步骤标出处 `(Lecture N, 第M页)`。

package/template/.claude/commands/slide.md CHANGED Viewed

@@ -18,7 +18,7 @@ $ARGUMENTS
 1. 用 Bash 列出有哪些科目：`ls -d course/*/ 2>/dev/null`。
 2. 当前会话科目记在 `.harness/state/current-subject.txt`（可能不存在）。先读它。
 3. 决定本次用哪个科目：
-   - **用户这次输入里点名了某个科目**（文字里出现某个科目文件夹名，如 `course_code(example)`、`course_code(example)`，或明显对应的说法）→ 用它，并把它写进 `.harness/state/current-subject.txt`（覆盖）。
+   - **用户这次输入里点名了某个科目**（文字里出现某个科目文件夹名，如 `ECON-10005`、`COMP-5990`，或明显对应的说法）→ 用它，并把它写进 `.harness/state/current-subject.txt`（覆盖）。
    - **没点名、但 state 文件里有当前科目** → 沿用该科目，不必再问。
    - **没点名、state 也没有**：
      - 如果 `course/` 下**只有一个科目** → 直接用它，并写入 state，回答开头一句话说明"当前科目：X"。
@@ -51,10 +51,29 @@ $ARGUMENTS
 # 第三步：找答案
-1. 根据问题主题，判断最可能相关的是哪一讲（看文件名/主题）。
-2. 用 Read 工具读相关 PDF。PDF 较大、页数多，**必须用 `pages` 参数分段读**（一次最多 20 页），按需翻页找到相关内容。
-3. 如果一讲里没找到，再扩大到主题相邻的其他讲。
-4. 记下答案所在的**确切文件名和页码**，用于标注出处。
+本项目用 `pdf-vision` 引擎做「快速检索 + 看图」：**先检索定位，再读页**，不要再逐个翻所有 PDF。
+1. 用 Bash 调检索（首次会建档；可能联网下载一次嵌入模型、稍等；之后很快）：
+   ```bash
+   .harness/scripts/pdf-rag.sh query --subject <科目> -q "<用户的问题>" -k 5
+   ```
+   输出 JSON：`results` 是命中页 `[{source_file_exact, lecture, page, score, visual_flag, png_path, text_path}]`；`miss=true` 表示没命中；`mode` 为 `semantic`（语义）或 `keyword`（降级）。
+2. 对每个命中页：用 Read 工具读该页（按 `source_file_exact` + `page`，可用 `pages` 参数）。**Claude 的 Read 对 PDF 是视觉渲染——你能直接看到该页的图、表、图表。** 命中页 `visual_flag=true` 或问题涉及图/表时，务必看图，不只看文字。
+3. `miss=true` → 按第四步「情况 E」：先说课件里没有，再问用户要不要外部补充；**绝不**把低相关结果当权威答案。
+4. 标出处用 `source_file_exact`（确切文件名）+ 页码。
+**降级与诚实（铁律延伸）：**
+- `mode=keyword`（语义检索未启用，常因首次模型下载失败/离线）→ 文字类问题照答，但要告诉用户「语义检索未启用、本次用关键词查找」。
+- `pdf-rag.sh` 整体失败（退出码非 0）→ 回退到直接用 Read 读相关 PDF（旧路径），照常标出处。
+- **看图能力不可用、而问题必须看图才能答**（如只能从图读的数值）→ 明确说「这个需要看图、当前看图不可用」，**不要**用纯文字答案冒充答了图。
+**看图取值规则（命中忠实性）：**
+- 图上直接标注的数字 → 照读，标「精确」。
+- 只能按坐标轴/刻度估算 → 标「估算、非精确读数」。
+- 看不清/被遮挡 → 标「看不清」，只给相对大小（更高/更低/约等）。
+- 图未读出 → 标「该页图未读出」，退回只用文字。
+- 文字层与图内容冲突 → 两个都列、各标出处、指出不一致，不替课件裁决。
+- 每条视觉结论绑定到具体页（`Lecture N, 第M页`）。
 # 第四步：组织回答

package/template/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,41 +1,28 @@
 {
-  "name": "ccc-magi",
-  "version": "0.10.2",
-  "description": "Generic AI development harness with cross-model audit, plain-language spec, persistent memory, EARS notation, three-section constitution, locale-aware UX, MAGI 7-position AI team (Core / Verdict / Planner / Programmer / Tester / Reviewer / Archivist), Simple+Pro onboarding modes (5 vs 16 questions), and feature-aware session resume with checkpoint state restoration. Drop-in for any codebase (greenfield or brownfield).",
+  "name": "ccc-tutor",
+  "version": "0.1.0",
+  "description": "多科目 AI 复习助手（基于 CCC-MAGI harness）：slide 依据课件查知识点、exam 解题目，严格依据课件、标出处、不编造。Multi-subject AI study assistant: slide answers strictly from course materials with citations, exam locates and solves problems from the exam folder.",
   "author": {
-    "name": "Eric Cheng",
-    "email": "Haizhou0807@gmail.com"
+    "name": "Ericcccccc777"
   },
-  "homepage": "https://github.com/Ericcccccc777/CCC-MAGI",
+  "homepage": "https://github.com/Ericcccccc777/CCC-tutor",
   "repository": {
     "type": "git",
-    "url": "https://github.com/Ericcccccc777/CCC-MAGI.git"
+    "url": "https://github.com/Ericcccccc777/CCC-tutor.git"
   },
   "license": "Apache-2.0",
   "keywords": [
-    "ai-harness",
+    "ccc-tutor",
+    "ai-tutor",
     "claude-code",
     "codex",
-    "spec-driven-development",
-    "cross-model-audit",
-    "ccc",
-    "ccc-magi",
-    "constitution-driven",
-    "persistent-memory",
-    "magi-system",
-    "magi-verdict",
-    "magi-archivist",
-    "session-resume",
-    "workflow-checkpoint",
-    "decision-log",
-    "simple-onboarding",
-    "team-collaboration",
-    "ears-notation",
-    "i18n-aware",
-    "tier-1-tested-claude-codex"
+    "slide",
+    "exam",
+    "course-review",
+    "ccc-magi"
   ],
   "engines": {
     "claude-code": ">=2.0.0"
   },
-  "_note": "This manifest exists so CCC-MAGI can be submitted to the Anthropic claude-community marketplace. The full CCC-MAGI install (constitution.md, .harness/state/, slot rendering) requires the install-into.sh or npx create-ccc-magi path — plugin-only install would ship just the skills + shims, not the project-level harness."
+  "_note": "Plugin-only install ships skills + commands; the full tutor (course/ structure, pre-filled constitution + install.json) comes from `npx create-ccc-tutor`."
 }

package/template/.codex/skills/exam/SKILL.md CHANGED Viewed

@@ -44,6 +44,19 @@ pdftotext -layout "course/<科目>/exam/<文件名>.pdf" - 2>/dev/null \
 - 按用户说的"第几题 / 哪道题 / 描述"定位到具体题目；定位不到就告知没找到，列出文件里实际有哪些题请用户确认。
 - **先把题目原文复述出来**给用户核对，再开始解答。
+# 第二步半：用检索引擎找解题依据（课件，pdf-vision）
+解题依据从 slide 课件找时，用引擎检索定位，不要逐个翻 PDF：
+```bash
+.harness/scripts/pdf-rag.sh query --subject <科目> -q "<这道题考的知识点/方法>" -k 5
+```
+- 读命中页文字 `cat "<text_path>"`；命中页 `visual_flag=true` 或题目涉及图/表时，用 `codex exec -i "<png_path>" "只描述图里看得见的，不猜"` 看图。出处用 `source_file_exact` + 页码。
+- `miss=true` → 该步课件无依据，按解题铁律 2 标「课件没有直接依据」。
+- `mode=keyword` / 引擎失败（退出码非 0）→ 告知「语义检索未启用」或回退 `pdftotext`；**看图不可用而必须看图的步骤**明说不可答、不纯文字冒充。
+- 看图取值：标注数字=精确；按刻度=估算（标「非精确」）；看不清=只给相对大小；图未读出=标注退回文字；文图冲突=两列各标出处、不替课件裁决。
 # 解题铁律（不可违反，与 Claude 版一致）
 1. **先依据课件解。** 方法/公式/判定标准优先引用同科目 slide 课件，能引的步骤标 `(Lecture N, 第M页)`。

package/template/.codex/skills/slide/SKILL.md CHANGED Viewed

@@ -18,7 +18,7 @@ metadata:
 1. 列出科目：`ls -d course/*/ 2>/dev/null`。
 2. 读当前会话科目：`cat .harness/state/current-subject.txt 2>/dev/null`（可能不存在）。
 3. 决定科目：
-   - 用户这次点名了科目（出现某个科目文件夹名，如 `course_code(example)`、`course_code(example)`）→ 用它，并写入：`printf '%s' '<科目>' > .harness/state/current-subject.txt`。
+   - 用户这次点名了科目（出现某个科目文件夹名，如 `ECON-10005`、`COMP-5990`）→ 用它，并写入：`printf '%s' '<科目>' > .harness/state/current-subject.txt`。
    - 没点名、state 有 → 沿用。
    - 没点名、state 没有：只有一个科目 → 直接用并写入 state（开头说明"当前科目：X"）；多个科目 → **停下来问用户**是哪个科目（列出可选），等回答。不要猜。
    - 用户说换科目 → 更新 state。
@@ -39,9 +39,33 @@ metadata:
 - 除了科目外没有具体问题（只触发了技能 / 只说了科目名）→ 请用户说清楚要问什么，然后停止。
 - 问题太宽泛/模糊/依赖上文 → **不要猜**，请用户把问题说完整；可附明确标注的相关内容，但要写"这是相关内容、非直接回答"。
-# 第二步：读取课件（用 shell 提取 PDF 文字）
+# 第二步：用检索引擎定位 + 看图（pdf-vision）
-Codex 没有内置 PDF 渲染，请用 shell 把 PDF 转文字再读：
+本项目用 `pdf-vision` 引擎做「快速检索 + 看图」，先检索定位再读页：
+```bash
+ls "course/<科目>/slide/"     # 为空 → 提示用户先放 PDF 并停止
+.harness/scripts/pdf-rag.sh query --subject <科目> -q "<用户的问题>" -k 5
+```
+输出 JSON：`results`=命中页 `[{source_file_exact, lecture, page, score, visual_flag, png_path, text_path}]`；`miss=true`=没命中；`mode`=`semantic`/`keyword`。
+1. 读命中页文字：`cat "<text_path>"`（引擎已抽好、含扫描页 OCR 文字，免重抽）。
+2. **看图（Codex 关键）**：命中页 `visual_flag=true` 或问题涉及图/表时，用 `png_path` 让 Codex 看图：
+   ```bash
+   codex exec -i "<png_path>" "只描述这页图/表里看得见的：标注数字、坐标轴、图例、趋势；看不清就说看不清，不要猜。"
+   ```
+   把描述并入回答，套下面取值规则。
+3. `miss=true` → 走情况 E（先说没有、再问用户要不要外部补充），绝不拿低相关结果当权威答案。
+4. 出处用 `source_file_exact` + 页码。
+**降级与诚实：** `mode=keyword` → 文字问题照答，告知「语义检索未启用、用关键词查找」；`pdf-rag.sh` 失败（退出码非 0）→ 回退到下面的 shell 抽文字；看图不可用（`png_path` 空 / `codex exec -i` 失败）而问题必须看图 → 明说「需要看图、当前看图不可用、无法从课件作答」，不要纯文字冒充。
+**看图取值规则：** 图上标注数字=精确；按刻度=估算（标「非精确」）；看不清=只给相对大小；图未读出=标注退回文字；文图冲突=两列各标出处、不替课件裁决；每条视觉结论绑定具体页。
+---
+**后备路径**（仅当 `pdf-rag.sh` 不可用时）——用 shell 把 PDF 转文字再读：
 ```bash
 ls "course/<科目>/slide/"     # 看有哪些课件；为空则提示用户先放 PDF 并停止

package/template/.harness/scripts/pdf-rag.sh ADDED Viewed

@@ -0,0 +1,40 @@
+#!/usr/bin/env bash
+# pdf-rag.sh — pdf_rag.py 的依赖自举包装。Claude/Codex 技能统一调它。
+#
+#   .harness/scripts/pdf-rag.sh index  --subject ECON-10005
+#   .harness/scripts/pdf-rag.sh query  --subject ECON-10005 -q "问题" -k 5
+#   .harness/scripts/pdf-rag.sh render --pdf "course/.../x.pdf" --page 9 --subject ECON-10005
+#
+# 隔离 venv 在 .harness/.venv-pdf（gitignored）。首次会装 pymupdf/fastembed/numpy/pypdf，
+# 并触发多语言嵌入模型首次下载（约 400–470MB）。装不上时退出码 3 + JSON 错误，
+# 技能据此降级（只用既有 pdftotext 文字路径）。
+set -euo pipefail
+HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+ROOT="$(cd "$HERE/../.." && pwd)"
+VENV="$ROOT/.harness/.venv-pdf"
+REQ="$HERE/requirements-pdf.txt"
+PY="$VENV/bin/python"
+STAMP="$VENV/.installed"
+cd "$ROOT"
+fail_json() { printf '{"ok":false,"error":"%s"}\n' "$1"; exit 3; }
+if [ ! -x "$PY" ]; then
+  PYBIN="$(command -v python3 || true)"
+  [ -n "$PYBIN" ] || fail_json "python3-not-found"
+  "$PYBIN" -m venv "$VENV" >&2 || fail_json "venv-create-failed"
+fi
+# 依赖只装一次（按 requirements 内容哈希做戳记，requirements 变了会重装）
+REQ_HASH="$("$PY" -c "import hashlib,sys;print(hashlib.sha256(open(sys.argv[1],'rb').read()).hexdigest())" "$REQ" 2>/dev/null || echo none)"
+if [ ! -f "$STAMP" ] || [ "$(cat "$STAMP" 2>/dev/null)" != "$REQ_HASH" ]; then
+  if "$PY" -m pip install --quiet --disable-pip-version-check -r "$REQ" >&2; then
+    printf '%s' "$REQ_HASH" > "$STAMP"
+  else
+    fail_json "dependency-install-failed"
+  fi
+fi
+exec "$PY" "$HERE/pdf_rag.py" "$@"