rag-fanuc 1.0.0__tar.gz

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,223 @@
1
+ Metadata-Version: 2.4
2
+ Name: rag-fanuc
3
+ Version: 1.0.0
4
+ Summary: FANUC RAG Knowledge Base — pluggable retrieval + SAG hybrid search + OKF concepts
5
+ Home-page: https://github.com/Ikalus1988/self-grow-wiki
6
+ Author: Ikalus1988
7
+ Author-email: ikalus1988@users.noreply.github.com
8
+ Classifier: Development Status :: 4 - Beta
9
+ Classifier: Intended Audience :: Developers
10
+ Classifier: License :: OSI Approved :: MIT License
11
+ Classifier: Operating System :: OS Independent
12
+ Classifier: Programming Language :: Python :: 3
13
+ Classifier: Programming Language :: Python :: 3.10
14
+ Classifier: Programming Language :: Python :: 3.11
15
+ Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
16
+ Requires-Python: >=3.10
17
+ Description-Content-Type: text/markdown
18
+ Requires-Dist: chromadb>=0.4
19
+ Requires-Dist: numpy>=1.21
20
+ Requires-Dist: torch>=1.13
21
+ Requires-Dist: transformers>=4.20
22
+ Requires-Dist: pyyaml>=6.0
23
+ Provides-Extra: test
24
+ Requires-Dist: pytest>=7.0; extra == "test"
25
+ Dynamic: author
26
+ Dynamic: author-email
27
+ Dynamic: classifier
28
+ Dynamic: description
29
+ Dynamic: description-content-type
30
+ Dynamic: home-page
31
+ Dynamic: provides-extra
32
+ Dynamic: requires-dist
33
+ Dynamic: requires-python
34
+ Dynamic: summary
35
+
36
+ # FANUC RAG 知识库
37
+
38
+ > 190 份 FANUC 工业机器人技术文档 → 20 万+ 向量 → 自然语言问答
39
+
40
+ ## 快速开始
41
+
42
+ ```bash
43
+ # 1. 克隆仓库
44
+ git clone https://github.com/Ikalus1988/self-grow-wiki.git
45
+ cd self-grow-wiki
46
+
47
+ # 2. 安装依赖
48
+ python3 -m venv ~/mkdocs-env
49
+ source ~/mkdocs-env/bin/activate
50
+ pip install -r requirements.txt
51
+
52
+ # 3. 准备向量库(二选一)
53
+ # 方式A: 从PDF重建(需要PDF源文件,耗时30-60分钟)
54
+ python3 scripts/import/rag_builder.py --input /path/to/pdf/
55
+
56
+ # 方式B: 迁移已有向量库(3.0G,秒级)
57
+ # 将 rag_chromadb/ 目录放到 ~/rag_chromadb/
58
+
59
+ # 4. 修改配置
60
+ # 编辑 rag_core.py 中的向量库路径
61
+ # CHROMA_PATH = "/你的路径/rag_chromadb"
62
+
63
+ # 5. 启动服务
64
+ chmod +x start_rag.sh
65
+ ./start_rag.sh
66
+ ```
67
+
68
+ ## 服务端口
69
+
70
+ | 服务 | 端口 | 说明 |
71
+ |------|------|------|
72
+ | RAG Web UI | 7860 | Gradio 问答界面 |
73
+ | 管理面板 | 7861 | Gradio 管理后台 |
74
+ | RAG API | 8002 | FastAPI HTTP 接口 |
75
+ | Ollama | 11434 | 本地 LLM 兜底 |
76
+
77
+ ## 目录结构
78
+
79
+ ```
80
+ self-grow-wiki/
81
+ ├── rag_core.py # 核心检索+生成
82
+ ├── rag_web.py # Gradio Web UI
83
+ ├── rag_api.py # FastAPI HTTP API
84
+ ├── rag_admin.py # 管理面板
85
+ ├── rag_feedback_card.py # 反馈卡片
86
+ ├── start_rag.sh # 一键启动脚本
87
+ ├── auto_flywheel.py # 自动飞轮
88
+ ├── daily_audit.py # 每日巡检
89
+ ├── badcase_review.py # Badcase 审核
90
+ ├── kb_learning.py # 自学习模块
91
+ ├── synonyms.json # 同义词表
92
+ ├── requirements.txt # Python 依赖
93
+ ├── wxauto_bot.py # 微信机器人(旧版)
94
+ ├── wxauto_bot/
95
+ │ └── bot.py # 微信机器人(v7,推荐)
96
+ ├── scripts/
97
+ │ ├── import/ # PDF 导入工具
98
+ │ │ ├── rag_builder.py # 批量导入
99
+ │ │ ├── rag_builder_ocr.py # OCR 增强导入
100
+ │ │ ├── rag_import_fanuc.py # FANUC 专用导入
101
+ │ │ └── import_batch.py # 批量导入
102
+ │ ├── audit/ # 审计工具
103
+ │ │ ├── audit_chunks_p1.py
104
+ │ │ ├── audit_exam_p2.py
105
+ │ │ ├── audit_pdf_chunk_p3.py
106
+ │ │ └── audit_pdf_chunk_v2.py
107
+ │ ├── exam/ # 试卷生成
108
+ │ │ ├── gen_exam.py
109
+ │ │ ├── gen_exam_c.py
110
+ │ │ ├── gen_exam_c_v2.py
111
+ │ │ └── gen_exam_v2.py
112
+ │ ├── docs/ # 文档工具
113
+ │ │ ├── doc_classifier.py
114
+ │ │ ├── doc_verify.py
115
+ │ │ ├── doc_verify_v2.py
116
+ │ │ ├── generate_mkdocs.py
117
+ │ │ └── graph_to_obsidian.py
118
+ │ ├── kb_selfcheck.py # 知识库自检
119
+ │ ├── rag_phase2_semantic_tag.py
120
+ │ └── rag_phase2b_refine.py
121
+ ├── lessons/ # Lessons(供 MisakaNet 使用)
122
+ └── tests/ # 测试
123
+ ```
124
+
125
+ ## 核心功能
126
+
127
+ ### 检索增强
128
+
129
+ | 能力 | 说明 |
130
+ |------|------|
131
+ | 语义搜索 + BM25 混合检索 | RRF 融合,兼顾语义和关键词 |
132
+ | 报警代码规范化 | SRVO-023 / SRVO023 / SRVO-023 全匹配 |
133
+ | 型号系列强制召回 | M-900 / R-2000iC 系列文档不遗漏 |
134
+ | 品牌过滤 | 查询含 FANUC 时自动排除 KUKA/ABB 文档 |
135
+ | 同义词扩展 | 49 组同义词自动扩展查询 |
136
+ | 多问题拆分 | "A 和 B 的区别" 自动拆分分别检索 |
137
+
138
+ ### LLM 四通道容灾
139
+
140
+ | 优先级 | 模型 | 来源 | 响应时间 |
141
+ |--------|------|------|----------|
142
+ | 1 | MiMo-V2-Flash | Mify 内网 | ~1.5s |
143
+ | 2 | MiMo-V2-Pro | Mify 内网 | ~0.5s |
144
+ | 3 | DeepSeek-Chat | DeepSeek API | ~2s |
145
+ | 4 | Qwen2.5:3b | Ollama 本地 | ~50s |
146
+
147
+ ### 质量保证
148
+
149
+ - **每日自动巡检** — cron 6:00,抽样 7-8 题,通过率 >=90% 为合格
150
+ - **入库质检门禁** — 新 PDF 入库前自动检查污染/重复/二进制残留
151
+ - **自学习飞轮** — 巡检失败 -> badcase -> 审核 -> 同义词 -> 检索增强
152
+
153
+ ## 迁移指南
154
+
155
+ ### 最小迁移(推荐)
156
+
157
+ ```bash
158
+ # 只需迁移代码仓库(git clone 即可)
159
+ git clone https://github.com/Ikalus1988/self-grow-wiki.git
160
+
161
+ # 向量库从 PDF 重建
162
+ pip install -r requirements.txt
163
+ python3 scripts/import/rag_builder.py --input /path/to/pdf/
164
+ ```
165
+
166
+ ### 完整迁移(保留已有向量库)
167
+
168
+ ```bash
169
+ # 1. 代码仓库
170
+ git clone https://github.com/Ikalus1988/self-grow-wiki.git
171
+
172
+ # 2. 向量库(3.0G)
173
+ # 从原服务器打包
174
+ tar czf rag_chromadb.tar.gz ~/rag_chromadb/
175
+ # 传输到新服务器解压
176
+ tar xzf rag_chromadb.tar.gz -C ~/
177
+
178
+ # 3. 修改配置
179
+ # 编辑 rag_core.py 中 CHROMA_PATH 为新路径
180
+
181
+ # 4. 启动
182
+ ./start_rag.sh
183
+ ```
184
+
185
+ ### 飞书机器人配置
186
+
187
+ 飞书接入通过 Hermes Gateway 处理,配置在 `~/.hermes/.env`:
188
+
189
+ ```bash
190
+ FEISHU_APP_ID=你的AppID
191
+ FEISHU_APP_SECRET=你的AppSecret
192
+ FEISHU_CONNECTION_MODE=websocket
193
+ FEISHU_HOME_CHANNEL=群组ID
194
+ FEISHU_GROUP_ALLOWED_CHATS=群组ID1,群组ID2
195
+ FEISHU_ALLOWED_USERS=用户ID1,用户ID2
196
+ ```
197
+
198
+ 修改飞书入口:
199
+ 1. **复用现有 App** — 只需复制 `.env` 到新服务器,重启 Hermes Gateway
200
+ 2. **创建新 App** — 在 https://open.feishu.cn/app/ 创建,获取 App ID/Secret,更新 `.env`
201
+ 3. **修改群组** — 更新 `FEISHU_HOME_CHANNEL` 和 `FEISHU_GROUP_ALLOWED_CHATS`
202
+
203
+ ## 环境变量
204
+
205
+ | 变量 | 说明 | 默认值 |
206
+ |------|------|--------|
207
+ | `CHROMA_PATH` | ChromaDB 向量库路径 | `~/rag_chromadb` |
208
+ | `OLLAMA_MODELS` | Ollama 模型路径 | `~/ollama/models` |
209
+ | `OLLAMA_HOST` | Ollama 监听地址 | `0.0.0.0:11434` |
210
+ | `FEISHU_WEBHOOK` | 飞书 Webhook(可选) | 无 |
211
+
212
+ ## 技术栈
213
+
214
+ - **嵌入模型**: bge-base-zh-v1.5 (768维)
215
+ - **向量数据库**: ChromaDB (cosine 距离)
216
+ - **Web UI**: Gradio
217
+ - **HTTP API**: FastAPI
218
+ - **本地 LLM**: Ollama + Qwen2.5:3b
219
+ - **微信机器人**: wxauto v7
220
+
221
+ ## 许可证
222
+
223
+ MIT License
@@ -0,0 +1,188 @@
1
+ # FANUC RAG 知识库
2
+
3
+ > 190 份 FANUC 工业机器人技术文档 → 20 万+ 向量 → 自然语言问答
4
+
5
+ ## 快速开始
6
+
7
+ ```bash
8
+ # 1. 克隆仓库
9
+ git clone https://github.com/Ikalus1988/self-grow-wiki.git
10
+ cd self-grow-wiki
11
+
12
+ # 2. 安装依赖
13
+ python3 -m venv ~/mkdocs-env
14
+ source ~/mkdocs-env/bin/activate
15
+ pip install -r requirements.txt
16
+
17
+ # 3. 准备向量库(二选一)
18
+ # 方式A: 从PDF重建(需要PDF源文件,耗时30-60分钟)
19
+ python3 scripts/import/rag_builder.py --input /path/to/pdf/
20
+
21
+ # 方式B: 迁移已有向量库(3.0G,秒级)
22
+ # 将 rag_chromadb/ 目录放到 ~/rag_chromadb/
23
+
24
+ # 4. 修改配置
25
+ # 编辑 rag_core.py 中的向量库路径
26
+ # CHROMA_PATH = "/你的路径/rag_chromadb"
27
+
28
+ # 5. 启动服务
29
+ chmod +x start_rag.sh
30
+ ./start_rag.sh
31
+ ```
32
+
33
+ ## 服务端口
34
+
35
+ | 服务 | 端口 | 说明 |
36
+ |------|------|------|
37
+ | RAG Web UI | 7860 | Gradio 问答界面 |
38
+ | 管理面板 | 7861 | Gradio 管理后台 |
39
+ | RAG API | 8002 | FastAPI HTTP 接口 |
40
+ | Ollama | 11434 | 本地 LLM 兜底 |
41
+
42
+ ## 目录结构
43
+
44
+ ```
45
+ self-grow-wiki/
46
+ ├── rag_core.py # 核心检索+生成
47
+ ├── rag_web.py # Gradio Web UI
48
+ ├── rag_api.py # FastAPI HTTP API
49
+ ├── rag_admin.py # 管理面板
50
+ ├── rag_feedback_card.py # 反馈卡片
51
+ ├── start_rag.sh # 一键启动脚本
52
+ ├── auto_flywheel.py # 自动飞轮
53
+ ├── daily_audit.py # 每日巡检
54
+ ├── badcase_review.py # Badcase 审核
55
+ ├── kb_learning.py # 自学习模块
56
+ ├── synonyms.json # 同义词表
57
+ ├── requirements.txt # Python 依赖
58
+ ├── wxauto_bot.py # 微信机器人(旧版)
59
+ ├── wxauto_bot/
60
+ │ └── bot.py # 微信机器人(v7,推荐)
61
+ ├── scripts/
62
+ │ ├── import/ # PDF 导入工具
63
+ │ │ ├── rag_builder.py # 批量导入
64
+ │ │ ├── rag_builder_ocr.py # OCR 增强导入
65
+ │ │ ├── rag_import_fanuc.py # FANUC 专用导入
66
+ │ │ └── import_batch.py # 批量导入
67
+ │ ├── audit/ # 审计工具
68
+ │ │ ├── audit_chunks_p1.py
69
+ │ │ ├── audit_exam_p2.py
70
+ │ │ ├── audit_pdf_chunk_p3.py
71
+ │ │ └── audit_pdf_chunk_v2.py
72
+ │ ├── exam/ # 试卷生成
73
+ │ │ ├── gen_exam.py
74
+ │ │ ├── gen_exam_c.py
75
+ │ │ ├── gen_exam_c_v2.py
76
+ │ │ └── gen_exam_v2.py
77
+ │ ├── docs/ # 文档工具
78
+ │ │ ├── doc_classifier.py
79
+ │ │ ├── doc_verify.py
80
+ │ │ ├── doc_verify_v2.py
81
+ │ │ ├── generate_mkdocs.py
82
+ │ │ └── graph_to_obsidian.py
83
+ │ ├── kb_selfcheck.py # 知识库自检
84
+ │ ├── rag_phase2_semantic_tag.py
85
+ │ └── rag_phase2b_refine.py
86
+ ├── lessons/ # Lessons(供 MisakaNet 使用)
87
+ └── tests/ # 测试
88
+ ```
89
+
90
+ ## 核心功能
91
+
92
+ ### 检索增强
93
+
94
+ | 能力 | 说明 |
95
+ |------|------|
96
+ | 语义搜索 + BM25 混合检索 | RRF 融合,兼顾语义和关键词 |
97
+ | 报警代码规范化 | SRVO-023 / SRVO023 / SRVO-023 全匹配 |
98
+ | 型号系列强制召回 | M-900 / R-2000iC 系列文档不遗漏 |
99
+ | 品牌过滤 | 查询含 FANUC 时自动排除 KUKA/ABB 文档 |
100
+ | 同义词扩展 | 49 组同义词自动扩展查询 |
101
+ | 多问题拆分 | "A 和 B 的区别" 自动拆分分别检索 |
102
+
103
+ ### LLM 四通道容灾
104
+
105
+ | 优先级 | 模型 | 来源 | 响应时间 |
106
+ |--------|------|------|----------|
107
+ | 1 | MiMo-V2-Flash | Mify 内网 | ~1.5s |
108
+ | 2 | MiMo-V2-Pro | Mify 内网 | ~0.5s |
109
+ | 3 | DeepSeek-Chat | DeepSeek API | ~2s |
110
+ | 4 | Qwen2.5:3b | Ollama 本地 | ~50s |
111
+
112
+ ### 质量保证
113
+
114
+ - **每日自动巡检** — cron 6:00,抽样 7-8 题,通过率 >=90% 为合格
115
+ - **入库质检门禁** — 新 PDF 入库前自动检查污染/重复/二进制残留
116
+ - **自学习飞轮** — 巡检失败 -> badcase -> 审核 -> 同义词 -> 检索增强
117
+
118
+ ## 迁移指南
119
+
120
+ ### 最小迁移(推荐)
121
+
122
+ ```bash
123
+ # 只需迁移代码仓库(git clone 即可)
124
+ git clone https://github.com/Ikalus1988/self-grow-wiki.git
125
+
126
+ # 向量库从 PDF 重建
127
+ pip install -r requirements.txt
128
+ python3 scripts/import/rag_builder.py --input /path/to/pdf/
129
+ ```
130
+
131
+ ### 完整迁移(保留已有向量库)
132
+
133
+ ```bash
134
+ # 1. 代码仓库
135
+ git clone https://github.com/Ikalus1988/self-grow-wiki.git
136
+
137
+ # 2. 向量库(3.0G)
138
+ # 从原服务器打包
139
+ tar czf rag_chromadb.tar.gz ~/rag_chromadb/
140
+ # 传输到新服务器解压
141
+ tar xzf rag_chromadb.tar.gz -C ~/
142
+
143
+ # 3. 修改配置
144
+ # 编辑 rag_core.py 中 CHROMA_PATH 为新路径
145
+
146
+ # 4. 启动
147
+ ./start_rag.sh
148
+ ```
149
+
150
+ ### 飞书机器人配置
151
+
152
+ 飞书接入通过 Hermes Gateway 处理,配置在 `~/.hermes/.env`:
153
+
154
+ ```bash
155
+ FEISHU_APP_ID=你的AppID
156
+ FEISHU_APP_SECRET=你的AppSecret
157
+ FEISHU_CONNECTION_MODE=websocket
158
+ FEISHU_HOME_CHANNEL=群组ID
159
+ FEISHU_GROUP_ALLOWED_CHATS=群组ID1,群组ID2
160
+ FEISHU_ALLOWED_USERS=用户ID1,用户ID2
161
+ ```
162
+
163
+ 修改飞书入口:
164
+ 1. **复用现有 App** — 只需复制 `.env` 到新服务器,重启 Hermes Gateway
165
+ 2. **创建新 App** — 在 https://open.feishu.cn/app/ 创建,获取 App ID/Secret,更新 `.env`
166
+ 3. **修改群组** — 更新 `FEISHU_HOME_CHANNEL` 和 `FEISHU_GROUP_ALLOWED_CHATS`
167
+
168
+ ## 环境变量
169
+
170
+ | 变量 | 说明 | 默认值 |
171
+ |------|------|--------|
172
+ | `CHROMA_PATH` | ChromaDB 向量库路径 | `~/rag_chromadb` |
173
+ | `OLLAMA_MODELS` | Ollama 模型路径 | `~/ollama/models` |
174
+ | `OLLAMA_HOST` | Ollama 监听地址 | `0.0.0.0:11434` |
175
+ | `FEISHU_WEBHOOK` | 飞书 Webhook(可选) | 无 |
176
+
177
+ ## 技术栈
178
+
179
+ - **嵌入模型**: bge-base-zh-v1.5 (768维)
180
+ - **向量数据库**: ChromaDB (cosine 距离)
181
+ - **Web UI**: Gradio
182
+ - **HTTP API**: FastAPI
183
+ - **本地 LLM**: Ollama + Qwen2.5:3b
184
+ - **微信机器人**: wxauto v7
185
+
186
+ ## 许可证
187
+
188
+ MIT License
@@ -0,0 +1,34 @@
1
+ #!/usr/bin/env python3
2
+ """Extract 3rd-party dependencies from project source files."""
3
+ import re
4
+
5
+ files = [
6
+ "rag_core.py", "kb_learning.py", "daily_audit.py",
7
+ "badcase_review.py",
8
+ ]
9
+
10
+ stdlib = {
11
+ "os","sys","time","json","re","logging","math","random","threading",
12
+ "datetime","pathlib","argparse","collections","hashlib","subprocess",
13
+ "urllib","typing","textwrap","pickle","io","shutil","types","functools",
14
+ "itertools","bisect","copy","warnings","statistics","base64","uuid",
15
+ "tempfile","operator","pprint","traceback","ctypes","glob","inspect",
16
+ "abc","enum","html","http","socket","ssl","string","struct","textwrap",
17
+ "configparser","csv","netrc","platform","shelve","sqlite3",
18
+ }
19
+
20
+ imports = set()
21
+ for f in files:
22
+ try:
23
+ with open(f) as fh:
24
+ for line in fh:
25
+ m = re.match(r"^\s*(?:import|from)\s+(\S+)", line)
26
+ if m:
27
+ mod = m.group(1).split(".")[0].split(" import")[0].strip()
28
+ if mod not in stdlib:
29
+ imports.add(mod)
30
+ except FileNotFoundError:
31
+ pass
32
+
33
+ for m in sorted(imports):
34
+ print(m)
@@ -0,0 +1,80 @@
1
+ #!/usr/bin/env python3
2
+ """Scan local lessons/ directory and rebuild lessons/_index.json.
3
+
4
+ Scope: this script only manages lessons inside this repository
5
+ (self-grow-wiki). It does not touch MisakaNet's lessons.json — that
6
+ file is owned by a different repo and updated via a separate process.
7
+
8
+ Output: lessons/_index.json — list of {id, title, domain, tags, url,
9
+ updated} entries parsed from each lesson's frontmatter.
10
+
11
+ Usage:
12
+ python3 _update_lessons.py # rebuild _index.json from disk
13
+ """
14
+ import json
15
+ import re
16
+ from datetime import date
17
+ from pathlib import Path
18
+
19
+ HERE = Path(__file__).resolve().parent
20
+ LESSONS_DIR = HERE / "lessons"
21
+ INDEX_FILE = LESSONS_DIR / "_index.json"
22
+
23
+ _FRONTMATTER_RE = re.compile(r"^---\s*\n(.*?)\n---", re.DOTALL)
24
+
25
+
26
+ def _parse_frontmatter(text: str) -> dict:
27
+ """Extract simple YAML-ish frontmatter. Returns {} if absent or malformed."""
28
+ m = _FRONTMATTER_RE.match(text)
29
+ if not m:
30
+ return {}
31
+ meta = {}
32
+ for line in m.group(1).splitlines():
33
+ if ":" not in line:
34
+ continue
35
+ k, _, v = line.partition(":")
36
+ meta[k.strip()] = v.strip().strip('"').strip("'")
37
+ return meta
38
+
39
+
40
+ def _slug_from_filename(name: str) -> str:
41
+ return name[: -len(".md")] if name.endswith(".md") else name
42
+
43
+
44
+ def build_index() -> list:
45
+ """Walk lessons/*.md and produce an index entry per file."""
46
+ entries = []
47
+ if not LESSONS_DIR.is_dir():
48
+ return entries
49
+ for md_path in sorted(LESSONS_DIR.glob("*.md")):
50
+ try:
51
+ text = md_path.read_text(encoding="utf-8")
52
+ except OSError:
53
+ continue
54
+ meta = _parse_frontmatter(text)
55
+ slug = _slug_from_filename(md_path.name)
56
+ entries.append({
57
+ "id": meta.get("id", slug),
58
+ "title": meta.get("title", slug),
59
+ "domain": meta.get("domain", ""),
60
+ "tags": meta.get("tags", []),
61
+ "url": f"lessons/{md_path.name}",
62
+ "updated": meta.get("updated", str(date.today())),
63
+ })
64
+ return entries
65
+
66
+
67
+ def main() -> int:
68
+ entries = build_index()
69
+ INDEX_FILE.write_text(
70
+ json.dumps(entries, ensure_ascii=False, indent=2) + "\n",
71
+ encoding="utf-8",
72
+ )
73
+ print(f"Indexed {len(entries)} lesson(s) → {INDEX_FILE.relative_to(HERE)}")
74
+ for e in entries:
75
+ print(f" - [{e['domain'] or '?'}] {e['id']}: {e['title']}")
76
+ return 0
77
+
78
+
79
+ if __name__ == "__main__":
80
+ raise SystemExit(main())