solo-doc 0.0.3 → 0.1.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +94 -30
- package/dist/bin/solo-doc.js +75 -21
- package/dist/src/ai/OllamaClient.js +59 -0
- package/dist/src/commands/VSCommand.js +157 -0
- package/dist/src/strategies/OCPStrategy.js +17 -0
- package/dist/src/utils/StrategyDetector.js +32 -0
- package/dist/src/utils/TocExtractor.js +33 -0
- package/dist/src/utils/filename.js +18 -0
- package/package.json +2 -1
package/README.md
CHANGED
|
@@ -1,61 +1,125 @@
|
|
|
1
1
|
# Solo-Doc CLI
|
|
2
2
|
|
|
3
|
-
Solo-Doc
|
|
3
|
+
Solo-Doc 是一个强大的 Node.js CLI 工具,旨在爬取复杂的文档站点并将其转换为单一的、保留层级结构的 Markdown 文件。
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
**命名由来**:"Solo" 代表将多个文档页面整合为“单一”(solo)文件的能力,"Doc" 代表文档。
|
|
6
6
|
|
|
7
|
-
##
|
|
7
|
+
## ✨ 功能特性
|
|
8
8
|
|
|
9
|
-
-
|
|
10
|
-
-
|
|
11
|
-
-
|
|
12
|
-
-
|
|
13
|
-
- **
|
|
14
|
-
- **
|
|
9
|
+
- **🧠 智能探测**:
|
|
10
|
+
- **自动策略识别**: 根据 URL 自动检测文档类型(Red Hat OpenShift 或 Alauda)。
|
|
11
|
+
- **自动命名**: 基于文档路径智能生成输出文件名(例如 `acp-building_application.md`),无需手动指定。
|
|
12
|
+
- **🏗 多策略支持**: 针对不同文档框架的专用策略:
|
|
13
|
+
- **OCP (Red Hat OpenShift)**: 针对静态单页 HTML 文档进行了优化。
|
|
14
|
+
- **ACP (Alauda Container Platform)**: 针对使用 Puppeteer 的动态客户端渲染(基于 Rspress)文档进行了优化。
|
|
15
|
+
- **🌲 保持层级结构**: 完整保留文档的原始目录结构(1, 1.1, 1.1.1...)。
|
|
16
|
+
- **✨ 纯净输出**: 移除导航栏、侧边栏、页眉和页脚,仅保留核心内容。
|
|
17
|
+
- **📄 单文件输出**: 将所有爬取的页面合并为一个完整的 Markdown 文件。
|
|
15
18
|
|
|
16
|
-
##
|
|
19
|
+
## 📦 安装
|
|
17
20
|
|
|
18
|
-
###
|
|
21
|
+
### 通过 NPM 安装(推荐)
|
|
19
22
|
|
|
20
|
-
|
|
23
|
+
你可以全局安装此工具:
|
|
21
24
|
|
|
22
25
|
```bash
|
|
23
26
|
npm install -g solo-doc
|
|
24
27
|
```
|
|
25
28
|
|
|
26
|
-
##
|
|
29
|
+
## 🚀 使用指南
|
|
27
30
|
|
|
28
|
-
|
|
31
|
+
全局安装后,你可以在任何终端窗口运行 `solo-doc` 命令。
|
|
29
32
|
|
|
30
|
-
###
|
|
33
|
+
### 基础用法(自动探测)
|
|
31
34
|
|
|
32
|
-
|
|
35
|
+
只需提供 URL。Solo-Doc 会自动识别站点类型并生成有意义的文件名。
|
|
33
36
|
|
|
34
37
|
```bash
|
|
35
|
-
|
|
38
|
+
# 爬取 Alauda 文档
|
|
39
|
+
# 输出文件: acp-building_application.md
|
|
40
|
+
solo-doc "https://docs.alauda.io/container_platform/4.2/developer/building_application/index.html"
|
|
41
|
+
|
|
42
|
+
# 爬取 Red Hat 文档
|
|
43
|
+
# 输出文件: ocp-building_applications.md
|
|
44
|
+
solo-doc "https://docs.redhat.com/en/documentation/openshift_container_platform/4.20/html-single/building_applications/index"
|
|
45
|
+
```
|
|
46
|
+
|
|
47
|
+
### 📝 自定义输出文件名
|
|
48
|
+
|
|
49
|
+
使用 `-o` 参数指定自定义输出路径。
|
|
50
|
+
|
|
51
|
+
```bash
|
|
52
|
+
solo-doc "https://docs.alauda.io/..." -o my-manual.md
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
### 🔧 强制指定策略类型
|
|
56
|
+
|
|
57
|
+
如果 URL 无法被自动识别(例如私有 IP 或自定义域名),你可以使用 `--type` 强制指定策略。
|
|
58
|
+
|
|
59
|
+
```bash
|
|
60
|
+
# 针对私有部署强制使用 ACP 策略
|
|
61
|
+
solo-doc "http://10.1.2.3/docs/index.html" --type acp
|
|
36
62
|
```
|
|
37
63
|
|
|
38
|
-
|
|
64
|
+
## ⚙️ 选项参数
|
|
39
65
|
|
|
40
|
-
|
|
66
|
+
| 选项 | 描述 | 默认值 |
|
|
67
|
+
|--------|-------------|---------|
|
|
68
|
+
| `<url>` | 要爬取的文档 URL | (必填) |
|
|
69
|
+
| `-o, --output <path>` | 输出文件路径。如果省略,将根据 URL 自动生成文件名。 | `[type]-[path-segment].md` |
|
|
70
|
+
| `-t, --type <type>` | 强制指定策略类型 (`ocp` 或 `acp`)。 | 自动探测 |
|
|
71
|
+
| `--limit <number>` | 限制爬取的页面数量 (用于测试/调试)。 | 无限制 |
|
|
72
|
+
| `--no-headless` | 在可见模式下运行浏览器 (仅限 ACP,用于调试)。 | Headless (无头模式) |
|
|
73
|
+
|
|
74
|
+
## 🧪 实验性功能 (Beta)
|
|
75
|
+
|
|
76
|
+
### 🤖 文档对比 (AI VS 模式)
|
|
77
|
+
|
|
78
|
+
使用本地 AI 模型对比两个文档的内容差异和结构差异。此功能处于 Beta 阶段。
|
|
79
|
+
|
|
80
|
+
> **⚠️ 前置要求**:
|
|
81
|
+
> 1. 本地已安装并运行 [Ollama](https://ollama.com/)。
|
|
82
|
+
> 2. 已拉取所需的模型(推荐 `qwen3-vl:8b` 或类似多模态/大文本模型)。
|
|
83
|
+
> 3. 确保 Ollama 服务监听在 `http://127.0.0.1:11434`。
|
|
84
|
+
|
|
85
|
+
#### 用法
|
|
41
86
|
|
|
42
87
|
```bash
|
|
43
|
-
solo-doc
|
|
88
|
+
solo-doc vs <baseline-url> <target-url> [options]
|
|
44
89
|
```
|
|
45
90
|
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
|
|
91
|
+
#### 示例
|
|
92
|
+
|
|
93
|
+
```bash
|
|
94
|
+
# 对比 OpenShift 和 Alauda 的文档
|
|
95
|
+
solo-doc vs \
|
|
96
|
+
"https://docs.redhat.com/en/documentation/openshift_container_platform/4.20/html-single/building_applications/index" \
|
|
97
|
+
"https://docs.alauda.io/container_platform/4.2/developer/building_application/index.html" \
|
|
98
|
+
--model qwen3-vl:8b
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
此命令将按顺序执行:
|
|
102
|
+
1. **自动爬取**: 分别爬取两个 URL 并保存为 Markdown 文件(如果已存在则跳过)。
|
|
103
|
+
2. **提取目录**: 提取两个文档的目录树结构。
|
|
104
|
+
3. **AI 分析**: 调用本地 Ollama 模型,根据 `solo-doc-prompt.md` 定义的提示词进行两步分析:
|
|
105
|
+
- 生成 `vs-result.md`: 详细的内容与结构差异分析。
|
|
106
|
+
- 生成 `vs-tree.md`: 包含差异标注的合并目录树。
|
|
107
|
+
|
|
108
|
+
#### VS 模式选项
|
|
109
|
+
|
|
110
|
+
| 选项 | 描述 | 默认值 |
|
|
111
|
+
|--------|-------------|---------|
|
|
112
|
+
| `--model <name>` | 指定使用的 Ollama 模型名称。 | `qwen3-vl:8b` |
|
|
113
|
+
| `-f, --force` | 强制重新爬取文档,即使文件已存在。 | false |
|
|
50
114
|
|
|
51
|
-
##
|
|
115
|
+
## ✅ 环境要求
|
|
52
116
|
|
|
53
|
-
- Node.js >=
|
|
54
|
-
- Google Chrome (
|
|
117
|
+
- Node.js >= 20
|
|
118
|
+
- Google Chrome (用于 ACP 爬取)
|
|
55
119
|
|
|
56
|
-
##
|
|
120
|
+
## 💻 开发
|
|
57
121
|
|
|
58
122
|
```bash
|
|
59
|
-
#
|
|
60
|
-
npm run dev --
|
|
123
|
+
# 在开发模式下运行
|
|
124
|
+
npm run dev -- "https://docs.alauda.io/..."
|
|
61
125
|
```
|
package/dist/bin/solo-doc.js
CHANGED
|
@@ -8,37 +8,91 @@ const commander_1 = require("commander");
|
|
|
8
8
|
const CrawlerContext_1 = require("../src/CrawlerContext");
|
|
9
9
|
const OCPStrategy_1 = require("../src/strategies/OCPStrategy");
|
|
10
10
|
const ACPStrategy_1 = require("../src/strategies/ACPStrategy");
|
|
11
|
+
const StrategyDetector_1 = require("../src/utils/StrategyDetector");
|
|
12
|
+
const filename_1 = require("../src/utils/filename");
|
|
13
|
+
const VSCommand_1 = require("../src/commands/VSCommand");
|
|
14
|
+
const chalk_1 = __importDefault(require("chalk"));
|
|
11
15
|
const path_1 = __importDefault(require("path"));
|
|
16
|
+
const fs_1 = __importDefault(require("fs"));
|
|
12
17
|
const program = new commander_1.Command();
|
|
13
18
|
program
|
|
14
19
|
.name('solo-doc')
|
|
15
20
|
.description('CLI to crawl documentation sites and convert to single Markdown file')
|
|
16
21
|
.version('1.0.0');
|
|
22
|
+
// VS Command
|
|
17
23
|
program
|
|
18
|
-
.command('
|
|
19
|
-
.description('
|
|
20
|
-
.
|
|
21
|
-
.
|
|
22
|
-
.
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
24
|
+
.command('vs')
|
|
25
|
+
.description('Compare two documentation sites using AI (Beta)')
|
|
26
|
+
.argument('<baseline>', 'Baseline documentation URL')
|
|
27
|
+
.argument('<target>', 'Target documentation URL')
|
|
28
|
+
.option('--model <model>', 'Ollama model to use', 'qwen3-vl:8b')
|
|
29
|
+
.action(async (baseline, target, options) => {
|
|
30
|
+
try {
|
|
31
|
+
console.log(chalk_1.default.yellow('⚠️ [Beta Feature] This feature requires a local Ollama instance running at http://127.0.0.1:11434'));
|
|
32
|
+
await VSCommand_1.VSCommand.run(baseline, target, options);
|
|
33
|
+
}
|
|
34
|
+
catch (error) {
|
|
35
|
+
console.error(chalk_1.default.red(`[VS Mode] Failed: ${error.message}`));
|
|
36
|
+
process.exit(1);
|
|
37
|
+
}
|
|
27
38
|
});
|
|
39
|
+
// Default Crawl Command (Implicit)
|
|
28
40
|
program
|
|
29
|
-
.
|
|
30
|
-
.
|
|
31
|
-
.option('-o, --output <path>', 'Output file path'
|
|
41
|
+
.argument('<url>', 'The documentation URL to crawl')
|
|
42
|
+
.option('-t, --type <type>', 'Force specify strategy type (ocp, acp)')
|
|
43
|
+
.option('-o, --output <path>', 'Output file path')
|
|
32
44
|
.option('--limit <number>', 'Limit number of pages (for debug)', parseInt)
|
|
33
|
-
.option('--no-headless', 'Run in headful mode (show browser)')
|
|
45
|
+
.option('--no-headless', 'Run in headful mode (show browser) - Only for ACP')
|
|
46
|
+
.option('-f, --force', 'Force overwrite existing file')
|
|
34
47
|
.action(async (url, options) => {
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
|
|
39
|
-
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
|
|
48
|
+
try {
|
|
49
|
+
// 1. Determine Strategy
|
|
50
|
+
let type = options.type;
|
|
51
|
+
if (!type) {
|
|
52
|
+
const detected = StrategyDetector_1.StrategyDetector.detect(url);
|
|
53
|
+
if (detected !== StrategyDetector_1.StrategyType.UNKNOWN) {
|
|
54
|
+
type = detected;
|
|
55
|
+
console.log(chalk_1.default.blue(`[Solo-Doc] Auto-detected strategy: ${type.toUpperCase()}`));
|
|
56
|
+
}
|
|
57
|
+
}
|
|
58
|
+
if (!type || (type !== 'ocp' && type !== 'acp')) {
|
|
59
|
+
console.error(chalk_1.default.red('Error: Could not detect documentation type.'));
|
|
60
|
+
console.error(chalk_1.default.yellow('Please use --type <ocp|acp> to specify the documentation type manually.'));
|
|
61
|
+
process.exit(1);
|
|
62
|
+
}
|
|
63
|
+
// 2. Instantiate Strategy
|
|
64
|
+
let strategy;
|
|
65
|
+
let defaultOutput;
|
|
66
|
+
if (type === 'ocp' || type === StrategyDetector_1.StrategyType.OCP) {
|
|
67
|
+
strategy = new OCPStrategy_1.OCPStrategy();
|
|
68
|
+
defaultOutput = (0, filename_1.generateDefaultFilename)(url, 'ocp');
|
|
69
|
+
}
|
|
70
|
+
else {
|
|
71
|
+
strategy = new ACPStrategy_1.ACPStrategy();
|
|
72
|
+
defaultOutput = (0, filename_1.generateDefaultFilename)(url, 'acp');
|
|
73
|
+
}
|
|
74
|
+
// 3. Prepare Context
|
|
75
|
+
const context = new CrawlerContext_1.CrawlerContext(strategy);
|
|
76
|
+
const outputPath = path_1.default.resolve(process.cwd(), options.output || defaultOutput);
|
|
77
|
+
// Check if file exists
|
|
78
|
+
if (fs_1.default.existsSync(outputPath) && !options.force) {
|
|
79
|
+
console.log(chalk_1.default.yellow('--------------------------------------------------'));
|
|
80
|
+
console.log(chalk_1.default.yellow(`ℹ File already exists: ${outputPath}`));
|
|
81
|
+
console.log(chalk_1.default.yellow(' Skipping crawl to save time.'));
|
|
82
|
+
console.log(chalk_1.default.gray(' Use --force or -f to overwrite.'));
|
|
83
|
+
console.log(chalk_1.default.yellow('--------------------------------------------------'));
|
|
84
|
+
return;
|
|
85
|
+
}
|
|
86
|
+
// 4. Run
|
|
87
|
+
await context.run(url, {
|
|
88
|
+
output: outputPath,
|
|
89
|
+
limit: options.limit,
|
|
90
|
+
headless: options.headless
|
|
91
|
+
});
|
|
92
|
+
}
|
|
93
|
+
catch (error) {
|
|
94
|
+
console.error(chalk_1.default.red(`[Solo-Doc] Failed: ${error.message}`));
|
|
95
|
+
process.exit(1);
|
|
96
|
+
}
|
|
43
97
|
});
|
|
44
98
|
program.parse(process.argv);
|
|
@@ -0,0 +1,59 @@
|
|
|
1
|
+
"use strict";
|
|
2
|
+
var __importDefault = (this && this.__importDefault) || function (mod) {
|
|
3
|
+
return (mod && mod.__esModule) ? mod : { "default": mod };
|
|
4
|
+
};
|
|
5
|
+
Object.defineProperty(exports, "__esModule", { value: true });
|
|
6
|
+
exports.OllamaClient = void 0;
|
|
7
|
+
const axios_1 = __importDefault(require("axios"));
|
|
8
|
+
class OllamaClient {
|
|
9
|
+
constructor(options) {
|
|
10
|
+
// Use 127.0.0.1 instead of localhost to avoid IPv6 issues (ECONNREFUSED ::1)
|
|
11
|
+
this.endpoint = options.endpoint || 'http://127.0.0.1:11434';
|
|
12
|
+
this.model = options.model;
|
|
13
|
+
}
|
|
14
|
+
async generate(prompt, onToken) {
|
|
15
|
+
try {
|
|
16
|
+
const response = await axios_1.default.post(`${this.endpoint}/api/generate`, {
|
|
17
|
+
model: this.model,
|
|
18
|
+
prompt: prompt,
|
|
19
|
+
stream: true
|
|
20
|
+
}, {
|
|
21
|
+
responseType: 'stream'
|
|
22
|
+
});
|
|
23
|
+
let fullResponse = '';
|
|
24
|
+
return new Promise((resolve, reject) => {
|
|
25
|
+
const stream = response.data;
|
|
26
|
+
stream.on('data', (chunk) => {
|
|
27
|
+
const lines = chunk.toString().split('\n').filter(Boolean);
|
|
28
|
+
for (const line of lines) {
|
|
29
|
+
try {
|
|
30
|
+
const json = JSON.parse(line);
|
|
31
|
+
if (json.response) {
|
|
32
|
+
fullResponse += json.response;
|
|
33
|
+
if (onToken) {
|
|
34
|
+
onToken(json.response);
|
|
35
|
+
}
|
|
36
|
+
}
|
|
37
|
+
if (json.done) {
|
|
38
|
+
// stream ended
|
|
39
|
+
}
|
|
40
|
+
}
|
|
41
|
+
catch (e) {
|
|
42
|
+
// ignore partial JSON
|
|
43
|
+
}
|
|
44
|
+
}
|
|
45
|
+
});
|
|
46
|
+
stream.on('end', () => {
|
|
47
|
+
resolve(fullResponse);
|
|
48
|
+
});
|
|
49
|
+
stream.on('error', (err) => {
|
|
50
|
+
reject(err);
|
|
51
|
+
});
|
|
52
|
+
});
|
|
53
|
+
}
|
|
54
|
+
catch (error) {
|
|
55
|
+
throw new Error(`Ollama API call failed: ${error.message}. Is Ollama running at ${this.endpoint}?`);
|
|
56
|
+
}
|
|
57
|
+
}
|
|
58
|
+
}
|
|
59
|
+
exports.OllamaClient = OllamaClient;
|
|
@@ -0,0 +1,157 @@
|
|
|
1
|
+
"use strict";
|
|
2
|
+
var __importDefault = (this && this.__importDefault) || function (mod) {
|
|
3
|
+
return (mod && mod.__esModule) ? mod : { "default": mod };
|
|
4
|
+
};
|
|
5
|
+
Object.defineProperty(exports, "__esModule", { value: true });
|
|
6
|
+
exports.VSCommand = void 0;
|
|
7
|
+
const fs_1 = __importDefault(require("fs"));
|
|
8
|
+
const path_1 = __importDefault(require("path"));
|
|
9
|
+
const chalk_1 = __importDefault(require("chalk"));
|
|
10
|
+
const ora_1 = __importDefault(require("ora"));
|
|
11
|
+
const CrawlerContext_1 = require("../CrawlerContext");
|
|
12
|
+
const OCPStrategy_1 = require("../strategies/OCPStrategy");
|
|
13
|
+
const ACPStrategy_1 = require("../strategies/ACPStrategy");
|
|
14
|
+
const StrategyDetector_1 = require("../utils/StrategyDetector");
|
|
15
|
+
const filename_1 = require("../utils/filename");
|
|
16
|
+
const TocExtractor_1 = require("../utils/TocExtractor");
|
|
17
|
+
const OllamaClient_1 = require("../ai/OllamaClient");
|
|
18
|
+
class VSCommand {
|
|
19
|
+
static async run(baselineUrl, targetUrl, options) {
|
|
20
|
+
console.log(chalk_1.default.blue(`[VS Mode] Starting comparison between:`));
|
|
21
|
+
console.log(chalk_1.default.gray(`Baseline: ${baselineUrl}`));
|
|
22
|
+
console.log(chalk_1.default.gray(`Target: ${targetUrl}`));
|
|
23
|
+
console.log(chalk_1.default.gray(`Model: ${options.model}`));
|
|
24
|
+
// 1. Crawl Baseline
|
|
25
|
+
const baselineFile = await VSCommand.crawlUrl(baselineUrl, 'baseline');
|
|
26
|
+
// 2. Crawl Target
|
|
27
|
+
const targetFile = await VSCommand.crawlUrl(targetUrl, 'target');
|
|
28
|
+
// 3. Extract TOC
|
|
29
|
+
const baselineContent = fs_1.default.readFileSync(baselineFile, 'utf-8');
|
|
30
|
+
const targetContent = fs_1.default.readFileSync(targetFile, 'utf-8');
|
|
31
|
+
const baselineToc = TocExtractor_1.TocExtractor.extract(baselineContent);
|
|
32
|
+
const targetToc = TocExtractor_1.TocExtractor.extract(targetContent);
|
|
33
|
+
console.log(chalk_1.default.green(`[VS Mode] TOC extracted.`));
|
|
34
|
+
// 4. Load Prompts
|
|
35
|
+
const promptPath = path_1.default.resolve(process.cwd(), 'solo-doc-prompt.md');
|
|
36
|
+
let promptContent = '';
|
|
37
|
+
if (fs_1.default.existsSync(promptPath)) {
|
|
38
|
+
promptContent = fs_1.default.readFileSync(promptPath, 'utf-8');
|
|
39
|
+
}
|
|
40
|
+
else {
|
|
41
|
+
// Try to look in package root (assuming we might be running from bin)
|
|
42
|
+
// or just fail gracefully
|
|
43
|
+
const altPath = path_1.default.resolve(__dirname, '../../solo-doc-prompt.md');
|
|
44
|
+
if (fs_1.default.existsSync(altPath)) {
|
|
45
|
+
promptContent = fs_1.default.readFileSync(altPath, 'utf-8');
|
|
46
|
+
}
|
|
47
|
+
else {
|
|
48
|
+
console.warn(chalk_1.default.yellow('[VS Mode] Warning: solo-doc-prompt.md not found. Using default internal prompts.'));
|
|
49
|
+
// Define fallback prompts here if needed, or throw
|
|
50
|
+
// For now, let's throw to ensure user provides the file as requested
|
|
51
|
+
throw new Error('Could not find solo-doc-prompt.md in current directory.');
|
|
52
|
+
}
|
|
53
|
+
}
|
|
54
|
+
const prompts = VSCommand.parsePrompts(promptContent);
|
|
55
|
+
if (prompts.length < 2) {
|
|
56
|
+
throw new Error('Found fewer than 2 prompt templates in solo-doc-prompt.md');
|
|
57
|
+
}
|
|
58
|
+
const client = new OllamaClient_1.OllamaClient({ model: options.model });
|
|
59
|
+
// 5. Step 1: Independent Comparison
|
|
60
|
+
console.log(chalk_1.default.blue(`[VS Mode] Step 1: Analyzing differences...`));
|
|
61
|
+
// Replace placeholders
|
|
62
|
+
// Note: The prompt template has [看附件ocp的文档目录树]
|
|
63
|
+
let prompt1 = prompts[0];
|
|
64
|
+
prompt1 = prompt1.replace('[看附件ocp的文档目录树]', '\n' + baselineToc + '\n');
|
|
65
|
+
prompt1 = prompt1.replace('[看附件alauda的文档目录树]', '\n' + targetToc + '\n');
|
|
66
|
+
const spinner1 = (0, ora_1.default)('Waiting for AI response (this may take a while)...').start();
|
|
67
|
+
let hasStartedOutput1 = false;
|
|
68
|
+
const result1 = await client.generate(prompt1, (token) => {
|
|
69
|
+
if (!hasStartedOutput1) {
|
|
70
|
+
spinner1.stop();
|
|
71
|
+
process.stdout.write(chalk_1.default.cyan('AI Thinking: '));
|
|
72
|
+
hasStartedOutput1 = true;
|
|
73
|
+
}
|
|
74
|
+
process.stdout.write(token);
|
|
75
|
+
});
|
|
76
|
+
if (!hasStartedOutput1)
|
|
77
|
+
spinner1.stop();
|
|
78
|
+
process.stdout.write('\n');
|
|
79
|
+
const result1File = 'vs-result.md';
|
|
80
|
+
fs_1.default.writeFileSync(result1File, result1);
|
|
81
|
+
console.log(chalk_1.default.green(`[VS Mode] Step 1 complete. Saved to ${result1File}`));
|
|
82
|
+
// 6. Step 2: Integration
|
|
83
|
+
console.log(chalk_1.default.blue(`[VS Mode] Step 2: Integrating into documentation tree...`));
|
|
84
|
+
let prompt2 = prompts[1];
|
|
85
|
+
// The prompt says "基于OpenShift文档的目录树" - we should inject it if it's not explicitly placeholder
|
|
86
|
+
// Or we just prepend context.
|
|
87
|
+
// The template: "基于OpenShift文档的目录树,和上面的详细对比总结。"
|
|
88
|
+
// We construct the full prompt by prepending data.
|
|
89
|
+
const context2 = `
|
|
90
|
+
OpenShift文档目录树:
|
|
91
|
+
${baselineToc}
|
|
92
|
+
|
|
93
|
+
详细对比总结:
|
|
94
|
+
${result1}
|
|
95
|
+
|
|
96
|
+
`;
|
|
97
|
+
prompt2 = context2 + prompt2;
|
|
98
|
+
const spinner2 = (0, ora_1.default)('Waiting for AI response (this may take a while)...').start();
|
|
99
|
+
let hasStartedOutput2 = false;
|
|
100
|
+
const result2 = await client.generate(prompt2, (token) => {
|
|
101
|
+
if (!hasStartedOutput2) {
|
|
102
|
+
spinner2.stop();
|
|
103
|
+
process.stdout.write(chalk_1.default.cyan('AI Thinking: '));
|
|
104
|
+
hasStartedOutput2 = true;
|
|
105
|
+
}
|
|
106
|
+
process.stdout.write(token);
|
|
107
|
+
});
|
|
108
|
+
if (!hasStartedOutput2)
|
|
109
|
+
spinner2.stop();
|
|
110
|
+
process.stdout.write('\n');
|
|
111
|
+
const result2File = 'vs-tree.md';
|
|
112
|
+
fs_1.default.writeFileSync(result2File, result2);
|
|
113
|
+
console.log(chalk_1.default.green(`[VS Mode] Step 2 complete. Saved to ${result2File}`));
|
|
114
|
+
console.log(chalk_1.default.green(`[VS Mode] All tasks finished.`));
|
|
115
|
+
}
|
|
116
|
+
static async crawlUrl(url, prefix) {
|
|
117
|
+
// Try to detect type
|
|
118
|
+
let type = StrategyDetector_1.StrategyDetector.detect(url);
|
|
119
|
+
let strategy;
|
|
120
|
+
// Simple logic: if detects OCP, use OCP. Else ACP (more generic).
|
|
121
|
+
if (type === StrategyDetector_1.StrategyType.OCP) {
|
|
122
|
+
strategy = new OCPStrategy_1.OCPStrategy();
|
|
123
|
+
}
|
|
124
|
+
else {
|
|
125
|
+
// Default to ACP which uses Puppeteer
|
|
126
|
+
strategy = new ACPStrategy_1.ACPStrategy();
|
|
127
|
+
}
|
|
128
|
+
const filename = (0, filename_1.generateDefaultFilename)(url, prefix);
|
|
129
|
+
const outputPath = path_1.default.resolve(process.cwd(), filename);
|
|
130
|
+
// Check if file exists
|
|
131
|
+
if (fs_1.default.existsSync(outputPath)) {
|
|
132
|
+
console.log(chalk_1.default.yellow('--------------------------------------------------'));
|
|
133
|
+
console.log(chalk_1.default.yellow(`ℹ File already exists: ${outputPath}`));
|
|
134
|
+
console.log(chalk_1.default.yellow(' Using cached version for comparison.'));
|
|
135
|
+
console.log(chalk_1.default.yellow('--------------------------------------------------'));
|
|
136
|
+
return outputPath;
|
|
137
|
+
}
|
|
138
|
+
console.log(chalk_1.default.blue(`[VS Mode] Crawling ${url} -> ${filename}...`));
|
|
139
|
+
const context = new CrawlerContext_1.CrawlerContext(strategy);
|
|
140
|
+
// Suppress console log from crawler to keep output clean?
|
|
141
|
+
// Or keep it to show progress. Keep it.
|
|
142
|
+
await context.run(url, { output: outputPath, headless: true });
|
|
143
|
+
return outputPath;
|
|
144
|
+
}
|
|
145
|
+
static parsePrompts(content) {
|
|
146
|
+
// Match content inside ``` ... ``` blocks that follow "Prompt模板"
|
|
147
|
+
// Regex: /Prompt模板.*?\n```([\s\S]*?)```/g
|
|
148
|
+
const regex = /Prompt模板.*?\n```([\s\S]*?)```/g;
|
|
149
|
+
const matches = [];
|
|
150
|
+
let match;
|
|
151
|
+
while ((match = regex.exec(content)) !== null) {
|
|
152
|
+
matches.push(match[1].trim());
|
|
153
|
+
}
|
|
154
|
+
return matches;
|
|
155
|
+
}
|
|
156
|
+
}
|
|
157
|
+
exports.VSCommand = VSCommand;
|
|
@@ -47,6 +47,23 @@ class OCPStrategy {
|
|
|
47
47
|
this.name = 'OCP (Red Hat OpenShift)';
|
|
48
48
|
}
|
|
49
49
|
async execute(url, options) {
|
|
50
|
+
// Optimisation: Try to convert multi-page URL (/html/) to single-page URL (/html-single/)
|
|
51
|
+
// Example: .../html/building_applications/index -> .../html-single/building_applications/index
|
|
52
|
+
if (url.includes('/html/') && !url.includes('/html-single/')) {
|
|
53
|
+
const singlePageUrl = url.replace('/html/', '/html-single/');
|
|
54
|
+
console.log(chalk_1.default.blue(`[OCP] Detected multi-page URL. Attempting to switch to single-page version for better results...`));
|
|
55
|
+
console.log(chalk_1.default.gray(`Original: ${url}`));
|
|
56
|
+
console.log(chalk_1.default.cyan(`Optimized: ${singlePageUrl}`));
|
|
57
|
+
try {
|
|
58
|
+
// Verify if the single page exists
|
|
59
|
+
await axios_1.default.head(singlePageUrl);
|
|
60
|
+
url = singlePageUrl;
|
|
61
|
+
console.log(chalk_1.default.green(`[OCP] Successfully switched to single-page version.`));
|
|
62
|
+
}
|
|
63
|
+
catch (e) {
|
|
64
|
+
console.log(chalk_1.default.yellow(`[OCP] Single-page version not found. Falling back to original URL.`));
|
|
65
|
+
}
|
|
66
|
+
}
|
|
50
67
|
const spinner = (0, ora_1.default)('Fetching OCP content...').start();
|
|
51
68
|
try {
|
|
52
69
|
// 1. Fetch the single page HTML
|
|
@@ -0,0 +1,32 @@
|
|
|
1
|
+
"use strict";
|
|
2
|
+
Object.defineProperty(exports, "__esModule", { value: true });
|
|
3
|
+
exports.StrategyDetector = exports.StrategyType = void 0;
|
|
4
|
+
var StrategyType;
|
|
5
|
+
(function (StrategyType) {
|
|
6
|
+
StrategyType["OCP"] = "ocp";
|
|
7
|
+
StrategyType["ACP"] = "acp";
|
|
8
|
+
StrategyType["UNKNOWN"] = "unknown";
|
|
9
|
+
})(StrategyType || (exports.StrategyType = StrategyType = {}));
|
|
10
|
+
class StrategyDetector {
|
|
11
|
+
static detect(url) {
|
|
12
|
+
try {
|
|
13
|
+
// Add protocol if missing to ensure URL parsing works
|
|
14
|
+
if (!url.startsWith('http://') && !url.startsWith('https://')) {
|
|
15
|
+
url = 'https://' + url;
|
|
16
|
+
}
|
|
17
|
+
const urlObj = new URL(url);
|
|
18
|
+
const hostname = urlObj.hostname;
|
|
19
|
+
if (hostname.includes('redhat.com') || hostname.includes('openshift.com')) {
|
|
20
|
+
return StrategyType.OCP;
|
|
21
|
+
}
|
|
22
|
+
if (hostname.includes('alauda.io') || hostname.includes('alauda.cn')) {
|
|
23
|
+
return StrategyType.ACP;
|
|
24
|
+
}
|
|
25
|
+
return StrategyType.UNKNOWN;
|
|
26
|
+
}
|
|
27
|
+
catch (e) {
|
|
28
|
+
return StrategyType.UNKNOWN;
|
|
29
|
+
}
|
|
30
|
+
}
|
|
31
|
+
}
|
|
32
|
+
exports.StrategyDetector = StrategyDetector;
|
|
@@ -0,0 +1,33 @@
|
|
|
1
|
+
"use strict";
|
|
2
|
+
Object.defineProperty(exports, "__esModule", { value: true });
|
|
3
|
+
exports.TocExtractor = void 0;
|
|
4
|
+
/**
|
|
5
|
+
* Extracts the Table of Contents (headers) from Markdown content.
|
|
6
|
+
* Returns a string representation of the tree.
|
|
7
|
+
*/
|
|
8
|
+
class TocExtractor {
|
|
9
|
+
static extract(markdown) {
|
|
10
|
+
const lines = markdown.split('\n');
|
|
11
|
+
const tocLines = [];
|
|
12
|
+
let inCodeBlock = false;
|
|
13
|
+
for (const line of lines) {
|
|
14
|
+
// Simple code block detection to avoid headers inside code blocks
|
|
15
|
+
if (line.trim().startsWith('```')) {
|
|
16
|
+
inCodeBlock = !inCodeBlock;
|
|
17
|
+
continue;
|
|
18
|
+
}
|
|
19
|
+
if (inCodeBlock)
|
|
20
|
+
continue;
|
|
21
|
+
// Match headers H1 to H3 only
|
|
22
|
+
// Regex: ^#{1,3}\s
|
|
23
|
+
if (line.match(/^#{1,3}\s/)) {
|
|
24
|
+
tocLines.push(line.trim());
|
|
25
|
+
}
|
|
26
|
+
}
|
|
27
|
+
if (tocLines.length === 0) {
|
|
28
|
+
return "No headers found.";
|
|
29
|
+
}
|
|
30
|
+
return tocLines.join('\n');
|
|
31
|
+
}
|
|
32
|
+
}
|
|
33
|
+
exports.TocExtractor = TocExtractor;
|
|
@@ -0,0 +1,18 @@
|
|
|
1
|
+
"use strict";
|
|
2
|
+
Object.defineProperty(exports, "__esModule", { value: true });
|
|
3
|
+
exports.generateDefaultFilename = void 0;
|
|
4
|
+
const generateDefaultFilename = (urlStr, typePrefix) => {
|
|
5
|
+
try {
|
|
6
|
+
const u = new URL(urlStr);
|
|
7
|
+
// Get the last path segment that isn't 'index.html' or 'index' or empty
|
|
8
|
+
const segments = u.pathname.split('/').filter(s => s && s !== 'index.html' && s !== 'index');
|
|
9
|
+
const lastSegment = segments.length > 0 ? segments[segments.length - 1] : 'docs';
|
|
10
|
+
// Sanitize filename
|
|
11
|
+
const safeName = lastSegment.replace(/[^a-zA-Z0-9-_]/g, '_');
|
|
12
|
+
return `${typePrefix}-${safeName}.md`;
|
|
13
|
+
}
|
|
14
|
+
catch (e) {
|
|
15
|
+
return `${typePrefix}-docs.md`;
|
|
16
|
+
}
|
|
17
|
+
};
|
|
18
|
+
exports.generateDefaultFilename = generateDefaultFilename;
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "solo-doc",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.1.2",
|
|
4
4
|
"main": "dist/bin/solo-doc.js",
|
|
5
5
|
"bin": {
|
|
6
6
|
"solo-doc": "dist/bin/solo-doc.js"
|
|
@@ -13,6 +13,7 @@
|
|
|
13
13
|
"clean": "rm -rf dist",
|
|
14
14
|
"build": "tsc",
|
|
15
15
|
"prepublishOnly": "npm run clean && npm run build",
|
|
16
|
+
"release": "npm run clean && npm run build && npm version patch --force && npm publish --access=public",
|
|
16
17
|
"start": "node dist/bin/solo-doc.js",
|
|
17
18
|
"dev": "ts-node bin/solo-doc.ts"
|
|
18
19
|
},
|