docmk 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/skills/pdf/SKILL.md +89 -0
- package/.claude/skills/web-scraping/SKILL.md +78 -0
- package/CLAUDE.md +90 -0
- package/bin/docmk.js +3 -0
- package/dist/index.d.ts +1 -0
- package/dist/index.js +636 -0
- package/dist/index.js.map +1 -0
- package/final-site/assets/main-B4orIFxK.css +1 -0
- package/final-site/assets/main-CSoKXua6.js +25 -0
- package/final-site/favicon.svg +4 -0
- package/final-site/index.html +26 -0
- package/final-site/robots.txt +4 -0
- package/final-site/sitemap.xml +14 -0
- package/my-docs/api/README.md +152 -0
- package/my-docs/api/advanced.md +260 -0
- package/my-docs/getting-started/README.md +24 -0
- package/my-docs/tutorials/README.md +272 -0
- package/my-docs/tutorials/customization.md +492 -0
- package/package.json +59 -0
- package/postcss.config.js +6 -0
- package/site/assets/main-BZUsYUCF.css +1 -0
- package/site/assets/main-q6laQtCD.js +114 -0
- package/site/favicon.svg +4 -0
- package/site/index.html +23 -0
- package/site/robots.txt +4 -0
- package/site/sitemap.xml +34 -0
- package/site-output/assets/main-B4orIFxK.css +1 -0
- package/site-output/assets/main-CSoKXua6.js +25 -0
- package/site-output/favicon.svg +4 -0
- package/site-output/index.html +26 -0
- package/site-output/robots.txt +4 -0
- package/site-output/sitemap.xml +14 -0
- package/src/builder/index.ts +189 -0
- package/src/builder/vite-dev.ts +117 -0
- package/src/cli/commands/build.ts +48 -0
- package/src/cli/commands/dev.ts +53 -0
- package/src/cli/commands/preview.ts +57 -0
- package/src/cli/index.ts +42 -0
- package/src/client/App.vue +15 -0
- package/src/client/components/SearchBox.vue +204 -0
- package/src/client/components/Sidebar.vue +18 -0
- package/src/client/components/SidebarItem.vue +108 -0
- package/src/client/index.html +21 -0
- package/src/client/layouts/AppLayout.vue +99 -0
- package/src/client/lib/utils.ts +6 -0
- package/src/client/main.ts +42 -0
- package/src/client/pages/Home.vue +279 -0
- package/src/client/pages/SkillPage.vue +565 -0
- package/src/client/router.ts +16 -0
- package/src/client/styles/global.css +92 -0
- package/src/client/utils/routes.ts +69 -0
- package/src/parser/index.ts +253 -0
- package/src/scanner/index.ts +127 -0
- package/src/types/index.ts +45 -0
- package/tailwind.config.js +65 -0
- package/test-build/assets/main-C2ARPC0e.css +1 -0
- package/test-build/assets/main-CHIQpV3B.js +25 -0
- package/test-build/favicon.svg +4 -0
- package/test-build/index.html +47 -0
- package/test-build/robots.txt +4 -0
- package/test-build/sitemap.xml +19 -0
- package/test-dist/assets/main-B4orIFxK.css +1 -0
- package/test-dist/assets/main-CSoKXua6.js +25 -0
- package/test-dist/favicon.svg +4 -0
- package/test-dist/index.html +26 -0
- package/test-dist/robots.txt +4 -0
- package/test-dist/sitemap.xml +14 -0
- package/tsconfig.json +30 -0
- package/tsup.config.ts +13 -0
- package/vite.config.ts +21 -0
|
@@ -0,0 +1,89 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: PDF Processing
|
|
3
|
+
description: Comprehensive guide to working with PDF files including extraction, manipulation, and generation
|
|
4
|
+
tags: ["pdf", "documents", "data-extraction"]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# PDF Processing
|
|
8
|
+
|
|
9
|
+
This skill covers various aspects of working with PDF documents programmatically.
|
|
10
|
+
|
|
11
|
+
## Overview
|
|
12
|
+
|
|
13
|
+
PDF (Portable Document Format) is a widely used file format for documents. This skill includes:
|
|
14
|
+
|
|
15
|
+
- Text extraction from PDFs
|
|
16
|
+
- PDF generation and manipulation
|
|
17
|
+
- Form handling
|
|
18
|
+
- Metadata processing
|
|
19
|
+
- Document conversion
|
|
20
|
+
|
|
21
|
+
## Key Libraries and Tools
|
|
22
|
+
|
|
23
|
+
### Python
|
|
24
|
+
- **PyPDF2/PyPDF4** - Basic PDF operations
|
|
25
|
+
- **pdfplumber** - Text extraction with layout preservation
|
|
26
|
+
- **ReportLab** - PDF generation
|
|
27
|
+
- **pdftk** - Command-line PDF toolkit
|
|
28
|
+
|
|
29
|
+
### JavaScript/Node.js
|
|
30
|
+
- **pdf-lib** - Create and modify PDF documents
|
|
31
|
+
- **pdf-parse** - Simple PDF parsing
|
|
32
|
+
- **puppeteer** - Generate PDFs from web content
|
|
33
|
+
|
|
34
|
+
## Common Use Cases
|
|
35
|
+
|
|
36
|
+
### Text Extraction
|
|
37
|
+
Extract text content from PDF files while preserving formatting and structure.
|
|
38
|
+
|
|
39
|
+
### Document Generation
|
|
40
|
+
Create PDFs programmatically from data, templates, or web content.
|
|
41
|
+
|
|
42
|
+
### Form Processing
|
|
43
|
+
Handle PDF forms, extract form data, and fill forms programmatically.
|
|
44
|
+
|
|
45
|
+
## Best Practices
|
|
46
|
+
|
|
47
|
+
1. **Memory Management** - Large PDFs can consume significant memory
|
|
48
|
+
2. **Error Handling** - PDFs can be corrupted or password-protected
|
|
49
|
+
3. **Performance** - Consider streaming for large documents
|
|
50
|
+
4. **Security** - Be cautious with user-uploaded PDFs
|
|
51
|
+
|
|
52
|
+
## Examples
|
|
53
|
+
|
|
54
|
+
### Basic Text Extraction (Python)
|
|
55
|
+
```python
|
|
56
|
+
import PyPDF2
|
|
57
|
+
|
|
58
|
+
with open('document.pdf', 'rb') as file:
|
|
59
|
+
reader = PyPDF2.PdfReader(file)
|
|
60
|
+
text = ""
|
|
61
|
+
for page in reader.pages:
|
|
62
|
+
text += page.extract_text()
|
|
63
|
+
print(text)
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
### PDF Generation (JavaScript)
|
|
67
|
+
```javascript
|
|
68
|
+
import { PDFDocument, rgb } from 'pdf-lib'
|
|
69
|
+
import fs from 'fs'
|
|
70
|
+
|
|
71
|
+
const pdfDoc = await PDFDocument.create()
|
|
72
|
+
const page = pdfDoc.addPage()
|
|
73
|
+
|
|
74
|
+
page.drawText('Hello, PDF!', {
|
|
75
|
+
x: 50,
|
|
76
|
+
y: 750,
|
|
77
|
+
size: 30,
|
|
78
|
+
color: rgb(0, 0, 0),
|
|
79
|
+
})
|
|
80
|
+
|
|
81
|
+
const pdfBytes = await pdfDoc.save()
|
|
82
|
+
fs.writeFileSync('output.pdf', pdfBytes)
|
|
83
|
+
```
|
|
84
|
+
|
|
85
|
+
## Troubleshooting
|
|
86
|
+
|
|
87
|
+
- **Encoding Issues** - Some PDFs may have encoding problems
|
|
88
|
+
- **Layout Preservation** - Complex layouts may not extract cleanly
|
|
89
|
+
- **Performance** - Large PDFs may require optimization
|
|
@@ -0,0 +1,78 @@
|
|
|
1
|
+
---
|
|
2
|
+
title: Web Scraping
|
|
3
|
+
description: Techniques and tools for extracting data from websites
|
|
4
|
+
tags: ["scraping", "data-extraction", "automation"]
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
# Web Scraping
|
|
8
|
+
|
|
9
|
+
Web scraping is the process of extracting data from websites programmatically.
|
|
10
|
+
|
|
11
|
+
## Overview
|
|
12
|
+
|
|
13
|
+
Web scraping involves:
|
|
14
|
+
|
|
15
|
+
- Fetching web pages
|
|
16
|
+
- Parsing HTML content
|
|
17
|
+
- Extracting specific data
|
|
18
|
+
- Handling dynamic content
|
|
19
|
+
- Managing rate limits and ethics
|
|
20
|
+
|
|
21
|
+
## Tools and Libraries
|
|
22
|
+
|
|
23
|
+
### Python
|
|
24
|
+
- **BeautifulSoup** - HTML parsing
|
|
25
|
+
- **Scrapy** - Full-featured scraping framework
|
|
26
|
+
- **Selenium** - Browser automation
|
|
27
|
+
- **requests** - HTTP library
|
|
28
|
+
|
|
29
|
+
### JavaScript/Node.js
|
|
30
|
+
- **Puppeteer** - Chrome automation
|
|
31
|
+
- **Playwright** - Multi-browser automation
|
|
32
|
+
- **Cheerio** - Server-side jQuery
|
|
33
|
+
- **axios** - HTTP client
|
|
34
|
+
|
|
35
|
+
## Key Concepts
|
|
36
|
+
|
|
37
|
+
### Respect robots.txt
|
|
38
|
+
Always check the robots.txt file of websites before scraping.
|
|
39
|
+
|
|
40
|
+
### Rate Limiting
|
|
41
|
+
Implement delays between requests to avoid overwhelming servers.
|
|
42
|
+
|
|
43
|
+
### User Agents
|
|
44
|
+
Rotate user agents to appear more like regular browsers.
|
|
45
|
+
|
|
46
|
+
## Examples
|
|
47
|
+
|
|
48
|
+
### Basic Scraping (Python)
|
|
49
|
+
```python
|
|
50
|
+
import requests
|
|
51
|
+
from bs4 import BeautifulSoup
|
|
52
|
+
|
|
53
|
+
response = requests.get('https://example.com')
|
|
54
|
+
soup = BeautifulSoup(response.content, 'html.parser')
|
|
55
|
+
|
|
56
|
+
titles = soup.find_all('h2', class_='title')
|
|
57
|
+
for title in titles:
|
|
58
|
+
print(title.text)
|
|
59
|
+
```
|
|
60
|
+
|
|
61
|
+
### Dynamic Content (JavaScript)
|
|
62
|
+
```javascript
|
|
63
|
+
const puppeteer = require('puppeteer');
|
|
64
|
+
|
|
65
|
+
(async () => {
|
|
66
|
+
const browser = await puppeteer.launch();
|
|
67
|
+
const page = await browser.newPage();
|
|
68
|
+
|
|
69
|
+
await page.goto('https://example.com');
|
|
70
|
+
await page.waitForSelector('.dynamic-content');
|
|
71
|
+
|
|
72
|
+
const content = await page.$eval('.dynamic-content',
|
|
73
|
+
el => el.textContent);
|
|
74
|
+
|
|
75
|
+
console.log(content);
|
|
76
|
+
await browser.close();
|
|
77
|
+
})();
|
|
78
|
+
```
|
package/CLAUDE.md
ADDED
|
@@ -0,0 +1,90 @@
|
|
|
1
|
+
# CLAUDE.md
|
|
2
|
+
|
|
3
|
+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
4
|
+
|
|
5
|
+
## 项目概述
|
|
6
|
+
|
|
7
|
+
DocGen 是一个文档生成 CLI 工具,用于扫描任意目录并自动生成静态文档网站。支持 Markdown 渲染、全文搜索、代码高亮等功能。
|
|
8
|
+
|
|
9
|
+
## 常用命令
|
|
10
|
+
|
|
11
|
+
```bash
|
|
12
|
+
# 安装依赖
|
|
13
|
+
npm install
|
|
14
|
+
|
|
15
|
+
# 构建 CLI(输出到 dist/)
|
|
16
|
+
npm run build
|
|
17
|
+
|
|
18
|
+
# 类型检查
|
|
19
|
+
npm run typecheck
|
|
20
|
+
|
|
21
|
+
# 代码检查
|
|
22
|
+
npm run lint
|
|
23
|
+
|
|
24
|
+
# 开发模式运行 CLI(使用 tsx 直接执行)
|
|
25
|
+
npm run dev
|
|
26
|
+
|
|
27
|
+
# 使用 CLI 启动文档开发服务器
|
|
28
|
+
node dist/index.js dev --dir ./my-docs --port 3000
|
|
29
|
+
|
|
30
|
+
# 构建静态站点
|
|
31
|
+
node dist/index.js build --dir ./my-docs --output ./dist
|
|
32
|
+
|
|
33
|
+
# 预览构建结果
|
|
34
|
+
node dist/index.js preview --output ./dist --port 4173
|
|
35
|
+
```
|
|
36
|
+
|
|
37
|
+
## 架构
|
|
38
|
+
|
|
39
|
+
```
|
|
40
|
+
src/
|
|
41
|
+
├── cli/ # CLI 入口和命令实现
|
|
42
|
+
│ ├── index.ts # 主入口,使用 Commander.js 定义命令
|
|
43
|
+
│ └── commands/ # dev/build/preview 命令
|
|
44
|
+
├── scanner/ # 目录扫描器,递归扫描 .md 文件
|
|
45
|
+
├── parser/ # Markdown 解析,使用 gray-matter + markdown-it + shiki
|
|
46
|
+
├── builder/ # Vite 构建逻辑
|
|
47
|
+
│ ├── index.ts # 生产构建,生成 sitemap/robots.txt
|
|
48
|
+
│ └── vite-dev.ts # 开发服务器,支持文件监听热更新
|
|
49
|
+
├── client/ # Vue 3 前端 SPA
|
|
50
|
+
│ ├── pages/ # Home(首页统计)和 SkillPage(文档渲染)
|
|
51
|
+
│ ├── components/ # Sidebar、SearchBox 等
|
|
52
|
+
│ └── router.ts # 路由配置,catch-all 匹配文档路径
|
|
53
|
+
└── types/ # TypeScript 类型定义
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
## 核心数据流
|
|
57
|
+
|
|
58
|
+
1. **Scanner** (`src/scanner/index.ts`): 扫描源目录,递归收集 `.md` 文件,构建 `SkillDirectory[]` 树结构
|
|
59
|
+
2. **Parser** (`src/parser/index.ts`): 解析 frontmatter、渲染 Markdown 为 HTML、提取 TOC、应用 shiki 代码高亮
|
|
60
|
+
3. **Builder**:
|
|
61
|
+
- 开发模式:通过 `/api/config` 端点注入配置,使用 chokidar 监听文件变化触发热更新
|
|
62
|
+
- 生产模式:将配置 base64 编码注入 HTML `<head>`,生成 sitemap.xml 和 robots.txt
|
|
63
|
+
4. **Client**: Vue 3 SPA 从 `globalThis.__DOCGEN_CONFIG__` 或 `/api/config` 获取配置,渲染文档
|
|
64
|
+
|
|
65
|
+
## 关键类型
|
|
66
|
+
|
|
67
|
+
- `SkillDirectory`: 目录节点,包含 `children` 和可选的 `skillFile`(SKILL.md)
|
|
68
|
+
- `SkillFile`: 文件节点,包含 `content`、`frontmatter`、`lastModified`
|
|
69
|
+
- `DocGenConfig`: 完整站点配置,包含 `navigation`、`files`、`directories`
|
|
70
|
+
|
|
71
|
+
## 技术栈
|
|
72
|
+
|
|
73
|
+
- **CLI**: Commander.js + tsup 打包
|
|
74
|
+
- **前端**: Vue 3 + Vue Router + Vite + Tailwind CSS
|
|
75
|
+
- **解析**: gray-matter(frontmatter)+ markdown-it(渲染)+ shiki(代码高亮)
|
|
76
|
+
- **监听**: chokidar
|
|
77
|
+
|
|
78
|
+
## 文档目录结构约定
|
|
79
|
+
|
|
80
|
+
```
|
|
81
|
+
docs/
|
|
82
|
+
├── category/
|
|
83
|
+
│ ├── SKILL.md # 主文档(可选,作为该目录的入口)
|
|
84
|
+
│ ├── advanced.md # 其他文档
|
|
85
|
+
│ └── sub-category/ # 子目录
|
|
86
|
+
└── another/
|
|
87
|
+
└── SKILL.md
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
路由规则:`SKILL.md` 文件映射到目录路径(如 `/api/SKILL.md` → `/api`),其他 `.md` 文件保留文件名(如 `/api/advanced.md` → `/api/advanced`)。
|
package/bin/docmk.js
ADDED
package/dist/index.d.ts
ADDED
|
@@ -0,0 +1 @@
|
|
|
1
|
+
#!/usr/bin/env node
|